#查询改写与谓词优化

# 冗余查询改写

考虑如下语句:

SELECT AREA_NO FROM (SELECT * FROM AREA);

从语句内容可以看出,该SQL语句的目标是获取AERA表的AREA_NO列,但多出一个冗余的子查询,这将导致SQL引擎执行SQL时查询出多余的投影列。

这种冗余的写法增加了执行器的负担,因此YashanDB会将上述语句改写成如下类似结构:

SELECT AREA_NO FROM AREA;

# "IN + 子查询"改写

对于 "IN + 子查询" 的语句,YashanDB则会改写成半连接。因为如果按照原SQL的语义去执行,则每次从BRANCHES表里取出一次数据时,都要完整执行一遍AREA的子查询。而改写后的语句,将会大幅减少AREA表的扫描次数,从而提升性能。如下:

EXPLAIN SELECT * FROM branches b WHERE b.area_no IN (SELECT area_no FROM area);

PLAN_DESCRIPTION                                                 
---------------------------------------------------------------- 
SQL hash value: 2158759181                                      
Optimizer: ADOPT_C                                              
                                                                
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
| Id | Operation type                 | Name                 | Owner      | Rows     | Cost(%CPU)  | Partition info                 |
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
|  0 | SELECT STATEMENT               |                      |            |          |             |                                |
|  1 |  NESTED LOOPS SEMI             |                      |            |    100000|      690( 0)|                                |
|  2 |   TABLE ACCESS FULL            | BRANCHES             | SYS        |    100000|      442( 0)|                                |
|  3 |   TABLE ACCESS BY INDEX ROWID  | AREA                 | SYS        |          |             |                                |
|* 4 |    INDEX UNIQUE SCAN           | SYS_C_111            | SYS        |         1|      149( 0)|                                |
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
                                                                
Operation Information (identified by operation id):             
---------------------------------------------------             
                                                                
   4 - Predicate : access("AREA"."AREA_NO" = "B"."AREA_NO")     

# "EXISTS + 子查询" 改写

与 "IN + 子查询" 类似,"exists + 子查询"的组合也可能被改写成半连接,例如:

EXPLAIN SELECT * FROM branches b WHERE EXISTS (SELECT area_no FROM area a WHERE b.area_no = a.area_no);

PLAN_DESCRIPTION                                                 
---------------------------------------------------------------- 
SQL hash value: 3585099968                                      
Optimizer: ADOPT_C                                              
                                                                
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
| Id | Operation type                 | Name                 | Owner      | Rows     | Cost(%CPU)  | Partition info                 |
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
|  0 | SELECT STATEMENT               |                      |            |          |             |                                |
|  1 |  NESTED LOOPS SEMI             |                      |            |    100000|      690( 0)|                                |
|  2 |   TABLE ACCESS FULL            | BRANCHES             | SYS        |    100000|      442( 0)|                                |
|  3 |   TABLE ACCESS BY INDEX ROWID  | AREA                 | SYS        |          |             |                                |
|* 4 |    INDEX UNIQUE SCAN           | SYS_C_111            | SYS        |         1|      149( 0)|                                |
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
                                                                
Operation Information (identified by operation id):             
---------------------------------------------------             
                                                                
   4 - Predicate : access("A"."AREA_NO" = "B"."AREA_NO")        

# "ANY/All + 子查询" 改写

当比较运算符(>、>=、<、<=) 右边为 any/all时,any 子查询会被改写成半连接,all子查询改写成反连接,例如:

EXPLAIN SELECT product FROM sales_info_hash s WHERE amount > ANY(SELECT revenue_total FROM finance_info f WHERE s.branch = f.branch);

PLAN_DESCRIPTION                                                 
---------------------------------------------------------------- 
SQL hash value: 777020315                                       
Optimizer: ADOPT_C                                              
                                                                
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
| Id | Operation type                 | Name                 | Owner      | Rows     | Cost(%CPU)  | Partition info                 |
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
|  0 | SELECT STATEMENT               |                      |            |          |             |                                |
|* 1 |  HASH JOIN SEMI                |                      |            |      2660|     2056( 0)|                                |
|  2 |   PART SCAN ALL                |                      |            |    100000|      442( 0)| [0,1]                          |
|  3 |    TABLE ACCESS FULL           | SALES_INFO_HASH      | SYS        |    100000|      442( 0)|                                |
|  4 |   TABLE ACCESS FULL            | FINANCE_INFO         | SYS        |    100000|      442( 0)|                                |
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
                                                                
Operation Information (identified by operation id):             
---------------------------------------------------             
                                                                
   1 - Predicate : access("S"."BRANCH" = "F"."BRANCH")          
                   filter("S"."AMOUNT" > "F"."REVENUE_TOTAL")   


EXPLAIN SELECT product FROM sales_info_hash WHERE amount > ALL(SELECT revenue_total FROM finance_info);

PLAN_DESCRIPTION
---------------------------------------------------------------- 
SQL hash value: 3243986469                                      
Optimizer: ADOPT_C                                              
                                                                
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
| Id | Operation type                 | Name                 | Owner      | Rows     | Cost(%CPU)  | Partition info                 |
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
|  0 | SELECT STATEMENT               |                      |            |          |             |                                |
|* 1 |  NESTED LOOPS ANTI             |                      |            |         1|   664389( 0)|                                |
|  2 |   PART SCAN ALL                |                      |            |    100000|      442( 0)| [0,1]                          |
|  3 |    TABLE ACCESS FULL           | SALES_INFO_HASH      | SYS        |    100000|      442( 0)|                                |
|  4 |   VIEW                         |                      |            |    100000|      447( 0)|                                |
|  5 |    TABLE ACCESS FULL           | FINANCE_INFO         | SYS        |    100000|      442( 0)|                                |
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
                                                                
Operation Information (identified by operation id):             
---------------------------------------------------             
                                                                
   1 - Predicate : filter(INVERT("SALES_INFO_HASH"."AMOUNT" > "VSQ$1@SEL$0"."REVENUE_TOTAL"))

# 谓词相关处理

# 谓词组合优化

谓词重组包括合并和扩展两类:

  • 谓词合并,去除冗余的谓词条件

    多个冗余的谓词,合并成一个。例如: A = B AND A != B 谓词,合并为 恒FALSE。

  • 谓词扩展,根据关系代数公式构造成等价高效的谓词条件

    根据已有条件,构造出新的谓词。例如:A > B AND B > C,可以推导出 A > C条件,在某些场景下,新构造出来的A > C能够将笛卡尔连接转换成内连接。

# 谓词下推

  • 谓词下推就是在不改变谓词语义的情况下将谓词放置到效率最高的地方。很显然谓词所启的作用越早,效率越高。

  • 如索引相关的放置到索引上,存储引擎可以根据该谓词减小返回的结果集大小,这比让存储引擎返回全部索引再由SQL引擎过滤结果集更优。