#查询改写与谓词优化
# 冗余查询改写
思考如下语句:
SELECT AREA_NO FROM (SELECT * FROM AREA);
从语句内容可以看出,该SQL语句的目标是获取AERA表的AREA_NO列,但多出一个冗余的子查询,这将导致SQL引擎执行SQL时查询出多余的投影列。
这种冗余的写法增加了执行器的负担,因此YashanDB会将上述语句改写成如下类似结构:
SELECT AREA_NO FROM AREA;
# “IN / NOT IN + 子查询”改写
对于“IN + 子查询”的语句,当查询中不包含AGGR、GROUP BY、CONNECT BY、WINDOW FUNCTION时,YashanDB会将其改写成半连接或反连接。因为如果按照原SQL的语义去执行,则每次从BRANCHES表里取出一次数据时都要完整执行一遍AREA的子查询。而改写后的语句,将会大幅减少AREA表的扫描次数,从而提升性能。
示例
EXPLAIN SELECT * FROM branches b WHERE b.area_no IN (SELECT area_no FROM area);
PLAN_DESCRIPTION
----------------------------------------------------------------
SQL hash value: 2158759181
Optimizer: ADOPT_C
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
| Id | Operation type | Name | Owner | Rows | Cost(%CPU) | Partition info |
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
| 0 | SELECT STATEMENT | | | | | |
| 1 | NESTED LOOPS SEMI | | | 101| 179( 0)| |
| 2 | TABLE ACCESS FULL | BRANCHES | REGRESS | 100000| 132( 0)| |
|* 3 | INDEX UNIQUE SCAN | SYS_C_17 | REGRESS | 1| 6( 0)| |
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
Operation Information (identified by operation id):
---------------------------------------------------
3 - Predicate : access("AREA"."AREA_NO" = "B"."AREA_NO")
EXPLAIN SELECT * FROM branches b WHERE b.area_no NOT IN (SELECT area_no FROM area);
PLAN_DESCRIPTION
----------------------------------------------------------------
SQL hash value: 708164047
Optimizer: ADOPT_C
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
| Id | Operation type | Name | Owner | Rows | Cost(%CPU) | Partition info |
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
| 0 | SELECT STATEMENT | | | | | |
|* 1 | NESTED LOOPS ANTI | | | 1| 1463780( 0)| |
| 2 | TABLE ACCESS FULL | BRANCHES | REGRESS | 100000| 132( 0)| |
| 3 | INDEX FAST FULL SCAN | SYS_C_17 | REGRESS | 100000| 92( 0)| |
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
Operation Information (identified by operation id):
---------------------------------------------------
1 - Predicate : filter(INVERT("B"."AREA_NO" <> "AREA"."AREA_NO"))
# “EXISTS / NOT EXISTS + 子查询”改写
与“IN / NOT IN + 子查询”类似,当查询中不包含AGGR、GROUP BY、CONNECT BY、WINDOW FUNCTION时,“EXISTS / NOT EXISTS + 子查询”的组合也可能被改写成半连接或反连接。
示例
EXPLAIN SELECT * FROM branches b WHERE EXISTS (SELECT area_no FROM area a WHERE b.area_no = a.area_no);
PLAN_DESCRIPTION
----------------------------------------------------------------
SQL hash value: 3585099968
Optimizer: ADOPT_C
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
| Id | Operation type | Name | Owner | Rows | Cost(%CPU) | Partition info |
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
| 0 | SELECT STATEMENT | | | | | |
| 1 | NESTED LOOPS SEMI | | | 101| 179( 0)| |
| 2 | TABLE ACCESS FULL | BRANCHES | REGRESS | 100000| 132( 0)| |
|* 3 | INDEX UNIQUE SCAN | SYS_C_23 | REGRESS | 1| 6( 0)| |
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
Operation Information (identified by operation id):
---------------------------------------------------
3 - Predicate : access("A"."AREA_NO" = "B"."AREA_NO")
EXPLAIN SELECT * FROM branches b WHERE NOT EXISTS (SELECT area_no FROM area a WHERE b.area_no = a.area_no);
PLAN_DESCRIPTION
----------------------------------------------------------------
SQL hash value: 3371762311
Optimizer: ADOPT_C
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
| Id | Operation type | Name | Owner | Rows | Cost(%CPU) | Partition info |
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
| 0 | SELECT STATEMENT | | | | | |
| 1 | NESTED LOOPS ANTI | | | 99900| 179( 0)| |
| 2 | TABLE ACCESS FULL | BRANCHES | REGRESS | 100000| 132( 0)| |
|* 3 | INDEX UNIQUE SCAN | SYS_C_23 | REGRESS | 1| 6( 0)| |
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
Operation Information (identified by operation id):
---------------------------------------------------
3 - Predicate : access("A"."AREA_NO" = "B"."AREA_NO")
# “ANY/All + 子查询”改写
当比较运算符(>、>=、<、<=) 右边为any/all时,any子查询会被改写成半连接,all子查询改写成反连接。
示例
EXPLAIN SELECT product FROM sales_info_hash s WHERE amount > ANY(SELECT revenue_total FROM finance_info f WHERE s.branch = f.branch);
PLAN_DESCRIPTION
----------------------------------------------------------------
SQL hash value: 777020315
Optimizer: ADOPT_C
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
| Id | Operation type | Name | Owner | Rows | Cost(%CPU) | Partition info |
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
| 0 | SELECT STATEMENT | | | | | |
|* 1 | HASH JOIN SEMI | | | 2660| 2056( 0)| |
| 2 | PART SCAN ALL | | | 100000| 442( 0)| [0,1] |
| 3 | TABLE ACCESS FULL | SALES_INFO_HASH | SYS | 100000| 442( 0)| |
| 4 | TABLE ACCESS FULL | FINANCE_INFO | SYS | 100000| 442( 0)| |
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
Operation Information (identified by operation id):
---------------------------------------------------
1 - Predicate : access("S"."BRANCH" = "F"."BRANCH")
filter("S"."AMOUNT" > "F"."REVENUE_TOTAL")
EXPLAIN SELECT product FROM sales_info_hash WHERE amount > ALL(SELECT revenue_total FROM finance_info);
PLAN_DESCRIPTION
----------------------------------------------------------------
SQL hash value: 3243986469
Optimizer: ADOPT_C
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
| Id | Operation type | Name | Owner | Rows | Cost(%CPU) | Partition info |
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
| 0 | SELECT STATEMENT | | | | | |
|* 1 | NESTED LOOPS ANTI | | | 1| 664389( 0)| |
| 2 | PART SCAN ALL | | | 100000| 442( 0)| [0,1] |
| 3 | TABLE ACCESS FULL | SALES_INFO_HASH | SYS | 100000| 442( 0)| |
| 4 | VIEW | | | 100000| 447( 0)| |
| 5 | TABLE ACCESS FULL | FINANCE_INFO | SYS | 100000| 442( 0)| |
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
Operation Information (identified by operation id):
---------------------------------------------------
1 - Predicate : filter(INVERT("SALES_INFO_HASH"."AMOUNT" > "VSQ$1@SEL$0"."REVENUE_TOTAL"))
# 谓词相关处理
# 谓词组合优化
谓词重组包括合并和扩展两类:
谓词合并,去除冗余的谓词条件
将多个冗余的谓词合并成一个。例如:
A = B AND A != B
谓词,合并为恒FALSE
。谓词扩展,根据关系代数公式构造成等价高效的谓词条件
根据已有条件,构造出新的谓词。例如:
A > B AND B > C
可以推导出A > C
条件,在某些场景下,新构造出来的A > C
能够将笛卡尔连接转换成内连接。
# 谓词下推
谓词下推是指在不改变谓词语义的情况下将谓词放置到效率最高的地方。很显然谓词越早发挥作用,效率越高。
“因地制宜”合理放置谓词,例如将索引相关的谓词放置到索引上,存储引擎可以根据该谓词减小返回的结果集大小,相较于让存储引擎返回全部索引再由SQL引擎过滤结果集更优。