MySQL 8.0 QueryResolver 源码笔记
前言
核心逻辑在SELECT_LEX::prepare()中,按照源码注释,其包括的大致内容如下:
/**
Prepare query block for optimization.
Resolve table and column information.
Resolve all expressions (item trees), ie WHERE clause, join conditions,
GROUP BY clause, HAVING clause, ORDER BY clause, LIMIT clause.
Prepare all subqueries recursively as part of resolving the expressions.
Apply permanent transformations to the abstract syntax tree, such as
semi-join transformation, derived table transformation, elimination of
constant values and redundant clauses (e.g ORDER BY, GROUP BY).
**/
通俗地讲,Resolver可以分为两部分:Semantic check和Rewriter。
Semantic check
这个概念来自于Oracle,主要负责将field、db甚至host等资源信息去和真实数据库内的对象去做绑定。举个例子,Parser是不关心field是否真实存在的,只要符合格式,随便写个字符串它都会语法解析成Item_field对象;而Semantic check步骤正是去结合数据库元信息去检查字段、表名、数据库名等是否合法真实存在。
需注意的是,基本的用户SELECT权限检查在MySQL的实现中放在了 precheck() 函数中,列权限检查放在了 SELECT_LEX::prepare() 的 setup_fields() 函数中。即,对于MySQL的实现,Semantic check仅是个逻辑上的流程概念。在一些其他数据库的实现中,可能会更直观地看到SemanticChecker这样的逻辑或者插件函数。
Table和Fields的绑定(逻辑分散在Sql_cmd_dml::prepare/SELECT_LEX::prepare等地方):
-
Parser时会找到所有的表,在Sql_cmd_dml::prepare阶段会按照表名和当前session所在的database(如果显式指定了则以用户指定的为准)去执行 open tables,如果表不合法则打开失败。
-
Item_field绑定Field的核心是set_field(Field*)函数。而 set_field 的主要调用逻辑都是 fix_field()。fix_field 的注释较为缺乏,实际用途是根据不同的resolution结果去绑定一个Item_field到物理或生成的Field上。fix_field会在多个地方被调用的原因是在 SQL Rewrite 的过程中,列发生变化,比如derive table优化或subquery优化,生成表的列优化为物理表的列等。这部分逻辑非常分散且复杂
Rewriter
剩下的 Resolve all expressions 、subquery、消除常量或冗余语句等都属于Rewriter过程,在很多MySQL分析中还是会把其称作Resolver。
TABLE
-
propagate_nullability: nullable table指的是table可能包含全为null的row,根据 JOIN关系null row可以被传播。如果能确定一个table为nullable(更准确的变量或方法命名叫contain_null_row)会使得一些优化退化,比如access method不能为EQ_REF、outer join不能优化为inner join等。
-
setup_tables():
-
leaf_tables:base table对应的是视图概念,获取leaf_tables即是透过视图找到视图背后关联的基表。但是base table一般指产生视图的物理表,一些常量表(如 SELECT 1+1;)不是物理表,但也算leaf table。因此leaf table相对来说概念更广一些,尽管绝大多数情况下都是物理表。
-
处理table hint 和 index hint。
-
-
resolve_placeholder_tables() : derived table, view OR table function
-
resolve_derived(): prepare derived table (SELECT_LEX_UNIT)
-
merge_derived(): 尝试merge derive table到outer query block。
![](/Users/lhfcws/Library/Application Support/marktext/images/2019-09-25-20-30-08-image.png)
-
setup_materialized_derived(): 对于剩下不能采用 merge 算法的 derived table ,会转为materialize 物化方式去处理。但此时只是做一些变量设置等预处理,实际的物化执行是在executor阶段执行。
-
用 materialize 算法处理 leaf tables 中的非物理表, resolve_derived() 之后对其 setup_materialized_derived() 。
-
如果 query block 中有 table function,整个过程会处理两遍。第一遍会跳过 table function 的 table ,第二遍才专门再对table function 的 table 执行一遍上述逻辑。这里的考虑应该是先 resolve 了外部环境(相对于table function),因为有可能函数参数会有依赖外部的 derived table。这里可以参见其代码里注释的一个例子。
/* A table function TF may depend on a previous derived table DT in the same FROM clause. Thus DT must be resolved before TF. It is the case, as the loops below progress from left to right in FROM. Alas this progress misses formerly-nested derived tables which are handled last, by the special branch 'if (!first_execution)'. So, assume a prepared statement: - SELECT FROM (SELECT FROM (SELECT ...) AS DT) AS DT1, JSON_TABLE(DT1.col); - DT1 is resolved, which materializes DT - JSON_TABLE is resolved - DT1 is merged Now we execute the statement: - in the first loop in the function, DT1 is not resolved again, as it was merged, and DT is not reached - we must thus defer resolution of JSON_TABLE - until DT is reached and resolved by the special branch 'if (!first_execution)' for formerly-nested derived tables - and then we can resolve JSON_TABLE. So, we run the code in this function twice: a first time for all non-JSON_TABLE tables, a second time for JSON_TABLE. */
SELECT fields
- setup_wilds(): 将 SELECT * 的通配符展开
- setup_fields(): 这里的fields不是指所有字段,而是select_fields,在阅读相关MySQL代码时需区分字段field和select field 两个概念,命名并不是很直观。这里的核心是fix_field() 和 列权限检查。
WHERE conds/ Join conds
-
setup_conds(): 核心是 where_cond->fix_fields() 和 setup_join_cond()
-
setup_join_cond():
-
如果有nested join,则递归 setup_join_cond(child_join)
-
如果存在A [LEFT] JOIN B ON join_cond,则 TABLE_LIST->join_cond 在解析后不为空,但优化过程中 inner join 的 join cond 会被优化进where_cond 里然后变为NULL。如果 join_cond 存在,也对其 fix_fields()
-
-
. setup_having : having_cond->fix_fields() 。having没有专门的函数括起
GROUPBY / ORDERBY
-
setup_order():
-
find_order_in_list(): 尝试在select fields里去寻找可以映射的列,否则就得在最后投影的all fields里加上当前列。同时该函数里也做了 fix_fields() 。
-
Check: ORDER BY for agg expression: Aggregated expressions in ORDER BY are not supported by SQL standard, but MySQL has some limited support for them.
select count(l_returnflag) as sum_qty from lineitem order by sum(l_quantity);
-
-
setup_group(): 和 setup_order() 基本类似。但GROUP BY明确不支持 agg的Item。
Subquery
-
remove_redundant_subquery_clauses() : 主要是去掉一些不规范的多余的语句,比如:
1. For a scalar subquery without LIMIT: ORDER BY is redundant, as the number of rows to order must be 1. 2. Remove GROUP BY if there are no aggregate functions, no HAVING clause, no ROLLUP and no windowing functions. 3. For a table subquery predicate (IN/ANY/ALL/EXISTS/etc): since it does not support LIMIT the following clauses are redundant: ORDER BY DISTINCT GROUP BY if there are no aggregate functions and no HAVING clause. ...
-
resolve_subquery():
Perform early unconditional subquery transformations: - Conver subquery predicate into semi-join, or - Mark the subquery for execution using materialization, or - Perform IN->EXISTS transformation, or - Perform more/less ALL/ANY -> MIN/MAX rewrite - Substitute trivial scalar-context subquery with its value
其中 convert semi join 在这一步更准确应该是 Prepare conversion,这一步并不会做实际的 rewrite semijoin 操作,而是去找到合法的可以转换的 subquery ,压入 sj_candidates 数组中,等待后面的 flatten_subqueries() 处理。后面列的几个调用实际实现在
Item_in_subselect::select_in_like_transformer
和Item_in_subselect::single_value_transformer
中。 -
flatten_subqueries(): bottom-up convert IN-subquery into semi-join
TODO
Others
-
setup_ftfuncs: 全文索引的prepare工作
-
remove_unused_windows:remove any unused explicit window
definitions 消除无用多余的窗口函数。 -
resolve_rollup:当 OLAP_TYPE=ROLLUP_TYPE 时才会调用,Resolve items for rollup processing。