05-sql语句执行流程解析

2021-11-04 10:04:57

sql执行语句流程解析

整个处理流程在exec_simple_query函数中完成，代码架构如下：

/*
 * exec_simple_query
 *
 * Execute a "simple Query" protocol message.
 */
static void
exec_simple_query(const char *query_string)
{
	...
	//原始语法树获取
	/*
	 * Do basic parsing of the query or queries (this should be safe even if
	 * we are in aborted transaction state!)
	 */
	parsetree_list = pg_parse_query(query_string);

	...
	//循环处理sql语句
	/*
	 * Run through the raw parsetree(s) and process each one.
	 */
	foreach(parsetree_item, parsetree_list)
	{
		...
		
		//对原始语法树进行分析和重写，生成查询语法树
		querytree_list = pg_analyze_and_rewrite(parsetree, query_string,
												NULL, 0, NULL);
		//对查询语法树进行优化，生成执行计划
		plantree_list = pg_plan_queries(querytree_list,
										CURSOR_OPT_PARALLEL_OK, NULL);

		...
		
		//执行语句
		/*
		 * Run the portal to completion, and then drop it (and the receiver).
		 */
		(void) PortalRun(portal,
						 FETCH_ALL,
						 true,	/* always top level */
						 true,
						 receiver,
						 receiver,
						 completionTag);

		...
	}
	...
}

查询分析和优化重写

词法、语法解析

使用FLEX和BISON做语法解析，详见https://my.oschina.net/Greedxuji/blog/4290160

查询分析和优化重写

sql语句经过词法、语法解析后，将得到一个原始的语法树。查询分析的作用就是对原始语法树进行分析重写，将原始树转换成一颗或者多颗查询语法树。

该部分功能主要在pg_analyze_and_rewrite函数中完成，主要操作步骤为语法分析和优化重写。

代码框架如下：

/*
 * Given a raw parsetree (gram.y output), and optionally information about
 * types of parameter symbols ($n), perform parse analysis and rule rewriting.
 *
 * A list of Query nodes is returned, since either the analyzer or the
 * rewriter might expand one query to several.
 *
 * NOTE: for reasons mentioned above, this must be separate from raw parsing.
 */
List *
pg_analyze_and_rewrite(RawStmt *parsetree, const char *query_string,
					   Oid *paramTypes, int numParams,
					   QueryEnvironment *queryEnv)
{
	Query	   *query;
	List	   *querytree_list;

	TRACE_POSTGRESQL_QUERY_REWRITE_START(query_string);

	/*
	 * (1) Perform parse analysis.
	 */
	if (log_parser_stats)
		ResetUsage();

	//原始语法树分析
	query = parse_analyze(parsetree, query_string, paramTypes, numParams,
						  queryEnv);

	if (log_parser_stats)
		ShowUsage("PARSE ANALYSIS STATISTICS");

	//原始语法树优化重写
	/*
	 * (2) Rewrite the queries, as necessary
	 */
	querytree_list = pg_rewrite_query(query);

	TRACE_POSTGRESQL_QUERY_REWRITE_DONE(query_string);

	return querytree_list;
}

查询分析 parse_analyze

查询分析是将原始语法树转换为查询语法树。因为元素语法树为树结构，所以遍历树的节点执行相应的处理。

基本调用栈如下，由此可见，对select的相关处理都已经包含完全了；相应的sql语句按照相应的执行节点执行就可以了。

parse_analyze
	->transformTopLevelStmt
		->transformOptionalSelectInto
			->transformStmt
				->transformInsertStmt
				->transformDeleteStmt
				->transformUpdateStmt
				->transformSelectStmt
				->transformDeclareCursorStmt
				->transformExplainStmt
				->transformCreateTableAsStmt
				->transformCallStmt

主要函数解析

这里以“ SELECT * FROM A_TBL,B_TBL WHERE xx == xx ”为例。

命令执行时首先调用transformSelectStmt 函数。

SELECT命令包含WITH . FROM . TARGET . WHERE . HAVING . ORDER BY . GROUP BY . DISTINCT 7种信息处理。每个信息处理对应了一个处理函数。具体代码如下：

static Query *
transformSelectStmt(ParseState *pstate, SelectStmt *stmt)
{
	Query	   *qry = makeNode(Query);
	Node	   *qual;
	ListCell   *l;

	qry->commandType = CMD_SELECT;

	/* process the WITH clause independently of all else */
	if (stmt->withClause)
	{
		qry->hasRecursive = stmt->withClause->recursive;
		qry->cteList = transformWithClause(pstate, stmt->withClause);
		qry->hasModifyingCTE = pstate->p_hasModifyingCTE;
	}

	/* Complain if we get called from someplace where INTO is not allowed */
	if (stmt->intoClause)
		ereport(ERROR,
				(errcode(ERRCODE_SYNTAX_ERROR),
				 errmsg("SELECT ... INTO is not allowed here"),
				 parser_errposition(pstate,
									exprLocation((Node *) stmt->intoClause))));

	/* make FOR UPDATE/FOR SHARE info available to addRangeTableEntry */
	pstate->p_locking_clause = stmt->lockingClause;

	/* make WINDOW info available for window functions, too */
	pstate->p_windowdefs = stmt->windowClause;

	/* process the FROM clause */
	transformFromClause(pstate, stmt->fromClause);

	/* transform targetlist */
	qry->targetList = transformTargetList(pstate, stmt->targetList,
										  EXPR_KIND_SELECT_TARGET);

	/* mark column origins */
	markTargetListOrigins(pstate, qry->targetList);

	/* transform WHERE */
	qual = transformWhereClause(pstate, stmt->whereClause,
								EXPR_KIND_WHERE, "WHERE");

	/* initial processing of HAVING clause is much like WHERE clause */
	qry->havingQual = transformWhereClause(pstate, stmt->havingClause,
										   EXPR_KIND_HAVING, "HAVING");

	/*
	 * Transform sorting/grouping stuff.  Do ORDER BY first because both
	 * transformGroupClause and transformDistinctClause need the results. Note
	 * that these functions can also change the targetList, so it's passed to
	 * them by reference.
	 */
	qry->sortClause = transformSortClause(pstate,
										  stmt->sortClause,
										  &qry->targetList,
										  EXPR_KIND_ORDER_BY,
										  false /* allow SQL92 rules */ );

	qry->groupClause = transformGroupClause(pstate,
											stmt->groupClause,
											&qry->groupingSets,
											&qry->targetList,
											qry->sortClause,
											EXPR_KIND_GROUP_BY,
											false /* allow SQL92 rules */ );

	if (stmt->distinctClause == NIL)
	{
		qry->distinctClause = NIL;
		qry->hasDistinctOn = false;
	}
	else if (linitial(stmt->distinctClause) == NULL)
	{
		/* We had SELECT DISTINCT */
		qry->distinctClause = transformDistinctClause(pstate,
													  &qry->targetList,
													  qry->sortClause,
													  false);
		qry->hasDistinctOn = false;
	}
	else
	{
		/* We had SELECT DISTINCT ON */
		qry->distinctClause = transformDistinctOnClause(pstate,
														stmt->distinctClause,
														&qry->targetList,
														qry->sortClause);
		qry->hasDistinctOn = true;
	}

	/* transform LIMIT */
	qry->limitOffset = transformLimitClause(pstate, stmt->limitOffset,
											EXPR_KIND_OFFSET, "OFFSET");
	qry->limitCount = transformLimitClause(pstate, stmt->limitCount,
										   EXPR_KIND_LIMIT, "LIMIT");

	/* transform window clauses after we have seen all window functions */
	qry->windowClause = transformWindowDefinitions(pstate,
												   pstate->p_windowdefs,
												   &qry->targetList);

	/* resolve any still-unresolved output columns as being type text */
	if (pstate->p_resolve_unknowns)
		resolveTargetListUnknowns(pstate, qry->targetList);

	qry->rtable = pstate->p_rtable;
	qry->jointree = makeFromExpr(pstate->p_joinlist, qual);

	qry->hasSubLinks = pstate->p_hasSubLinks;
	qry->hasWindowFuncs = pstate->p_hasWindowFuncs;
	qry->hasTargetSRFs = pstate->p_hasTargetSRFs;
	qry->hasAggs = pstate->p_hasAggs;

	foreach(l, stmt->lockingClause)
	{
		transformLockingClause(pstate, qry,
							   (LockingClause *) lfirst(l), false);
	}

	assign_query_collations(pstate, qry);

	/* this must be done after collations, for reliable comparison of exprs */
	if (pstate->p_hasAggs || qry->groupClause || qry->groupingSets || qry->havingQual)
		parseCheckAggregates(pstate, qry);

	return qry;
}

FROM处理:

transformFromClause

FROM处理时，遍历fromlist将每一个“基表”传送给transformFromClauseItem进行处理，transformFromClauseItem处理的基表可能是直接处理基表或者查询表，例如：select * from aa,(select * from bb) as BB;所以在处理时分为一下几个类型进行处理：

RangeVar 调用 transformTableEntry：普通类型的基表，基表信息直接存储在pstate->p_rtable链表中，后续结果显示按照该链表顺序进行显示
RangeSubselect 调用 transformRangeSubselect：子查询类型的基表，因为是完整select语句，所以最终再调用transformStmt函数进行分析；解析的结果存储在pstate->p_rtable链表中，作为区别会将rtekind域设置为RTE_SUBQUERY。
RangeFunction 调用 transformRangeFunction：查询到函数并最终调用addRangeTableEntryForFunction函数，将结果存储在pstate->p_rtable链表中，作为区别会将rtekind域设置为RTE_FUNCTION。
RangeTableFunc 调用 transformRangeTableFunc：调用XMLTABLE相关函数，将结果存储在pstate->p_rtable链表中，作为区别会将rtekind域设置为RTE_TABLEFUNC。
RangeTableSample 调用 transformFromClauseItem：
JoinExpr 调用 transformFromClauseItem：join连接语句，对左右节点进行解析，获取到基表信息，并创建一个新的RTE结果存储在pstate->p_rtable链表中。作为区别会将rtekind域设置为RTE_JOIN。

在处理完成后，将所有表明添加到pstate->p_namespace中，该值用于后续对select *中列名的解析，查询出所有的列名；或者判断查询的列名是否存在。

查询目标列获取：

transformTargetList

当查询全部列名时，需要将*转换为全部列名，例如“ SELECT * FROM A_TBL,B_TBL WHERE xx == xx ”。在做列名解析时，在pstate->p_namespace中验证传入表中是否存在该列名，不存在则报错。

在获取时，分为字符串型的列名和句号.类型的列名。

全部为列名：直接存储在qry->targetList中。
字符中存在*星号,调用ExpandColumnRefStar处理：存在*则扩展为全部列名（SELECT *, dname FROM emp, dept）。带表名的*列名，则需要校验表明列名不超过4个（SELECT emp.*, dname FROM emp, dept）。将结果存储在qry->targetList中。
句号.类型调用ExpandIndirectionStar处理：解析表达式，验证是否存在列名，存在则存储在qry->targetList中。

对应代码如下：

List *
transformTargetList(ParseState *pstate, List *targetlist,
					ParseExprKind exprKind)
{
	List	   *p_target = NIL;
	bool		expand_star;
	ListCell   *o_target;

	/* Shouldn't have any leftover multiassign items at start */
	Assert(pstate->p_multiassign_exprs == NIL);

	/* Expand "something.*" in SELECT and RETURNING, but not UPDATE */
	expand_star = (exprKind != EXPR_KIND_UPDATE_SOURCE);

	foreach(o_target, targetlist)
	{
		ResTarget  *res = (ResTarget *) lfirst(o_target);

		/*
		 * Check for "something.*".  Depending on the complexity of the
		 * "something", the star could appear as the last field in ColumnRef,
		 * or as the last indirection item in A_Indirection.
		 */
		if (expand_star)
		{
			if (IsA(res->val, ColumnRef))
			{
				ColumnRef  *cref = (ColumnRef *) res->val;

				if (IsA(llast(cref->fields), A_Star))
				{
					/* It is something.*, expand into multiple items */
					p_target = list_concat(p_target,
										   ExpandColumnRefStar(pstate,
															   cref,
															   true));
					continue;
				}
			}
			else if (IsA(res->val, A_Indirection))
			{
				A_Indirection *ind = (A_Indirection *) res->val;

				if (IsA(llast(ind->indirection), A_Star))
				{
					/* It is something.*, expand into multiple items */
					p_target = list_concat(p_target,
										   ExpandIndirectionStar(pstate,
																 ind,
																 true,
																 exprKind));
					continue;
				}
			}
		}

		/*
		 * Not "something.*", or we want to treat that as a plain whole-row
		 * variable, so transform as a single expression
		 */
		p_target = lappend(p_target,
						   transformTargetEntry(pstate,
												res->val,
												NULL,
												exprKind,
												res->name,
												false));
	}

	...
}

WHERE处理：

transformWhereClause

在该函数中处理where语句，该语句处理时没有特定的函数进行处理，仍然使用transformExpr函数进行处理，当where中只有一个表达式时，transformExpr函数处理T_ColumnRef分支；当where中为多个表达式时，transformExpr函数处理T_BoolExpr分支，在transformBoolExpr函数中再拆分为T_ColumnRef分支处理。

WHERE的最终结果会存储在jointree中qry->jointree = makeFromExpr(pstate->p_joinlist, qual);。所以后续进行计划树优化时，会对jointree进行优化处理。

代码如下：

Node *
transformWhereClause(ParseState *pstate, Node *clause,
					 ParseExprKind exprKind, const char *constructName)
{
	Node	   *qual;

	if (clause == NULL)
		return NULL;

	qual = transformExpr(pstate, clause, exprKind);

	qual = coerce_to_boolean(pstate, qual, constructName);

	return qual;
}

HAVING处理：

transformWhereClause

按照where语句进行处理

/* initial processing of HAVING clause is much like WHERE clause */
	qry->havingQual = transformWhereClause(pstate, stmt->havingClause,
										   EXPR_KIND_HAVING, "HAVING");

GROUP BY处理：

transformGroupClause

在group by语句进行处理时，需要与order by语句一起处理。处理时需要先进行order by排序，再进行group by分组。

ORDER BY处理：

DISTINCT处理：

以上两种未作介绍

优化重写 pg_rewrite_query

按照pg_rewrite中定义的规则进行重写。

查询逻辑优化

分析重写后的查询树不是最优化的查询树，当碰到select子查询层次很深时，最低层的基表就和树根距离较远，这样就会增加查询时间。另外查询树中的各个节点信息是独立的，就有可能造成冗余查询，所以也需要做逻辑优化。所以，查询逻辑优化就是以数据库理论中的关系代数为理论基础，以查询树为载体，通过遍历查询树并在保证查询树中的语法单元的语义和最终结果不变的情况下对其进行优化；最终得到一个没有冗余的查询树。

------代码中在pg_plan_queries函数中实现

语法优化处理基本步骤及相关代码：

工具类语法（DDL，DML）不做处理
非工具类语法，使用pg_plan_query函数进行处理,pg_plan_query调用planner进行处理

List *
pg_plan_queries(List *querytrees, int cursorOptions, ParamListInfo boundParams)
{
	List	   *stmt_list = NIL;
	ListCell   *query_list;

	foreach(query_list, querytrees)
	{
		Query	   *query = lfirst_node(Query, query_list);
		PlannedStmt *stmt;

		if (query->commandType == CMD_UTILITY)
		{
			/* Utility commands require no planning. */
			stmt = makeNode(PlannedStmt);
			stmt->commandType = CMD_UTILITY;
			stmt->canSetTag = query->canSetTag;
			stmt->utilityStmt = query->utilityStmt;
			stmt->stmt_location = query->stmt_location;
			stmt->stmt_len = query->stmt_len;
		}
		else
		{
			stmt = pg_plan_query(query, cursorOptions, boundParams);
		}

		stmt_list = lappend(stmt_list, stmt);
	}

	return stmt_list;
}


PlannedStmt *
pg_plan_query(Query *querytree, int cursorOptions, ParamListInfo boundParams)
{
	PlannedStmt *plan;

	...
	/* call the optimizer */
	plan = planner(querytree, cursorOptions, boundParams);

	...

	return plan;
}

非工具类语法处理

在planner函数中对非工具类语法进行处理，如果设置planner_hook则调用钩子函数，默认调用standard_planner函数处理。

standard_planner函数中递归处理查询树，查询时将结果存储在PlannerGlobal全局结果中。再调用create_plan函数根据PlannerInfo类型创建执行计划树。最后将PlannerGlobal和PlannerInfo中存储的基本信息转存到PlannedStmt中并返回。

PlannedStmt *
planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
{
	PlannedStmt *result;

	if (planner_hook)
		result = (*planner_hook) (parse, cursorOptions, boundParams);
	else
		result = standard_planner(parse, cursorOptions, boundParams);
	return result;
}

PlannedStmt *
standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
{
	...
	
	/* primary planning entry point (may recurse for subqueries) */
	root = subquery_planner(glob, parse, NULL,
							false, tuple_fraction);

	/* Select best Path and turn it into a Plan */
	final_rel = fetch_upper_rel(root, UPPERREL_FINAL, NULL);
	best_path = get_cheapest_fractional_path(final_rel, tuple_fraction);

	top_plan = create_plan(root, best_path);

	...
	
	/* build the PlannedStmt result */
	result = makeNode(PlannedStmt);

	result->commandType = parse->commandType;
	...
	result->stmt_len = parse->stmt_len;

	result->jitFlags = PGJIT_NONE;
	...

	return result;
}

计划树生成并优化

在subquery_planner函数中生成，根据类型对查询语句进行分类处理。计划优化部分涉及子链接上提、函数处理、子查询上提等操作。因为计划树的生成和优化是按照类型分类处理并同时执行，所以这里放在一起介绍。

处理CTE（通用表表达式）表达式

SS_process_ctes：处理查询语句中的CTE子句（with语句），CTE是一个临时的结果集；可以将子查询的结果作为一个独立的结果集使用。所以在函数处理时遍历ctelish列表，将其中的各个子结果集通过再调用subquery_planner函数进行处理，处理的结果存储在root->glob->subplans链表中。

void
SS_process_ctes(PlannerInfo *root)
{
	ListCell   *lc;

	Assert(root->cte_plan_ids == NIL);

	foreach(lc, root->parse->cteList)
	{
		...
		
		/*
		 * Generate Paths for the CTE query.  Always plan for full retrieval
		 * --- we don't have enough info to predict otherwise.
		 */
		subroot = subquery_planner(root->glob, subquery,
								   root,
								   cte->cterecursive, 0.0);

		...

		plan = create_plan(subroot, best_path);

		...
		
		/*
		 * Add the subplan and its PlannerInfo to the global lists.
		 */
		root->glob->subplans = lappend(root->glob->subplans, plan);
		root->glob->subroots = lappend(root->glob->subroots, subroot);
		
		...
	}
}

子链接上提

pull_up_sublinks：将命令中的 ANY（sub-SELECT）和 EXISTS 转换为 JOIN 。这样能够将子链接和父查询进行合并，统一进行优化处理。

ANY语句转换为Semi-join语句，转换只适用于WHERE语句或者JOIN/ON语句。

EXISTS或者NOT EXISTS语句转换为Semi-join或者Anti-Semi-join。

基本流程介绍

子链接上提时，因为WHERE相关的节点信息存储在jointree中，所以会输入root->parse->jointree到pull_up_sublinks_jointree_recurse函数进行上提操作。pull_up_sublinks_jointree_recurse函数中检查jointree中存储的类型，并按照类型分类进行处理：

RangeTblRef：直接返回，不做优化
FromExpr：fromlist中包含两个域：基表信息（fromlist）和where条件表达式（quals）；处理流程如下：
- fromlist 列表：递归调用pull_up_sublinks_jointree_recurse函数
- quals 表达式：调用pull_up_sublinks_qual_recurse函数，where条件表达式（quals）上提
JoinExpr：joinexpr中包含两个域：左右基表信息（larg和rarg）和on约束条件（quals）；处理流程如下：
- 调用pull_up_sublinks_jointree_recurse函数处理左右节点
- 根据join类型处理对应的where条件表达式（quals）上提

具体转换流程介绍

实际转换流程执行，有以下限制：

子链接的子查询不能使用父节点的var类型变量：形成环路
比较表达式中必须包含父查询的var类型变量
比较表达式中不能包含任何的虚函数（Volatile function）

var类型变量:指查询分析和查询优化中的基表目标列；或者表示子查询计划的输出结果

convert_ANY_sublink_to_join函数介绍：

contain_vars_of_level：父节点环路检查，检查子查询中的基表是否是父节点的基表。
pull_varnos：比较表达式检查
contain_volatile_functions：虚函数查询
addRangeTableEntryForSubquery：创建名字为ANY_subquery的RangeTblEntry对象，添加到父查询的基表（rtable链表）中
generate_subquery_vars：根据rtable链表创建var变量用来存储子链接查询结果
convert_testexpr：调用XXX_mutator函数处理
构建JoinExpr节点，节点的larg由调用者填充

JoinExpr *
convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink,
							Relids available_rels)
{
	JoinExpr   *result;
	Query	   *parse = root->parse;
	Query	   *subselect = (Query *) sublink->subselect;
	Relids		upper_varnos;
	int			rtindex;
	RangeTblEntry *rte;
	RangeTblRef *rtr;
	List	   *subquery_vars;
	Node	   *quals;
	ParseState *pstate;

	Assert(sublink->subLinkType == ANY_SUBLINK);

	/*
	 * The sub-select must not refer to any Vars of the parent query. (Vars of
	 * higher levels should be okay, though.)
	 */
	if (contain_vars_of_level((Node *) subselect, 1))
		return NULL;

	/*
	 * The test expression must contain some Vars of the parent query, else
	 * it's not gonna be a join.  (Note that it won't have Vars referring to
	 * the subquery, rather Params.)
	 */
	upper_varnos = pull_varnos(sublink->testexpr);
	if (bms_is_empty(upper_varnos))
		return NULL;

	/*
	 * However, it can't refer to anything outside available_rels.
	 */
	if (!bms_is_subset(upper_varnos, available_rels))
		return NULL;

	/*
	 * The combining operators and left-hand expressions mustn't be volatile.
	 */
	if (contain_volatile_functions(sublink->testexpr))
		return NULL;

	/* Create a dummy ParseState for addRangeTableEntryForSubquery */
	pstate = make_parsestate(NULL);

	/*
	 * Okay, pull up the sub-select into upper range table.
	 *
	 * We rely here on the assumption that the outer query has no references
	 * to the inner (necessarily true, other than the Vars that we build
	 * below). Therefore this is a lot easier than what pull_up_subqueries has
	 * to go through.
	 */
	rte = addRangeTableEntryForSubquery(pstate,
										subselect,
										makeAlias("ANY_subquery", NIL),
										false,
										false);
	parse->rtable = lappend(parse->rtable, rte);
	rtindex = list_length(parse->rtable);

	/*
	 * Form a RangeTblRef for the pulled-up sub-select.
	 */
	rtr = makeNode(RangeTblRef);
	rtr->rtindex = rtindex;

	/*
	 * Build a list of Vars representing the subselect outputs.
	 */
	subquery_vars = generate_subquery_vars(root,
										   subselect->targetList,
										   rtindex);

	/*
	 * Build the new join's qual expression, replacing Params with these Vars.
	 */
	quals = convert_testexpr(root, sublink->testexpr, subquery_vars);

	/*
	 * And finally, build the JoinExpr node.
	 */
	result = makeNode(JoinExpr);
	result->jointype = JOIN_SEMI;
	result->isNatural = false;
	result->larg = NULL;		/* caller must fill this in */
	result->rarg = (Node *) rtr;
	result->usingClause = NIL;
	result->quals = quals;
	result->alias = NULL;
	result->rtindex = 0;		/* we don't need an RTE for it */

	return result;
}

上提原理分析

什么是半连接（SEMI-JOIN）：一张表AA在另外一张表BB中找到匹配的记录，返回第一张表AA中满足条件的记录，且BB表记录不被返回。

什么是IN语句：AA IN BB 返回AA中满足BB条件的记录。

由此而知：

基本原理一致，所以可以将IN语句转换为SEMI-JOIN 半连接语句。
因为右边的记录不会显示，所以上述处理中将实际查询语句放在JoinExpr的左子节点，便于显示。
所以上述提到的‘上提’操作，只是将子链接中的查询语句进行解析并转换为JoinExpr中节点信息的过程。由此来减少查询动作节约时间。

子查询优化

原代码为pull_up_subqueries;原代码注释为Check to see if any subqueries in the jointree can be merged into this query。

名称为子查询上提，实际是对pull_up_sublinks子链接上提操作后jointree树进行分析，尝试是否能否再进行优化。因为子链接上提操作未将子查询中的基表添加到父查询的基表（rtable链表）中。所以这里需要检查子查询是否能合并到父查询中。

具体操作为：检查jointree树中是否还存在别名的结果集，如果存在则替换为对应的查询语句的类型（RangeTblRef、FromExpr、JoinExpr）。

最终在pull_up_subqueries_recurse函数中实现上述流程；pull_up_subqueries_recurse函数介绍：函数对jointree进行解析，jointree中包含三种类型：

RangeTblRef：上提
- RTE_SUBQUERY且不为简单表达式：由于上提后层级会发生变化，所以对索引编号、层级编号、变量参数等需要进行调整。—调整相关的见代码，这里不做介绍
- RTE_SUBQUERY且为简单表达式：简单查询树，直接上提子查询树
- RTE_VALUES：上提为RTE值
FromExpr：遍历fromlist，递归调用pull_up_subqueries_recurse
JoinExpr：调用pull_up_subqueries_recurse函数处理左右节点，根据join类型修改相应的参数

static Node *
pull_up_subqueries_recurse(PlannerInfo *root, Node *jtnode,
						   JoinExpr *lowest_outer_join,
						   JoinExpr *lowest_nulling_outer_join,
						   AppendRelInfo *containing_appendrel)
{
	Assert(jtnode != NULL);
	if (IsA(jtnode, RangeTblRef))
	{
		int			varno = ((RangeTblRef *) jtnode)->rtindex;
		RangeTblEntry *rte = rt_fetch(varno, root->parse->rtable);

		/*
		 * Is this a subquery RTE, and if so, is the subquery simple enough to
		 * pull up?
		 *
		 * If we are looking at an append-relation member, we can't pull it up
		 * unless is_safe_append_member says so.
		 */
		if (rte->rtekind == RTE_SUBQUERY &&
			is_simple_subquery(rte->subquery, rte, lowest_outer_join) &&
			(containing_appendrel == NULL ||
			 is_safe_append_member(rte->subquery)))
			return pull_up_simple_subquery(root, jtnode, rte,
										   lowest_outer_join,
										   lowest_nulling_outer_join,
										   containing_appendrel);

		/*
		 * Alternatively, is it a simple UNION ALL subquery?  If so, flatten
		 * into an "append relation".
		 *
		 * It's safe to do this regardless of whether this query is itself an
		 * appendrel member.  (If you're thinking we should try to flatten the
		 * two levels of appendrel together, you're right; but we handle that
		 * in set_append_rel_pathlist, not here.)
		 */
		if (rte->rtekind == RTE_SUBQUERY &&
			is_simple_union_all(rte->subquery))
			return pull_up_simple_union_all(root, jtnode, rte);

		/*
		 * Or perhaps it's a simple VALUES RTE?
		 *
		 * We don't allow VALUES pullup below an outer join nor into an
		 * appendrel (such cases are impossible anyway at the moment).
		 */
		if (rte->rtekind == RTE_VALUES &&
			lowest_outer_join == NULL &&
			containing_appendrel == NULL &&
			is_simple_values(root, rte))
			return pull_up_simple_values(root, jtnode, rte);

		/* Otherwise, do nothing at this node. */
	}
	else if (IsA(jtnode, FromExpr))
	{
		FromExpr   *f = (FromExpr *) jtnode;
		ListCell   *l;

		Assert(containing_appendrel == NULL);
		/* Recursively transform all the child nodes */
		foreach(l, f->fromlist)
		{
			lfirst(l) = pull_up_subqueries_recurse(root, lfirst(l),
												   lowest_outer_join,
												   lowest_nulling_outer_join,
												   NULL);
		}
	}
	else if (IsA(jtnode, JoinExpr))
	{
		JoinExpr   *j = (JoinExpr *) jtnode;

		Assert(containing_appendrel == NULL);
		/* Recurse, being careful to tell myself when inside outer join */
		switch (j->jointype)
		{
			case JOIN_INNER:
				j->larg = pull_up_subqueries_recurse(root, j->larg,
													 lowest_outer_join,
													 lowest_nulling_outer_join,
													 NULL);
				j->rarg = pull_up_subqueries_recurse(root, j->rarg,
													 lowest_outer_join,
													 lowest_nulling_outer_join,
													 NULL);
				break;
			case JOIN_LEFT:
			case JOIN_SEMI:
			case JOIN_ANTI:
				j->larg = pull_up_subqueries_recurse(root, j->larg,
													 j,
													 lowest_nulling_outer_join,
													 NULL);
				j->rarg = pull_up_subqueries_recurse(root, j->rarg,
													 j,
													 j,
													 NULL);
				break;
			case JOIN_FULL:
				j->larg = pull_up_subqueries_recurse(root, j->larg,
													 j,
													 j,
													 NULL);
				j->rarg = pull_up_subqueries_recurse(root, j->rarg,
													 j,
													 j,
													 NULL);
				break;
			case JOIN_RIGHT:
				j->larg = pull_up_subqueries_recurse(root, j->larg,
													 j,
													 j,
													 NULL);
				j->rarg = pull_up_subqueries_recurse(root, j->rarg,
													 j,
													 lowest_nulling_outer_join,
													 NULL);
				break;
			default:
				elog(ERROR, "unrecognized join type: %d",
					 (int) j->jointype);
				break;
		}
	}
	else
		elog(ERROR, "unrecognized node type: %d",
			 (int) nodeTag(jtnode));
	return jtnode;
}

UNION ALL语句处理

/*
	 * If this is a simple UNION ALL query, flatten it into an appendrel. We
	 * do this now because it requires applying pull_up_subqueries to the leaf
	 * queries of the UNION ALL, which weren't touched above because they
	 * weren't referenced by the jointree (they will be after we do this).
	 */
	 if (parse->setOperations)
		flatten_simple_union_all(root);

RowMark处理

/*
	 * Preprocess RowMark information.  We need to do this after subquery
	 * pullup, so that all base relations are present.
	 */
	preprocess_rowmarks(root);

表达式优化处理

目标列处理、withCheckOptions处理、RETURN表达式处理、HAVING语句处理、WINDOWS语句处理、LIMIT OFF语句处理

都调用preprocess_expression函数进行处理。

处理流程

flatten_join_alias_vars：拉平链接中的变量别名
eval_const_expressions：常量表达式预处理
canonicalize_qual：对quals中的条件表达式进行正则化处理
SS_process_sublinks：子链接转换为子查询计划
SS_replace_correlation_vars：处理Param节点中的变量
make_ands_implicit：将quals或havingqual转换为隐式AND格式

static Node *
preprocess_expression(PlannerInfo *root, Node *expr, int kind)
{
	/*
	 * Fall out quickly if expression is empty.  This occurs often enough to
	 * be worth checking.  Note that null->null is the correct conversion for
	 * implicit-AND result format, too.
	 */
	if (expr == NULL)
		return NULL;

	/*
	 * If the query has any join RTEs, replace join alias variables with
	 * base-relation variables.  We must do this first, since any expressions
	 * we may extract from the joinaliasvars lists have not been preprocessed.
	 * For example, if we did this after sublink processing, sublinks expanded
	 * out from join aliases would not get processed.  But we can skip this in
	 * non-lateral RTE functions, VALUES lists, and TABLESAMPLE clauses, since
	 * they can't contain any Vars of the current query level.
	 */
	if (root->hasJoinRTEs &&
		!(kind == EXPRKIND_RTFUNC ||
		  kind == EXPRKIND_VALUES ||
		  kind == EXPRKIND_TABLESAMPLE ||
		  kind == EXPRKIND_TABLEFUNC))
		expr = flatten_join_alias_vars(root->parse, expr);

	/*
	 * Simplify constant expressions.
	 *
	 * Note: an essential effect of this is to convert named-argument function
	 * calls to positional notation and insert the current actual values of
	 * any default arguments for functions.  To ensure that happens, we *must*
	 * process all expressions here.  Previous PG versions sometimes skipped
	 * const-simplification if it didn't seem worth the trouble, but we can't
	 * do that anymore.
	 *
	 * Note: this also flattens nested AND and OR expressions into N-argument
	 * form.  All processing of a qual expression after this point must be
	 * careful to maintain AND/OR flatness --- that is, do not generate a tree
	 * with AND directly under AND, nor OR directly under OR.
	 */
	expr = eval_const_expressions(root, expr);

	/*
	 * If it's a qual or havingQual, canonicalize it.
	 */
	if (kind == EXPRKIND_QUAL)
	{
		expr = (Node *) canonicalize_qual((Expr *) expr, false);

#ifdef OPTIMIZER_DEBUG
		printf("After canonicalize_qual()\n");
		pprint(expr);
#endif
	}

	/* Expand SubLinks to SubPlans */
	if (root->parse->hasSubLinks)
		expr = SS_process_sublinks(root, expr, (kind == EXPRKIND_QUAL));

	/*
	 * XXX do not insert anything here unless you have grokked the comments in
	 * SS_replace_correlation_vars ...
	 */

	/* Replace uplevel vars with Param nodes (this IS possible in VALUES) */
	if (root->query_level > 1)
		expr = SS_replace_correlation_vars(root, expr);

	/*
	 * If it's a qual or havingQual, convert it to implicit-AND format. (We
	 * don't want to do this before eval_const_expressions, since the latter
	 * would be unable to simplify a top-level AND correctly. Also,
	 * SS_process_sublinks expects explicit-AND format.)
	 */
	if (kind == EXPRKIND_QUAL)
		expr = (Node *) make_ands_implicit((Expr *) expr);

	return expr;
}

各个类型分类处理

上述流程的多个类型转换环节最终都会调用 XXX_XXX_mutator函数。XXX_XXX_mutator函数根据各个类型来实现分类转换。

这里主要介绍一下SS_process_sublinks流程中调用的process_sublinks_mutator函数，因为子链接中的节点为不确定类型，所以再函数调用时也会根据类型实行分类处理，当类型都不满足时，调用expression_tree_mutator函数进行处理。

SubLink子链接类型：对sublink->testexpr再调用process_sublinks_mutator函数进行解析获取testexpr节点，再调用make_subplan函数解析testexpr节点并创建一个子计划节点。
AND和OR类型：遍历bool中的节点再调用process_sublinks_mutator函数进行解析，将解析结果存储再新的list中，最后创建expr节点存储list和对应的AND、OR类型信息并返回。

创建子查询计划

make_subplan函数流程如下：

tuple_fraction值设置：0-1表示记录查询的比例个数，比例根据EXISTS_SUBLINK、ALL_SUBLINK来制定。（因为ANY、EXISTS、ANY功能不一致）。
调用subquery_planner函数执行子链接的查询树优化，和完整的查询树优化处理一致。
create_plan、build_subplan创建计划。

完成子计划创建后，返回。

条件语句中的表达式优化处理

调用preprocess_qual_conditions函数遍历jointree节点，依据节点基础类型查找qual节点，并调用preprocess_expression函数对qual节点进行处理，：

RangeTblRef：什么都不做
FromExpr：遍历，递归调用preprocess_qual_conditions函数，再调用preprocess_expression处理各个节点中的qual节点
JoinExpr：对左右子节点调用preprocess_qual_conditions函数，再调用preprocess_expression处理节点中的qual节点

消除外连接

reduce_outer_joins

/*
 * reduce_outer_joins
 *		Attempt to reduce outer joins to plain inner joins.
 *
 * The idea here is that given a query like
 *		SELECT ... FROM a LEFT JOIN b ON (...) WHERE b.y = 42;
 * we can reduce the LEFT JOIN to a plain JOIN if the "=" operator in WHERE
 * is strict.  The strict operator will always return NULL, causing the outer
 * WHERE to fail, on any row where the LEFT JOIN filled in NULLs for b's
 * columns.  Therefore, there's no need for the join to produce null-extended
 * rows in the first place --- which makes it a plain join not an outer join.
 * (This scenario may not be very likely in a query written out by hand, but
 * it's reasonably likely when pushing quals down into complex views.)
 *
 * More generally, an outer join can be reduced in strength if there is a
 * strict qual above it in the qual tree that constrains a Var from the
 * nullable side of the join to be non-null.  (For FULL joins this applies
 * to each side separately.)
 *
 * Another transformation we apply here is to recognize cases like
 *		SELECT ... FROM a LEFT JOIN b ON (a.x = b.y) WHERE b.y IS NULL;
 * If the join clause is strict for b.y, then only null-extended rows could
 * pass the upper WHERE, and we can conclude that what the query is really
 * specifying is an anti-semijoin.  We change the join type from JOIN_LEFT
 * to JOIN_ANTI.  The IS NULL clause then becomes redundant, and must be
 * removed to prevent bogus selectivity calculations, but we leave it to
 * distribute_qual_to_rels to get rid of such clauses.
 *
 * Also, we get rid of JOIN_RIGHT cases by flipping them around to become
 * JOIN_LEFT.  This saves some code here and in some later planner routines,
 * but the main reason to do it is to not need to invent a JOIN_REVERSE_ANTI
 * join type.
 *
 * To ease recognition of strict qual clauses, we require this routine to be
 * run after expression preprocessing (i.e., qual canonicalization and JOIN
 * alias-var expansion).
 */
void
reduce_outer_joins(PlannerInfo *root)
{
	reduce_outer_joins_state *state;

	/*
	 * To avoid doing strictness checks on more quals than necessary, we want
	 * to stop descending the jointree as soon as there are no outer joins
	 * below our current point.  This consideration forces a two-pass process.
	 * The first pass gathers information about which base rels appear below
	 * each side of each join clause, and about whether there are outer
	 * join(s) below each side of each join clause. The second pass examines
	 * qual clauses and changes join types as it descends the tree.
	 */
	state = reduce_outer_joins_pass1((Node *) root->parse->jointree);

	/* planner.c shouldn't have called me if no outer joins */
	if (state == NULL || !state->contains_outer)
		elog(ERROR, "so where are the outer joins?");

	reduce_outer_joins_pass2((Node *) root->parse->jointree,
							 state, root, NULL, NIL, NIL);
}

生成查询计划

grouping_planner函数，首先处理LIMIT、ORDER BY、GROUP BY语句，然后根据setOperations（集合操作语句）值判断是否为UNION/INTERSECT/EXCEPT语句还是普通语句：

处理LIMIT语句
处理UNION/INTERSECT/EXCEPT语句，调用plan_set_operations函数处理，内部进行分类处理
- union递归处理 generate_recursion_path：对左右子句进行处理后再合并
- 非递归处理recurse_set_operations：按照基本流程就行处理
处理普通语句，按照基本流程就行处理

static void

grouping_planner(PlannerInfo *root, bool inheritance_update,

double tuple_fraction)

{

Query *parse = root->parse;

int64 offset_est = 0;

int64 count_est = 0;

double limit_tuples = -1.0;

bool have_postponed_srfs = false;

PathTarget *final_target;

List *final_targets;

List *final_targets_contain_srfs;

bool final_target_parallel_safe;

RelOptInfo *current_rel;

RelOptInfo *final_rel;

FinalPathExtraData extra;

ListCell *lc;

static void
grouping_planner(PlannerInfo *root, bool inheritance_update,
				 double tuple_fraction)
{
	...
	
	//处理LIMIT语句
	/* Tweak caller-supplied tuple_fraction if have LIMIT/OFFSET */
	if (parse->limitCount || parse->limitOffset)
	{
		tuple_fraction = preprocess_limit(root, tuple_fraction,
										  &offset_est, &count_est);

		/*
		 * If we have a known LIMIT, and don't have an unknown OFFSET, we can
		 * estimate the effects of using a bounded sort.
		 */
		if (count_est > 0 && offset_est >= 0)
			limit_tuples = (double) count_est + (double) offset_est;
	}

	/* Make tuple_fraction accessible to lower-level routines */
	root->tuple_fraction = tuple_fraction;

	if (parse->setOperations)
	{
		...
		
		//处理UNION/INTERSECT/EXCEPT语句
		/*
		 * Construct Paths for set operations.  The results will not need any
		 * work except perhaps a top-level sort and/or LIMIT.  Note that any
		 * special work for recursive unions is the responsibility of
		 * plan_set_operations.
		 */
		current_rel = plan_set_operations(root);

		...
	}
	else
	{
		//普通语句处理
		...
	}
	...
}

普通语句处理流程

除开特殊语句，其他的语句都执行普通执行流程：

preprocess_groupclause处理分组语句：将GROUP BY 后的元素重新排列顺序，调整的顺序按照ORDER BY调整。便于后续利用索引快速完成ORDER BY和GROUP BY操作。
preprocess_targetlist处理目标列语句：没看懂怎么处理的
get_agg_clause_costs收集聚集函数使用的成本：
select_active_windows执行windows函数：
query_planner创建查询访问路径：因为该部分比较重要，所以单独讲一下

创建查询访问路径

注意：阅读代码前需要对查询引擎原理进行了解，不然不知道为什么这么做。

query_planner函数处理时分为普通语句和fromlist链表长度为1的语句（"SELECT expression" and "INSERT ... VALUES()"）–该类型调用函数直接处理并返回结果。重点讲一下普通语句的处理流程（普通查询会有三个要素：数据源、输出结果、查询条件，下面依次进行填充）：

setup_append_rel_array收集基表信息：从root->parse->rtable表设置root->append_rel_array表。

add_base_rels_to_query构建RelOptInfo数组（基表信息）（设置数据源）：根据jointree类型创建RelOptInfo数组，将RelOptInfo数据存放再root->simple_rel_array中。简单来说就是创建基表的数组，再填充基表中的数据源。

build_simple_rel设置RelOptInfo参数填充输出链表targetlist和查询条件quals

build_base_rel_tlists设置目标列（设置输出结果）：设置查询语句的输出结果，遍历target list（这里入参为root->processed_tlist），查找出所有的Var类型节点并添加到Var所属基表的RelOptInfo的reltargetlist中（已存在则不做重复添加）。简单来说就是填充基表的输出结果；将列名和数据源绑定：（select a1 , b1 from aa, bb where aa.a1 = bb.b1;------将a1和aa的关系绑定起来）

pull_var_clause：调用pull_var_clause_walker函数查询所用的Var变量
add_vars_to_targetlist：将Var变量添加到root->simple_rel_array中

deconstruct_jointree设置约束条件（设置查询条件）：调用deconstruct_recurse函数按照类型进行处理：

RangeTblRef：直接返回，不处理
FromExpr：查找FromExpr中所有的基表Relids信息，然后和基表相关的约束条件绑定到基表RelOptInfo中。调用distribute_qual_to_rels函数执行绑定操作，绑定函数中涉及较复杂流程，后续介绍。
JoinExpr：按照join的类型分类进行查找，查找JoinExpr中左右子节点中所有的基表Relids信息，然后和基表相关的约束条件绑定到基表RelOptInfo中。最后调用make_outerjoininfo函数做连接顺序处理

reconsider_outer_join_clauses处理外连接：

generate_base_implied_equalities创建约束条件

create_lateral_join_info构建lateraljoin信息

码农公寓

05-sql语句执行流程解析

查询分析和优化重写

词法、语法解析

查询分析和优化重写

查询分析 parse_analyze

FROM处理:

查询目标列获取：

WHERE处理：

HAVING处理：

GROUP BY处理：

ORDER BY处理：

DISTINCT处理：

优化重写 pg_rewrite_query

查询逻辑优化

非工具类语法处理

计划树生成并优化

处理CTE（通用表表达式）表达式

子链接上提

基本流程介绍

具体转换流程介绍

上提原理分析

子查询优化

UNION ALL语句处理

RowMark处理

表达式优化处理

各个类型分类处理

创建子查询计划

条件语句中的表达式优化处理

消除外连接

生成查询计划

普通语句处理流程

创建查询访问路径

查询物理优化

查询计划生成

码农公寓

查询分析和优化重写

词法、语法解析

查询分析和优化重写

查询分析 parse_analyze

FROM处理:

查询目标列获取：

WHERE处理：

HAVING处理：

GROUP BY处理：

ORDER BY处理：

DISTINCT处理：

优化重写 pg_rewrite_query

查询逻辑优化

非工具类语法处理

计划树生成并优化

处理CTE（通用表表达式）表达式

子链接上提

基本流程介绍

具体转换流程介绍

上提原理分析

子查询优化

UNION ALL语句处理

RowMark处理

表达式优化处理

各个类型分类处理

创建子查询计划

条件语句中的表达式优化处理

消除外连接

生成查询计划

普通语句处理流程

创建查询访问路径

查询物理优化

查询计划生成

相关文章