preparedstatement将PREPARE,BIND,EXECUTE分开,其好处是避免了重复的语法分析、语义分析与重写,对于复杂SQL来说,其效果更加明显。
In the extended-query protocol, execution of SQL commands is divided into multiple steps. The state retained between steps is represented by two types of objects: prepared statements and portals. A prepared statement represents the result of parsing and semantic analysis of a textual query string. A prepared statement is not in itself ready to execute, because it might lack specific values for parameters. A portal represents a ready-to-execute or already-partially-executed statement, with any missing parameter values filled in. (For SELECT statements, a portal is equivalent to an open cursor, but we choose to use a different term since cursors don't handle non-SELECT statements.) The overall execution cycle consists of a parse step, which creates a prepared statement from a textual query string; a bind step, which creates a portal given a prepared statement and values for any needed parameters; and an execute step that runs a portal's query. In the case of a query that returns rows (SELECT, SHOW, etc), the execute step can be told to fetch only a limited number of rows, so that multiple execute steps might be needed to complete the operation. The backend can keep track of multiple prepared statements and portals (but note that these exist only within a session, and are never shared across sessions). Existing prepared statements and portals are referenced by names assigned when they were created. In addition, an “unnamed” prepared statement and portal exist. Although these behave largely the same as named objects, operations on them are optimized for the case of executing a query only once and then discarding it, whereas operations on named objects are optimized on the expectation of multiple uses.
不同于SIMPLE QUERY,P B E的入口不太一样:
switch (firstchar) { case 'Q': /* simple query */ { const char *query_string; /* Set statement_timestamp() */ SetCurrentStatementStartTimestamp(); query_string = pq_getmsgstring(&input_message); pq_getmsgend(&input_message); if (am_walsender) { if (!exec_replication_command(query_string)) exec_simple_query(query_string); } else exec_simple_query(query_string); send_ready_for_query = true; } break; case 'P': /* parse */ { const char *stmt_name; const char *query_string; int numParams; Oid *paramTypes = NULL; forbidden_in_wal_sender(firstchar); /* Set statement_timestamp() */ SetCurrentStatementStartTimestamp(); stmt_name = pq_getmsgstring(&input_message); query_string = pq_getmsgstring(&input_message); numParams = pq_getmsgint(&input_message, 2); if (numParams > 0) { paramTypes = (Oid *) palloc(numParams * sizeof(Oid)); for (int i = 0; i < numParams; i++) paramTypes[i] = pq_getmsgint(&input_message, 4); } pq_getmsgend(&input_message); exec_parse_message(query_string, stmt_name, paramTypes, numParams); } break; case 'B': /* bind */ forbidden_in_wal_sender(firstchar); /* Set statement_timestamp() */ SetCurrentStatementStartTimestamp(); /* * this message is complex enough that it seems best to put * the field extraction out-of-line */ exec_bind_message(&input_message); break; case 'E': /* execute */ { const char *portal_name; int max_rows; forbidden_in_wal_sender(firstchar); /* Set statement_timestamp() */ SetCurrentStatementStartTimestamp(); portal_name = pq_getmsgstring(&input_message); max_rows = pq_getmsgint(&input_message, 4); pq_getmsgend(&input_message); exec_execute_message(portal_name, max_rows); } break;
prepare处理流程:
StorePreparedStatement prepare.c:432 PrepareQuery prepare.c:173 standard_ProcessUtility utility.c:737 pgss_ProcessUtility pg_stat_statements.c:1201 pgaudit_ProcessUtility_hook pgaudit.c:1412 ProcessUtility utility.c:521 PortalRunUtility pquery.c:1157 PortalRunMulti pquery.c:1303 PortalRun pquery.c:779 exec_simple_query postgres.c:1326 PostgresMain postgres.c:4445 BackendRun postmaster.c:4883 BackendStartup postmaster.c:4567 ServerLoop postmaster.c:1854 PostmasterMain postmaster.c:1487 main main.c:231 __libc_start_main 0x00007f32f566f555 _start 0x0000000000484799
PREPARE时,语句将被parsed, analyzed, and rewritten。BIND的时候被plan,EXECUTE的时候被执行。
最后通过调用StorePreparedStatement保存在per-backend prepared_queries哈希中(实际上把它放在全局变量的话,是可以做到backend无关的)。
FetchPreparedStatement prepare.c:477 UtilityReturnsTuples utility.c:2020 ChoosePortalStrategy pquery.c:258 PortalStart pquery.c:464 exec_simple_query postgres.c:1287 PostgresMain postgres.c:4445 BackendRun postmaster.c:4883 BackendStartup postmaster.c:4567 ServerLoop postmaster.c:1854 PostmasterMain postmaster.c:1487 main main.c:231 __libc_start_main 0x00007f32f566f555 _start 0x0000000000484799
GetCachedPlan plancache.c:1157 ExecuteQuery prepare.c:233 standard_ProcessUtility utility.c:742 pgss_ProcessUtility pg_stat_statements.c:1201 pgaudit_ProcessUtility_hook pgaudit.c:1412 ProcessUtility utility.c:521 PortalRunUtility pquery.c:1157 PortalRunMulti pquery.c:1303 PortalRun pquery.c:779 exec_simple_query postgres.c:1326 PostgresMain postgres.c:4445 BackendRun postmaster.c:4883 BackendStartup postmaster.c:4567 ServerLoop postmaster.c:1854 PostmasterMain postmaster.c:1487 main main.c:231 __libc_start_main 0x00007f32f566f555 _start 0x0000000000484799
BIND:
执行计划生成发生在BIND环节,因为CBO生成执行计划需要依赖于参数,选择custom还是generic执行计划也是在这一步。通过FetchPreparedStatement获取CachedPlanSource(未plan,所以叫plansource,至于已经plan的,那叫做generic plan,由choose_custom_plan和参数plan_cache_mode决定),如下:
/* * Lookup an existing query in the hash table. If the query does not * actually exist, throw ereport(ERROR) or return NULL per second parameter. * * Note: this does not force the referenced plancache entry to be valid, * since not all callers care. */ PreparedStatement * FetchPreparedStatement(const char *stmt_name, bool throwError) { PreparedStatement *entry; /* * If the hash table hasn't been initialized, it can't be storing * anything, therefore it couldn't possibly store our plan. */ if (prepared_queries) entry = (PreparedStatement *) hash_search(prepared_queries, stmt_name, HASH_FIND, NULL); else entry = NULL; if (!entry && throwError) ereport(ERROR, (errcode(ERRCODE_UNDEFINED_PSTATEMENT), errmsg("prepared statement \"%s\" does not exist", stmt_name))); return entry; }
决定新生成plan还是复用generic_plan的逻辑如下:
CachedPlan * GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams, bool useResOwner, QueryEnvironment *queryEnv) { ...... /* Make sure the querytree list is valid and we have parse-time locks */ qlist = RevalidateCachedQuery(plansource, queryEnv); /* Decide whether to use a custom plan */ customplan = choose_custom_plan(plansource, boundParams); if (!customplan) { if (CheckCachedPlan(plansource)) { /* We want a generic plan, and we already have a valid one */ plan = plansource->gplan; Assert(plan->magic == CACHEDPLAN_MAGIC); } else { /* Build a new generic plan */ plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv); /* Just make real sure plansource->gplan is clear */ ReleaseGenericPlan(plansource); /* Link the new generic plan into the plansource */ plansource->gplan = plan; plan->refcount++; /* Immediately reparent into appropriate context */ if (plansource->is_saved) { /* saved plans all live under CacheMemoryContext */ MemoryContextSetParent(plan->context, CacheMemoryContext); plan->is_saved = true; } else { /* otherwise, it should be a sibling of the plansource */ MemoryContextSetParent(plan->context, MemoryContextGetParent(plansource->context)); } /* Update generic_cost whenever we make a new generic plan */ plansource->generic_cost = cached_plan_cost(plan, false); /* * If, based on the now-known value of generic_cost, we'd not have * chosen to use a generic plan, then forget it and make a custom * plan. This is a bit of a wart but is necessary to avoid a * glitch in behavior when the custom plans are consistently big * winners; at some point we'll experiment with a generic plan and * find it's a loser, but we don't want to actually execute that * plan. */ customplan = choose_custom_plan(plansource, boundParams); /* * If we choose to plan again, we need to re-copy the query_list, * since the planner probably scribbled on it. We can force * BuildCachedPlan to do that by passing NIL. */ qlist = NIL; } } if (customplan) { /* Build a custom plan */ plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv); /* Accumulate total costs of custom plans, but 'ware overflow */ if (plansource->num_custom_plans < INT_MAX) { plansource->total_custom_cost += cached_plan_cost(plan, true); plansource->num_custom_plans++; } } ......
需要注意的是,BIND中区分了语句和portal(虽然portal是运行时表示,portal是execute的前提条件)。命名portal和未命名portal的区别在于:命名portal会持续到事务结束或被显示销毁,未命名会在下一次BIND执行时自动销毁,命名PORTAL必须在下次BIND之前显示被销毁。
https://jdbc.postgresql.org/documentation/head/server-prepare.html
注:上图也说明了,PG服务端是支持一次接收多个可连续的请求命令的,如上面的PARSE和BIND。
第二次请求不包含PARSE,如下:
这是在客户端进行处理的。
https://www.postgresql.org/docs/current/sql-prepare.html
https://www.postgresql.org/docs/current/view-pg-prepared-statements.html
显示执行的PREPARE会在pg_prepared_statements中实时显示预编译的语句,协议级的PREPARE则不会在此体现。因为postgresql的plancache是per backend的,所以要验证的话,就得在java中查询pg_prepared_statements。
zjh@postgres=# PREPARE fooplan (int, text, bool, numeric) AS zjh@postgres-# INSERT INTO foo VALUES($1, $2, $3, $4); PREPARE zjh@postgres=# select * from pg_prepared_statements ; name | statement | prepare_time | parameter_types | from_sql ---------+-----------------------------------------------+------------------------------+--------------------------------+---------- fooplan | PREPARE fooplan (int, text, bool, numeric) AS+| 2022-01-26 10:04:10.81974+00 | {integer,text,boolean,numeric} | t | INSERT INTO foo VALUES($1, $2, $3, $4); | | | (1 row)
按照https://www.postgresql.org/message-id/CAL454F2yiTPqnTAVw78teOCnHvYxMSjzSekH8wjOPxVNTLFejw%40mail.gmail.com的说法,JDBC只要设置setPrepareThreshold(1)即可。javadoc也确实是如此说的,如下:
/** * Turn on the use of prepared statements in the server (server side prepared statements are * unrelated to jdbc PreparedStatements) As of build 302, this method is equivalent to * <code>setPrepareThreshold(1)</code>. * * @param flag use server prepare * @throws SQLException if something goes wrong * @since 7.3 * @deprecated As of build 302, replaced by {@link #setPrepareThreshold(int)} */ @Deprecated void setUseServerPrepare(boolean flag) throws SQLException;
代码中判断是否oneshotquery是通过org.postgresql.jdbc.PgStatement#executeInternal中调用isOneShotQuery(cachedQuery),而它这事通过mPrepareThreshold==0来判断,pgjdbc客户端好几个鸡肋。