PostgreSQL 11 preview - 强制auto prepared statment开关(自动化plan cache)(类似Oracle cursor_sharing force)

2021-12-27 03:09:57

背景

通常一个QUERY在执行时分为几个步骤，语义解析、SQL REWRITE、生成所有可选执行路径、选择最优路径、执行等多个步骤。

对于同一类SQL，只是更换SQL中的一些FILTER变量时，实际上很多步骤是可以共享的。例如语义解析、SQL REWRITE、生成执行计划都是可共享的，这些步骤通常比较耗CPU资源，通过prepared statement对于高并发的小事务来说，可以大幅降低CPU开销，降低延迟，性能提升非常明显。

比如：

1、简单调用，耗费较多资源

select * from tbl where id=1;  
  
select * from tbl where id=2;

2、绑定变量，一次语义解析、SQL REWRITE、生成执行计划，多次BIND，EXEC。

prepare ps1(int) as select * from tbl where id=$1;  
  
execute ps1(1);  
  
execute ps1(2);  
  
....

目前PostgreSQL支持几种绑定变量手段：

1、DB端绑定变量

https://www.postgresql.org/docs/10/static/sql-prepare.html

2、协议绑定变量

《学习 PostgreSQL Frontend/Backend protocol (通信协议)》

驱动使用例子

libpq  
  
jdbc  
  
...

https://www.postgresql.org/docs/10/static/libpq-async.html

https://jdbc.postgresql.org/documentation/head/server-prepare.html

3、UDF中动态SQL绑定变量

《PostgreSQL Oracle 兼容性之 - DBMS_SQL(存储过程动态SQL中使用绑定变量)》

PostgreSQL绑定变量的使用

1、每个会话各自有各自的绑定变量，所以切换会话的话，绑定变量会失效，需要重新绑定。

《PostgreSQL plan cache 源码浅析 - 如何确保不会计划倾斜》

2、函数调用是默认用绑定变量的。（因为函数调用接口固定，很容易实现绑定变量）。

但是并不是所有场景都可以用绑定变量的，例如我们在使用连接池时，如果需要用transaction级复用机制，那么就无法使用绑定变量，因为事务结束后，连接就可能被其他会话复用，而你再次发起execute ps请求时，可能从池里分配给你的连接并不是你之前执行prepare statement的连接，导致ps不存在的错误。

那么怎么让数据库在执行简单SQL的时候，能够用上prepared statement呢？

在Oracle里面可以设置CURSOR_SHARING参数来搞定。

Oracle CURSOR_SHARING 强制绑定变量

设置CURSOR_SHARING=force，即使你使用的是简单SQL，那么在数据库中也会自动帮你转换为prepared statement，当然相比正式的绑定变量还是更耗费资源一点，毕竟parser省不了。

CURSOR_SHARING determines what kind of SQL statements can share the same cursors.  
  
参数：  
  
FORCE  
Forces statements that may differ in some literals, but are otherwise identical, to share a cursor, unless the literals affect the meaning of the statement.  
  
SIMILAR  
Causes statements that may differ in some literals, but are otherwise identical, to share a cursor, unless the literals affect either the meaning of the statement or the degree to which the plan is optimized.  
  
EXACT  
Only allows statements with identical text to share the same cursor.

PostgreSQL 数据库强制自动绑定变量

PostgreSQL 11可能会加入这个patch，即使你执行的是简单SQL，内部也会自动对其绑定。

那么势必要让所有会话共享执行计划。最后会设计成什么样还不清楚，因为这个PATCH还没有提交。

可能的做法是1、通过开关来控制是否开启强制绑定变量，2、限制执行多少次后开启强制绑定变量，3、共享绑定变量，4、根据执行计划耗时设置阈值，来决定是否使用强制绑定变量。。。。。

https://www.postgresql.org/message-id/flat/8e76d8fc-8b8c-14bd-d4d1-e9cf193a74f5@postgrespro.ru#8e76d8fc-8b8c-14bd-d4d1-e9cf193a74f5@postgrespro.ru

Hi hackers,  
  
There were a lot of discussions about query plan caching in hackers   
mailing list, but I failed to find some clear answer for my question and   
the current consensus on this question in Postgres community. As far as   
I understand current state is the following:  
1. We have per-connection prepared statements.  
2. Queries executed inside plpgsql code are implicitly prepared.  
  
It is not always possible or convenient to use prepared statements.  
For example, if pgbouncer is used to perform connection pooling.  
Another use case (which is actually the problem I am trying to solve   
now) is partitioning.  
Efficient execution of query to partitioned table requires hardcoded   
value for partitioning key.  
Only in this case optimizer will be able to construct efficient query   
plan which access only affected tables (partitions).  
  
My small benchmark for distributed partitioned table based on pg_pathman   
+ postgres_fdw shows 3 times degrade of performance in case of using   
prepared statements.  
But without prepared statements substantial amount of time is spent in   
query compilation and planning. I was be able to speed up benchmark more   
than two time by  
sending prepared queries directly to the remote nodes.  
  
So what I am thinking now is implicit query caching. If the same query   
with different literal values is repeated many times, then we can try to   
generalize this query and replace it with prepared query with   
parameters. I am not considering now shared query cache: is seems to be   
much harder to implement. But local caching of generalized queries seems   
to be not so difficult to implement and requires not so much changes in   
Postgres code. And it can be useful not only for sharding, but for many   
other cases where prepared statements can not be used.  
  
I wonder if such option was already considered and if it was for some   
reasons rejected: can you point me at this reasons?