【重新发现PostgreSQL之美】- 10 内卷 & 大禹治水

背景


场景:
内卷现象, 供不应求(高峰期打车、电商秒杀), 热点数据更新
社会现象: 资源有限而需求无限的情况(春运时期的火车票、学生报补习班、企业里面的资源地盘争夺等)

挑战:
当系统中出现热点row时, 意味着大量的并发请求更新同一行数据, 因为数据库最小粒度的锁为行锁,所以这些并发请求只能串行执行,
一个会话在更新的时候其他所有会话都处于等待状态, 可能导致连接打爆, 其他会话连不进来引起雪崩.
如果被秒杀的商品库存只有10个, 那么实际上只有10个请求能达成交易, 其他等待中的会话都属于无用功.浪费大量的连接和等待时间.

PG 解决方案:
大禹治水(疏导、消灭无用等待):

  • SKIP LOCKED,
  • advisory lock

例子


测试表, 1条热点记录, 库存1000万.

id int primary key ,  -- 商品ID
cnt int  ,  -- 库存
ts timestamp  -- 修改时间
);
insert into a values (1, 10000000, now());

扣减库存并返回

id |   cnt   |             ts
----+---------+----------------------------
1 | 9999993 | 2021-06-01 14:41:14.775177
(1 row)
UPDATE 1
postgres=# update a set cnt=cnt-1, ts=clock_timestamp() where id=1 returning *;
id |   cnt   |             ts
----+---------+----------------------------
1 | 9999992 | 2021-06-01 14:41:17.747961
(1 row)
UPDATE 1

并发能力测试


1、传统方法

update a set cnt=cnt-1, ts=clock_timestamp() where id=1 returning *;
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 12 -j 12 -T 120
pgbench (PostgreSQL) 14.0
transaction type: ./test.sql
scaling factor: 1
query mode: prepared
number of clients: 12
number of threads: 12
duration: 120 s
number of transactions actually processed: 2301279
latency average = 0.625 ms
latency stddev = 0.562 ms
initial connection time = 8.466 ms
tps = 19177.578464 (without initial connection time)
statement latencies in milliseconds:
0.625  update a set cnt=cnt-1, ts=clock_timestamp() where id=1 returning *;

2、skip locked 跳过被锁的行

ctid =
(select ctid from a where id=1 and cnt>=1 for update skip locked)
returning *;
QUERY PLAN
-----------------------------------------------------------------------------------
Update on a  (cost=2.36..3.48 rows=1 width=18)
InitPlan 1 (returns $1)
->  LockRows  (cost=0.12..2.36 rows=1 width=12)
->  Index Scan using a_pkey on a a_1  (cost=0.12..2.35 rows=1 width=12)
Index Cond: (id = 1)
Filter: (cnt >= 1)
->  Tid Scan on a  (cost=0.00..1.12 rows=1 width=18)
TID Cond: (ctid = $1)
(8 rows)
pgbench (PostgreSQL) 14.0
transaction type: ./test.sql
scaling factor: 1
query mode: prepared
number of clients: 12
number of threads: 12
duration: 120 s
number of transactions actually processed: 7165617
latency average = 0.201 ms
latency stddev = 0.150 ms
initial connection time = 11.126 ms
tps = 59717.700525 (without initial connection time)
statement latencies in milliseconds:
0.202  update a set cnt=cnt-1 , ts=clock_timestamp() where

3、advisory lock, 彻底消除行锁

QUERY PLAN
-----------------------------------------------------------------------
Update on a  (cost=0.12..2.36 rows=1 width=18)
->  Index Scan using a_pkey on a  (cost=0.12..2.36 rows=1 width=18)
Index Cond: (id = 1)
Filter: pg_try_advisory_xact_lock((id)::bigint)
(4 rows)
postgres=# begin;
BEGIN
postgres=*# update a set cnt=cnt-1, ts=clock_timestamp() where id=1 and pg_try_advisory_xact_lock(id) returning *;
id |   cnt   |             ts
----+---------+----------------------------
1 | 6839129 | 2021-06-01 14:47:54.232782
(1 row)
UPDATE 1
其他会话, 探测同一个商品ID的advisory锁, 未获取则不会进行更新
postgres=# update a set cnt=cnt-1, ts=clock_timestamp() where id=1 and pg_try_advisory_xact_lock(id) returning *;
id | cnt | ts
----+-----+----
(0 rows)
UPDATE 0
transaction type: ./test.sql
scaling factor: 1
query mode: prepared
number of clients: 12
number of threads: 12
duration: 120 s
number of transactions actually processed: 10701637
latency average = 0.134 ms
latency stddev = 0.705 ms
initial connection time = 10.577 ms
tps = 89184.703653 (without initial connection time)
statement latencies in milliseconds:
0.136  update a set cnt=cnt-1, ts=clock_timestamp() where id=1 and pg_try_advisory_xact_lock(id) returning *;

tps 性能提升

12个并发:
19177(传统方法) -> 59717(skip locked) -> 89184(advisory lock)

800个并发:
374(传统方法) -> 34495(skip locked) -> 70444(advisory lock)

知识点


1、skip locked
https://www.postgresql.org/docs/14/sql-select.html

2、advisory lock (database->session|xact level)
https://www.postgresql.org/docs/14/functions-admin.html#FUNCTIONS-ADVISORY-LOCKS
https://www.postgresql.org/docs/14/explicit-locking.html#ADVISORY-LOCKS

3、tid scan
https://www.postgresql.org/docs/14/runtime-config-query.html#RUNTIME-CONFIG-QUERY-ENABLE

4、ctid
https://www.postgresql.org/docs/14/ddl-system-columns.html

5、update delete returning
https://www.postgresql.org/docs/14/dml-returning.html

201801/20180105_03.md  《PostgreSQL秒杀4种方法- 增加批量流式加减库存方法》
201711/20171107_31.md  《HTAP数据库PostgreSQL 场景与性能测试之30 - (OLTP) 秒杀- 高并发单点更新》
201611/20161117_01.md  《聊一聊双十一背后的技术- 不一样的秒杀技术, 裸秒》
201509/20150914_01.md  《PostgreSQL秒杀场景优化》
上一篇:【DB吐槽大会】第67期 - PG 存储过程和函数内对自治事务支持不完整


下一篇:Java 9 和Spring Boot 2.0纷纷宣布支持的HTTP/2到底是什么?