背景
场景:
内卷现象, 供不应求(高峰期打车、电商秒杀), 热点数据更新
社会现象: 资源有限而需求无限的情况(春运时期的火车票、学生报补习班、企业里面的资源地盘争夺等)
挑战:
当系统中出现热点row时, 意味着大量的并发请求更新同一行数据, 因为数据库最小粒度的锁为行锁,所以这些并发请求只能串行执行,
一个会话在更新的时候其他所有会话都处于等待状态, 可能导致连接打爆, 其他会话连不进来引起雪崩.
如果被秒杀的商品库存只有10个, 那么实际上只有10个请求能达成交易, 其他等待中的会话都属于无用功.浪费大量的连接和等待时间.
PG 解决方案:
大禹治水(疏导、消灭无用等待):
- SKIP LOCKED,
- advisory lock
例子
测试表, 1条热点记录, 库存1000万.
id int primary key , -- 商品ID
cnt int , -- 库存
ts timestamp -- 修改时间
);
insert into a values (1, 10000000, now());
扣减库存并返回
id | cnt | ts
----+---------+----------------------------
1 | 9999993 | 2021-06-01 14:41:14.775177
(1 row)
UPDATE 1
postgres=# update a set cnt=cnt-1, ts=clock_timestamp() where id=1 returning *;
id | cnt | ts
----+---------+----------------------------
1 | 9999992 | 2021-06-01 14:41:17.747961
(1 row)
UPDATE 1
并发能力测试
1、传统方法
update a set cnt=cnt-1, ts=clock_timestamp() where id=1 returning *;
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 12 -j 12 -T 120
pgbench (PostgreSQL) 14.0
transaction type: ./test.sql
scaling factor: 1
query mode: prepared
number of clients: 12
number of threads: 12
duration: 120 s
number of transactions actually processed: 2301279
latency average = 0.625 ms
latency stddev = 0.562 ms
initial connection time = 8.466 ms
tps = 19177.578464 (without initial connection time)
statement latencies in milliseconds:
0.625 update a set cnt=cnt-1, ts=clock_timestamp() where id=1 returning *;
2、skip locked 跳过被锁的行
ctid =
(select ctid from a where id=1 and cnt>=1 for update skip locked)
returning *;
QUERY PLAN
-----------------------------------------------------------------------------------
Update on a (cost=2.36..3.48 rows=1 width=18)
InitPlan 1 (returns $1)
-> LockRows (cost=0.12..2.36 rows=1 width=12)
-> Index Scan using a_pkey on a a_1 (cost=0.12..2.35 rows=1 width=12)
Index Cond: (id = 1)
Filter: (cnt >= 1)
-> Tid Scan on a (cost=0.00..1.12 rows=1 width=18)
TID Cond: (ctid = $1)
(8 rows)
pgbench (PostgreSQL) 14.0
transaction type: ./test.sql
scaling factor: 1
query mode: prepared
number of clients: 12
number of threads: 12
duration: 120 s
number of transactions actually processed: 7165617
latency average = 0.201 ms
latency stddev = 0.150 ms
initial connection time = 11.126 ms
tps = 59717.700525 (without initial connection time)
statement latencies in milliseconds:
0.202 update a set cnt=cnt-1 , ts=clock_timestamp() where
3、advisory lock, 彻底消除行锁
QUERY PLAN
-----------------------------------------------------------------------
Update on a (cost=0.12..2.36 rows=1 width=18)
-> Index Scan using a_pkey on a (cost=0.12..2.36 rows=1 width=18)
Index Cond: (id = 1)
Filter: pg_try_advisory_xact_lock((id)::bigint)
(4 rows)
postgres=# begin;
BEGIN
postgres=*# update a set cnt=cnt-1, ts=clock_timestamp() where id=1 and pg_try_advisory_xact_lock(id) returning *;
id | cnt | ts
----+---------+----------------------------
1 | 6839129 | 2021-06-01 14:47:54.232782
(1 row)
UPDATE 1
其他会话, 探测同一个商品ID的advisory锁, 未获取则不会进行更新
postgres=# update a set cnt=cnt-1, ts=clock_timestamp() where id=1 and pg_try_advisory_xact_lock(id) returning *;
id | cnt | ts
----+-----+----
(0 rows)
UPDATE 0
transaction type: ./test.sql
scaling factor: 1
query mode: prepared
number of clients: 12
number of threads: 12
duration: 120 s
number of transactions actually processed: 10701637
latency average = 0.134 ms
latency stddev = 0.705 ms
initial connection time = 10.577 ms
tps = 89184.703653 (without initial connection time)
statement latencies in milliseconds:
0.136 update a set cnt=cnt-1, ts=clock_timestamp() where id=1 and pg_try_advisory_xact_lock(id) returning *;
tps 性能提升
12个并发:
19177(传统方法) -> 59717(skip locked) -> 89184(advisory lock)
800个并发:
374(传统方法) -> 34495(skip locked) -> 70444(advisory lock)
知识点
1、skip locked
https://www.postgresql.org/docs/14/sql-select.html
2、advisory lock (database->session|xact level)
https://www.postgresql.org/docs/14/functions-admin.html#FUNCTIONS-ADVISORY-LOCKS
https://www.postgresql.org/docs/14/explicit-locking.html#ADVISORY-LOCKS
3、tid scan
https://www.postgresql.org/docs/14/runtime-config-query.html#RUNTIME-CONFIG-QUERY-ENABLE
4、ctid
https://www.postgresql.org/docs/14/ddl-system-columns.html
5、update delete returning
https://www.postgresql.org/docs/14/dml-returning.html