【重新发现PostgreSQL之美】- 23 彭祖的长寿秘诀

2021-08-13 03:17:46

背景

场景:

论坛、短视频、社交、电商等推荐业务.
在推荐的过程中要过滤已读列表.

挑战:

已读列表巨大, 普通数据库没有多值列类型, 每个已读ID需要存储1条.
使用not in过滤, 性能极差.

PG 解决方案:

1、多值列, 一个用户已读只需要存储1条
2、datasketch 近似类型, 几KB空间可以存储上亿唯一值.
3、压缩存储多值列 roaringbitmap类型, 效率和空间的平衡.
4、使用partial index, 拆散过滤列表, 降低每次请求的无效数据.

例子

本文不谈推荐算法, 有兴趣的读者可以参考末尾文档或我的github, 为了观察方便, 本文尽量简化过程, 就用一个排序维度进行过滤.

传统方案

1、已读列表, 存储已读视频ID

uid int,  -- UID

vid int   -- 已读VID

);

create index idx_u_readlist on u_readlist (uid);

2、视频ID表, 存储推荐分数

vid int primary key,  -- 视频ID

score float4,   -- 推荐分数

ts timestamp

);

create index idx_vscore on vscore (score desc);

3、插入1000万条视频

4、插入1000个用户, 每个用户已读5万条视频.

insert into u_readlist select generate_series(1,1000),vid from a;

5、传统数据库没有多值列数据类型, 需要5000万条记录存储已读列表.

6、为1号用户推荐视频并过滤已读, 取出100个vid.

where vid not in (select vid from u_readlist where uid=1)

order by score desc

limit 100;

QUERY PLAN

-------------------------------------------------------------------------------------------------

Limit  (cost=48446.32..48451.80 rows=100 width=8) (actual time=103.616..103.704 rows=100 loops=1)

->  Index Scan using idx_vscore on vscore  (cost=48446.32..322781.27 rows=4999930 width=8) (actual time=103.615..103.693 rows=100 loops=1)

Filter: (NOT (hashed SubPlan 1))

Rows Removed by Filter: 50000

SubPlan 1

->  Bitmap Heap Scan on u_readlist  (cost=434.17..48321.27 rows=49846 width=4) (actual time=13.819..53.265 rows=50000 loops=1)

Recheck Cond: (uid = 1)

Heap Blocks: exact=50000

->  Bitmap Index Scan on idx_u_readlist  (cost=0.00..421.71 rows=49846 width=0) (actual time=4.610..4.610 rows=50000 loops=1)

Index Cond: (uid = 1)

Planning Time: 0.086 ms

Execution Time: 104.279 ms

(12 rows)

PG 解决方案1

1、PG可以使用数组存储多个已读VID,记录数节省5万倍.

uid int,  -- UID

vid int[]   -- 已读VID

);

create index idx_p_u_readlist on p_u_readlist (uid);

2、5000万条压缩到1000条

3、空间约仅占1/10

4、为1号用户推荐视频并过滤已读, 取出100个vid.

where vid not in (select unnest(vid) from p_u_readlist where uid=1)

order by score desc

limit 100;

Limit  (cost=3.01..8.50 rows=100 width=8) (actual time=56.343..56.438 rows=100 loops=1)

->  Index Scan using idx_vscore on vscore  (cost=3.01..274337.96 rows=4999930 width=8) (actual time=56.342..56.427 rows=100 loops=1)

Filter: (NOT (hashed SubPlan 1))

Rows Removed by Filter: 50000

SubPlan 1

->  ProjectSet  (cost=0.28..2.55 rows=10 width=4) (actual time=0.113..3.904 rows=50000 loops=1)

->  Index Scan using idx_p_u_readlist on p_u_readlist  (cost=0.28..2.49 rows=1 width=18) (actual time=0.018..0.022 rows=1 loops=1)

Index Cond: (uid = 1)

Planning Time: 0.112 ms

Execution Time: 56.814 ms

(10 rows)

PG 解决方案2

在VID表存储已读UID, 而不是在UID中存储已读VID

vid int primary key,  -- 视频ID

score float4,   -- 推荐分数

uid int[],   -- 已读UID

ts timestamp

);

create index idx_p_vscore on p_vscore (score desc);

插入1000万条视频, score排名前5万个视频全部被1000个用户已读.

update p_vscore set

uid=array(select generate_series(1,1000))

where vid in (

select vid from p_vscore order by score desc limit 50000

);

public | p_vscore      | table             | postgres | unlogged    | heap          | 687 MB  |

为1号用户推荐视频并过滤已读, 取出100个vid

where uid is null or not uid @> array[1]

order by score desc

limit 100;

Limit  (cost=0.43..3.08 rows=100 width=8) (actual time=384.599..384.696 rows=100 loops=1)

->  Index Scan using idx_p_vscore on p_vscore  (cost=0.43..264791.46 rows=9999712 width=8) (actual time=384.598..384.685 rows=100 loops=1)

Filter: ((uid IS NULL) OR (NOT (uid @> '{1}'::integer[])))

Rows Removed by Filter: 50000

Planning Time: 0.062 ms

Execution Time: 384.713 ms

(6 rows)

和之前视频讲的一样, 为什么这种方式更慢了? 具有olap属性, 如下:
《重新发现PostgreSQL之美- 21 探访宇航员的食物》

PG 解决方案3

1、使用HLL、roaringbitmap存储过滤列表, 可以节省列表的存储空间, 有些用户可能已读VID达到上百万, 需要大量的存储空间.

create unlogged table p1_u_readlist (

uid int,  -- UID

vid hll   -- 已读VID

);

create index idx_p1_u_readlist on p1_u_readlist (uid);

5000万条压缩到1000条

public | p1_u_readlist | table             | postgres | unlogged    | heap          | 1376 kB |

为1号用户推荐视频并过滤已读, 取出100个vid.

where not exists (select 1 from p1_u_readlist where uid=1 and p1_u_readlist.vid = hll_add(p1_u_readlist.vid,hll_hash_integer(vscore.vid)))

order by score desc

limit 100;

Limit  (cost=0.71..5.23 rows=100 width=8) (actual time=602.038..632.683 rows=100 loops=1)

->  Nested Loop Anti Join  (cost=0.71..449335.43 rows=9949861 width=8) (actual time=602.037..632.672 rows=100 loops=1)

Join Filter: (p1_u_readlist.vid = hll_add(p1_u_readlist.vid, hll_hash_integer(vscore.vid, 0)))

Rows Removed by Join Filter: 100

->  Index Scan using idx_vscore on vscore  (cost=0.43..249335.74 rows=9999860 width=8) (actual time=0.010..37.183 rows=52695 loops=1)

->  Materialize  (cost=0.28..2.50 rows=1 width=1287) (actual time=0.000..0.000 rows=1 loops=52695)

->  Index Scan using idx_p1_u_readlist on p1_u_readlist  (cost=0.28..2.49 rows=1 width=1287) (actual time=0.019..0.021 rows=1 loops=1)

Index Cond: (uid = 1)

Planning Time: 0.146 ms

Execution Time: 632.714 ms

(10 rows)

和之前视频讲的一样, 为什么这种方式更慢了? 具有olap属性, 如下:
《重新发现PostgreSQL之美- 21 探访宇航员的食物》

PG 解决方案4

在VID表存储已读UID, 而不是在UID中存储已读VID

使用HLL、roaringbitmap存储过滤列表

在VID表存储已读UID, 而不是在UID中存储已读VID

vid int primary key,  -- 视频ID

score float4,   -- 推荐分数

uid hll,   -- 已读UID

ts timestamp

);

create index idx_p1_vscore on p1_vscore (score desc);

插入1000万条视频, score排名前5万个视频全部被1000个用户已读.

update p1_vscore set

uid= (select hll_add_agg(hll_hash_integer(i)) from generate_series(1,1000) i)

where vid in (

select vid from p1_vscore order by score desc limit 50000

);

public | p_vscore      | table             | postgres | unlogged    | heap          | 687 MB  |

public | p1_vscore | table | postgres | unlogged    | heap          | 488 MB |

为1号用户推荐视频并过滤已读, 取出100个vid.

where uid is null or uid <> hll_add(uid,hll_hash_integer(1))

order by score desc

limit 100;

Limit  (cost=0.43..3.38 rows=100 width=8) (actual time=565.654..565.773 rows=100 loops=1)

->  Index Scan using idx_p1_vscore on p1_vscore  (cost=0.43..339509.68 rows=11541416 width=8) (actual time=565.653..565.763 rows=100 loops=1)

Filter: ((uid IS NULL) OR (uid <> hll_add(uid, '-8604791237420463362'::hll_hashval)))

Rows Removed by Filter: 50000

Planning Time: 0.090 ms

Execution Time: 565.795 ms

(6 rows)

PG 解决方案5

采用partial index, 注意看清楚, 这个方法直接将过滤已读从5万缩小到1000

uid int,  -- UID

hashid int,  -- partial index hashid

vid int[]   -- 已读VID

);

create index idx_pp_u_readlist on pp_u_readlist (uid,hashid);

假设hash 50片, 5000万条压缩到50000条,

public | pp_u_readlist | table | postgres | unlogged    | heap          | 229 MB |

视频表创建partial index

declare

begin

for i in 0..49 loop

execute format( 'create index p_vscore_p%s on vscore (score desc) where abs(mod(hashint4(vid),50))=%s', i,i);

end loop;

end;

$$;

List of relations

Schema |     Name      | Type  |  Owner   |  Table   | Persistence | Access method |  Size   | Description

--------+---------------+-------+----------+----------+-------------+---------------+---------+-------------

public | p_vscore_p0   | index | postgres | vscore   | unlogged    | btree         | 4416 kB |

public | p_vscore_p1   | index | postgres | vscore   | unlogged    | btree         | 4416 kB |

public | p_vscore_p10  | index | postgres | vscore   | unlogged    | btree         | 4408 kB |

public | p_vscore_p11  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |

public | p_vscore_p12  | index | postgres | vscore   | unlogged    | btree         | 4432 kB |

public | p_vscore_p13  | index | postgres | vscore   | unlogged    | btree         | 4408 kB |

public | p_vscore_p14  | index | postgres | vscore   | unlogged    | btree         | 4400 kB |

public | p_vscore_p15  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |

public | p_vscore_p16  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |

public | p_vscore_p17  | index | postgres | vscore   | unlogged    | btree         | 4424 kB |

public | p_vscore_p18  | index | postgres | vscore   | unlogged    | btree         | 4424 kB |

public | p_vscore_p19  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |

public | p_vscore_p2   | index | postgres | vscore   | unlogged    | btree         | 4408 kB |

public | p_vscore_p20  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |

public | p_vscore_p21  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |

public | p_vscore_p22  | index | postgres | vscore   | unlogged    | btree         | 4408 kB |

public | p_vscore_p23  | index | postgres | vscore   | unlogged    | btree         | 4432 kB |

public | p_vscore_p24  | index | postgres | vscore   | unlogged    | btree         | 4408 kB |

public | p_vscore_p25  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |

public | p_vscore_p26  | index | postgres | vscore   | unlogged    | btree         | 4408 kB |

public | p_vscore_p27  | index | postgres | vscore   | unlogged    | btree         | 4408 kB |

public | p_vscore_p28  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |

public | p_vscore_p29  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |

public | p_vscore_p3   | index | postgres | vscore   | unlogged    | btree         | 4424 kB |

public | p_vscore_p30  | index | postgres | vscore   | unlogged    | btree         | 4432 kB |

public | p_vscore_p31  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |

public | p_vscore_p32  | index | postgres | vscore   | unlogged    | btree         | 4424 kB |

public | p_vscore_p33  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |

public | p_vscore_p34  | index | postgres | vscore   | unlogged    | btree         | 4424 kB |

public | p_vscore_p35  | index | postgres | vscore   | unlogged    | btree         | 4400 kB |

public | p_vscore_p36  | index | postgres | vscore   | unlogged    | btree         | 4424 kB |

public | p_vscore_p37  | index | postgres | vscore   | unlogged    | btree         | 4408 kB |

public | p_vscore_p38  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |

public | p_vscore_p39  | index | postgres | vscore   | unlogged    | btree         | 4408 kB |

public | p_vscore_p4   | index | postgres | vscore   | unlogged    | btree         | 4432 kB |

public | p_vscore_p40  | index | postgres | vscore   | unlogged    | btree         | 4432 kB |

public | p_vscore_p41  | index | postgres | vscore   | unlogged    | btree         | 4400 kB |

public | p_vscore_p42  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |

public | p_vscore_p43  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |

public | p_vscore_p44  | index | postgres | vscore   | unlogged    | btree         | 4424 kB |

public | p_vscore_p45  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |

public | p_vscore_p46  | index | postgres | vscore   | unlogged    | btree         | 4424 kB |

public | p_vscore_p47  | index | postgres | vscore   | unlogged    | btree         | 4400 kB |

public | p_vscore_p48  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |

public | p_vscore_p49  | index | postgres | vscore   | unlogged    | btree         | 4424 kB |

public | p_vscore_p5   | index | postgres | vscore   | unlogged    | btree         | 4408 kB |

public | p_vscore_p6   | index | postgres | vscore   | unlogged    | btree         | 4408 kB |

public | p_vscore_p7   | index | postgres | vscore   | unlogged    | btree         | 4416 kB |

public | p_vscore_p8   | index | postgres | vscore   | unlogged    | btree         | 4424 kB |

public | p_vscore_p9   | index | postgres | vscore   | unlogged    | btree         | 4408 kB |

为1号用户推荐视频并过滤已读, 取出100个vid.
每次获取时业务随机或采用round robin负载均衡方式从0-49个分片中获取100个.

where vid not in (select unnest(vid) from pp_u_readlist where uid=1 and hashid=0)

and abs(mod(hashint4(vid),50))=0

order by score desc

limit 100;

Limit  (cost=3.01..157.54 rows=100 width=8) (actual time=1.223..1.331 rows=100 loops=1)

->  Index Scan using p_vscore_p0 on vscore  (cost=3.01..38635.65 rows=25000 width=8) (actual time=1.222..1.317 rows=100 loops=1)

Filter: (NOT (hashed SubPlan 1))

Rows Removed by Filter: 973

SubPlan 1

->  ProjectSet  (cost=0.29..2.57 rows=10 width=4) (actual time=0.021..0.115 rows=973 loops=1)

->  Index Scan using idx_pp_u_readlist on pp_u_readlist  (cost=0.29..2.51 rows=1 width=18) (actual time=0.007..0.008 rows=1 loops=1)

Index Cond: ((uid = 1) AND (hashid = 0))

Planning Time: 0.515 ms

Execution Time: 1.366 ms

(10 rows)

其他方案略

datasketch
roaringbitmap

总结

推荐查询性能提升:

已读列表存储空间节约:

原理

1、多值列, 一个用户已读只需要存储1条
2、datasketch 近似类型, 几KB空间可以存储上亿唯一值.
3、压缩存储多值列roaringbitmap类型, 效率和空间的平衡.
4、使用partial index, 拆散过滤列表, 降低每次请求的无效数据.

参考

《PostgreSQL 大量IO扫描、计算浪费的优化- 推荐模块, 过滤已推荐. (热点用户、已推荐列表超大)》

《近似查询处理(Approximate Query Processing) - DataSketches - 压缩、distinct、分位、rank、高
频柱状图(count distinct, quantiles, most-frequent items, joins, matrix computations, and graph analysis) 等实时大数据近似分析》

《PostgreSQL pg_roaringbitmap - 用户画像、标签、高效检索》

除了本文的应用, hll, roaringbitmap还经常被用在实时分析系统, 画像系统中, 支持高速的PV UV统计, 滑窗分析, 人群圈选等.

码农公寓

背景

例子

传统方案

PG 解决方案1

PG 解决方案2

PG 解决方案3

PG 解决方案4

PG 解决方案5

其他方案略

总结

原理

参考

相关文章