背景
场景:
- 论坛、短视频、社交、电商等推荐业务.
- 在推荐的过程中要过滤已读列表.
挑战:
- 已读列表巨大, 普通数据库没有多值列类型, 每个已读ID需要存储1条.
- 使用not in过滤, 性能极差.
PG 解决方案:
- 1、多值列, 一个用户已读只需要存储1条
- 2、datasketch 近似类型, 几KB空间可以存储上亿唯一值.
- 3、压缩存储多值列 roaringbitmap类型, 效率和空间的平衡.
- 4、使用partial index, 拆散过滤列表, 降低每次请求的无效数据.
例子
本文不谈推荐算法, 有兴趣的读者可以参考末尾文档或我的github, 为了观察方便, 本文尽量简化过程, 就用一个排序维度进行过滤.
传统方案
1、已读列表, 存储已读视频ID
uid int, -- UID
vid int -- 已读VID
);
create index idx_u_readlist on u_readlist (uid);
2、视频ID表, 存储推荐分数
vid int primary key, -- 视频ID
score float4, -- 推荐分数
ts timestamp
);
create index idx_vscore on vscore (score desc);
3、插入1000万条视频
4、插入1000个用户, 每个用户已读5万条视频.
insert into u_readlist select generate_series(1,1000),vid from a;
5、传统数据库没有多值列数据类型, 需要5000万条记录存储已读列表.
6、为1号用户推荐视频并过滤已读, 取出100个vid.
where vid not in (select vid from u_readlist where uid=1)
order by score desc
limit 100;
QUERY PLAN
-------------------------------------------------------------------------------------------------
Limit (cost=48446.32..48451.80 rows=100 width=8) (actual time=103.616..103.704 rows=100 loops=1)
-> Index Scan using idx_vscore on vscore (cost=48446.32..322781.27 rows=4999930 width=8) (actual time=103.615..103.693 rows=100 loops=1)
Filter: (NOT (hashed SubPlan 1))
Rows Removed by Filter: 50000
SubPlan 1
-> Bitmap Heap Scan on u_readlist (cost=434.17..48321.27 rows=49846 width=4) (actual time=13.819..53.265 rows=50000 loops=1)
Recheck Cond: (uid = 1)
Heap Blocks: exact=50000
-> Bitmap Index Scan on idx_u_readlist (cost=0.00..421.71 rows=49846 width=0) (actual time=4.610..4.610 rows=50000 loops=1)
Index Cond: (uid = 1)
Planning Time: 0.086 ms
Execution Time: 104.279 ms
(12 rows)
PG 解决方案1
1、PG可以使用数组存储多个已读VID,记录数节省5万倍.
uid int, -- UID
vid int[] -- 已读VID
);
create index idx_p_u_readlist on p_u_readlist (uid);
2、5000万条压缩到1000条
3、空间约仅占1/10
4、为1号用户推荐视频并过滤已读, 取出100个vid.
where vid not in (select unnest(vid) from p_u_readlist where uid=1)
order by score desc
limit 100;
Limit (cost=3.01..8.50 rows=100 width=8) (actual time=56.343..56.438 rows=100 loops=1)
-> Index Scan using idx_vscore on vscore (cost=3.01..274337.96 rows=4999930 width=8) (actual time=56.342..56.427 rows=100 loops=1)
Filter: (NOT (hashed SubPlan 1))
Rows Removed by Filter: 50000
SubPlan 1
-> ProjectSet (cost=0.28..2.55 rows=10 width=4) (actual time=0.113..3.904 rows=50000 loops=1)
-> Index Scan using idx_p_u_readlist on p_u_readlist (cost=0.28..2.49 rows=1 width=18) (actual time=0.018..0.022 rows=1 loops=1)
Index Cond: (uid = 1)
Planning Time: 0.112 ms
Execution Time: 56.814 ms
(10 rows)
PG 解决方案2
在VID表存储已读UID, 而不是在UID中存储已读VID
vid int primary key, -- 视频ID
score float4, -- 推荐分数
uid int[], -- 已读UID
ts timestamp
);
create index idx_p_vscore on p_vscore (score desc);
插入1000万条视频, score排名前5万个视频全部被1000个用户已读.
update p_vscore set
uid=array(select generate_series(1,1000))
where vid in (
select vid from p_vscore order by score desc limit 50000
);
public | p_vscore | table | postgres | unlogged | heap | 687 MB |
为1号用户推荐视频并过滤已读, 取出100个vid
where uid is null or not uid @> array[1]
order by score desc
limit 100;
Limit (cost=0.43..3.08 rows=100 width=8) (actual time=384.599..384.696 rows=100 loops=1)
-> Index Scan using idx_p_vscore on p_vscore (cost=0.43..264791.46 rows=9999712 width=8) (actual time=384.598..384.685 rows=100 loops=1)
Filter: ((uid IS NULL) OR (NOT (uid @> '{1}'::integer[])))
Rows Removed by Filter: 50000
Planning Time: 0.062 ms
Execution Time: 384.713 ms
(6 rows)
和之前视频讲的一样, 为什么这种方式更慢了? 具有olap属性, 如下:
《重新发现PostgreSQL之美- 21 探访宇航员的食物》
PG 解决方案3
1、使用HLL、roaringbitmap存储过滤列表, 可以节省列表的存储空间, 有些用户可能已读VID达到上百万, 需要大量的存储空间.
create unlogged table p1_u_readlist (
uid int, -- UID
vid hll -- 已读VID
);
create index idx_p1_u_readlist on p1_u_readlist (uid);
5000万条压缩到1000条
public | p1_u_readlist | table | postgres | unlogged | heap | 1376 kB |
为1号用户推荐视频并过滤已读, 取出100个vid.
where not exists (select 1 from p1_u_readlist where uid=1 and p1_u_readlist.vid = hll_add(p1_u_readlist.vid,hll_hash_integer(vscore.vid)))
order by score desc
limit 100;
Limit (cost=0.71..5.23 rows=100 width=8) (actual time=602.038..632.683 rows=100 loops=1)
-> Nested Loop Anti Join (cost=0.71..449335.43 rows=9949861 width=8) (actual time=602.037..632.672 rows=100 loops=1)
Join Filter: (p1_u_readlist.vid = hll_add(p1_u_readlist.vid, hll_hash_integer(vscore.vid, 0)))
Rows Removed by Join Filter: 100
-> Index Scan using idx_vscore on vscore (cost=0.43..249335.74 rows=9999860 width=8) (actual time=0.010..37.183 rows=52695 loops=1)
-> Materialize (cost=0.28..2.50 rows=1 width=1287) (actual time=0.000..0.000 rows=1 loops=52695)
-> Index Scan using idx_p1_u_readlist on p1_u_readlist (cost=0.28..2.49 rows=1 width=1287) (actual time=0.019..0.021 rows=1 loops=1)
Index Cond: (uid = 1)
Planning Time: 0.146 ms
Execution Time: 632.714 ms
(10 rows)
和之前视频讲的一样, 为什么这种方式更慢了? 具有olap属性, 如下:
《重新发现PostgreSQL之美- 21 探访宇航员的食物》
PG 解决方案4
在VID表存储已读UID, 而不是在UID中存储已读VID
使用HLL、roaringbitmap存储过滤列表
在VID表存储已读UID, 而不是在UID中存储已读VID
vid int primary key, -- 视频ID
score float4, -- 推荐分数
uid hll, -- 已读UID
ts timestamp
);
create index idx_p1_vscore on p1_vscore (score desc);
插入1000万条视频, score排名前5万个视频全部被1000个用户已读.
update p1_vscore set
uid= (select hll_add_agg(hll_hash_integer(i)) from generate_series(1,1000) i)
where vid in (
select vid from p1_vscore order by score desc limit 50000
);
public | p_vscore | table | postgres | unlogged | heap | 687 MB |
public | p1_vscore | table | postgres | unlogged | heap | 488 MB |
为1号用户推荐视频并过滤已读, 取出100个vid.
where uid is null or uid <> hll_add(uid,hll_hash_integer(1))
order by score desc
limit 100;
Limit (cost=0.43..3.38 rows=100 width=8) (actual time=565.654..565.773 rows=100 loops=1)
-> Index Scan using idx_p1_vscore on p1_vscore (cost=0.43..339509.68 rows=11541416 width=8) (actual time=565.653..565.763 rows=100 loops=1)
Filter: ((uid IS NULL) OR (uid <> hll_add(uid, '-8604791237420463362'::hll_hashval)))
Rows Removed by Filter: 50000
Planning Time: 0.090 ms
Execution Time: 565.795 ms
(6 rows)
PG 解决方案5
采用partial index, 注意看清楚, 这个方法直接将过滤已读从5万缩小到1000
uid int, -- UID
hashid int, -- partial index hashid
vid int[] -- 已读VID
);
create index idx_pp_u_readlist on pp_u_readlist (uid,hashid);
假设hash 50片, 5000万条压缩到50000条,
public | pp_u_readlist | table | postgres | unlogged | heap | 229 MB |
视频表创建partial index
declare
begin
for i in 0..49 loop
execute format( 'create index p_vscore_p%s on vscore (score desc) where abs(mod(hashint4(vid),50))=%s', i,i);
end loop;
end;
$$;
List of relations
Schema | Name | Type | Owner | Table | Persistence | Access method | Size | Description
--------+---------------+-------+----------+----------+-------------+---------------+---------+-------------
public | p_vscore_p0 | index | postgres | vscore | unlogged | btree | 4416 kB |
public | p_vscore_p1 | index | postgres | vscore | unlogged | btree | 4416 kB |
public | p_vscore_p10 | index | postgres | vscore | unlogged | btree | 4408 kB |
public | p_vscore_p11 | index | postgres | vscore | unlogged | btree | 4416 kB |
public | p_vscore_p12 | index | postgres | vscore | unlogged | btree | 4432 kB |
public | p_vscore_p13 | index | postgres | vscore | unlogged | btree | 4408 kB |
public | p_vscore_p14 | index | postgres | vscore | unlogged | btree | 4400 kB |
public | p_vscore_p15 | index | postgres | vscore | unlogged | btree | 4416 kB |
public | p_vscore_p16 | index | postgres | vscore | unlogged | btree | 4416 kB |
public | p_vscore_p17 | index | postgres | vscore | unlogged | btree | 4424 kB |
public | p_vscore_p18 | index | postgres | vscore | unlogged | btree | 4424 kB |
public | p_vscore_p19 | index | postgres | vscore | unlogged | btree | 4416 kB |
public | p_vscore_p2 | index | postgres | vscore | unlogged | btree | 4408 kB |
public | p_vscore_p20 | index | postgres | vscore | unlogged | btree | 4416 kB |
public | p_vscore_p21 | index | postgres | vscore | unlogged | btree | 4416 kB |
public | p_vscore_p22 | index | postgres | vscore | unlogged | btree | 4408 kB |
public | p_vscore_p23 | index | postgres | vscore | unlogged | btree | 4432 kB |
public | p_vscore_p24 | index | postgres | vscore | unlogged | btree | 4408 kB |
public | p_vscore_p25 | index | postgres | vscore | unlogged | btree | 4416 kB |
public | p_vscore_p26 | index | postgres | vscore | unlogged | btree | 4408 kB |
public | p_vscore_p27 | index | postgres | vscore | unlogged | btree | 4408 kB |
public | p_vscore_p28 | index | postgres | vscore | unlogged | btree | 4416 kB |
public | p_vscore_p29 | index | postgres | vscore | unlogged | btree | 4416 kB |
public | p_vscore_p3 | index | postgres | vscore | unlogged | btree | 4424 kB |
public | p_vscore_p30 | index | postgres | vscore | unlogged | btree | 4432 kB |
public | p_vscore_p31 | index | postgres | vscore | unlogged | btree | 4416 kB |
public | p_vscore_p32 | index | postgres | vscore | unlogged | btree | 4424 kB |
public | p_vscore_p33 | index | postgres | vscore | unlogged | btree | 4416 kB |
public | p_vscore_p34 | index | postgres | vscore | unlogged | btree | 4424 kB |
public | p_vscore_p35 | index | postgres | vscore | unlogged | btree | 4400 kB |
public | p_vscore_p36 | index | postgres | vscore | unlogged | btree | 4424 kB |
public | p_vscore_p37 | index | postgres | vscore | unlogged | btree | 4408 kB |
public | p_vscore_p38 | index | postgres | vscore | unlogged | btree | 4416 kB |
public | p_vscore_p39 | index | postgres | vscore | unlogged | btree | 4408 kB |
public | p_vscore_p4 | index | postgres | vscore | unlogged | btree | 4432 kB |
public | p_vscore_p40 | index | postgres | vscore | unlogged | btree | 4432 kB |
public | p_vscore_p41 | index | postgres | vscore | unlogged | btree | 4400 kB |
public | p_vscore_p42 | index | postgres | vscore | unlogged | btree | 4416 kB |
public | p_vscore_p43 | index | postgres | vscore | unlogged | btree | 4416 kB |
public | p_vscore_p44 | index | postgres | vscore | unlogged | btree | 4424 kB |
public | p_vscore_p45 | index | postgres | vscore | unlogged | btree | 4416 kB |
public | p_vscore_p46 | index | postgres | vscore | unlogged | btree | 4424 kB |
public | p_vscore_p47 | index | postgres | vscore | unlogged | btree | 4400 kB |
public | p_vscore_p48 | index | postgres | vscore | unlogged | btree | 4416 kB |
public | p_vscore_p49 | index | postgres | vscore | unlogged | btree | 4424 kB |
public | p_vscore_p5 | index | postgres | vscore | unlogged | btree | 4408 kB |
public | p_vscore_p6 | index | postgres | vscore | unlogged | btree | 4408 kB |
public | p_vscore_p7 | index | postgres | vscore | unlogged | btree | 4416 kB |
public | p_vscore_p8 | index | postgres | vscore | unlogged | btree | 4424 kB |
public | p_vscore_p9 | index | postgres | vscore | unlogged | btree | 4408 kB |
为1号用户推荐视频并过滤已读, 取出100个vid.
每次获取时业务随机或采用round robin负载均衡方式从0-49个分片中获取100个.
where vid not in (select unnest(vid) from pp_u_readlist where uid=1 and hashid=0)
and abs(mod(hashint4(vid),50))=0
order by score desc
limit 100;
Limit (cost=3.01..157.54 rows=100 width=8) (actual time=1.223..1.331 rows=100 loops=1)
-> Index Scan using p_vscore_p0 on vscore (cost=3.01..38635.65 rows=25000 width=8) (actual time=1.222..1.317 rows=100 loops=1)
Filter: (NOT (hashed SubPlan 1))
Rows Removed by Filter: 973
SubPlan 1
-> ProjectSet (cost=0.29..2.57 rows=10 width=4) (actual time=0.021..0.115 rows=973 loops=1)
-> Index Scan using idx_pp_u_readlist on pp_u_readlist (cost=0.29..2.51 rows=1 width=18) (actual time=0.007..0.008 rows=1 loops=1)
Index Cond: ((uid = 1) AND (hashid = 0))
Planning Time: 0.515 ms
Execution Time: 1.366 ms
(10 rows)
其他方案略
datasketch
roaringbitmap
总结
推荐查询性能提升:
已读列表存储空间节约:
原理
1、多值列, 一个用户已读只需要存储1条
2、datasketch 近似类型, 几KB空间可以存储上亿唯一值.
3、压缩存储多值列roaringbitmap类型, 效率和空间的平衡.
4、使用partial index, 拆散过滤列表, 降低每次请求的无效数据.
参考
《PostgreSQL 大量IO扫描、计算浪费的优化- 推荐模块, 过滤已推荐. (热点用户、已推荐列表超大)》
《推荐系统, 已阅读过滤, 大量CPU和IO浪费的优化思路2 - partial index - hash 分片,降低过滤量》
《PostgreSQL 推荐系统优化总计- 空间、时间、标量等混合多模查询场景, 大量已读过滤导致CPU IO剧增(类挖矿概率下降优化)》
《PostgreSQL pg_roaringbitmap - 用户画像、标签、高效检索》
除了本文的应用, hll, roaringbitmap还经常被用在实时分析系统, 画像系统中, 支持高速的PV UV统计, 滑窗分析, 人群圈选等.