【重新发现PostgreSQL之美】- 23 彭祖的长寿秘诀

背景


场景:

  • 论坛、短视频、社交、电商等推荐业务.
  • 在推荐的过程中要过滤已读列表.

挑战:

  • 已读列表巨大, 普通数据库没有多值列类型, 每个已读ID需要存储1条.
  • 使用not in过滤, 性能极差.

PG 解决方案:

  • 1、多值列, 一个用户已读只需要存储1条
  • 2、datasketch 近似类型, 几KB空间可以存储上亿唯一值.
  • 3、压缩存储多值列 roaringbitmap类型, 效率和空间的平衡.
  • 4、使用partial index, 拆散过滤列表, 降低每次请求的无效数据.

例子


本文不谈推荐算法, 有兴趣的读者可以参考末尾文档或我的github, 为了观察方便, 本文尽量简化过程, 就用一个排序维度进行过滤.

传统方案

1、已读列表, 存储已读视频ID

uid int,  -- UID
vid int   -- 已读VID
);
create index idx_u_readlist on u_readlist (uid);

2、视频ID表, 存储推荐分数

vid int primary key,  -- 视频ID
score float4,   -- 推荐分数
ts timestamp
);
create index idx_vscore on vscore (score desc);

3、插入1000万条视频

4、插入1000个用户, 每个用户已读5万条视频.

insert into u_readlist select generate_series(1,1000),vid from a;

5、传统数据库没有多值列数据类型, 需要5000万条记录存储已读列表.

6、为1号用户推荐视频并过滤已读, 取出100个vid.

where vid not in (select vid from u_readlist where uid=1)
order by score desc
limit 100;
QUERY PLAN
-------------------------------------------------------------------------------------------------
Limit  (cost=48446.32..48451.80 rows=100 width=8) (actual time=103.616..103.704 rows=100 loops=1)
->  Index Scan using idx_vscore on vscore  (cost=48446.32..322781.27 rows=4999930 width=8) (actual time=103.615..103.693 rows=100 loops=1)
Filter: (NOT (hashed SubPlan 1))
Rows Removed by Filter: 50000
SubPlan 1
->  Bitmap Heap Scan on u_readlist  (cost=434.17..48321.27 rows=49846 width=4) (actual time=13.819..53.265 rows=50000 loops=1)
Recheck Cond: (uid = 1)
Heap Blocks: exact=50000
->  Bitmap Index Scan on idx_u_readlist  (cost=0.00..421.71 rows=49846 width=0) (actual time=4.610..4.610 rows=50000 loops=1)
Index Cond: (uid = 1)
Planning Time: 0.086 ms
Execution Time: 104.279 ms
(12 rows)

PG 解决方案1

1、PG可以使用数组存储多个已读VID,记录数节省5万倍.

uid int,  -- UID
vid int[]   -- 已读VID
);
create index idx_p_u_readlist on p_u_readlist (uid);

2、5000万条压缩到1000条

3、空间约仅占1/10

4、为1号用户推荐视频并过滤已读, 取出100个vid.

where vid not in (select unnest(vid) from p_u_readlist where uid=1)
order by score desc
limit 100;
Limit  (cost=3.01..8.50 rows=100 width=8) (actual time=56.343..56.438 rows=100 loops=1)
->  Index Scan using idx_vscore on vscore  (cost=3.01..274337.96 rows=4999930 width=8) (actual time=56.342..56.427 rows=100 loops=1)
Filter: (NOT (hashed SubPlan 1))
Rows Removed by Filter: 50000
SubPlan 1
->  ProjectSet  (cost=0.28..2.55 rows=10 width=4) (actual time=0.113..3.904 rows=50000 loops=1)
->  Index Scan using idx_p_u_readlist on p_u_readlist  (cost=0.28..2.49 rows=1 width=18) (actual time=0.018..0.022 rows=1 loops=1)
Index Cond: (uid = 1)
Planning Time: 0.112 ms
Execution Time: 56.814 ms
(10 rows)

PG 解决方案2

在VID表存储已读UID, 而不是在UID中存储已读VID

vid int primary key,  -- 视频ID
score float4,   -- 推荐分数
uid int[],   -- 已读UID
ts timestamp
);
create index idx_p_vscore on p_vscore (score desc);

插入1000万条视频, score排名前5万个视频全部被1000个用户已读.

update p_vscore set
uid=array(select generate_series(1,1000))
where vid in (
select vid from p_vscore order by score desc limit 50000
);
public | p_vscore      | table             | postgres | unlogged    | heap          | 687 MB  |

为1号用户推荐视频并过滤已读, 取出100个vid

where uid is null or not uid @> array[1]
order by score desc
limit 100;
Limit  (cost=0.43..3.08 rows=100 width=8) (actual time=384.599..384.696 rows=100 loops=1)
->  Index Scan using idx_p_vscore on p_vscore  (cost=0.43..264791.46 rows=9999712 width=8) (actual time=384.598..384.685 rows=100 loops=1)
Filter: ((uid IS NULL) OR (NOT (uid @> '{1}'::integer[])))
Rows Removed by Filter: 50000
Planning Time: 0.062 ms
Execution Time: 384.713 ms
(6 rows)

和之前视频讲的一样, 为什么这种方式更慢了? 具有olap属性, 如下:
《重新发现PostgreSQL之美- 21 探访宇航员的食物》

PG 解决方案3

1、使用HLL、roaringbitmap存储过滤列表, 可以节省列表的存储空间, 有些用户可能已读VID达到上百万, 需要大量的存储空间.

create unlogged table p1_u_readlist (
uid int,  -- UID
vid hll   -- 已读VID
);
create index idx_p1_u_readlist on p1_u_readlist (uid);

5000万条压缩到1000条

public | p1_u_readlist | table             | postgres | unlogged    | heap          | 1376 kB |

为1号用户推荐视频并过滤已读, 取出100个vid.

where not exists (select 1 from p1_u_readlist where uid=1 and p1_u_readlist.vid = hll_add(p1_u_readlist.vid,hll_hash_integer(vscore.vid)))
order by score desc
limit 100;
Limit  (cost=0.71..5.23 rows=100 width=8) (actual time=602.038..632.683 rows=100 loops=1)
->  Nested Loop Anti Join  (cost=0.71..449335.43 rows=9949861 width=8) (actual time=602.037..632.672 rows=100 loops=1)
Join Filter: (p1_u_readlist.vid = hll_add(p1_u_readlist.vid, hll_hash_integer(vscore.vid, 0)))
Rows Removed by Join Filter: 100
->  Index Scan using idx_vscore on vscore  (cost=0.43..249335.74 rows=9999860 width=8) (actual time=0.010..37.183 rows=52695 loops=1)
->  Materialize  (cost=0.28..2.50 rows=1 width=1287) (actual time=0.000..0.000 rows=1 loops=52695)
->  Index Scan using idx_p1_u_readlist on p1_u_readlist  (cost=0.28..2.49 rows=1 width=1287) (actual time=0.019..0.021 rows=1 loops=1)
Index Cond: (uid = 1)
Planning Time: 0.146 ms
Execution Time: 632.714 ms
(10 rows)

和之前视频讲的一样, 为什么这种方式更慢了? 具有olap属性, 如下:
《重新发现PostgreSQL之美- 21 探访宇航员的食物》

PG 解决方案4

在VID表存储已读UID, 而不是在UID中存储已读VID

使用HLL、roaringbitmap存储过滤列表

在VID表存储已读UID, 而不是在UID中存储已读VID

vid int primary key,  -- 视频ID
score float4,   -- 推荐分数
uid hll,   -- 已读UID
ts timestamp
);
create index idx_p1_vscore on p1_vscore (score desc);

插入1000万条视频, score排名前5万个视频全部被1000个用户已读.

update p1_vscore set
uid= (select hll_add_agg(hll_hash_integer(i)) from generate_series(1,1000) i)
where vid in (
select vid from p1_vscore order by score desc limit 50000
);
public | p_vscore      | table             | postgres | unlogged    | heap          | 687 MB  |
public | p1_vscore | table | postgres | unlogged    | heap          | 488 MB |

为1号用户推荐视频并过滤已读, 取出100个vid.

where uid is null or uid <> hll_add(uid,hll_hash_integer(1))
order by score desc
limit 100;
Limit  (cost=0.43..3.38 rows=100 width=8) (actual time=565.654..565.773 rows=100 loops=1)
->  Index Scan using idx_p1_vscore on p1_vscore  (cost=0.43..339509.68 rows=11541416 width=8) (actual time=565.653..565.763 rows=100 loops=1)
Filter: ((uid IS NULL) OR (uid <> hll_add(uid, '-8604791237420463362'::hll_hashval)))
Rows Removed by Filter: 50000
Planning Time: 0.090 ms
Execution Time: 565.795 ms
(6 rows)

PG 解决方案5

采用partial index, 注意看清楚, 这个方法直接将过滤已读从5万缩小到1000

uid int,  -- UID
hashid int,  -- partial index hashid
vid int[]   -- 已读VID
);
create index idx_pp_u_readlist on pp_u_readlist (uid,hashid);

假设hash 50片, 5000万条压缩到50000条,

public | pp_u_readlist | table | postgres | unlogged    | heap          | 229 MB |

视频表创建partial index

declare
begin
for i in 0..49 loop
execute format( 'create index p_vscore_p%s on vscore (score desc) where abs(mod(hashint4(vid),50))=%s', i,i);
end loop;
end;
$$;
List of relations
Schema |     Name      | Type  |  Owner   |  Table   | Persistence | Access method |  Size   | Description
--------+---------------+-------+----------+----------+-------------+---------------+---------+-------------
public | p_vscore_p0   | index | postgres | vscore   | unlogged    | btree         | 4416 kB |
public | p_vscore_p1   | index | postgres | vscore   | unlogged    | btree         | 4416 kB |
public | p_vscore_p10  | index | postgres | vscore   | unlogged    | btree         | 4408 kB |
public | p_vscore_p11  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |
public | p_vscore_p12  | index | postgres | vscore   | unlogged    | btree         | 4432 kB |
public | p_vscore_p13  | index | postgres | vscore   | unlogged    | btree         | 4408 kB |
public | p_vscore_p14  | index | postgres | vscore   | unlogged    | btree         | 4400 kB |
public | p_vscore_p15  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |
public | p_vscore_p16  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |
public | p_vscore_p17  | index | postgres | vscore   | unlogged    | btree         | 4424 kB |
public | p_vscore_p18  | index | postgres | vscore   | unlogged    | btree         | 4424 kB |
public | p_vscore_p19  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |
public | p_vscore_p2   | index | postgres | vscore   | unlogged    | btree         | 4408 kB |
public | p_vscore_p20  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |
public | p_vscore_p21  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |
public | p_vscore_p22  | index | postgres | vscore   | unlogged    | btree         | 4408 kB |
public | p_vscore_p23  | index | postgres | vscore   | unlogged    | btree         | 4432 kB |
public | p_vscore_p24  | index | postgres | vscore   | unlogged    | btree         | 4408 kB |
public | p_vscore_p25  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |
public | p_vscore_p26  | index | postgres | vscore   | unlogged    | btree         | 4408 kB |
public | p_vscore_p27  | index | postgres | vscore   | unlogged    | btree         | 4408 kB |
public | p_vscore_p28  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |
public | p_vscore_p29  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |
public | p_vscore_p3   | index | postgres | vscore   | unlogged    | btree         | 4424 kB |
public | p_vscore_p30  | index | postgres | vscore   | unlogged    | btree         | 4432 kB |
public | p_vscore_p31  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |
public | p_vscore_p32  | index | postgres | vscore   | unlogged    | btree         | 4424 kB |
public | p_vscore_p33  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |
public | p_vscore_p34  | index | postgres | vscore   | unlogged    | btree         | 4424 kB |
public | p_vscore_p35  | index | postgres | vscore   | unlogged    | btree         | 4400 kB |
public | p_vscore_p36  | index | postgres | vscore   | unlogged    | btree         | 4424 kB |
public | p_vscore_p37  | index | postgres | vscore   | unlogged    | btree         | 4408 kB |
public | p_vscore_p38  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |
public | p_vscore_p39  | index | postgres | vscore   | unlogged    | btree         | 4408 kB |
public | p_vscore_p4   | index | postgres | vscore   | unlogged    | btree         | 4432 kB |
public | p_vscore_p40  | index | postgres | vscore   | unlogged    | btree         | 4432 kB |
public | p_vscore_p41  | index | postgres | vscore   | unlogged    | btree         | 4400 kB |
public | p_vscore_p42  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |
public | p_vscore_p43  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |
public | p_vscore_p44  | index | postgres | vscore   | unlogged    | btree         | 4424 kB |
public | p_vscore_p45  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |
public | p_vscore_p46  | index | postgres | vscore   | unlogged    | btree         | 4424 kB |
public | p_vscore_p47  | index | postgres | vscore   | unlogged    | btree         | 4400 kB |
public | p_vscore_p48  | index | postgres | vscore   | unlogged    | btree         | 4416 kB |
public | p_vscore_p49  | index | postgres | vscore   | unlogged    | btree         | 4424 kB |
public | p_vscore_p5   | index | postgres | vscore   | unlogged    | btree         | 4408 kB |
public | p_vscore_p6   | index | postgres | vscore   | unlogged    | btree         | 4408 kB |
public | p_vscore_p7   | index | postgres | vscore   | unlogged    | btree         | 4416 kB |
public | p_vscore_p8   | index | postgres | vscore   | unlogged    | btree         | 4424 kB |
public | p_vscore_p9   | index | postgres | vscore   | unlogged    | btree         | 4408 kB |

为1号用户推荐视频并过滤已读, 取出100个vid.
每次获取时业务随机或采用round robin负载均衡方式从0-49个分片中获取100个.

where vid not in (select unnest(vid) from pp_u_readlist where uid=1 and hashid=0)
and abs(mod(hashint4(vid),50))=0
order by score desc
limit 100;
Limit  (cost=3.01..157.54 rows=100 width=8) (actual time=1.223..1.331 rows=100 loops=1)
->  Index Scan using p_vscore_p0 on vscore  (cost=3.01..38635.65 rows=25000 width=8) (actual time=1.222..1.317 rows=100 loops=1)
Filter: (NOT (hashed SubPlan 1))
Rows Removed by Filter: 973
SubPlan 1
->  ProjectSet  (cost=0.29..2.57 rows=10 width=4) (actual time=0.021..0.115 rows=973 loops=1)
->  Index Scan using idx_pp_u_readlist on pp_u_readlist  (cost=0.29..2.51 rows=1 width=18) (actual time=0.007..0.008 rows=1 loops=1)
Index Cond: ((uid = 1) AND (hashid = 0))
Planning Time: 0.515 ms
Execution Time: 1.366 ms
(10 rows)

其他方案略

datasketch
roaringbitmap

总结

推荐查询性能提升:

已读列表存储空间节约:

原理

1、多值列, 一个用户已读只需要存储1条
2、datasketch 近似类型, 几KB空间可以存储上亿唯一值.
3、压缩存储多值列roaringbitmap类型, 效率和空间的平衡.
4、使用partial index, 拆散过滤列表, 降低每次请求的无效数据.

参考

《PostgreSQL 大量IO扫描、计算浪费的优化- 推荐模块, 过滤已推荐. (热点用户、已推荐列表超大)》

《推荐系统, 已阅读过滤, 大量CPU和IO浪费的优化思路2 - partial index - hash 分片,降低过滤量》

《PostgreSQL 推荐系统优化总计- 空间、时间、标量等混合多模查询场景, 大量已读过滤导致CPU IO剧增(类挖矿概率下降优化)》

《近似查询处理(Approximate Query Processing) - DataSketches - 压缩、distinct、分位、rank、高
频柱状图(count distinct, quantiles, most-frequent items, joins, matrix computations, and graph analysis) 等实时大数据近似分析》

《PostgreSQL pg_roaringbitmap - 用户画像、标签、高效检索》

除了本文的应用, hll, roaringbitmap还经常被用在实时分析系统, 画像系统中, 支持高速的PV UV统计, 滑窗分析, 人群圈选等.

 

上一篇:PG数据库升级步骤说明(pg_dumpall和pg_upgrade)


下一篇:为IIS添加json扩展类型文件的MiME类型