背景
对于较大数据量的表,如果在索引字段上面有小结果集JOIN,用nestloop JOIN是比较好的方法。
但是nestloop带来的一个问题就是离散IO,这个是无法回避的问题,特别是硬件IO能力不行的情况下,性能会比较糟糕。
有什么优化方法呢?
PostgreSQL提供了一个命令,可以修改物理存储的顺序,减少离散IO就靠它了。
例子
创建两张表
postgres=# create unlogged table test01(id int primary key, info text);
CREATE TABLE
postgres=# create unlogged table test02(id int primary key, info text);
CREATE TABLE
产生一些离散primary key数据
postgres=# insert into test01 select trunc(random()*10000000), md5(random()::text) from generate_series(1,10000000) on conflict on constraint test01_pkey do nothing;
INSERT 0 6322422
postgres=# insert into test02 select trunc(random()*10000000), md5(random()::text) from generate_series(1,10000000) on conflict on constraint test02_pkey do nothing;
INSERT 0 6320836
分析表
postgres=# analyze test01;
postgres=# analyze test02;
清除缓存,并重启
$ pg_ctl stop -m fast
# echo 3 > /proc/sys/vm/drop_caches
$ pg_ctl start
第一次调用,耗费大量的离散IO,执行时间18.490毫秒(我这台机器是SSD,IOPS能力算好的,差的机器时间更长)
postgres=# explain (analyze,verbose,timing,costs,buffers) select t1.*,t2.* from test01 t1,test02 t2 where t1.id=t2.id and t1.id between 1 and 1000;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=19.25..7532.97 rows=623 width=74) (actual time=0.465..17.221 rows=402 loops=1)
Output: t1.id, t1.info, t2.id, t2.info
Buffers: shared hit=1929 read=1039 dirtied=188
-> Bitmap Heap Scan on public.test01 t1 (cost=18.82..2306.39 rows=623 width=37) (actual time=0.416..8.019 rows=640 loops=1)
Output: t1.id, t1.info
Recheck Cond: ((t1.id >= 1) AND (t1.id <= 1000))
Heap Blocks: exact=637
Buffers: shared hit=5 read=637 dirtied=123
-> Bitmap Index Scan on test01_pkey (cost=0.00..18.66 rows=623 width=0) (actual time=0.254..0.254 rows=640 loops=1)
Index Cond: ((t1.id >= 1) AND (t1.id <= 1000))
Buffers: shared hit=4 read=1
-> Index Scan using test02_pkey on public.test02 t2 (cost=0.43..8.38 rows=1 width=37) (actual time=0.013..0.013 rows=1 loops=640)
Output: t2.id, t2.info
Index Cond: (t2.id = t1.id)
Buffers: shared hit=1924 read=402 dirtied=65
Planning time: 26.668 ms
Execution time: 18.490 ms
(17 rows)
第二次,缓存命中5.4毫秒
postgres=# explain (analyze,verbose,timing,costs,buffers) select t1.*,t2.* from test01 t1,test02 t2 where t1.id=t2.id and t1.id between 1 and 1000;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=19.25..7532.97 rows=623 width=74) (actual time=0.392..5.150 rows=402 loops=1)
Output: t1.id, t1.info, t2.id, t2.info
Buffers: shared hit=2968
-> Bitmap Heap Scan on public.test01 t1 (cost=18.82..2306.39 rows=623 width=37) (actual time=0.373..1.760 rows=640 loops=1)
Output: t1.id, t1.info
Recheck Cond: ((t1.id >= 1) AND (t1.id <= 1000))
Heap Blocks: exact=637
Buffers: shared hit=642
-> Bitmap Index Scan on test01_pkey (cost=0.00..18.66 rows=623 width=0) (actual time=0.218..0.218 rows=640 loops=1)
Index Cond: ((t1.id >= 1) AND (t1.id <= 1000))
Buffers: shared hit=5
-> Index Scan using test02_pkey on public.test02 t2 (cost=0.43..8.38 rows=1 width=37) (actual time=0.004..0.004 rows=1 loops=640)
Output: t2.id, t2.info
Index Cond: (t2.id = t1.id)
Buffers: shared hit=2326
Planning time: 0.956 ms
Execution time: 5.434 ms
(17 rows)
根据索引字段调整表的物理顺序,降低离散IO。
postgres=# cluster test01 using test01_pkey;
CLUSTER
postgres=# cluster test02 using test02_pkey;
CLUSTER
postgres=# analyze test01;
postgres=# analyze test02;
清除缓存,重启数据库
$ pg_ctl stop -m fast
# echo 3 > /proc/sys/vm/drop_caches
$ pg_ctl start
第一次调用,降低到了5.4毫秒
postgres=# explain (analyze,verbose,timing,costs,buffers) select t1.*,t2.* from test01 t1,test02 t2 where t1.id=t2.id and t1.id between 1 and 1000;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.86..5618.07 rows=668 width=74) (actual time=0.069..4.072 rows=402 loops=1)
Output: t1.id, t1.info, t2.id, t2.info
Buffers: shared hit=2323 read=12
-> Index Scan using test01_pkey on public.test01 t1 (cost=0.43..30.79 rows=668 width=37) (actual time=0.040..0.557 rows=640 loops=1)
Output: t1.id, t1.info
Index Cond: ((t1.id >= 1) AND (t1.id <= 1000))
Buffers: shared hit=5 read=6
-> Index Scan using test02_pkey on public.test02 t2 (cost=0.43..8.35 rows=1 width=37) (actual time=0.004..0.004 rows=1 loops=640)
Output: t2.id, t2.info
Index Cond: (t2.id = t1.id)
Buffers: shared hit=2318 read=6 -- 注意在cluster之后,shared hit并没有下降,因为LOOP了多次,但是性能确比cluster 之前提升了很多,因为需要访问的HEAP page少了,OS cache可以瞬间命中。
Planning time: 42.356 ms
Execution time: 5.426 ms
(13 rows)
第二次调用,3.6毫秒
postgres=# explain (analyze,verbose,timing,costs,buffers) select t1.*,t2.* from test01 t1,test02 t2 where t1.id=t2.id and t1.id between 1 and 1000;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.86..5618.07 rows=668 width=74) (actual time=0.055..3.414 rows=402 loops=1)
Output: t1.id, t1.info, t2.id, t2.info
Buffers: shared hit=2335
-> Index Scan using test01_pkey on public.test01 t1 (cost=0.43..30.79 rows=668 width=37) (actual time=0.037..0.374 rows=640 loops=1)
Output: t1.id, t1.info
Index Cond: ((t1.id >= 1) AND (t1.id <= 1000))
Buffers: shared hit=11
-> Index Scan using test02_pkey on public.test02 t2 (cost=0.43..8.35 rows=1 width=37) (actual time=0.003..0.004 rows=1 loops=640)
Output: t2.id, t2.info
Index Cond: (t2.id = t1.id)
Buffers: shared hit=2324
Planning time: 1.042 ms
Execution time: 3.620 ms
(13 rows)
小结
通过cluster, 将表的物理顺序和索引对齐,所以如果查询的值是连续的,在使用嵌套循环时可以大幅减少离散IO,取得非常好查询优化的效果。
如果查询的值是跳跃的,那么这种方法就没有效果啦,不过好在PostgreSQL有bitmap index scan,在读取heap tuple前,会对ctid排序,按排序后的ctid取heap tuple,也可以起到减少离散IO的作用。