Greenplum 海量数据，大宽表行存 VS 列存

2021-11-01 21:04:50

背景

GPDB支持行存列存，而且相关的选择原理我们之前的文章也有很详细的分析，同时在性能方面也做过较多的评测对比。

但是在大宽表上，相差到底有多悬殊我们之前没有对其进行过对比。主要的差异来自deform和QUERY扫描的存储空间（理论上，数据量越大，同时访问的列越少，列存的优势越明显。或者是访问到的列越靠后，优势越明显）。

一些原理参考

《PostgreSQL VOPS 向量计算 + DBLINK异步并行 - 单实例 10亿聚合计算跑进2秒》

《PostgreSQL 向量化执行插件(瓦片式实现-vops) 10x提速OLAP》

一些选择参考本文末尾的链接文章。

测试

1、创建一个函数，用于创建1000个列的表(每列类型INT8)

create or replace function crt_tbl(name, text, text) returns void as $$  
declare  
  sql text;  
begin  
  sql := 'create table '||$1||'(';  
  for i in 1..1000 loop  
    sql := sql||'c'||i||' int8 default random()*10000,';  
  end loop;  
  sql := rtrim(sql, ',');  
  sql := sql||') with (APPENDONLY=true, ORIENTATION='||$2||', COMPRESSTYPE='||$3||')';  
  execute sql;  
end;  
$$ language plpgsql strict;

2、创建行存表

select crt_tbl('tbl_1000_row_nonc', 'row', 'none');  
  
select crt_tbl('tbl_1000_row_c', 'row', 'zlib');

3、创建列存表

select crt_tbl('tbl_1000_column_nonc', 'column', 'none');  
  
select crt_tbl('tbl_1000_column_c', 'column', 'zlib');

4、写入100万行测试数据

postgres=# insert into tbl_1000_row_nonc (c1) select generate_series(1,1000000);  
INSERT 0 1000000  
Time: 221003.467 ms  
postgres=# insert into tbl_1000_row_c select * from tbl_1000_row_nonc;  
INSERT 0 1000000  
Time: 12298.931 ms  
postgres=# insert into tbl_1000_column_nonc select * from tbl_1000_row_nonc;  
INSERT 0 1000000  
Time: 23332.039 ms  
postgres=# insert into tbl_1000_column_c select * from tbl_1000_row_nonc;  
INSERT 0 1000000  
Time: 17017.119 ms

5、空间对比

postgres=# select pg_size_pretty(pg_total_relation_size('tbl_1000_row_nonc'));  
 pg_size_pretty   
----------------  
 7653 MB  
(1 row)  
  
Time: 282.180 ms  
postgres=# select pg_size_pretty(pg_total_relation_size('tbl_1000_column_nonc'));  
 pg_size_pretty   
----------------  
 7647 MB  
(1 row)  
  
Time: 55.315 ms  
  
postgres=# select pg_size_pretty(pg_total_relation_size('tbl_1000_row_c'));  
 pg_size_pretty   
----------------  
 2522 MB  
(1 row)  
  
Time: 56.017 ms  
postgres=# select pg_size_pretty(pg_total_relation_size('tbl_1000_column_c'));  
 pg_size_pretty   
----------------  
 2520 MB  
(1 row)  
  
Time: 55.557 ms

6、第一列查询对比

postgres=# explain analyze select c1,count(*) from tbl_1000_row_nonc group by c1;  
                                                                 QUERY PLAN                                                                    
---------------------------------------------------------------------------------------------------------------------------------------------  
 Gather Motion 48:1  (slice1; segments: 48)  (cost=259714.00..272214.00 rows=1000000 width=16)  
   Rows out:  1000000 rows at destination with 352 ms to end, start offset by 1.483 ms.  
   ->  HashAggregate  (cost=259714.00..272214.00 rows=20834 width=16)  
         Group By: c1  
         Rows out:  Avg 20833.3 rows x 48 workers.  Max 20854 rows (seg42) with 0.005 ms to first row, 101 ms to end, start offset by 63 ms.  
         ->  Append-only Scan on tbl_1000_row_nonc  (cost=0.00..254714.00 rows=20834 width=8)  
               Rows out:  0 rows (seg0) with 19 ms to end, start offset by 82 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 405K bytes.  
   (slice1)    Executor memory: 500K bytes avg x 48 workers, 500K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Settings:  enable_bitmapscan=off; enable_seqscan=off; optimizer=off  
 Optimizer status: legacy query optimizer  
 Total runtime: 353.793 ms  
(15 rows)  
  
postgres=# explain analyze select c1,count(*) from tbl_1000_row_c group by c1;  
                                                                 QUERY PLAN                                                                    
---------------------------------------------------------------------------------------------------------------------------------------------  
 Gather Motion 48:1  (slice1; segments: 48)  (cost=95537.00..108037.00 rows=1000000 width=16)  
   Rows out:  1000000 rows at destination with 1128 ms to end, start offset by 1.360 ms.  
   ->  HashAggregate  (cost=95537.00..108037.00 rows=20834 width=16)  
         Group By: c1  
         Rows out:  Avg 20833.3 rows x 48 workers.  Max 20854 rows (seg42) with 0.003 ms to first row, 699 ms to end, start offset by 42 ms.  
         ->  Append-only Scan on tbl_1000_row_c  (cost=0.00..90537.00 rows=20834 width=8)  
               Rows out:  0 rows (seg0) with 13 ms to end, start offset by 42 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 405K bytes.  
   (slice1)    Executor memory: 500K bytes avg x 48 workers, 500K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Settings:  enable_bitmapscan=off; enable_seqscan=off; optimizer=off  
 Optimizer status: legacy query optimizer  
 Total runtime: 1130.006 ms  
(15 rows)  
  
  
postgres=# explain analyze select c1,count(*) from tbl_1000_column_nonc group by c1;  
                                                                  QUERY PLAN                                                                     
-----------------------------------------------------------------------------------------------------------------------------------------------  
 Gather Motion 48:1  (slice1; segments: 48)  (cost=259518.00..272018.00 rows=1000000 width=16)  
   Rows out:  1000000 rows at destination with 245 ms to end, start offset by 1.365 ms.  
   ->  HashAggregate  (cost=259518.00..272018.00 rows=20834 width=16)  
         Group By: c1  
         Rows out:  Avg 20833.3 rows x 48 workers.  Max 20854 rows (seg42) with 0.005 ms to first row, 9.968 ms to end, start offset by 60 ms.  
         ->  Append-only Columnar Scan on tbl_1000_column_nonc  (cost=0.00..254518.00 rows=20834 width=8)  
               Rows out:  0 rows (seg0) with 16 ms to end, start offset by 60 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 405K bytes.  
   (slice1)    Executor memory: 821K bytes avg x 48 workers, 821K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Settings:  enable_bitmapscan=off; enable_seqscan=off; optimizer=off  
 Optimizer status: legacy query optimizer  
 Total runtime: 246.967 ms  
(15 rows)  
  
  
postgres=# explain analyze select c1,count(*) from tbl_1000_column_c group by c1;  
                                                                  QUERY PLAN                                                                     
-----------------------------------------------------------------------------------------------------------------------------------------------  
 Gather Motion 48:1  (slice1; segments: 48)  (cost=259518.00..272018.00 rows=1000000 width=16)  
   Rows out:  1000000 rows at destination with 249 ms to end, start offset by 1.450 ms.  
   ->  HashAggregate  (cost=259518.00..272018.00 rows=20834 width=16)  
         Group By: c1  
         Rows out:  Avg 20833.3 rows x 48 workers.  Max 20854 rows (seg42) with 0.004 ms to first row, 8.861 ms to end, start offset by 46 ms.  
         ->  Append-only Columnar Scan on tbl_1000_column_c  (cost=0.00..254518.00 rows=20834 width=8)  
               Rows out:  0 rows (seg0) with 16 ms to end, start offset by 45 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 405K bytes.  
   (slice1)    Executor memory: 853K bytes avg x 48 workers, 853K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Settings:  enable_bitmapscan=off; enable_seqscan=off; optimizer=off  
 Optimizer status: legacy query optimizer  
 Total runtime: 250.583 ms  
(15 rows)

7、最后一列查询对比

postgres=# explain analyze select c1000,count(*) from tbl_1000_row_nonc group by c1000;  
                                                                      QUERY PLAN                                                                         
-------------------------------------------------------------------------------------------------------------------------------------------------------  
 Gather Motion 48:1  (slice2; segments: 48)  (cost=260048.04..260167.34 rows=9544 width=16)  
   Rows out:  10001 rows at destination with 296 ms to end, start offset by 1.806 ms.  
   ->  HashAggregate  (cost=260048.04..260167.34 rows=199 width=16)  
         Group By: tbl_1000_row_nonc.c1000  
         Rows out:  Avg 208.4 rows x 48 workers.  Max 223 rows (seg17) with 0.002 ms to first row, 133 ms to end, start offset by 21 ms.  
         ->  Redistribute Motion 48:48  (slice1; segments: 48)  (cost=259714.00..259904.88 rows=199 width=16)  
               Hash Key: tbl_1000_row_nonc.c1000  
               Rows out:  Avg 8749.4 rows x 48 workers at destination.  Max 9360 rows (seg46) with 141 ms to end, start offset by 21 ms.  
               ->  HashAggregate  (cost=259714.00..259714.00 rows=199 width=16)  
                     Group By: tbl_1000_row_nonc.c1000  
                     Rows out:  Avg 8749.4 rows x 48 workers.  Max 8814 rows (seg18) with 0.003 ms to first row, 209 ms to end, start offset by 63 ms.  
                     ->  Append-only Scan on tbl_1000_row_nonc  (cost=0.00..254714.00 rows=20834 width=8)  
                           Rows out:  0 rows (seg0) with 16 ms to end, start offset by 64 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 417K bytes.  
   (slice1)    Executor memory: 1818K bytes avg x 48 workers, 1818K bytes max (seg0).  
   (slice2)    Executor memory: 434K bytes avg x 48 workers, 434K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Settings:  enable_bitmapscan=off; enable_seqscan=off; optimizer=off  
 Optimizer status: legacy query optimizer  
 Total runtime: 298.222 ms  
(22 rows)  
  
postgres=# explain analyze select c1000,count(*) from tbl_1000_row_c group by c1000;  
                                                                       QUERY PLAN                                                                         
--------------------------------------------------------------------------------------------------------------------------------------------------------  
 Gather Motion 48:1  (slice2; segments: 48)  (cost=95886.65..96011.53 rows=9990 width=16)  
   Rows out:  10001 rows at destination with 1244 ms to end, start offset by 1.760 ms.  
   ->  HashAggregate  (cost=95886.65..96011.53 rows=209 width=16)  
         Group By: tbl_1000_row_c.c1000  
         Rows out:  Avg 208.4 rows x 48 workers.  Max 223 rows (seg17) with 0.002 ms to first row, 432 ms to end, start offset by 32 ms.  
         ->  Redistribute Motion 48:48  (slice1; segments: 48)  (cost=95537.00..95736.80 rows=209 width=16)  
               Hash Key: tbl_1000_row_c.c1000  
               Rows out:  Avg 8749.4 rows x 48 workers at destination.  Max 9360 rows (seg46) with 755 ms to end, start offset by 57 ms.  
               ->  HashAggregate  (cost=95537.00..95537.00 rows=209 width=16)  
                     Group By: tbl_1000_row_c.c1000  
                     Rows out:  Avg 8749.4 rows x 48 workers.  Max 8814 rows (seg18) with 0.004 ms to first row, 1085 ms to end, start offset by 63 ms.  
                     ->  Append-only Scan on tbl_1000_row_c  (cost=0.00..90537.00 rows=20834 width=8)  
                           Rows out:  0 rows (seg0) with 22 ms to end, start offset by 59 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 417K bytes.  
   (slice1)    Executor memory: 1818K bytes avg x 48 workers, 1818K bytes max (seg0).  
   (slice2)    Executor memory: 434K bytes avg x 48 workers, 434K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Settings:  enable_bitmapscan=off; enable_seqscan=off; optimizer=off  
 Optimizer status: legacy query optimizer  
 Total runtime: 1246.276 ms  
(22 rows)  
  
postgres=# explain analyze select c1000,count(*) from tbl_1000_column_c group by c1000;  
                                                                       QUERY PLAN                                                                          
---------------------------------------------------------------------------------------------------------------------------------------------------------  
 Gather Motion 48:1  (slice2; segments: 48)  (cost=259870.80..259996.80 rows=10080 width=16)  
   Rows out:  10001 rows at destination with 78 ms to end, start offset by 1.783 ms.  
   ->  HashAggregate  (cost=259870.80..259996.80 rows=210 width=16)  
         Group By: tbl_1000_column_c.c1000  
         Rows out:  Avg 208.4 rows x 48 workers.  Max 223 rows (seg17) with 0.004 ms to first row, 24 ms to end, start offset by 26 ms.  
         ->  Redistribute Motion 48:48  (slice1; segments: 48)  (cost=259518.00..259719.60 rows=210 width=16)  
               Hash Key: tbl_1000_column_c.c1000  
               Rows out:  Avg 8749.4 rows x 48 workers at destination.  Max 9360 rows (seg46) with 31 ms to end, start offset by 23 ms.  
               ->  HashAggregate  (cost=259518.00..259518.00 rows=210 width=16)  
                     Group By: tbl_1000_column_c.c1000  
                     Rows out:  Avg 8749.4 rows x 48 workers.  Max 8814 rows (seg18) with 0.004 ms to first row, 5.962 ms to end, start offset by 42 ms.  
                     ->  Append-only Columnar Scan on tbl_1000_column_c  (cost=0.00..254518.00 rows=20834 width=8)  
                           Rows out:  0 rows (seg0) with 28 ms to end, start offset by 30 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 417K bytes.  
   (slice1)    Executor memory: 1787K bytes avg x 48 workers, 1787K bytes max (seg0).  
   (slice2)    Executor memory: 434K bytes avg x 48 workers, 434K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Settings:  enable_bitmapscan=off; enable_seqscan=off; optimizer=off  
 Optimizer status: legacy query optimizer  
 Total runtime: 80.152 ms  
(22 rows)  
  
postgres=# explain analyze select c1000,count(*) from tbl_1000_column_nonc group by c1000;  
                                                                       QUERY PLAN                                                                          
---------------------------------------------------------------------------------------------------------------------------------------------------------  
 Gather Motion 48:1  (slice2; segments: 48)  (cost=259856.48..259977.37 rows=9671 width=16)  
   Rows out:  10001 rows at destination with 77 ms to end, start offset by 2.019 ms.  
   ->  HashAggregate  (cost=259856.48..259977.37 rows=202 width=16)  
         Group By: tbl_1000_column_nonc.c1000  
         Rows out:  Avg 208.4 rows x 48 workers.  Max 223 rows (seg17) with 0.003 ms to first row, 26 ms to end, start offset by 35 ms.  
         ->  Redistribute Motion 48:48  (slice1; segments: 48)  (cost=259518.00..259711.42 rows=202 width=16)  
               Hash Key: tbl_1000_column_nonc.c1000  
               Rows out:  Avg 8749.4 rows x 48 workers at destination.  Max 9360 rows (seg46) with 20 ms to end, start offset by 30 ms.  
               ->  HashAggregate  (cost=259518.00..259518.00 rows=202 width=16)  
                     Group By: tbl_1000_column_nonc.c1000  
                     Rows out:  Avg 8749.4 rows x 48 workers.  Max 8814 rows (seg18) with 0.006 ms to first row, 5.476 ms to end, start offset by 25 ms.  
                     ->  Append-only Columnar Scan on tbl_1000_column_nonc  (cost=0.00..254518.00 rows=20834 width=8)  
                           Rows out:  0 rows (seg0) with 27 ms to end, start offset by 25 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 417K bytes.  
   (slice1)    Executor memory: 1755K bytes avg x 48 workers, 1755K bytes max (seg0).  
   (slice2)    Executor memory: 1330K bytes avg x 48 workers, 1330K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Settings:  enable_bitmapscan=off; enable_seqscan=off; optimizer=off  
 Optimizer status: legacy query optimizer  
 Total runtime: 78.976 ms  
(22 rows)

9、更新对比

postgres=# update tbl_1000_row_nonc set c1000=c1000+1;  
UPDATE 1000000  
Time: 7085.645 ms  
postgres=# update tbl_1000_row_c set c1000=c1000+1;  
UPDATE 1000000  
Time: 6734.279 ms  
postgres=# update tbl_1000_column_nonc set c1000=c1000+1;  
UPDATE 1000000  
Time: 13514.749 ms  
postgres=# update tbl_1000_column_c set c1000=c1000+1;  
UPDATE 1000000  
Time: 10629.104 ms

小结

表	空间	查询第一列	查询最后一列	全表更新
AO行存不压缩	7653 MB	353.793 ms	298.222 ms	7085.645 ms
AO行存压缩	2522 MB	1130.006 ms	1246.276 ms	6734.279 ms
AO列存不压缩	7647 MB	246.967 ms	80.152 ms	13514.749 ms
AO列存压缩	2520 MB	250.583 ms	78.976 ms	10629.104 ms

对于大宽表，查询性能显然列存储要好很多(不管是第一列还是最后一列，性能一致)，写入和更新性能列存储略差因为需要操作更多的数据文件。

列存第一列和最后一列性能差异是返回记录数导致的，除了第一列，其他列的取值空间都是10000，第一列是100万。

有兴趣可以再测试一些其他参数，比如压缩级别，块大小，HEAP TABLE：

where storage_parameter is:
   APPENDONLY={TRUE|FALSE}
   BLOCKSIZE={8192-2097152}
   ORIENTATION={COLUMN|ROW}
   COMPRESSTYPE={ZLIB|QUICKLZ|RLE_TYPE|NONE}
   COMPRESSLEVEL={0-9}
   CHECKSUM={TRUE|FALSE}
   FILLFACTOR={10-100}
   OIDS[=TRUE|FALSE]