【重新发现PostgreSQL之美】- 22 黄帝内经

背景


场景:
电商、社交、SAAS软件行业.
用户表、租户表、用户行为表.

挑战:
按企业、appid等hash分区可能出现倾斜, 例如某些APPID的数据量可能非常大, 热数据可能在同一个分区, saas的某些大企业可能被分到同一分区, 使得数据倾斜.

PG 解决方案:
非对称分区方法.

例子


1、创建1个分区表, 按appid hash分区

appid int,
c1 int,
c2 int,
info text,
ts timestamp
) partition by hash (appid);
create unlogged table t0 partition of t for values with (modulus 4, REMAINDER 0);
create unlogged table t1 partition of t for values with (modulus 4, REMAINDER 1);
create unlogged table t2 partition of t for values with (modulus 4, REMAINDER 2);
create unlogged table t3 partition of t for values with (modulus 4, REMAINDER 3);

2、写入2000万记录, 其中1000万appid=1, 制造倾斜

insert into t select 1, random()*100, random()*1000, md5(random()::text), clock_timestamp() from generate_series(1,10000000);
List of relations
Schema |   Name   |       Type        |  Owner   | Persistence | Access method |  Size   | Description
--------+----------+-------------------+----------+-------------+---------------+---------+-------------
public | t        | partitioned table | postgres | unlogged    |               | 0 bytes |
public | t0       | table             | postgres | unlogged    | heap          | 1014 MB |
public | t1       | table             | postgres | unlogged    | heap          | 189 MB  |
public | t2       | table             | postgres | unlogged    | heap          | 222 MB  |
public | t3       | table             | postgres | unlogged    | heap          | 186 MB  |

3、解决办法, 对于appid=1的分区, 再次按c1 hash分区, 其他分区保持不变.

create unlogged table tt0(like t) PARTITION BY hash (c1);
create unlogged table tt0_0 partition of tt0 for values with (modulus 4, REMAINDER 0);
create unlogged table tt0_1 partition of tt0 for values with (modulus 4, REMAINDER 1);
create unlogged table tt0_2 partition of tt0 for values with (modulus 4, REMAINDER 2);
create unlogged table tt0_3 partition of tt0 for values with (modulus 4, REMAINDER 3);
alter table t DETACH PARTITION t0;
insert into tt0 select * from t0;
alter table t attach partition tt0 FOR VALUES WITH ( MODULUS 4, REMAINDER 0 );
end;
Unlogged partitioned table "public.t"
Column |            Type             | Collation | Nullable | Default | Storage  | Compression | Stats target | Description
--------+-----------------------------+-----------+----------+---------+----------+-------------+--------------+-------------
appid  | integer                     |           |          |         | plain    |             |              |
c1     | integer                     |           |          |         | plain    |             |              |
c2     | integer                     |           |          |         | plain    |             |              |
info   | text                        |           |          |         | extended |             |              |
ts     | timestamp without time zone |           |          |         | plain    |             |              |
Partition key: HASH (appid)
Partitions: t1 FOR VALUES WITH (modulus 4, remainder 1),
t2 FOR VALUES WITH (modulus 4, remainder 2),
t3 FOR VALUES WITH (modulus 4, remainder 3),
tt0 FOR VALUES WITH (modulus 4, remainder 0), PARTITIONED
postgres=# \d+ tt0
Unlogged partitioned table "public.tt0"
Column |            Type             | Collation | Nullable | Default | Storage  | Compression | Stats target | Description
--------+-----------------------------+-----------+----------+---------+----------+-------------+--------------+-------------
appid  | integer                     |           |          |         | plain    |             |              |
c1     | integer                     |           |          |         | plain    |             |              |
c2     | integer                     |           |          |         | plain    |             |              |
info   | text                        |           |          |         | extended |             |              |
ts     | timestamp without time zone |           |          |         | plain    |             |              |
Partition of: t FOR VALUES WITH (modulus 4, remainder 0)
Partition constraint: satisfies_hash_partition('16552'::oid, 4, 0, appid)
Partition key: HASH (c1)
Partitions: tt0_0 FOR VALUES WITH (modulus 4, remainder 0),
tt0_1 FOR VALUES WITH (modulus 4, remainder 1),
tt0_2 FOR VALUES WITH (modulus 4, remainder 2),
tt0_3 FOR VALUES WITH (modulus 4, remainder 3)

可以看到appid=1的数据已经打散

List of relations
Schema |   Name   |       Type        |  Owner   | Persistence | Access method |  Size   | Description
--------+----------+-------------------+----------+-------------+---------------+---------+-------------
public | t        | partitioned table | postgres | unlogged    |               | 0 bytes |
public | t1       | table             | postgres | unlogged    | heap          | 189 MB  |
public | t2       | table             | postgres | unlogged    | heap          | 222 MB  |
public | t3       | table             | postgres | unlogged    | heap          | 186 MB  |
public | tt0      | partitioned table | postgres | unlogged    |               | 0 bytes |
public | tt0_0    | table             | postgres | unlogged    | heap          | 279 MB  |
public | tt0_1    | table             | postgres | unlogged    | heap          | 309 MB  |
public | tt0_2    | table             | postgres | unlogged    | heap          | 254 MB  |
public | tt0_3    | table             | postgres | unlogged    | heap          | 172 MB  |

4、检查执行计划, 分区被正常路由

QUERY PLAN
------------------------------------------------------------
Seq Scan on t2 t  (cost=0.00..62947.58 rows=9965 width=53)
Filter: (appid = 2)
(2 rows)
postgres=# explain select * from t where appid=1;
QUERY PLAN
--------------------------------------------------------------------------
Append  (cost=0.00..337327.48 rows=10031536 width=53)
->  Seq Scan on tt0_0 t_1  (cost=0.00..78943.27 rows=2771208 width=53)
Filter: (appid = 1)
->  Seq Scan on tt0_1 t_2  (cost=0.00..87582.51 rows=3057186 width=53)
Filter: (appid = 1)
->  Seq Scan on tt0_2 t_3  (cost=0.00..71825.61 rows=2499200 width=53)
Filter: (appid = 1)
->  Seq Scan on tt0_3 t_4  (cost=0.00..48818.40 rows=1703942 width=53)
Filter: (appid = 1)
(9 rows)
postgres=# explain select * from t where appid=1 and c1=1;
QUERY PLAN
----------------------------------------------------------------
Seq Scan on tt0_0 t  (cost=0.00..87595.73 rows=99763 width=53)
Filter: ((appid = 1) AND (c1 = 1))
(2 rows)

层次不限、分区方法不限

1、如果t1也有问题, 我们可以继续对t1分区进行二次分区, 选择不同的分区个数

create unlogged table tt1(like t) PARTITION BY hash (c1);
create unlogged table tt1_0 partition of tt1 for values with (modulus 5, REMAINDER 0);
create unlogged table tt1_1 partition of tt1 for values with (modulus 5, REMAINDER 1);
create unlogged table tt1_2 partition of tt1 for values with (modulus 5, REMAINDER 2);
create unlogged table tt1_3 partition of tt1 for values with (modulus 5, REMAINDER 3);
create unlogged table tt1_4 partition of tt1 for values with (modulus 5, REMAINDER 4);
alter table t DETACH PARTITION t1;
insert into tt1 select * from t1;
alter table t attach partition tt1 FOR VALUES WITH ( MODULUS 4, REMAINDER 1 );
end;
Unlogged partitioned table "public.t"
Column |            Type             | Collation | Nullable | Default | Storage  | Compression | Stats target | Description
--------+-----------------------------+-----------+----------+---------+----------+-------------+--------------+-------------
appid  | integer                     |           |          |         | plain    |             |              |
c1     | integer                     |           |          |         | plain    |             |              |
c2     | integer                     |           |          |         | plain    |             |              |
info   | text                        |           |          |         | extended |             |              |
ts     | timestamp without time zone |           |          |         | plain    |             |              |
Partition key: HASH (appid)
Partitions: t2 FOR VALUES WITH (modulus 4, remainder 2),
t3 FOR VALUES WITH (modulus 4, remainder 3),
tt0 FOR VALUES WITH (modulus 4, remainder 0), PARTITIONED,
tt1 FOR VALUES WITH (modulus 4, remainder 1), PARTITIONED
postgres=# \d+ tt1
Unlogged partitioned table "public.tt1"
Column |            Type             | Collation | Nullable | Default | Storage  | Compression | Stats target | Description
--------+-----------------------------+-----------+----------+---------+----------+-------------+--------------+-------------
appid  | integer                     |           |          |         | plain    |             |              |
c1     | integer                     |           |          |         | plain    |             |              |
c2     | integer                     |           |          |         | plain    |             |              |
info   | text                        |           |          |         | extended |             |              |
ts     | timestamp without time zone |           |          |         | plain    |             |              |
Partition of: t FOR VALUES WITH (modulus 4, remainder 1)
Partition constraint: satisfies_hash_partition('16552'::oid, 4, 1, appid)
Partition key: HASH (c1)
Partitions: tt1_0 FOR VALUES WITH (modulus 5, remainder 0),
tt1_1 FOR VALUES WITH (modulus 5, remainder 1),
tt1_2 FOR VALUES WITH (modulus 5, remainder 2),
tt1_3 FOR VALUES WITH (modulus 5, remainder 3),
tt1_4 FOR VALUES WITH (modulus 5, remainder 4)

路由正常

QUERY PLAN
-----------------------------------------------------------------------
Append  (cost=0.00..53418.63 rows=9951 width=53)
->  Seq Scan on tt1_0 t_1  (cost=0.00..12278.30 rows=2289 width=53)
Filter: (appid = 3)
->  Seq Scan on tt1_1 t_2  (cost=0.00..8540.11 rows=1593 width=53)
Filter: (appid = 3)
->  Seq Scan on tt1_2 t_3  (cost=0.00..10130.17 rows=1886 width=53)
Filter: (appid = 3)
->  Seq Scan on tt1_3 t_4  (cost=0.00..9849.98 rows=1837 width=53)
Filter: (appid = 3)
->  Seq Scan on tt1_4 t_5  (cost=0.00..12570.31 rows=2346 width=53)
Filter: (appid = 3)
(11 rows)
postgres=# explain select * from t where appid=3 and c1=2;
QUERY PLAN
-------------------------------------------------------------
Seq Scan on tt1_1 t  (cost=0.00..9476.14 rows=103 width=53)
Filter: ((appid = 3) AND (c1 = 2))
(2 rows)

2、我们也可以对t2使用range或list分区

create unlogged table tt2(like t) PARTITION BY range (ts);
create unlogged table tt2_2021 partition of tt2 for values from ('2021-01-01') to ('2022-01-01');
create unlogged table tt2_2022 partition of tt2 for values from ('2022-01-01') to ('2023-01-01');
alter table t DETACH PARTITION t2;
insert into tt2 select * from t2;
alter table t attach partition tt2 FOR VALUES WITH ( MODULUS 4, REMAINDER 2 );
end;
Unlogged partitioned table "public.t"
Column |            Type             | Collation | Nullable | Default | Storage  | Compression | Stats target | Description
--------+-----------------------------+-----------+----------+---------+----------+-------------+--------------+-------------
appid  | integer                     |           |          |         | plain    |             |              |
c1     | integer                     |           |          |         | plain    |             |              |
c2     | integer                     |           |          |         | plain    |             |              |
info   | text                        |           |          |         | extended |             |              |
ts     | timestamp without time zone |           |          |         | plain    |             |              |
Partition key: HASH (appid)
Partitions: t3 FOR VALUES WITH (modulus 4, remainder 3),
tt0 FOR VALUES WITH (modulus 4, remainder 0), PARTITIONED,
tt1 FOR VALUES WITH (modulus 4, remainder 1), PARTITIONED,
tt2 FOR VALUES WITH (modulus 4, remainder 2), PARTITIONED
postgres=# \d+ tt2
Unlogged partitioned table "public.tt2"
Column |            Type             | Collation | Nullable | Default | Storage  | Compression | Stats target | Description
--------+-----------------------------+-----------+----------+---------+----------+-------------+--------------+-------------
appid  | integer                     |           |          |         | plain    |             |              |
c1     | integer                     |           |          |         | plain    |             |              |
c2     | integer                     |           |          |         | plain    |             |              |
info   | text                        |           |          |         | extended |             |              |
ts     | timestamp without time zone |           |          |         | plain    |             |              |
Partition of: t FOR VALUES WITH (modulus 4, remainder 2)
Partition constraint: satisfies_hash_partition('16552'::oid, 4, 2, appid)
Partition key: RANGE (ts)
Partitions: tt2_2021 FOR VALUES FROM ('2021-01-01 00:00:00') TO ('2022-01-01 00:00:00'),
tt2_2022 FOR VALUES FROM ('2022-01-01 00:00:00') TO ('2023-01-01 00:00:00')

使用PG, 支持非对称分区(深度不对称、分区方法不对称、分区数不对称), 彻底解决数据倾斜问题.

除了分, 还能合, 对于某些比较小的分区可以合起来减少分区数. 和黄帝内经的人与自然,五行平衡学说类似, 实则泻之,虚则补之.

参考


《如何修改PostgreSQL分区表分区范围- detach attach - 拆分、合并、非平衡分区表、深度不一致分区表》

《PostgreSQL hash分区表扩容、缩容(增加分区、减少分区、分区重分布、拆分区、合并分区), hash算法hash_any, 混合hash MODULUS 分区- attach , detach》

https://www.postgresql.org/docs/14/sql-createtable.html

 

上一篇:hbase_异常_04_util.FSUtils: Waiting for dfs to exit safe mode...


下一篇:记一次故障处理----主机异常关闭后mongodb二进制文件损坏