PostgreSQL 多维空间几何对象 相交、包含 高效率检索实践 - cube


多维空间对象的几何运算,高效率检索实践。

例如我们在数据库中存储了多维几何对象,可以使用lower, upper的数组来表达,例如3维度对象:

CUBE
[
xmin1
ymin1
zmin1
,
xmax1
ymax1
zmax1
]
在介绍CUBE类型前,我们可以使用6个字段(xmin,xmax,ymin,ymax,zmin,zmax)来表达一个立方体。

包含和相交查询
在介绍CUBE类型前,我们如果使用6个字段来表达立方体,那么相交,包含分别如何标示呢?

包含:
(xmin1 <= xmin2 and xmax1 >= xmax2)
and
(ymin1 <= ymin2 and ymax1 >= ymax2)
and
(zmin1 <= zmin2 and zmax1 >= zmax2)
相交:
每个坐标都相交,注意任意坐标相交的方位有


-----

-----


---


---



---

每条边都有相交即CUBE相交,表达如下

((xmin1 >= xmin2 and xmin1 <= xmax2) or (xmax1 >= xmin2 and xmax1 <= xmax2) or (xmin1 <= xmin2 and xmax1 >= xmax2))
and
((ymin1 >= ymin2 and ymin1 <= ymax2) or (ymax1 >= ymin2 and ymax1 <= ymax2) or (ymin1 <= ymin2 and ymax1 >= ymax2))
and
((zmin1 >= zmin2 and zmin1 <= zmax2) or (zmax1 >= zmin2 and zmax1 <= zmax2) or (zmin1 <= zmin2 and zmax1 >= zmax2))
使用6个字段的空间计算性能
1、创建测试表

create table test1 (
id int primary key,
x_min int,
y_min int,
z_min int,
x_max int,
y_max int,
z_max int
);
2、写入100万记录

insert into test1 select id, x, y, z, x+1+(random()100)::int, y+1+(random()100)::int, z+1+(random()*100)::int
from (select id, (random()1000)::int x, (random()1000)::int y, (random()*1000)::int z from generate_series(1,1000000) t(id)) t ;
记录如下

postgres=# select * from test1 limit 10;

id x_min y_min z_min x_max y_max z_max
1 37 367 948 93 372 989
2 994 543 596 1031 613 617
3 399 616 897 444 624 959
4 911 624 67 1007 705 84
5 286 560 882 334 632 936
6 370 748 897 403 779 992
7 723 292 484 756 358 503
8 514 48 792 556 98 879
9 17 400 485 26 435 514
10 240 631 841 253 642 897

(10 rows)
3、包含查询

select * from test1 where
(x_min <= 37 and x_max >= 93)
and
(y_min <= 367 and y_max >= 372)
and
(z_min <= 948 and z_max >= 989);

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from test1 where
(x_min <= 37 and x_max >= 93)
and
(y_min <= 367 and y_max >= 372)
and
(z_min <= 948 and z_max >= 989);

                                                                     QUERY PLAN                                                                            

Seq Scan on public.test1 (cost=0.00..13220.05 rows=539 width=28) (actual time=0.024..79.397 rows=15 loops=1)
Output: id, x_min, y_min, z_min, x_max, y_max, z_max
Filter: ((test1.x_min <= 37) AND (test1.x_max >= 93) AND (test1.y_min <= 367) AND (test1.y_max >= 372) AND (test1.z_min <= 948) AND (test1.z_max >= 989))
Rows Removed by Filter: 999985
Buffers: shared hit=1835
Planning Time: 0.103 ms
Execution Time: 79.421 ms
(7 rows)

Time: 79.947 ms

id x_min y_min z_min x_max y_max z_max
1 37 367 948 93 372 989
104882 17 327 924 111 389 1012
178185 31 315 897 104 380 990
228661 9 363 934 101 394 1001
275030 21 334 912 102 379 1012
405290 10 356 911 102 435 996
586417 35 362 930 128 454 1016
594367 23 312 943 112 395 1017
622753 11 365 916 93 427 995
645719 32 309 918 94 377 1015
757900 34 339 905 98 430 998
784203 36 344 945 95 390 1035
824046 23 367 946 115 423 1021
878257 37 339 948 123 398 1033
914020 26 358 918 109 379 1019

(15 rows)

Time: 80.269 ms
4、相交查询

select * from test1 where
((x_min >= 37 and x_min <= 93) or (x_max >= 37 and x_max <= 93) or (x_min <= 37 and x_max >= 93))
and
((y_min >= 367 and y_min <= 372) or (y_max >= 367 and y_max <= 372) or (y_min <= 367 and y_max >= 372))
and
((z_min >= 948 and z_min <= 989) or (z_max >= 948 and z_max <= 989) or (z_min <= 948 and z_max >= 989))
;

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from test1 where
((x_min >= 37 and x_min <= 93) or (x_max >= 37 and x_max <= 93) or (x_min <= 37 and x_max >= 93))
and
((y_min >= 367 and y_min <= 372) or (y_max >= 367 and y_max <= 372) or (y_min <= 367 and y_max >= 372))
and
((z_min >= 948 and z_min <= 989) or (z_max >= 948 and z_max <= 989) or (z_min <= 948 and z_max >= 989))
;

                   QUERY PLAN                                                                                                                          

Seq Scan on public.test1 (cost=0.00..39229.87 rows=4364 width=28) (actual time=0.026..119.539 rows=483 loops=1)
Output: id, x_min, y_min, z_min, x_max, y_max, z_max
Filter: ((((test1.x_min >= 37) AND (test1.x_min <= 93)) OR ((test1.x_max >= 37) AND (test1.x_max <= 93)) OR ((test1.x_min <= 37) AND (test1.x_max >= 93))) AND (((test1.y_min >= 367) AND (test1.y_min <= 372)) OR ((test1.y_max >= 367) AND (test1.y_max <= 372)) OR ((test1.y_min <= 367) AND (test1.y_max >= 372))) AND (((test1.z_min >= 948) AND (test1.z_min <= 989)) OR ((test1.z_max >= 948) AND (test1.z_max <= 989)) OR ((test1.z_min <= 948) AND (test1.z_max >= 989))))
Rows Removed by Filter: 999517
Buffers: shared hit=1835
Planning Time: 0.135 ms
Execution Time: 119.621 ms
(7 rows)

Time: 120.283 ms
cube 类型
cube的多维体表达方法如下

It does not matter which order the opposite corners of a cube are entered in.

The cube functions automatically swap values if needed to create a uniform “lower left — upper right” internal representation.

When the corners coincide, cube stores only one corner along with an “is point” flag to avoid wasting space.

1、创建 cube 插件

create extension cube;
2、创建测试表

create table test2 (
id int primary key,
cb cube
);
3、将数据导入test2 cube表

insert into test2 select id, cube(array[x_min,y_min,z_min], array[x_max,y_max,z_max]) from test1;
4、给CUBE类型创建gist索引

create index idx_test2_cb on test2 using gist(cb);
5、包含查询性能

explain (analyze,verbose,timing,costs,buffers) select * from test2 where cb @> cube '[(37,367,948), (93,372,989)]';

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from test2 where cb @> cube '[(37,367,948), (93,372,989)]';

                                                       QUERY PLAN                                                              

Index Scan using idx_test2_cb on public.test2 (cost=0.25..20.65 rows=1000 width=60) (actual time=0.154..0.247 rows=15 loops=1)
Output: id, cb
Index Cond: (test2.cb @> '(37, 367, 948),(93, 372, 989)'::cube)
Buffers: shared hit=26
Planning Time: 0.196 ms
Execution Time: 0.269 ms
(6 rows)

postgres=# timing
Timing is on.
postgres=# select * from test2 where cb @> cube '[(37,367,948), (93,372,989)]';

id cb
1 (37, 367, 948),(93, 372, 989)
228661 (9, 363, 934),(101, 394, 1001)
586417 (35, 362, 930),(128, 454, 1016)
824046 (23, 367, 946),(115, 423, 1021)
914020 (26, 358, 918),(109, 379, 1019)
104882 (17, 327, 924),(111, 389, 1012)
594367 (23, 312, 943),(112, 395, 1017)
645719 (32, 309, 918),(94, 377, 1015)
784203 (36, 344, 945),(95, 390, 1035)
275030 (21, 334, 912),(102, 379, 1012)
757900 (34, 339, 905),(98, 430, 998)
878257 (37, 339, 948),(123, 398, 1033)
405290 (10, 356, 911),(102, 435, 996)
622753 (11, 365, 916),(93, 427, 995)
178185 (31, 315, 897),(104, 380, 990)

(15 rows)

Time: 0.685 ms
6、相交查询性能

select * from test2 where cb && cube '[(37,367,948), (93,372,989)]';

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from test2 where cb && cube '[(37,367,948), (93,372,989)]';

                                                        QUERY PLAN                                                              

Index Scan using idx_test2_cb on public.test2 (cost=0.25..76.66 rows=5000 width=60) (actual time=0.086..0.943 rows=483 loops=1)
Output: id, cb
Index Cond: (test2.cb && '(37, 367, 948),(93, 372, 989)'::cube)
Buffers: shared hit=505
Planning Time: 0.085 ms
Execution Time: 1.011 ms
(6 rows)

Time: 1.506 ms
7、除此以外,CUBE还支持很多的几何计算操作符,也可以做包含点的查询。

https://www.postgresql.org/docs/devel/static/cube.html

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from test2 where cb @> cube '(37,367,948)';

                                                        QUERY PLAN                                                            

Index Scan using idx_test2_cb on public.test2 (cost=0.25..20.65 rows=1000 width=60) (actual time=0.153..0.420 rows=107 loops=1)
Output: id, cb
Index Cond: (test2.cb @> '(37, 367, 948)'::cube)
Buffers: shared hit=121
Planning Time: 0.077 ms
Execution Time: 0.448 ms
(6 rows)

Time: 0.893 ms
优化
如果SQL请求返回的记录数非常多,建议流式返回,同时建议根据BLOCK设备的随机IO能力设置正确的random_page_cost参数。

《PostgreSQL 10 参数模板 - 珍藏级》

流式返回例子

postgres=# begin;
BEGIN
postgres=# declare cur1 cursor for select * from test2 where cb && cube '[(37,367,948), (93,372,989)]';
DECLARE CURSOR
postgres=# timing
Timing is on.
postgres=# fetch 10 from cur1;

id cb
41724 (65, 363, 939),(87, 425, 980)
115087 (72, 362, 977),(97, 454, 1005)
235266 (74, 362, 958),(133, 457, 994)
489571 (51, 362, 970),(101, 393, 989)
655616 (77, 359, 932),(79, 455, 1026)
786710 (73, 358, 942),(160, 374, 960)
1 (37, 367, 948),(93, 372, 989)
6441 (48, 368, 949),(88, 426, 964)
59620 (29, 364, 939),(60, 452, 997)
153554 (22, 367, 959),(75, 374, 997)

(10 rows)

Time: 0.297 ms
postgres=# end;
COMMIT
Time: 0.138 ms
如果是SSD盘,建议random_page_cost设置为1.1-1.3

alter system set random_page_cost=1.3;
select pg_reload_conf();
小结
使用cube插件,我们在对多维几何空间对象进行查询时,可以使用GIST索引,性能非常棒。

在100万空间对象的情况下,性能提升了100倍。

PS, test1表(分字段表达)即使使用BTREE索引,效果也不好,因为多字段的范围检索,初级索引是要全扫描的,以前有一个智能DNS的例子类似,使用GIST提升了20多倍性能。

《PostgreSQL 黑科技 range 类型及 gist index 20x+ speedup than Mysql index combine query》

使用CUBE插件,我们还可以用来计算多维对象的向量相似性,按向量相似性排序。参考末尾连接。

参考
《PostgreSQL 相似人群圈选,人群扩选,向量相似 使用实践》

《PostgreSQL 黑科技 range 类型及 gist index 20x+ speedup than Mysql index combine query》

《通过空间思想理解GiST索引的构造》

https://www.postgresql.org/docs/devel/static/cube.html
转自阿里云德哥

上一篇:PostgreSQL merge insert(upsert/insert into on conflict) 如何区分数据是INSERT还是UPDATE


下一篇:PostgreSQL 分区表、继承表 记录去重方法