景
多维空间对象的几何运算,高效率检索实践。
例如我们在数据库中存储了多维几何对象,可以使用lower, upper的数组来表达,例如3维度对象:
CUBE
[
xmin1
ymin1
zmin1
,
xmax1
ymax1
zmax1
]
在介绍CUBE类型前,我们可以使用6个字段(xmin,xmax,ymin,ymax,zmin,zmax)来表达一个立方体。
包含和相交查询
在介绍CUBE类型前,我们如果使用6个字段来表达立方体,那么相交,包含分别如何标示呢?
包含:
(xmin1 <= xmin2 and xmax1 >= xmax2)
and
(ymin1 <= ymin2 and ymax1 >= ymax2)
and
(zmin1 <= zmin2 and zmax1 >= zmax2)
相交:
每个坐标都相交,注意任意坐标相交的方位有
-----
或
-----
或
---
或
---
或
或
---
每条边都有相交即CUBE相交,表达如下
((xmin1 >= xmin2 and xmin1 <= xmax2) or (xmax1 >= xmin2 and xmax1 <= xmax2) or (xmin1 <= xmin2 and xmax1 >= xmax2))
and
((ymin1 >= ymin2 and ymin1 <= ymax2) or (ymax1 >= ymin2 and ymax1 <= ymax2) or (ymin1 <= ymin2 and ymax1 >= ymax2))
and
((zmin1 >= zmin2 and zmin1 <= zmax2) or (zmax1 >= zmin2 and zmax1 <= zmax2) or (zmin1 <= zmin2 and zmax1 >= zmax2))
使用6个字段的空间计算性能
1、创建测试表
create table test1 (
id int primary key,
x_min int,
y_min int,
z_min int,
x_max int,
y_max int,
z_max int
);
2、写入100万记录
insert into test1 select id, x, y, z, x+1+(random()100)::int, y+1+(random()100)::int, z+1+(random()*100)::int
from (select id, (random()1000)::int x, (random()1000)::int y, (random()*1000)::int z from generate_series(1,1000000) t(id)) t ;
记录如下
postgres=# select * from test1 limit 10;
id | x_min | y_min | z_min | x_max | y_max | z_max |
---|---|---|---|---|---|---|
1 | 37 | 367 | 948 | 93 | 372 | 989 |
2 | 994 | 543 | 596 | 1031 | 613 | 617 |
3 | 399 | 616 | 897 | 444 | 624 | 959 |
4 | 911 | 624 | 67 | 1007 | 705 | 84 |
5 | 286 | 560 | 882 | 334 | 632 | 936 |
6 | 370 | 748 | 897 | 403 | 779 | 992 |
7 | 723 | 292 | 484 | 756 | 358 | 503 |
8 | 514 | 48 | 792 | 556 | 98 | 879 |
9 | 17 | 400 | 485 | 26 | 435 | 514 |
10 | 240 | 631 | 841 | 253 | 642 | 897 |
(10 rows)
3、包含查询
select * from test1 where
(x_min <= 37 and x_max >= 93)
and
(y_min <= 367 and y_max >= 372)
and
(z_min <= 948 and z_max >= 989);
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from test1 where
(x_min <= 37 and x_max >= 93)
and
(y_min <= 367 and y_max >= 372)
and
(z_min <= 948 and z_max >= 989);
QUERY PLAN
Seq Scan on public.test1 (cost=0.00..13220.05 rows=539 width=28) (actual time=0.024..79.397 rows=15 loops=1)
Output: id, x_min, y_min, z_min, x_max, y_max, z_max
Filter: ((test1.x_min <= 37) AND (test1.x_max >= 93) AND (test1.y_min <= 367) AND (test1.y_max >= 372) AND (test1.z_min <= 948) AND (test1.z_max >= 989))
Rows Removed by Filter: 999985
Buffers: shared hit=1835
Planning Time: 0.103 ms
Execution Time: 79.421 ms
(7 rows)
Time: 79.947 ms
id | x_min | y_min | z_min | x_max | y_max | z_max |
---|---|---|---|---|---|---|
1 | 37 | 367 | 948 | 93 | 372 | 989 |
104882 | 17 | 327 | 924 | 111 | 389 | 1012 |
178185 | 31 | 315 | 897 | 104 | 380 | 990 |
228661 | 9 | 363 | 934 | 101 | 394 | 1001 |
275030 | 21 | 334 | 912 | 102 | 379 | 1012 |
405290 | 10 | 356 | 911 | 102 | 435 | 996 |
586417 | 35 | 362 | 930 | 128 | 454 | 1016 |
594367 | 23 | 312 | 943 | 112 | 395 | 1017 |
622753 | 11 | 365 | 916 | 93 | 427 | 995 |
645719 | 32 | 309 | 918 | 94 | 377 | 1015 |
757900 | 34 | 339 | 905 | 98 | 430 | 998 |
784203 | 36 | 344 | 945 | 95 | 390 | 1035 |
824046 | 23 | 367 | 946 | 115 | 423 | 1021 |
878257 | 37 | 339 | 948 | 123 | 398 | 1033 |
914020 | 26 | 358 | 918 | 109 | 379 | 1019 |
(15 rows)
Time: 80.269 ms
4、相交查询
select * from test1 where
((x_min >= 37 and x_min <= 93) or (x_max >= 37 and x_max <= 93) or (x_min <= 37 and x_max >= 93))
and
((y_min >= 367 and y_min <= 372) or (y_max >= 367 and y_max <= 372) or (y_min <= 367 and y_max >= 372))
and
((z_min >= 948 and z_min <= 989) or (z_max >= 948 and z_max <= 989) or (z_min <= 948 and z_max >= 989))
;
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from test1 where
((x_min >= 37 and x_min <= 93) or (x_max >= 37 and x_max <= 93) or (x_min <= 37 and x_max >= 93))
and
((y_min >= 367 and y_min <= 372) or (y_max >= 367 and y_max <= 372) or (y_min <= 367 and y_max >= 372))
and
((z_min >= 948 and z_min <= 989) or (z_max >= 948 and z_max <= 989) or (z_min <= 948 and z_max >= 989))
;
QUERY PLAN
Seq Scan on public.test1 (cost=0.00..39229.87 rows=4364 width=28) (actual time=0.026..119.539 rows=483 loops=1)
Output: id, x_min, y_min, z_min, x_max, y_max, z_max
Filter: ((((test1.x_min >= 37) AND (test1.x_min <= 93)) OR ((test1.x_max >= 37) AND (test1.x_max <= 93)) OR ((test1.x_min <= 37) AND (test1.x_max >= 93))) AND (((test1.y_min >= 367) AND (test1.y_min <= 372)) OR ((test1.y_max >= 367) AND (test1.y_max <= 372)) OR ((test1.y_min <= 367) AND (test1.y_max >= 372))) AND (((test1.z_min >= 948) AND (test1.z_min <= 989)) OR ((test1.z_max >= 948) AND (test1.z_max <= 989)) OR ((test1.z_min <= 948) AND (test1.z_max >= 989))))
Rows Removed by Filter: 999517
Buffers: shared hit=1835
Planning Time: 0.135 ms
Execution Time: 119.621 ms
(7 rows)
Time: 120.283 ms
cube 类型
cube的多维体表达方法如下
It does not matter which order the opposite corners of a cube are entered in.
The cube functions automatically swap values if needed to create a uniform “lower left — upper right” internal representation.
When the corners coincide, cube stores only one corner along with an “is point” flag to avoid wasting space.
1、创建 cube 插件
create extension cube;
2、创建测试表
create table test2 (
id int primary key,
cb cube
);
3、将数据导入test2 cube表
insert into test2 select id, cube(array[x_min,y_min,z_min], array[x_max,y_max,z_max]) from test1;
4、给CUBE类型创建gist索引
create index idx_test2_cb on test2 using gist(cb);
5、包含查询性能
explain (analyze,verbose,timing,costs,buffers) select * from test2 where cb @> cube '[(37,367,948), (93,372,989)]';
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from test2 where cb @> cube '[(37,367,948), (93,372,989)]';
QUERY PLAN
Index Scan using idx_test2_cb on public.test2 (cost=0.25..20.65 rows=1000 width=60) (actual time=0.154..0.247 rows=15 loops=1)
Output: id, cb
Index Cond: (test2.cb @> '(37, 367, 948),(93, 372, 989)'::cube)
Buffers: shared hit=26
Planning Time: 0.196 ms
Execution Time: 0.269 ms
(6 rows)
postgres=# timing
Timing is on.
postgres=# select * from test2 where cb @> cube '[(37,367,948), (93,372,989)]';
id | cb |
---|---|
1 | (37, 367, 948),(93, 372, 989) |
228661 | (9, 363, 934),(101, 394, 1001) |
586417 | (35, 362, 930),(128, 454, 1016) |
824046 | (23, 367, 946),(115, 423, 1021) |
914020 | (26, 358, 918),(109, 379, 1019) |
104882 | (17, 327, 924),(111, 389, 1012) |
594367 | (23, 312, 943),(112, 395, 1017) |
645719 | (32, 309, 918),(94, 377, 1015) |
784203 | (36, 344, 945),(95, 390, 1035) |
275030 | (21, 334, 912),(102, 379, 1012) |
757900 | (34, 339, 905),(98, 430, 998) |
878257 | (37, 339, 948),(123, 398, 1033) |
405290 | (10, 356, 911),(102, 435, 996) |
622753 | (11, 365, 916),(93, 427, 995) |
178185 | (31, 315, 897),(104, 380, 990) |
(15 rows)
Time: 0.685 ms
6、相交查询性能
select * from test2 where cb && cube '[(37,367,948), (93,372,989)]';
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from test2 where cb && cube '[(37,367,948), (93,372,989)]';
QUERY PLAN
Index Scan using idx_test2_cb on public.test2 (cost=0.25..76.66 rows=5000 width=60) (actual time=0.086..0.943 rows=483 loops=1)
Output: id, cb
Index Cond: (test2.cb && '(37, 367, 948),(93, 372, 989)'::cube)
Buffers: shared hit=505
Planning Time: 0.085 ms
Execution Time: 1.011 ms
(6 rows)
Time: 1.506 ms
7、除此以外,CUBE还支持很多的几何计算操作符,也可以做包含点的查询。
https://www.postgresql.org/docs/devel/static/cube.html
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from test2 where cb @> cube '(37,367,948)';
QUERY PLAN
Index Scan using idx_test2_cb on public.test2 (cost=0.25..20.65 rows=1000 width=60) (actual time=0.153..0.420 rows=107 loops=1)
Output: id, cb
Index Cond: (test2.cb @> '(37, 367, 948)'::cube)
Buffers: shared hit=121
Planning Time: 0.077 ms
Execution Time: 0.448 ms
(6 rows)
Time: 0.893 ms
优化
如果SQL请求返回的记录数非常多,建议流式返回,同时建议根据BLOCK设备的随机IO能力设置正确的random_page_cost参数。
《PostgreSQL 10 参数模板 - 珍藏级》
流式返回例子
postgres=# begin;
BEGIN
postgres=# declare cur1 cursor for select * from test2 where cb && cube '[(37,367,948), (93,372,989)]';
DECLARE CURSOR
postgres=# timing
Timing is on.
postgres=# fetch 10 from cur1;
id | cb |
---|---|
41724 | (65, 363, 939),(87, 425, 980) |
115087 | (72, 362, 977),(97, 454, 1005) |
235266 | (74, 362, 958),(133, 457, 994) |
489571 | (51, 362, 970),(101, 393, 989) |
655616 | (77, 359, 932),(79, 455, 1026) |
786710 | (73, 358, 942),(160, 374, 960) |
1 | (37, 367, 948),(93, 372, 989) |
6441 | (48, 368, 949),(88, 426, 964) |
59620 | (29, 364, 939),(60, 452, 997) |
153554 | (22, 367, 959),(75, 374, 997) |
(10 rows)
Time: 0.297 ms
postgres=# end;
COMMIT
Time: 0.138 ms
如果是SSD盘,建议random_page_cost设置为1.1-1.3
alter system set random_page_cost=1.3;
select pg_reload_conf();
小结
使用cube插件,我们在对多维几何空间对象进行查询时,可以使用GIST索引,性能非常棒。
在100万空间对象的情况下,性能提升了100倍。
PS, test1表(分字段表达)即使使用BTREE索引,效果也不好,因为多字段的范围检索,初级索引是要全扫描的,以前有一个智能DNS的例子类似,使用GIST提升了20多倍性能。
《PostgreSQL 黑科技 range 类型及 gist index 20x+ speedup than Mysql index combine query》
使用CUBE插件,我们还可以用来计算多维对象的向量相似性,按向量相似性排序。参考末尾连接。
参考
《PostgreSQL 相似人群圈选,人群扩选,向量相似 使用实践》
《PostgreSQL 黑科技 range 类型及 gist index 20x+ speedup than Mysql index combine query》
《通过空间思想理解GiST索引的构造》
https://www.postgresql.org/docs/devel/static/cube.html
转自阿里云德哥