PostGIS 空间索引-Building Indexes 创建空间索引

Building Indexes 创建索引

Indexes are what make using a spatial database for large data sets possible. Without indexing, any search for a feature would require a “sequential scan” of every record in the database. Indexing speeds up searching by organizing the data into a search tree which can be quickly traversed to find a particular record. PostgreSQL supports three kinds of indexes by default: B-Tree indexes, SP-GiST and GiST indexes.

索引使得使用空间数据库来处理大型数据集成为可能。 在没有索引的情况下,对某个特性的任何搜索都需要对数据库中的每条记录进行“顺序扫描”。 索引通过将数据组织到搜索树中来加快搜索速度,搜索树可以快速遍历以找到特定的记录。 PostgreSQL默认支持三种索引:B-Tree索引、SP-GiST索引和GiST索引。

  • B-Trees are used for data which can be sorted along one axis; for example, numbers, letters, dates. Spatial data can be sorted along a space-filling curve, Z-order curve or Hilbert curve. This representation however does not allow speeding up common operations.

b树用于可沿一个轴排序的数据; 例如,数字,字母,日期。 空间数据可以沿着空间填充曲线、z顺序曲线或希尔伯特曲线进行排序。 然而,这种表示方式不允许加速常见操作。

  • GiST (Generalized Search Trees) indexes break up data into “things to one side”, “things which overlap”, “things which are inside” and can be used on a wide range of data-types, including GIS data. PostGIS uses an R-Tree index implemented on top of GiST to index GIS data.

GiST(广义搜索树)索引将数据分解为“一侧的数据”、“重叠的数据”、“内部的数据”,可以用于广泛的数据类型,包括GIS数据。 PostGIS使用在GiST上实现的R-Tree索引来索引GIS数据。

1.1 GiST Indexes GIST索引

GiST stands for “Generalized Search Tree” and is a generic form of indexing. In addition to GIS indexing, GiST is used to speed up searches on all kinds of irregular data structures (integer arrays, spectral data, etc) which are not amenable to normal B-Tree indexing.

GiST代表“广义搜索树”,是一种通用的索引形式。 除了GIS索引,GiST还用于加快对各种不规则数据结构(整数数组、光谱数据等)的搜索,这些数据结构不适合常规的B-Tree索引。

Once a GIS data table exceeds a few thousand rows, you will want to build an index to speed up spatial searches of the data (unless all your searches are based on attributes, in which case you’ll want to build a normal index on the attribute fields).

一旦一个GIS数据表超过了几千行,您将希望构建一个索引来加速数据的空间搜索(除非您的所有搜索都基于属性,在这种情况下,您将希望在属性字段上构建一个普通索引)。

The syntax for building a GiST index on a “geometry” column is as follows:

在"geometry"列上建立GiST索引的语法如下:

CREATE INDEX [indexname] ON [tablename] USING GIST ( [geometryfield] );

The above syntax will always build a 2D-index. To get the an n-dimensional index for the geometry type, you can create one using this syntax:

上面的语法将始终构建一个2d索引。 要获取几何类型的n维索引,可以使用以下语法创建一个索引:

CREATE INDEX [indexname] ON [tablename] USING GIST ([geometryfield] gist_geometry_ops_nd);

Building a spatial index is a computationally intensive exercise. It also blocks write access to your table for the time it creates,so on a production system you may want to do in in a slower CONCURRENTLY-aware way:

构建空间索引是一个计算密集型的练习。 它还会在创建表的时候阻塞对表的写访问,所以在生产系统上,你可能想要以一种更慢的并发感知方式来做:

CREATE INDEX CONCURRENTLY [indexname] ON [tablename] USING GIST ( [geometryfield] );

After building an index, it is sometimes helpful to force PostgreSQL to collect table statistics, which are used to optimize query plans:

在建立索引之后,强制PostgreSQL收集表统计信息有时是有帮助的,这些数据被用来优化查询计划:

VACUUM ANALYZE [table_name] [(column_name)];

1.2 BRIN Indexes BRIN 索引

BRIN stands for “Block Range Index” and is a generic form of indexing that has been introduced in PostgreSQL 9.5. BRIN is a lossy kind of index, and its main usage is to provide a compromise for both read and write performance. Its primary goal is to handle very large tables for which some of the columns have some natural correlation with their physical location within the table. In addition to GIS indexing, BRIN is used to speed up searches on various kinds of regular or irregular data structures (integer, arrays etc).

BRIN代表“块范围索引”,是一种通用的索引形式,在PostgreSQL 9.5中引入。 BRIN是一种有损索引,它的主要用途是提供读写性能的折衷。 它的主要目标是处理非常大的表,其中一些列与它们在表中的物理位置有某种自然的相关性。 除了GIS索引,BRIN还被用来加快对各种规则或不规则数据结构(整数、数组等)的搜索速度。

Once a GIS data table exceeds a few thousand rows, you will want to build an index to speed up spatial searches of the data (unless all your searches are based on attributes, in which case you’ll want to build a normal index on the attribute fields). GiST indexes are really performant as long as their size doesn’t exceed the amount of RAM available for the database, and as long as you can afford the storage size, and the penalty in write workload. Otherwise, BRIN index can be considered as an alternative.

一旦一个GIS数据表超过了几千行,您将希望构建一个索引来加速数据的空间搜索(除非您的所有搜索都基于属性,在这种情况下,您将希望在属性字段上构建一个普通索引)。 只要它们的大小不超过数据库可用的RAM,只要您能够承受存储大小和写工作负载的代价,GiST索引的性能就非常好。 否则,可以考虑使用BRIN索引作为替代。

The idea of a BRIN index is to store only the bouding box englobing all the geometries contained in all the rows in a set of table blocks, called a range. Obviously, this indexing method will only be efficient if the data is physically ordered in a way where the resulting bouding boxes for block ranges will be mutually exclusive. The resulting index will be really small, but will be less efficient than a GiST index in many cases.

BRIN索引的想法是只存储一个边界框,这个边界框包含了一组表块中所有行的所有几何图形,称为范围。 显然,只有当数据的物理排序方式是块范围的结果绑定框互斥时,此索引方法才有效。 生成的索引将非常小,但在很多情况下会比GiST索引效率低。

Building a BRIN index is way less intensive than building a GiST index. It’s quite common to build a BRIN index in more than ten time less than a GiST index would have required. As a BRIN index only store one bouding box for one to many table blocks,it’s pretty common to consume up to a thousand time less disk space for this kind of indexes.

建立一个BRIN索引比建立一个GiST索引要少得多。 通常情况下,构建BRIN索引所需的时间比构建GiST索引所需的时间少十倍以上。 由于BRIN索引只存储一个到多个表块的绑定框,因此这种索引所占用的磁盘空间通常要少1000倍。

You can choose the number of blocks to summarize in a range. If you decrease this number, the index will be bigger but will probably help to get better performance.

您可以选择要在范围内进行汇总的块的数量。 如果减少这个数字,索引会变大,但可能有助于获得更好的性能。

The syntax for building a BRIN index on a “geometry” column is as follows:

在“geometry”列上建立BRIN索引的语法如下:

CREATE INDEX [indexname] ON [tablename] USING BRIN ( [geometryfield] );

The above syntax will always build a 2D-index. To get a 3D-dimensional index, you can create one using this syntax

上面的语法将始终构建一个2d索引。 要获得三维索引,可以使用此语法创建一个索引

CREATE INDEX [indexname] ON [tablename] USING BRIN ([geometryfield] 

brin_geometry_inclusion_ops_3d);

You can also get a 4D-dimensional index using the 4D operator class

您还可以使用4D操作符类获得4D维度索引

CREATE INDEX [indexname] ON [tablename] USING BRIN ([geometryfield] 
brin_geometry_inclusion_ops_4d);

These above syntaxes will use the default number or block in a range, which is 128. To specify the number of blocks you want to summarise in a range, you can create one using this syntax

以上语法将在一个范围内使用默认的数字或块,即128。 要指定要在一个范围内汇总的块的数量,可以使用此语法创建一个

CREATE INDEX [indexname] ON [tablename] USING BRIN ( [geometryfield] ) WITH (pages_per_range = [number]);

Also, keep in mind that a BRIN index will only store one index value for a large number of rows. If your table stores geometries with a mixed number of dimensions, it’s likely that the resulting index will have poor performance. You can avoid this drop of performance by choosing the operator class whith the least number of dimensions of the stored geometries

Also the “geography” datatype is supported for BRIN indexing. The syntax for building a BRIN index on a “geography” column is as follows:

另外,请记住,对于大量的行,BRIN索引只存储一个索引值。 如果表存储的几何图形的维数是混合的,那么得到的索引很可能性能很差。 您可以通过选择存储几何图形维数最少的操作符类来避免这种性能下降

BRIN索引也支持“geography”数据类型。 在“geography”列上建立BRIN索引的语法如下:

CREATE INDEX [indexname] ON [tablename] USING BRIN ( [geographyfield] );

The above syntax will always build a 2D-index for geospatial objects on the spheroid.

上面的语法总是为球体上的地理空间对象建立一个2d索引。

Currently, just the “inclusion support” is considered here, meaning that just &&, ~ and @ operators can be used for the 2D cases (both for “geometry” and for “geography”), and just the &&& operator can be used for the 3D geometries. There is no support for kNN searches at the moment.

目前,这里只考虑“inclusion support”,这意味着只有&&、~和@操作符可以用于2D情况(对于“geometry”和“geography”),而只有&&&操作符可以用于3D几何。 目前不支持kNN搜索。

1.3 SP-GiST Indexes SP-GiST 索引

SP-GiST stands for “Space-Partitioned Generalized Search Tree” and is a generic form of indexing that supports partitioned search trees, such as quad-trees, k-d trees, and radix trees (tries). The common feature of these data structures is that they repeatedly divide the search space into partitions that need not be of equal size. In addition to GIS indexing, SP-GiST is used to speed up searches on many kinds of data, such as phone routing, ip routing, substring search, etc.

SP-GiST代表“空间分块的广义搜索树”,是一种支持分块搜索树的通用索引形式,如四叉树、k-d树和基数树(尝试)。 这些数据结构的共同特征是,它们重复地将搜索空间划分为大小不等的分区。 除了GIS索引外,SP-GiST还用于加快对多种数据的搜索,如电话路由、ip路由、子字符串搜索等。

As it is the case for GiST indexes, SP-GiST indexes are lossy, in the sense that they store the bounding box englobing the spatial objects. SP-GiST indexes can be considered as an alternative to GiST indexes. The performance tests reveal that SP-GiST indexes are especially beneficial when there are many overlapping objects, that is, with so-called “spaghetti data”.

就像GiST索引的情况一样,SP-GiST索引也是有损的,因为它们存储了包含空间对象的边界框。 SP-GiST索引可以作为GiST索引的替代。 性能测试显示,SP-GiST索引在存在许多重叠对象(即所谓的“意大利面条数据”)时特别有用。

Once a GIS data table exceeds a few thousand rows, an SP-GiST index may be used to speed up spatial searches of the data. The syntax for building an SP-GiST index on a “geometry” column is as follows:

一旦一个GIS数据表超过几千行,可以使用SP-GiST索引来加速数据的空间搜索。 在"geometry"列上建立SP-GiST索引的语法如下:

CREATE INDEX [indexname] ON [tablename] USING SPGIST ( [geometryfield] );

The above syntax will build a 2-dimensional index. A 3-dimensional index for the geometry type can be created using the 3D operator class:

上面的语法将构建一个二维索引。 可以使用3D操作符类创建几何类型的三维索引:

CREATE INDEX [indexname] ON [tablename] USING SPGIST ([geometryfield] spgist_geometry_ops_3d);

Building a spatial index is a computationally intensive operation. It also blocks write access to your table for the time it creates,so on a production system you may want to do in in a slower CONCURRENTLY-aware way:

建立空间索引是一个计算密集型的操作。 它还会在创建表的时候阻塞对表的写访问,所以在生产系统上,你可能想要以一种更慢的并发感知方式来做:

CREATE INDEX CONCURRENTLY [indexname] ON [tablename] USING SPGIST ( [geometryfield] ); 

After building an index, it is sometimes helpful to force PostgreSQL to collect table statistics, which are used to optimize query plans:

在建立索引之后,强制PostgreSQL收集表统计信息有时是有帮助的,这些数据被用来优化查询计划:

VACUUM ANALYZE [table_name] [(column_name)];

An SP-GiST index can accelerate queries involving the following operators:

SP-GiST索引可以加速涉及以下操作符的查询:

  • <<, &<, &>, >>, <<|, &<|, |&>, |>>, &&, @>, <@, and ~=, for 2-dimensional indexes
  • <<, &<, &>, >>, <<|, &<|, |&>, |>>, &&, @>, <@, and ~=, 这是二维索引
  • &/&, ~==, @>>, and <<@, for 3-dimensional indexes.
  • &/&, ~==, @>>, and <<@, 这是三维索引

There is no support for kNN searches at the moment.

目前不支持kNN搜索。

1.4 Using Indexes 使用索引

Ordinarily, indexes invisibly speed up data access: once the index is built, the query planner transparently decides when to use index information to speed up a query plan. Unfortunately, the PostgreSQL query planner sometimes does not optimize the use of GiST indexes well, so sometimes searches which should use a spatial index instead may perform a sequential scan of the whole table.

通常,索引无形地加速了数据访问:一旦建立了索引,查询规划器就会透明地决定何时使用索引信息来加速查询计划。 不幸的是,PostgreSQL查询规划器有时不能很好地优化GiST索引的使用,所以有时应该使用空间索引的搜索可能会执行整个表的顺序扫描。

If you find your spatial indexes are not being used (or your attribute indexes, for that matter) there are a couple things you can do:

如果你发现你的空间索引没有被使用(或者你的属性索引没有被使用),你可以做一些事情:

  • Firstly, read query plan and check your query actually tries to compute the thing you need. A runaway JOIN condition, either forgotten or to the wrong table, can unexpectedly bring you all of your table multiple times. To get query plan, add EXPLAIN keyword in front of your query.
  • Second, make sure statistics are gathered about the number and distributions of values in a table, to provide the query planner with better information to make decisions around index usage. VACUUM ANALYZE will compute both.You should regularly vacuum your databases anyways - many PostgreSQL DBAs have VACUUM run as an off-peak cron job on a regular basis.
  • If vacuuming does not help, you can temporarily force the planner to use the index information by using the set enable_seqscan to off; command. This way you can check whether planner is at all capable to generate an index accelerated query plan for your query. You should only use this command only for debug: generally speaking, the planner knows better than you do about when to use indexes. Once you have run your query, do not forget to set ENABLE_SEQSCAN back on, so that other queries will utilize the planner as normal.
  • If set enable_seqscan to off; helps your query to run, your Postgres is likely not tuned for your hardware. If you find the planner wrong about the cost of sequential vs index scans try reducing the value of random_page_cost in postgresql.conf or using set random_page_cost to 1.1;. Default value for the parameter is 4, try setting it to 1 (on SSD) or 2 (on fast magnetic disks). Decreasing the value makes the planner more inclined of using Index scans.
  • If set enable_seqscan to off; does not help your query, it may happen you use a construction Postgres is not yet able to untangle. A subquery with inline select is one example - you need to rewrite it to the form planner can optimize, say, a LATERAL JOIN.

  • 首先,阅读查询计划,并检查您的查询实际尝试计算您需要的东西。 失控的JOIN条件(忘记或指向错误的表)可能会意外地多次返回所有表。 要获取查询计划,请在查询前添加EXPLAIN关键字。
  • 其次,确保收集了关于表中值的数量和分布的统计信息,以便为查询规划器提供更好的信息,以便围绕索引使用做出决策。 VACUUM ANALYZE将计算两者。 无论如何,您都应该定期对数据库进行真空处理——许多PostgreSQL dba都定期将vacuum作为非高峰cron作业运行。
  • 如果清理不起作用,可以使用set enable_seqscan关闭,临时强制规划器使用索引信息; 命令。 通过这种方式,您可以检查planner是否能够为您的查询生成索引加速查询计划。 您应该仅在调试时使用此命令:一般来说,规划器比您更清楚何时使用索引。 一旦您运行了查询,不要忘记重新设置ENABLE_SEQSCAN,以便其他查询将像往常一样使用规划器。
  • 如果将enable_seqscan设置为off; 帮助您的查询运行,您的Postgres可能没有针对您的硬件进行调优。 如果你发现规划器错误的顺序与索引扫描的成本,尝试减少在postgresql.conf中的random_page_cost值或使用设置random_page_cost为1.1; 该参数默认值为4,请尝试将其设置为1 (SSD硬盘)或2(快速硬盘)。 降低该值会使规划器更倾向于使用Index扫描。
  • 如果将enable_seqscan设置为off; 对您的查询没有帮助,可能发生您使用的构造Postgres还不能解缠。 带有内联选择的子查询就是一个例子——您需要将其重写为表单规划器可以优化的形式,比如横向JOIN。
上一篇:开发者:聚合型Web3.0应用将对时代产生颠覆作用


下一篇:Ubuntu安装PostgreSQL