宝剑赠英雄 - 任意字段\条件等效查询, 探探PostgreSQL多列展开式B树

标签

PostgreSQL , 多列索引 , btree , gin , gist , brin , btree_gist , btree_gin , 复合索引 , composite index , 任意字段等效查询


背景

很多人小时候都有一个武侠梦,独孤求败更是金庸武侠小说里的一位传奇人物。

纵横江湖三十馀载,杀尽仇寇奸人,败尽英雄豪杰,天下更无抗手,无可奈何,惟隐居深谷,以雕为友。 呜呼,生平求一敌手而不可得,诚寂寥难堪也。

独孤老前辈的佩剑描写非常有意思,从使用的佩剑,可以看出一个人的武功修为。

第一柄是一柄青光闪闪的无名利剑。「凌厉刚猛,无坚不摧,弱冠前以之与河朔群雄争锋。」

第二柄是紫薇软剑,「三十岁前所用,误伤义士不祥,乃弃之深谷。」

第三柄是玄铁重剑,「重剑无锋,大巧不工,四十岁之前恃之横行天下。」

第四柄是柄已腐朽的木剑,原因是独孤求败「四十岁后,不滞于物,草木竹石均可为剑。」

个人感觉和我们现在搞IT的也很相似哦,开发语言有C, python, java, ....各式各样的可选,数据库有Oracle, PostgreSQL, MySQL等等,也是各式各样的产品可选。

宝剑赠英雄,美玉配佳人。

不管你是李逍遥、还是杨过、小龙女,希望你也能找到合适你的武器。

宝剑赠英雄 - 任意字段\条件等效查询, 探探PostgreSQL多列展开式B树

宝剑赠英雄 - 任意字段\条件等效查询, 探探PostgreSQL多列展开式B树

进入正题。

多列的组合查询在实际的应用中也较为常见,比如淘宝购物网站的商品搜索

宝剑赠英雄 - 任意字段\条件等效查询, 探探PostgreSQL多列展开式B树

有发货地、分类、是否包邮、是否货到付款、是否天猫、二手、等等许多选项,这些选项在设计时可能使用多个字段来表示(当然,有些可能会使用BIT合并成单个字段来表述)。

举个非常简单的例子

CREATE TABLE test2 (    
  major int,    
  minor int,    
  name varchar    
);    

查询条件中,包含test2表的两个字段的检索

SELECT name FROM test2 WHERE major = constant AND minor = constant;    

这种情况下,我们可以使用两个索引,也可以使用一个复合索引(多列索引)。

CREATE INDEX test2_mm_idx ON test2 (major, minor);    

以上例子可以转化为对多个字段的任意组合查询需求(任意单一、任意两个、任意三个,任意若干个字段的查询需求)。

看过我写的文档的童鞋,可能会联想到我在之前写过一篇文档

《PostgreSQL 9.6 黑科技 bloom 算法索引,一个索引支撑任意列组合查询》

不过本文并不是要讲bloom,本文要讲一讲另一种技术(gin索引的暗藏功能,多列展开式B树)。

在开始正文之前,大家有没有想过这些问题呢?

1. 哪些索引方法支持多列索引?

PostgreSQL目前支持btree, hash, gin, gist, sp-gist, brin, bloom, rum等众多索引访问方法,哪些访问方法支持多列索引呢?

2. 多列索引的内部存储结构如何?

比如b-tree单列索引很好理解,就是以被索引列值为KEY的类B-Tree(nbtree)结构。但是当使用多列索引时,内部是如何组织的呢?

不同的索引方法,内部组织有什么差异呢?

3. 多列索引支持哪些查询组合

比如index on (a,b,c)三列,那么哪些查询条件能用上多列索引呢?比如where a=? and b>?

不同的索引方法,适用的查询条件是不是都一样呢?

4. 不同的查询组合,使用多列索引的效率如何,效率是否一样(是否与索引访问方法有关?)

比如b-tree index on (a,b,c)三列,where a=? and b>? 以及 where b>? and c=? 效率一样吗?

5. 多列索引,每个列的顺序是否可以指定

比如,是不是所有的索引方法都可创建这样的索引 index on (a,b desc, c desc nulls first)

6. 同样的查询组合,使用什么索引方法更高效

比如 where a=? and b=? and c=? 这样的查询,适用gin好还是b-tree好呢?

7. 如何选择合适的索引方法,与查询条件,数据分布有关系吗?

要回答这些问题,需要对索引方法,内部存储结构有一定的了解。

本文将以gin和btree为例,讲解一下multi column index,它们的内部存储结构,适应的场景。

一 单列索引的内部结构

建议读者先了解一下单列索引的内部结构,本文就不展开了,可以参考我之前写的一些文章。

《深入浅出PostgreSQL B-Tree索引结构》

《B-Tree和B+Tree》

《PostgreSQL GIN索引实现原理》

《从难缠的模糊查询聊开 - PostgreSQL独门绝招之一 GIN , GiST , SP-GiST , RUM 索引原理与技术背景》

二 PostgreSQL use bitmap combine single column indexs

在没有multi column index时,如果我们有多个列的查询条件,通常可以使用选择性好的列,或者多个列索引的组合。

PostgreSQL 使用多个列索引组合查询时,可以使用多列查询结果的ctid bitmap and or ,筛选出最终符合多列条件的ctid。

不仅适用于多列的查询条件,也适用于单列的多个查询条件。

例如

WHERE x = 42 OR x = 47 OR x = 53 OR x = 99    

WHERE x = 5 AND y = 6    

https://www.postgresql.org/docs/9.6/static/indexes-bitmap-scans.html

Combining Multiple Indexes

A single index scan can only use query clauses that use the index's columns with operators of its operator class and are joined with AND.     
For example, given an index on (a, b) a query condition like WHERE a = 5 AND b = 6 could use the index, but a query like WHERE a = 5 OR b = 6 could not directly use the index.    

Fortunately, PostgreSQL has the ability to combine multiple indexes (including multiple uses of the same index) to handle cases that cannot be implemented by single index scans.     
The system can form AND and OR conditions across several index scans.     
For example, a query like WHERE x = 42 OR x = 47 OR x = 53 OR x = 99 could be broken down into four separate scans of an index on x, each scan using one of the query clauses.     
The results of these scans are then ORed together to produce the result.     
Another example is that if we have separate indexes on x and y,     
one possible implementation of a query like WHERE x = 5 AND y = 6 is to use each index with the appropriate query clause and then AND together the index results to identify the result rows.    

支持单列组合条件,也支持多列组合条件。    

To combine multiple indexes, the system scans each needed index and prepares a bitmap in memory giving the locations of table rows that are reported as matching that index's conditions.     
The bitmaps are then ANDed and ORed together as needed by the query. Finally, the actual table rows are visited and returned.     

注意bitmap扫描是按CTID顺序输出,而非KEY的顺序输出,所以如果对列有ORDER BY的需求,那么会需要额外的sort,优化器会根据实际情况选择合适的执行计划(bitmap 或 单个索引filter但是不用sort)    

The table rows are visited in physical order, because that is how the bitmap is laid out;     
this means that any ordering of the original indexes is lost, and so a separate sort step will be needed if the query has an ORDER BY clause.     
For this reason, and because each additional index scan adds extra time, the planner will sometimes choose to use a simple index scan even though additional indexes are available that could have been used as well.    

In all but the simplest applications, there are various combinations of indexes that might be useful, and the database developer must make trade-offs to decide which indexes to provide.     
Sometimes multicolumn indexes are best, but sometimes it's better to create separate indexes and rely on the index-combination feature.     
For example, if your workload includes a mix of queries that sometimes involve only column x, sometimes only column y, and sometimes both columns,     
you might choose to create two separate indexes on x and y, relying on index combination to process the queries that use both columns.     
You could also create a multicolumn index on (x, y).     
This index would typically be more efficient than index combination for queries involving both columns,     
but as discussed in Section 11.3, it would be almost useless for queries involving only y, so it should not be the only index.     
A combination of the multicolumn index and a separate index on y would serve reasonably well. For queries involving only x, the multicolumn index could be used,     
though it would be larger and hence slower than an index on x alone. The last alternative is to create all three indexes,     
but this is probably only reasonable if the table is searched much more often than it is updated and all three types of query are common.     
If one of the types of query is much less common than the others, you'd probably settle for creating just the two indexes that best match the common types.    

在选择单列还是多列索引时,请根据查询需求选择。如果查询很多,更新插入删除很少,那么如果查询条件有a, b, a and b这类的,可以建立3个索引,分别是(a), (b), (a,b)。      

三 哪些索引方法支持multi column index

Currently, only the B-tree, GiST, GIN, and BRIN index types support multicolumn indexes.

Up to 32 columns can be specified. (This limit can be altered when building PostgreSQL; see the file pg_config_manual.h.)

目前PostgreSQL的B-tree, GiST, GIN, and BRIN索引方法,支持多列索引。

目前支持最多32个列的多列索引,实际上可以更大(通过调整pg_config_manual.h可以做到更大,但是还有另一个限制,indextuple不能超过约1/4的数据块大小,也就是说复合索引列很多的情况下,可能会触发这个限制)。

四 multi column index的查询组合与效率解读

Multicolumn Indexes

https://www.postgresql.org/docs/current/static/indexes-multicolumn.html

由于b-tree, gin , gist, brin都支持multi column索引,但是这几种索引的内部存储方式不一样,所以不同的组合查询的效率也不一样。

例如a,b,c三列的组合索引,select * from tbl where a=? and b>? 以及 where b=?,这两种查询组合,哪个效率高?和索引方法有大大的关系。

btree 查询组合与效率

b-tree多列索引支持任意列的组合查询

A multicolumn B-tree index can be used with query conditions that involve any subset of the index's columns, but the index is most efficient when there are constraints on the leading (leftmost) columns.
虽然b-tree多列索引支持任意列的组合查询,但是最有效的查询还是包含驱动列条件的查询。

The exact rule is that equality constraints on leading columns, plus any inequality constraints on the first column that does not have an equality constraint, will be used to limit the portion of the index that is scanned.
对于b-tree的多列索引来说,一个查询要扫描索引的哪些部分呢?

从驱动列开始算,按索引列的顺序算到非驱动列的第一个不相等条件为止(没有任何条件也算)。

Constraints on columns to the right of these columns are checked in the index, so they save visits to the table proper, but they do not reduce the portion of the index that has to be scanned.

For example, given an index on (a, b, c) and a query condition WHERE a = 5 AND b >= 42 AND c < 77, the index would have to be scanned from the first entry with a = 5 and b = 42 up through the last entry with a = 5.

(WHERE a = 5 AND b >= 42 AND c < 77),从a=5, b=42开始的所有索引条目,都会被扫描。

Index entries with c >= 77 would be skipped, but they'd still have to be scanned through.

其他例子

(WHERE b >= 42 AND c < 77),所有索引条目,都会被扫描。只要不包含驱动列,则扫描所有索引条目。

(WHERE a = 5 AND c < 77),a=5的所有索引条目,都会被扫描。

(WHERE a >= 5 AND b=1 and c < 77),从a=5开始的所有索引条目,都会被扫描。

This index could in principle be used for queries that have constraints on b and/or c with no constraint on a — but the entire index would have to be scanned, so in most cases the planner would prefer a sequential table scan over using the index.

建议有频繁的复合查询,并且复合查询带有驱动列以及其他列的查询时,可以考虑使用多列索引。

gist 查询组合与效率

gist多列索引支持任意列的组合查询。

A multicolumn GiST index can be used with query conditions that involve any subset of the index's columns.

Conditions on additional columns restrict the entries returned by the index, but the condition on the first column is the most important one for determining how much of the index needs to be scanned.

注意与b-tree不一样的地方,驱动列的选择性决定了需要扫描多少索引条目,扫描多少条目与非驱动列无关(而b-tree是与非驱动列也有关的)。

A GiST index will be relatively ineffective if its first column has only a few distinct values, even if there are many distinct values in additional columns.

如果驱动列的选择性不好、其他列的选择性很好,即使查询条件同时包含了 驱动列以及其他列 ,也需要扫描很多索引条目,因为扫描多少索引条目和其他列无关。

这么说,并不建议使用gist多列索引。

如果一定要使用GIST多列索引,请一定要把选择性好的列作为驱动列。

gin 查询组合与效率

gin多列索引支持任意列的组合查询。

并且任意查询条件的查询效率都是一样的。

A multicolumn GIN index can be used with query conditions that involve any subset of the index's columns.

Unlike B-tree or GiST, index search effectiveness is the same regardless of which index column(s) the query conditions use.

brin 查询组合与效率

brin多列索引支持任意列的组合查询。

并且任意查询条件的查询效率都是一样的。

如果有brin组合查询的必要(比如多个与ctid线性相关的列的范围查询,无所谓线性的方向),任何时候都建议使用BRIN的multi column index,除非想针对不同的列使用不同的pages_per_range(比如有些列10个块的范围和另外一些列100个块的范围覆盖差不多,那么建议它们使用不同的pages_per_range)

A multicolumn BRIN index can be used with query conditions that involve any subset of the index's columns.

Like GIN and unlike B-tree or GiST, index search effectiveness is the same regardless of which index column(s) the query conditions use.

The only reason to have multiple BRIN indexes instead of one multicolumn BRIN index on a single table is to have a different pages_per_range storage parameter.

五 multi column使用建议

多列索引每个列的operator class必须和实际查询匹配,在创建索引时可以指定。

Of course, each column must be used with operators appropriate to the index type; clauses that involve other operators will not be considered.

Multicolumn indexes should be used sparingly.

In most situations, an index on a single column is sufficient and saves space and time.

Indexes with more than three columns are unlikely to be helpful unless the usage of the table is extremely stylized.

六 多列索引,在不同组合下的查询效率差异,原理剖析 - 从索引内部结构说起

前面分析了b-tree, gin都支持任意组合查询。

但是b-tree推荐使用包含驱动列的查询条件,如果查询条件未包含驱动列,则需要扫描整个复合索引。

而gin则通吃,可以输入任意组合列作为查询条件,并且效率一致。

例如

index on (a,b,c)

b-tree 对于包含驱动列a查询条件的SQL,效率可能比较好,不包括a查询条件的SQL,即使走索引,也要扫描整个索引的所有条目。

而gin 则无论任何查询条件,效果都一样。

这是为什么呢?必须从索引的内部存储组织结构来分析。

b-tree multi column index 剖析

btree 对被索引列按创建索引时指定的顺序排序,然后建立B树。

如create index idx on tbl using btree (a,b desc nulls first,c desc, e);

所以B树中的KEY实际上就是被索引列的组合对象,这个结构决定了什么查询能用上这个复合索引。

(a,b,c), row?    
(a,b,c), row?    
(a,b,c), row?    
....    

要达到最高效的使用这种复合索引,必须带上驱动列的条件。

如果order by要用上索引,那么必须order by的写法要与创建索引时指定的顺序一致。

例如select * from tbl where a=? order by a,b desc nulls first;

gin multi column index 剖析

gin 的复合索引很有趣,它将所有列展开,然后将展开后的数据(列ID+值)排序并建立B树。

因此在gin的复合索引中,B树的KEY实际上是列ID+值。

(column_a, v1), row?    
(column_b, v1), row?    
(column_b, v2), row?    
(column_c, v2), row?    
....    

这样的树,以任意组合进行查询,效率都一样。

where a=? 与 where b=? 效率一样,而且和B-tree的单列索引的效率几乎一致(当索引层级一致时)。

where a=? and b=? 与 where b=? and c=? 效率一样(复合索引查两次,在内部使用bitmapAnd取得结果)。

仅仅当多列组合查询时,gin效率可能比不上b-tree的带驱动列的查询(因为b-tree索引不需要bitmapAnd,而gin需要内部bitmapAnd)。

七 例子

创建一个测试表,包含3个字段

postgres=# create table t3(c1 int, c2 text, c3 int);    
CREATE TABLE    

插入100万记录,其中c2,c3的值固定

postgres=# postgres=# insert into t3 select generate_series(1,1000000),'test',1;    
INSERT 0 1000000    

创建gin复合索引

postgres=# create index idx_t3_1 on t3 using gin(c1,c2,c3);    
CREATE INDEX    

查询c1=1,效率与单列索引一致

这个查询结果也可以说明另一个问题,不同列并不是单纯展开后直接构建B树,它依旧添加了列ID进来,所以即使c3=1有100万记录,并不影响c1=1的扫描PAGE数。

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t3 where c1=1;    
                                                   QUERY PLAN                                                        
-----------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on public.t3  (cost=5.01..6.02 rows=1 width=13) (actual time=0.021..0.021 rows=1 loops=1)    
   Output: c1, c2, c3    
   Recheck Cond: (t3.c1 = 1)    
   Heap Blocks: exact=1    
   Buffers: shared hit=5    
   ->  Bitmap Index Scan on idx_t3_1  (cost=0.00..5.01 rows=1 width=0) (actual time=0.016..0.016 rows=1 loops=1)    
         Index Cond: (t3.c1 = 1)    
         Buffers: shared hit=4    
 Planning time: 0.076 ms    
 Execution time: 0.047 ms    
(10 rows)    

查询c2=?,效率与单列索引一致

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t3 where c2='test';    
                                                            QUERY PLAN                                                                
----------------------------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on public.t3  (cost=8121.00..26027.00 rows=1000000 width=13) (actual time=74.467..179.603 rows=1000000 loops=1)    
   Output: c1, c2, c3    
   Recheck Cond: (t3.c2 = 'test'::text)    
   Heap Blocks: exact=5406    
   Buffers: shared hit=5542    
   ->  Bitmap Index Scan on idx_t3_1  (cost=0.00..7871.00 rows=1000000 width=0) (actual time=73.640..73.640 rows=1000000 loops=1)    
         Index Cond: (t3.c2 = 'test'::text)    
         Buffers: shared hit=136    
 Planning time: 0.130 ms    
 Execution time: 230.770 ms    
(10 rows)    

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t3 where c2='t';    
                                                   QUERY PLAN                                                        
-----------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on public.t3  (cost=5.00..6.01 rows=1 width=13) (actual time=0.014..0.014 rows=0 loops=1)    
   Output: c1, c2, c3    
   Recheck Cond: (t3.c2 = 't'::text)    
   Buffers: shared hit=4    
   ->  Bitmap Index Scan on idx_t3_1  (cost=0.00..5.00 rows=1 width=0) (actual time=0.013..0.013 rows=0 loops=1)    
         Index Cond: (t3.c2 = 't'::text)    
         Buffers: shared hit=4    
 Planning time: 0.081 ms    
 Execution time: 0.039 ms    
(9 rows)    

查询c3=?,效率与单列索引一致


postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t3 where c3=1;    
                                                            QUERY PLAN                                                                
----------------------------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on public.t3  (cost=8121.00..26027.00 rows=1000000 width=13) (actual time=77.949..182.939 rows=1000000 loops=1)    
   Output: c1, c2, c3    
   Recheck Cond: (t3.c3 = 1)    
   Heap Blocks: exact=5406    
   Buffers: shared hit=5542    
   ->  Bitmap Index Scan on idx_t3_1  (cost=0.00..7871.00 rows=1000000 width=0) (actual time=77.116..77.116 rows=1000000 loops=1)    
         Index Cond: (t3.c3 = 1)    
         Buffers: shared hit=136    
 Planning time: 0.083 ms    
 Execution time: 234.558 ms    
(10 rows)    

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t3 where c3=2;    
                                                   QUERY PLAN                                                        
-----------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on public.t3  (cost=5.00..6.01 rows=1 width=13) (actual time=0.015..0.015 rows=0 loops=1)    
   Output: c1, c2, c3    
   Recheck Cond: (t3.c3 = 2)    
   Buffers: shared hit=4    
   ->  Bitmap Index Scan on idx_t3_1  (cost=0.00..5.00 rows=1 width=0) (actual time=0.014..0.014 rows=0 loops=1)    
         Index Cond: (t3.c3 = 2)    
         Buffers: shared hit=4    
 Planning time: 0.081 ms    
 Execution time: 0.040 ms    
(9 rows)    

gin任意组合(不需要限定驱动列)多列查询的隐含bitmapAnd, bitmapOr操作

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t3 where c1=2 and c3=1;    
                                                   QUERY PLAN                                                        
-----------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on public.t3  (cost=9.01..10.03 rows=1 width=13) (actual time=0.044..0.044 rows=1 loops=1)    
   Output: c1, c2, c3    
   Recheck Cond: ((t3.c1 = 2) AND (t3.c3 = 1))    
   Heap Blocks: exact=1    
   Buffers: shared hit=10    
   ->  Bitmap Index Scan on idx_t3_1  (cost=0.00..9.01 rows=1 width=0) (actual time=0.040..0.040 rows=1 loops=1)    
         Index Cond: ((t3.c1 = 2) AND (t3.c3 = 1))    
         Buffers: shared hit=9    
 Planning time: 0.061 ms    
 Execution time: 0.063 ms    
(10 rows)    

没有驱动列,一样高效无比

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t3 where c2='test' and c3=2;    
                                                   QUERY PLAN                                                        
-----------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on public.t3  (cost=9.00..10.02 rows=1 width=13) (actual time=0.052..0.052 rows=0 loops=1)    
   Output: c1, c2, c3    
   Recheck Cond: ((t3.c2 = 'test'::text) AND (t3.c3 = 2))    
   Buffers: shared hit=9    
   ->  Bitmap Index Scan on idx_t3_1  (cost=0.00..9.00 rows=1 width=0) (actual time=0.051..0.051 rows=0 loops=1)    
         Index Cond: ((t3.c2 = 'test'::text) AND (t3.c3 = 2))    
         Buffers: shared hit=9    
 Planning time: 0.086 ms    
 Execution time: 0.075 ms    
(9 rows)    

gin复合索引的展开式B树决定了不能按单列设置顺序

postgres=# create index idx_t1_0 on t1 using gin (c1, c2 desc);    
ERROR:  0A000: access method "gin" does not support ASC/DESC options    
LOCATION:  ComputeIndexAttrs, indexcmds.c:1248    

八 btree vs gin 多列索引

1. 由于btree index, 多列值根据创建索引的DDL指定顺序sort后,多列的值组合后作为一个KEY存储在B树中。

例如4条记录如下

1,1,2;     
1,100,2;     
2,1,10;     
1,1,3;     

btree 中的key排序后分布(有多少条记录,就有多少KEY)

1,1,2; 1,1,3; 1,100,2; 2,1,10;     

2. GIN MULTI COLUMN INDEX 构建了一个包含多种数据类型的B-TREE , 将多列的数据展开后,排序后分布 (key的数量为每列的count distinct总和)

column1,1; column2,1; column1,2; column3,2; column3,3; column3,10; column2,100;     

更形象的比喻

比如有三幅扑克牌(每幅54张牌),每一幅代表一列,如果要创建3列的复合索引,那么B-TREE会创建出54个条目的B树,而GIN会创建出包含162个条目的B树。

请看这个例子,可以说明这个情况

postgres=# create table t2(c1 int2, c2 int4, c3 int8, c4 numeric, c5 text, c6 timestamp);    
CREATE TABLE    
postgres=# insert into t2 select c1,c1,c1,c1,c5,c6 from (select trunc(random()*1000) c1, md5(random()::text) c5, now()+(random()*10000||' sec')::interval c6 from generate_series(1,100000)) t;    
INSERT 0 100000    
postgres=# create index idx_t2_1 on t2 using gin (c1,c2,c3,c4,c5,c6);    
CREATE INDEX    
postgres=# select count(distinct c4) from t2;    
 count     
-------    
  1000    
(1 row)    

postgres=# select count(distinct c1) from t2;    
 count     
-------    
  1000    
(1 row)    

postgres=# select count(distinct c2) from t2;    
 count     
-------    
  1000    
(1 row)    

postgres=# select count(distinct c3) from t2;    
 count     
-------    
  1000    
(1 row)    

postgres=# select count(distinct c5) from t2;    
 count     
-------    
 99996    
(1 row)    

postgres=# select count(distinct c6) from t2;    
 count      
--------    
 100000    
(1 row)    

postgres=# select 99996+100000+4000;    
 ?column?     
----------    
   203996    
(1 row)    

postgres=# select * from gin_metapage_info(get_raw_page('idx_t2_1',0));    
 pending_head | pending_tail | tail_free_size | n_pending_pages | n_pending_tuples | n_total_pages | n_entry_pages | n_data_pages | n_entries | version     
--------------+--------------+----------------+-----------------+------------------+---------------+---------------+--------------+-----------+---------    
   4294967295 |   4294967295 |              0 |               0 |                0 |          2611 |          2610 |            0 |    203996 |       2    
(1 row)    

n_entries = count(distinct indexed column1) + ....     

b-tree 要高效的使用复合索引,必须带驱动列的查询条件。

GIN 使用复合索引,可以任意组合查询条件,当有多列查询条件时,使用隐含的bitmapAnd or bitmapOr。

什么情况下,gin比btree慢?

通常,多列查询时,如果使用了驱动列,那么B-TREE索引会更快一些,因为它不需要使用bitmapAnd or bitmapOr,而GIN需要。

其他情况,gin都比btree的复合查询快。

九 gin 多列索引的独门秘籍

通过前面的分析,我们已经摸清楚了GIN的复合索引结构(展开式B树),并且也知道GIN是key+ctid list的类b+tree树组织形式,它可以有很高的压缩比,也可以高效的查询单KEY。

那么GIN具备这些特性后,有哪些独门秘籍呢?

1. 任意组合查询,都可以达到很高效,你不需要创建多个b-tree索引了,一个GIN搞定它(不必担心GIN的IO,有fastupdate技术支撑,并且读写(entry合并)不堵塞)。

比如淘宝的搜索页面,用户可能根据任意属性,勾选任意条件进行查询。

创建一张测试表, 6个字段

postgres=# create table taobao(c1 int, c2 int, c3 timestamp, c4 text, c5 numeric, c6 text);  
CREATE TABLE  

插入1000万随机记录

postgres=# insert into taobao select random()*2000, random()*3000, now()+((50000-100000*random())||' sec')::interval , md5(random()::text), round((random()*1000000)::numeric,2), md5(random()::text) from generate_series(1,10000000);  
INSERT 0 10000000  

创建GIN多列索引

postgres=# create index idx_taobao_gin on taobao using gin(c1,c2,c3,c4,c5,c6);  

数据样例

postgres=# select * from taobao limit 10;  
  c1  |  c2  |             c3             |                c4                |    c5     |                c6                  
------+------+----------------------------+----------------------------------+-----------+----------------------------------  
 1405 |  882 | 2017-02-06 09:41:24.985878 | 49982683517aab7d194f3affe74ba827 |  65157.79 | 256ad6d098a6536e3548b5af91a26557  
 1277 | 1269 | 2017-02-06 09:52:16.313212 | b40c1febdb7f62c916d3632a03092261 | 940751.10 | 9911b203e38b57c55a769312c2aaaeba  
  870 |  159 | 2017-02-06 05:59:46.853421 | 96a0f84d9f9381d77364d93ca2d7aa6f | 419618.52 | 1e716d90055d3b32027a5e80a19e2f4f  
 1990 | 1100 | 2017-02-07 00:35:26.849744 | 684b66b25eb57d97f604eb9d92dfc8b0 | 764940.62 | eea82a253995a70da23a9f9ba3015175  
  625 | 1076 | 2017-02-06 01:48:13.929789 | 7fe094f548cffa367ebeb28d4f188875 | 482201.22 | 81ba27a8123dbd3e3a741c5984368709  
  968 |  554 | 2017-02-06 06:05:29.971936 | 6c8b34e4eb7c5eca3a8ad6131c5aecd3 | 583617.05 | b5cfc01c845cc87a4eb8a425ac9b2e01  
 1795 |  667 | 2017-02-06 20:18:46.027376 | b4ef2282064ef3e7a4eb3c624ba42334 | 911819.49 | a794b4635bb314972d891c7218063642  
  222 | 1041 | 2017-02-06 20:29:15.686187 | f3a6bc2b683e272c293d09353fb2465b | 722814.10 | 22b3b06a69b134dee9e3236907637683  
 1596 | 2153 | 2017-02-05 22:38:54.795333 | 1455f7f7f198d09e380e680321a31968 | 672431.00 | efac7b0a2154e25321a522b98903df41  
  313 | 2955 | 2017-02-06 22:59:49.117719 | 7840bbd2be443904b094b9f8919730c0 | 525295.10 | b5feb900191fd404f3a3de407fa62c93  
(10 rows)  

 public | taobao | table | postgres | 1202 MB |   
 public | idx_taobao_gin | index | postgres | taobao | 3761 MB |   

查询测试

任意单一字段

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from taobao where c1=1;  
                                                          QUERY PLAN                                                            
------------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.taobao  (cost=46.51..4873.01 rows=4840 width=90) (actual time=1.602..10.038 rows=5043 loops=1)  
   Output: c1, c2, c3, c4, c5, c6  
   Recheck Cond: (taobao.c1 = 1)  
   Heap Blocks: exact=4958  
   Buffers: shared hit=4966  
   ->  Bitmap Index Scan on idx_taobao_gin  (cost=0.00..45.30 rows=4840 width=0) (actual time=0.849..0.849 rows=5043 loops=1)  
         Index Cond: (taobao.c1 = 1)  
         Buffers: shared hit=8  
 Planning time: 0.319 ms  
 Execution time: 10.346 ms  
(10 rows)  

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from taobao where c2=882;  
                                                          QUERY PLAN                                                            
------------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.taobao  (cost=34.16..3287.73 rows=3246 width=90) (actual time=1.175..7.024 rows=3373 loops=1)  
   Output: c1, c2, c3, c4, c5, c6  
   Recheck Cond: (taobao.c2 = 882)  
   Heap Blocks: exact=3350  
   Buffers: shared hit=3358  
   ->  Bitmap Index Scan on idx_taobao_gin  (cost=0.00..33.35 rows=3246 width=0) (actual time=0.651..0.651 rows=3373 loops=1)  
         Index Cond: (taobao.c2 = 882)  
         Buffers: shared hit=8  
 Planning time: 0.094 ms  
 Execution time: 7.236 ms  
(10 rows)  

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from taobao where c3='2017-02-06 09:41:24.985878';  
                                                      QUERY PLAN                                                         
-----------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.taobao  (cost=8.01..9.02 rows=1 width=90) (actual time=0.025..0.025 rows=1 loops=1)  
   Output: c1, c2, c3, c4, c5, c6  
   Recheck Cond: (taobao.c3 = '2017-02-06 09:41:24.985878'::timestamp without time zone)  
   Heap Blocks: exact=1  
   Buffers: shared hit=6  
   ->  Bitmap Index Scan on idx_taobao_gin  (cost=0.00..8.01 rows=1 width=0) (actual time=0.019..0.019 rows=1 loops=1)  
         Index Cond: (taobao.c3 = '2017-02-06 09:41:24.985878'::timestamp without time zone)  
         Buffers: shared hit=5  
 Planning time: 0.125 ms  
 Execution time: 0.057 ms  
(10 rows)  

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from taobao where c4='b40c1febdb7f62c916d3632a03092261';  
                                                      QUERY PLAN                                                         
-----------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.taobao  (cost=8.01..9.02 rows=1 width=90) (actual time=0.028..0.028 rows=1 loops=1)  
   Output: c1, c2, c3, c4, c5, c6  
   Recheck Cond: (taobao.c4 = 'b40c1febdb7f62c916d3632a03092261'::text)  
   Heap Blocks: exact=1  
   Buffers: shared hit=6  
   ->  Bitmap Index Scan on idx_taobao_gin  (cost=0.00..8.01 rows=1 width=0) (actual time=0.023..0.023 rows=1 loops=1)  
         Index Cond: (taobao.c4 = 'b40c1febdb7f62c916d3632a03092261'::text)  
         Buffers: shared hit=5  
 Planning time: 0.101 ms  
 Execution time: 0.058 ms  
(10 rows)  

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from taobao where c5='764940.62';  
                                                      QUERY PLAN                                                         
-----------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.taobao  (cost=8.01..9.02 rows=1 width=90) (actual time=0.028..0.028 rows=1 loops=1)  
   Output: c1, c2, c3, c4, c5, c6  
   Recheck Cond: (taobao.c5 = 764940.62)  
   Heap Blocks: exact=1  
   Buffers: shared hit=6  
   ->  Bitmap Index Scan on idx_taobao_gin  (cost=0.00..8.01 rows=1 width=0) (actual time=0.023..0.023 rows=1 loops=1)  
         Index Cond: (taobao.c5 = 764940.62)  
         Buffers: shared hit=5  
 Planning time: 0.127 ms  
 Execution time: 0.069 ms  
(10 rows)  

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from taobao where c6='test';  
                                                      QUERY PLAN                                                         
-----------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.taobao  (cost=8.01..9.02 rows=1 width=90) (actual time=0.023..0.023 rows=0 loops=1)  
   Output: c1, c2, c3, c4, c5, c6  
   Recheck Cond: (taobao.c6 = 'test'::text)  
   Buffers: shared hit=5  
   ->  Bitmap Index Scan on idx_taobao_gin  (cost=0.00..8.01 rows=1 width=0) (actual time=0.020..0.020 rows=0 loops=1)  
         Index Cond: (taobao.c6 = 'test'::text)  
         Buffers: shared hit=5  
 Planning time: 0.088 ms  
 Execution time: 0.054 ms  
(9 rows)  

任意2字段

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from taobao where c2=1 and c3='2017-02-06 09:41:24.985878';  
                                                       QUERY PLAN                                                         
------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.taobao  (cost=15.00..16.02 rows=1 width=90) (actual time=0.060..0.060 rows=0 loops=1)  
   Output: c1, c2, c3, c4, c5, c6  
   Recheck Cond: ((taobao.c2 = 1) AND (taobao.c3 = '2017-02-06 09:41:24.985878'::timestamp without time zone))  
   Buffers: shared hit=11  
   ->  Bitmap Index Scan on idx_taobao_gin  (cost=0.00..15.00 rows=1 width=0) (actual time=0.057..0.057 rows=0 loops=1)  
         Index Cond: ((taobao.c2 = 1) AND (taobao.c3 = '2017-02-06 09:41:24.985878'::timestamp without time zone))  
         Buffers: shared hit=11  
 Planning time: 0.100 ms  
 Execution time: 0.093 ms  
(9 rows)  

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from taobao where c2=1 or c3='2017-02-06 09:41:24.985878';  
                                                             QUERY PLAN                                                               
------------------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.taobao  (cost=42.98..3305.68 rows=3247 width=90) (actual time=1.165..6.947 rows=3330 loops=1)  
   Output: c1, c2, c3, c4, c5, c6  
   Recheck Cond: ((taobao.c2 = 1) OR (taobao.c3 = '2017-02-06 09:41:24.985878'::timestamp without time zone))  
   Heap Blocks: exact=3295  
   Buffers: shared hit=3308  
   ->  BitmapOr  (cost=42.98..42.98 rows=3247 width=0) (actual time=0.650..0.650 rows=0 loops=1)  
         Buffers: shared hit=13  
         ->  Bitmap Index Scan on idx_taobao_gin  (cost=0.00..33.35 rows=3246 width=0) (actual time=0.638..0.638 rows=3329 loops=1)  
               Index Cond: (taobao.c2 = 1)  
               Buffers: shared hit=8  
         ->  Bitmap Index Scan on idx_taobao_gin  (cost=0.00..8.01 rows=1 width=0) (actual time=0.011..0.011 rows=1 loops=1)  
               Index Cond: (taobao.c3 = '2017-02-06 09:41:24.985878'::timestamp without time zone)  
               Buffers: shared hit=5  
 Planning time: 0.099 ms  
 Execution time: 7.161 ms  
(15 rows)  

任意3字段

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from taobao where c4='b40c1febdb7f62c916d3632a03092261' and c5=1 and c6='test';  
                                                                 QUERY PLAN                                                                   
--------------------------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.taobao  (cost=22.00..23.02 rows=1 width=90) (actual time=0.051..0.051 rows=0 loops=1)  
   Output: c1, c2, c3, c4, c5, c6  
   Recheck Cond: ((taobao.c4 = 'b40c1febdb7f62c916d3632a03092261'::text) AND (taobao.c5 = '1'::numeric) AND (taobao.c6 = 'test'::text))  
   Buffers: shared hit=13  
   ->  Bitmap Index Scan on idx_taobao_gin  (cost=0.00..22.00 rows=1 width=0) (actual time=0.048..0.048 rows=0 loops=1)  
         Index Cond: ((taobao.c4 = 'b40c1febdb7f62c916d3632a03092261'::text) AND (taobao.c5 = '1'::numeric) AND (taobao.c6 = 'test'::text))  
         Buffers: shared hit=13  
 Planning time: 0.115 ms  
 Execution time: 0.084 ms  
(9 rows)  

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from taobao where c4='b40c1febdb7f62c916d3632a03092261' or c5=1 or c6='test';  
                                                              QUERY PLAN                                                                
--------------------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.taobao  (cost=24.03..27.08 rows=3 width=90) (actual time=0.061..0.062 rows=1 loops=1)  
   Output: c1, c2, c3, c4, c5, c6  
   Recheck Cond: ((taobao.c4 = 'b40c1febdb7f62c916d3632a03092261'::text) OR (taobao.c5 = '1'::numeric) OR (taobao.c6 = 'test'::text))  
   Heap Blocks: exact=1  
   Buffers: shared hit=16  
   ->  BitmapOr  (cost=24.03..24.03 rows=3 width=0) (actual time=0.055..0.055 rows=0 loops=1)  
         Buffers: shared hit=15  
         ->  Bitmap Index Scan on idx_taobao_gin  (cost=0.00..8.01 rows=1 width=0) (actual time=0.025..0.025 rows=1 loops=1)  
               Index Cond: (taobao.c4 = 'b40c1febdb7f62c916d3632a03092261'::text)  
               Buffers: shared hit=5  
         ->  Bitmap Index Scan on idx_taobao_gin  (cost=0.00..8.01 rows=1 width=0) (actual time=0.016..0.016 rows=0 loops=1)  
               Index Cond: (taobao.c5 = '1'::numeric)  
               Buffers: shared hit=5  
         ->  Bitmap Index Scan on idx_taobao_gin  (cost=0.00..8.01 rows=1 width=0) (actual time=0.012..0.012 rows=0 loops=1)  
               Index Cond: (taobao.c6 = 'test'::text)  
               Buffers: shared hit=5  
 Planning time: 0.110 ms  
 Execution time: 0.122 ms  
(18 rows)  

任意4字段

不再举例,都能用上索引

2. 由于ctid list组织,所以可以做到很好的压缩,前面讲的posting list compress就是这个功效。所以它节约空间,提升效率。

postgres=# create table t4(c1 int,c2 int);  
CREATE TABLE  
postgres=# insert into t4 select generate_series(1,10000000),1;  
INSERT 0 10000000  

postgres=# create index btree_t4_c1 on t4 using btree(c1);  
CREATE INDEX  
postgres=# create index btree_t4_c2 on t4 using btree(c2);  
CREATE INDEX  

postgres=# create index btree_t4_c1 on t4 using btree(c1);  
CREATE INDEX  
postgres=# create index btree_t4_c2 on t4 using btree(c2);  
CREATE INDEX  

postgres=# create index gin_t4_c1 on t4 using gin(c1);  
CREATE INDEX  
postgres=# create index gin_t4_c2 on t4 using gin(c2);  
CREATE INDEX  

postgres=# \di+  
                            List of relations  
 Schema |    Name     | Type  |  Owner   | Table |  Size   | Description   
--------+-------------+-------+----------+-------+---------+-------------  
 public | btree_t4_c1 | index | postgres | t4    | 214 MB  | // btree为全entry索引,  
 public | btree_t4_c2 | index | postgres | t4    | 214 MB  | // 所以即使c2字段1000万全重复值,存储空间也一样  
 public | gin_t4_c1   | index | postgres | t4    | 534 MB  | // 原本存储heap ctid的,被拆分为pos 2 bytes、blkid 4 bytes来存储posting list的长度、posting tree root的blkid,所以比btree多了一点  
 public | gin_t4_c2   | index | postgres | t4    | 10 MB   | // ctid list(posting list)自动压缩,压缩比很高  

3. 使用btree_gin插件,可以对任意标量数据类型创建GIN索引,前面已有例子。

4. gin对多值类型(如数组、文本、全文检索)的支持就不多说了,那是GIN的发源地,支持非常棒。

注意,目前gin还不支持sort,所以如果你有大数据量的ORDER BY limit 小数据输出需求,建议还是使用b-tree。

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t4 where c2=1 order by c2 limit 1;  
                                                                 QUERY PLAN                                                                    
---------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=81163.88..81163.90 rows=1 width=8) (actual time=754.234..754.235 rows=1 loops=1)  
   Output: c1, c2  
   Buffers: shared hit=1307  
   ->  Bitmap Heap Scan on public.t4  (cost=81163.88..250411.70 rows=9999985 width=8) (actual time=754.234..754.234 rows=1 loops=1)  
         Output: c1, c2  
         Recheck Cond: (t4.c2 = 1)  
         Heap Blocks: exact=1  
         Buffers: shared hit=1307  
         ->  Bitmap Index Scan on gin_t4_c2  (cost=0.00..78663.89 rows=9999985 width=0) (actual time=745.651..745.651 rows=10000000 loops=1)   
             // gin还不适合limit输出,但是可以通过修改内核来改进  
               Index Cond: (t4.c2 = 1)  
               Buffers: shared hit=1306  
 Planning time: 0.091 ms  
 Execution time: 754.261 ms  
(13 rows)  

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t4 order by c2 limit 1;  
                                                               QUERY PLAN                                                                  
-----------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=0.43..0.46 rows=1 width=8) (actual time=0.031..0.031 rows=1 loops=1)  
   Output: c1, c2  
   Buffers: shared hit=1 read=3  
   ->  Index Scan using btree_t4_c2 on public.t4  (cost=0.43..221670.43 rows=10000000 width=8) (actual time=0.030..0.030 rows=1 loops=1)  
         Output: c1, c2  
         Buffers: shared hit=1 read=3  
 Planning time: 0.141 ms  
 Execution time: 0.048 ms  
(8 rows)  

postgres=# drop index btree_t4_c2;  

// 不限制c2任何条件的话,不能使用gin,order by也不能使用gin  

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t4 order by c2 limit 1;  
                                                                    QUERY PLAN                                                                      
--------------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=10000194247.77..10000194247.78 rows=1 width=8) (actual time=1719.522..1719.523 rows=1 loops=1)  
   Output: c1, c2  
   Buffers: shared hit=44248  
   ->  Sort  (cost=10000194247.77..10000219247.74 rows=9999985 width=8) (actual time=1719.520..1719.520 rows=1 loops=1)  
         Output: c1, c2  
         Sort Key: t4.c2  
         Sort Method: top-N heapsort  Memory: 25kB  
         Buffers: shared hit=44248  
         ->  Seq Scan on public.t4  (cost=10000000000.00..10000144247.85 rows=9999985 width=8) (actual time=0.009..754.991 rows=10000000 loops=1)  
               Output: c1, c2  
               Buffers: shared hit=44248  
 Planning time: 0.084 ms  
 Execution time: 1719.543 ms  
(13 rows)  

// 限制c2任何条件的话,可以使用gin,但是order by依旧不能使用gin  
postgres=# set enable_sort=off;  
SET  
Time: 0.112 ms  
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t4 where c2>0 order by c2 limit 1;  
                                                                     QUERY PLAN                                                                       
----------------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=10000301719.00..10000301719.00 rows=1 width=8) (actual time=2801.874..2801.875 rows=1 loops=1)  
   Output: c1, c2  
   Buffers: shared hit=45554  
   ->  Sort  (cost=10000301719.00..10000326719.00 rows=10000000 width=8) (actual time=2801.873..2801.873 rows=1 loops=1)  
         Output: c1, c2  
         Sort Key: t4.c2  
         Sort Method: top-N heapsort  Memory: 25kB  
         Buffers: shared hit=45554  
         ->  Bitmap Heap Scan on public.t4  (cost=82471.00..251719.00 rows=10000000 width=8) (actual time=817.773..1905.348 rows=10000000 loops=1)  
               Output: c1, c2  
               Recheck Cond: (t4.c2 > 0)  
               Heap Blocks: exact=44248  
               Buffers: shared hit=45554  
               ->  Bitmap Index Scan on gin_t4_c2  (cost=0.00..79971.00 rows=10000000 width=0) (actual time=809.234..809.234 rows=10000000 loops=1)  
                     Index Cond: (t4.c2 > 0)  
                     Buffers: shared hit=1306  
 Planning time: 0.103 ms  
 Execution time: 2801.909 ms  
(18 rows)  

十 小结

宝剑赠英雄,美玉配佳人。

PostgreSQL 数据库,了解越多,才会更加随心所欲,驾驭自如。

参考

https://www.postgresql.org/docs/9.6/static/indexes-bitmap-scans.html

https://www.postgresql.org/docs/current/static/indexes-multicolumn.html

https://www.postgresql.org/docs/9.6/static/btree-gin.html

https://www.postgresql.org/docs/9.6/static/btree-gist.html

https://www.postgresql.org/docs/9.6/static/pageinspect.html

《深入浅出PostgreSQL B-Tree索引结构》

《B-Tree和B+Tree》

《PostgreSQL GIN索引实现原理》

《从难缠的模糊查询聊开 - PostgreSQL独门绝招之一 GIN , GiST , SP-GiST , RUM 索引原理与技术背景》

上一篇:Python异步IO --- 轻松管理10k+并发连接


下一篇:XML基础