【学习资料】第14期快速入门PostgreSQL应用开发与管理 - 4 高级SQL用法

背景

本章大

1. 聚集函数

常用聚合函数

统计类的聚合函

排序聚合

Hypothetical-Set聚合函数

分析

2. 查询

3. union\union all\except\intersect

4. 

5. 

优化器如何强制连接顺序

6. 

左外

右外

全外

7. 窗口查询

第二章:高SQL用法

1. 聚集函数

https://www.postgresql.org/docs/9.6/static/functions-aggregate.html

常用聚合函数

Function

Argument Type(s)

Return Type

Description

array_agg(expression)

any

array of the argument type

input values, including nulls, concatenated into an array

avg(expression)

smallint, int, bigint, real, double precision, numeric, or interval

numeric for any integer-type argument, double precision for a floating-point argument, otherwise the same as the argument data type

the average (arithmetic mean) of all input values

bit_and(expression)

smallint, int, bigint, or bit

same as argument data type

the bitwise AND of all non-null input values, or null if none

bit_or(expression)

smallint, int, bigint, or bit

same as argument data type

the bitwise OR of all non-null input values, or null if none

bool_and(expression)

bool

bool

true if all input values are true, otherwise false

bool_or(expression)

bool

bool

true if at least one input value is true, otherwise false

count(*)

-

bigint

number of input rows

count(expression)

any

bigint

number of input rows for which the value of expression is not null

every(expression)

bool

bool

equivalent to bool_and

json_agg(expression)

any

json

aggregates values as a JSON array

json_object_agg(name,value)

(any, any)

json

aggregates name/value pairs as a JSON object

max(expression)

any array, numeric, string, or date/time type

same as argument type

maximum value of expression across all input values

min(expression)

any array, numeric, string, or date/time type

same as argument type

minimum value of expression across all input values

string_agg(expression,delimiter)

(text, text) or (bytea, bytea)

same as argument types

input values concatenated into a string, separated by delimiter

sum(expression)

smallint, int, bigint, real, double precision, numeric, interval, or money

bigint for smallint or int arguments, numeric for bigint arguments, otherwise the same as the argument data type

sum of expression across all input values

xmlagg(expression)

xml

xml

concatenation of XML values (see alsoSection 9.14.1.7)

上图中所有聚合函数, 当没有行输入时, 除了count返回0, 其他都返回null.

使用sum, array_agg时, 当没有行输入, 返回NULL可能有点别扭, 可以使用coalesce来替代NULL,如coalesce(sum(x), 0)

coalesce(array_agg(x), '{}'::int[])

例子:

聚合后得到数组, null将计入数组元素

postgres=# select array_agg(id) from (values(null),(1),(2)) as t(id);    

 array_agg     

------------    

 {NULL,1,2}   

(1 row)    

算平均值时不计算null

postgres=# select avg(id) from (values(null),(1),(2)) as t(id);    

        avg             

--------------------    

 1.5000000000000000    

(1 row)    

算bit与|或时也不计算NULL

postgres=# select bit_and(id) from (values(null),(1),(2)) as t(id);    

 bit_and    

---------    

       0   

(1 row)    

postgres=# select bit_or(id) from (values(null),(1),(2)) as t(id);    

 bit_or    

--------    

      3   

(1 row)    

算布尔逻辑时也不计算NULL

postgres=# select bool_and(id) from (values(null),(true),(false)) as t(id);    

 bool_and    

----------    

 f    

(1 row)    

every是bool_and的别名, 实际上是SQL标准中定义的.

postgres=# select every(id) from (values(null),(true),(false)) as t(id);    

 every    

-------    

 f    

(1 row)    

SQL标准中还定义了any和some为bool_or的别名, 但是因为any和some还可以被解释为子查询, 所以在PostgreSQL中any和some的布尔逻辑聚合不可用.

postgres=# select any(id) from (values(null),(true),(false)) as t(id);    

ERROR:  syntax error at or near "any"    

LINE 1: select any(id) from (values(null),(true),(false)) as t(id);    

               ^    

postgres=# select some(id) from (values(null),(true),(false)) as t(id);    

ERROR:  syntax error at or near "some"    

LINE 1: select some(id) from (values(null),(true),(false)) as t(id);    

               ^    

bool_or的例子

postgres=# select bool_or(id) from (values(null),(true),(false)) as t(id);    

 bool_or    

---------    

 t    

(1 row)    

计算非空的表达式个数, count带表达式时, 不计算null

postgres=# select count(id) from (values(null),(1),(2)) as t(id);    

 count    

-------    

     2   

(1 row)    

计算表达式(含空值)的个数, count(*)计算null, 注意count(*)是一个独立的聚合函数. 请和count(express)区分开来.

postgres=# select count(*) from (values(null),(1),(2)) as t(id);    

 count    

-------    

     3   

(1 row)    

postgres=# select count(*) from (values(null),(null),(1),(2)) as t(id);    

 count    

-------    

     4   

(1 row)    

聚合后得到json, 不带key的json聚合

postgres=# select json_agg(id) from (values(null),(true),(false)) as t(id);    

      json_agg           

---------------------    

 [null, true, false]    

(1 row)    

聚合后得到json, 带key的json聚合, 注意key不能为null, 否则报错.

postgres=# select json_object_agg(c1,c2) from (values('a',null),('b',true),('c',false)) as t(c1,c2);    

             json_object_agg                 

-----------------------------------------    

 { "a" : null, "b" : true, "c" : false }    

(1 row)    

postgres=# select json_object_agg(c1,c2) from (values(null,null),('b',true),('c',false)) as t(c1,c2);    

ERROR:  22023: field name must not be null    

LOCATION:  json_object_agg_transfn, json.c:1959    

 计算最大最小值, max, min都不计算null

postgres=# select max(id) from (values(null),(1),(2)) as t(id);    

 max     

-----    

   2    

(1 row)    

postgres=# select min(id) from (values(null),(1),(2)) as t(id);    

 min     

-----    

   1    

(1 row)    

聚合后得到字符串, 字符串聚合

postgres=# select string_agg(c1,'***') from (values('a',null),('b',true),('c',false)) as t(c1,c2);    

 string_agg    

------------    

 a***b***c   

(1 row)    

postgres=# select string_agg(id,'***') from (values(null),('digoal'),('zhou')) as t(id);    

  string_agg       

---------------    

 digoal***zhou    

(1 row)    

计算总和, sum不计算null, 当所有行都是null时, 即没有任何行输入, 返回null.

postgres=# select sum(id) from (values(null),(1),(2)) as t(id);    

 sum     

-----    

   3    

(1 row)    

postgres=# select sum(id::int) from (values(null),(null),(null)) as t(id);    

 sum     

-----    

        

(1 row)    

(1 row)    

聚合后得到xml

postgres=# select xmlagg(id::xml) from (values(null),('<foo>digoal</foo>'),('<bar/>')) as t(id);    

         xmlagg              

-------------------------    

 <foo>digoal</foo><bar/>    

(1 row)    

某些聚合函数得到的结果可能和行的输入顺序有关, 例如array_agg, json_agg, json_object_agg, string_agg, and xmlagg, 以及某些自定义聚合函数. 如何来实现呢?

支持聚合函数中使用order by的PostgreSQL版本可以用如下语法:

postgres=# select string_agg(id,'***' order by id) from (values(null),('digoal'),('zhou')) as t(id);    

  string_agg       

---------------    

 digoal***zhou    

(1 row)    

postgres=# select string_agg(id,'***' order by id desc) from (values(null),('digoal'),('zhou')) as t(id);    

  string_agg       

---------------    

 zhou***digoal    

(1 row)    

统计类的聚合函

Function

Argument Type

Return Type

Description

corr(Y, X)

double precision

double precision

correlation coefficient

covar_pop(Y, X)

double precision

double precision

population covariance

covar_samp(Y, X)

double precision

double precision

sample covariance

regr_avgx(Y, X)

double precision

double precision

average of the independent variable (sum(X)/N)

regr_avgy(Y, X)

double precision

double precision

average of the dependent variable (sum(Y)/N)

regr_count(Y, X)

double precision

bigint

number of input rows in which both expressions are nonnull

regr_intercept(Y, X)

double precision

double precision

y-intercept of the least-squares-fit linear equation determined by the (X, Y) pairs

regr_r2(Y, X)

double precision

double precision

square of the correlation coefficient

regr_slope(Y, X)

double precision

double precision

slope of the least-squares-fit linear equation determined by the (X, Y) pairs

regr_sxx(Y, X)

double precision

double precision

sum(X^2) - sum(X)^2/N ("sum of squares" of the independent variable)

regr_sxy(Y, X)

double precision

double precision

sum(X*Y) - sum(X) * sum(Y)/N ("sum of products" of independent times dependent variable)

regr_syy(Y, X)

double precision

double precision

sum(Y^2) - sum(Y)^2/N ("sum of squares" of the dependent variable)

stddev(expression)

smallint, int, bigint, real, double precision, or numeric

double precision for floating-point arguments, otherwise numeric

historical alias for stddev_samp

stddev_pop(expression)

smallint, int, bigint, real, double precision, or numeric

double precision for floating-point arguments, otherwise numeric

population standard deviation of the input values

stddev_samp(expression)

smallint, int, bigint, real, double precision, or numeric

double precision for floating-point arguments, otherwise numeric

sample standard deviation of the input values

variance(expression)

smallint, int, bigint, real, double precision, or numeric

double precision for floating-point arguments, otherwise numeric

historical alias for var_samp

var_pop(expression)

smallint, int, bigint, real, double precision, or numeric

double precision for floating-point arguments, otherwise numeric

population variance of the input values (square of the population standard deviation)

var_samp(expression)

smallint, int, bigint, real, double precision, or numeric

double precision for floating-point arguments, otherwise numeric

sample variance of the input values (square of the sample standard deviation)

相关性统计:

corr, regr_r2

总体|样本方差, 标准方差:

variance, var_pop, var_samp

stddev, stddev_pop, stddev_samp

总体协方差, 样本协方差:

covar_pop, covar_samp

线性回归:

regr_avgx, regr_avgy, regr_count, regr_intercept(截距), regr_r2(相关度corr的平方), regr_slope(斜率), regr_sxx, regr_sxy, regr_syy.

排序聚合

Function

Direct Argument Type(s)

Aggregated Argument Type(s)

Return Type

Description

mode() WITHIN GROUP (ORDER BYsort_expression)

-

any sortable type

same as sort expression

returns the most frequent input value (arbitrarily choosing the first one if there are multiple equally-frequent results)

percentile_cont(fraction) WITHIN GROUP (ORDER BY sort_expression)

double precision

double precisionor interval

same as sort expression

continuous percentile: returns a value corresponding to the specified fraction in the ordering, interpolating between adjacent input items if needed

percentile_cont(fractions) WITHIN GROUP (ORDER BY sort_expression)

double precision[]

double precisionor interval

array of sort expression's type

multiple continuous percentile: returns an array of results matching the shape of the fractionsparameter, with each non-null element replaced by the value corresponding to that percentile

percentile_disc(fraction) WITHIN GROUP (ORDER BY sort_expression)

double precision

any sortable type

same as sort expression

discrete percentile: returns the first input value whose position in the ordering equals or exceeds the specified fraction

percentile_disc(fractions) WITHIN GROUP (ORDER BY sort_expression)

double precision[]

any sortable type

array of sort expression's type

multiple discrete percentile: returns an array of results matching the shape of the fractionsparameter, with each non-null element replaced by the input value corresponding to that percentile

mode比较好理解, 就是取分组中出现频率最高的值或表达式, 如果最高频率的值有多个, 则随机取一个.

postgres=# create table test(id int, info text);    

CREATE TABLE    

postgres=# insert into test values (1,'test1');    

INSERT 0 1    

postgres=# insert into test values (1,'test1');    

INSERT 0 1    

postgres=# insert into test values (1,'test2');    

INSERT 0 1    

postgres=# insert into test values (1,'test3');    

INSERT 0 1    

postgres=# insert into test values (2,'test1');    

INSERT 0 1    

postgres=# insert into test values (2,'test1');    

INSERT 0 1    

postgres=# insert into test values (2,'test1');    

INSERT 0 1    

postgres=# insert into test values (3,'test4');    

INSERT 0 1    

postgres=# insert into test values (3,'test4');    

INSERT 0 1    

postgres=# insert into test values (3,'test4');    

INSERT 0 1    

postgres=# insert into test values (3,'test4');    

INSERT 0 1    

postgres=# insert into test values (3,'test4');    

INSERT 0 1    

postgres=# select * from test;    

 id | info     

----+-------    

  1 | test1   

  1 | test1   

  1 | test2   

  1 | test3   

  2 | test1   

  2 | test1   

  2 | test1   

  3 | test4   

  3 | test4   

  3 | test4   

  3 | test4   

  3 | test4   

(12 rows)    

取出所有数据中, 出现频率最高的info, 有可能是test1也有可能是test4, 因为他们的出现频率一致.

mode的返回结果数据类型和order by后面的表达式一致.

postgres=# select mode() within group (order by info) from test;    

 mode     

-------    

 test1   

(1 row)    

如果按INFO来分组的话, 取出出现频率最高的info, 实际上这个操作是没有任何意义的, 返回值就是所有记录的info的唯一值.

postgres=# select mode() within group (order by info) from test group by info;    

 mode     

-------    

 test1   

 test2   

 test3   

 test4   

(4 rows)    

按id来分组, 取出组内出现频率最高的info值, 这个是有意义的.

postgres=# select mode() within group (order by info) from test group by id;    

 mode     

-------    

 test1   

 test1   

 test4   

(3 rows)    

id=1 , 出现频率最高的info是test1. 出现2次.

如下:

postgres=# select id,info,count(*) from test group by id,info;    

 id | info | count     

----+-------+-------    

  1 | test1 |     2   

  1 | test3 |     1   

  3 | test4 |     5   

  1 | test2 |     1   

  2 | test1 |     3   

(5 rows)    

如果要返回mode()并返回频率次数. 可以使用row_number()窗口来实现. 如下.

postgres=# select id,info,cnt from (select id,info,cnt,row_number() over(partition by id order by cnt desc) as rn from (select id,info,count(*) cnt from test group by id,info) t) t where t.rn=1;    

 id | info | cnt     

----+-------+-----    

  1 | test1 |  2    

  2 | test1 |  3    

  3 | test4 |  5    

(3 rows)    

其他, mode的返回结果数据类型和order by后面的表达式一致.

postgres=# select mode() within group (order by id) from test;    

 mode    

------    

    3    

(1 row)    

postgres=# select mode() within group (order by id+1) from test;    

 mode    

------    

    4    

(1 row)    

另外还有4个函数是和数据分布有关的, 需要指定从0到1的分布位置. 返回排序后, 在指定分布位置的值或表达式的值.

src/backend/utils/adt/orderedsetaggs.c    

    

    

        if (percentile < 0 || percentile > 1 || isnan(percentile))    

               ereport(ERROR,    

                               (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),    

                                errmsg("percentile value %g is not between 0 and 1",    

                                               percentile)));    

同时还需要注意区分连续分布和离散分布.

postgres=# create table test(id int, info text);    

CREATE TABLE    

postgres=# insert into test values (1,'test1');    

INSERT 0 1    

postgres=# insert into test values (2,'test2');    

INSERT 0 1    

postgres=# insert into test values (3,'test2');    

INSERT 0 1    

postgres=# insert into test values (4,'test2');    

INSERT 0 1    

postgres=# insert into test values (5,'test2');    

INSERT 0 1    

postgres=# insert into test values (6,'test2');    

INSERT 0 1    

postgres=# insert into test values (7,'test2');    

INSERT 0 1    

postgres=# insert into test values (8,'test3');    

INSERT 0 1    

postgres=# insert into test values (100,'test3');   

INSERT 0 1    

postgres=# insert into test values (1000,'test4');   

INSERT 0 1    

postgres=# select * from test;    

  id  | info      

------+-------    

    1 | test1   

    2 | test2   

    3 | test2   

    4 | test2   

    5 | test2   

    6 | test2   

    7 | test2   

    8 | test3   

  100 | test3   

 1000 | test4   

(10 rows)    

取连续分布的中位数可以用percentile_cont(0.5)来获得.

postgres=# select percentile_cont(0.5) within group (order by id) from test;    

 percentile_cont     

-----------------    

             5.5    

(1 row)    

这个5.5是怎么计算来的呢? 参考本文末尾:

  If (CRN = FRN = RN) then the result is    

    (value of expression from row at RN)    

  Otherwise the result is    

    (CRN - RN) * (value of expression for row at FRN) +    

    (RN - FRN) * (value of expression for row at CRN)    

解释:

N = 当前分组的行数= 10

RN = (1+传入参数*(N-1)) = (1+0.5*(10-1)) = 5.5

CRN = ceiling(RN) = 6

FRN = floor(RN) = 5

value of expression for row at FRN : 当前分组内第FRN行的值= 5

value of expression for row at CRN : 当前分组内第CRN行的值= 6

所以最终中位数值:

    (CRN - RN) * (value of expression for row at FRN) +    

    (RN - FRN) * (value of expression for row at CRN) =     

(6-5.5)*(5) + (5.5 - 5)*(6) = 5.5;    

使用info分组:

postgres=# select percentile_cont(0.5) within group (order by id),info from test group by info;    

 percentile_cont | info      

-----------------+-------    

               1 | test1    

             4.5 | test2    

              54 | test3    

            1000 | test4    

(4 rows)    

验证这个值4.5 | test2 :

    2 | test2   

    3 | test2   

    4 | test2   

    5 | test2   

    6 | test2   

    7 | test2   

N = 当前分组的行数= 6

RN = (1+传入参数*(N-1)) = (1+0.5*(6-1)) = 3.5

CRN = ceiling(RN) = 4

FRN = floor(RN) = 3

value of expression for row at FRN : 当前分组内第FRN行的值= 4

value of expression for row at CRN : 当前分组内第CRN行的值= 5

所以最终中位数值:

    (CRN - RN) * (value of expression for row at FRN) +    

    (RN - FRN) * (value of expression for row at CRN) =     

(4-3.5)*(4) + (3.5 - 3)*(5) = 4.5;    

当输入参数为数组时, 返回值也是数组, 如下:

postgres=# select percentile_cont(array[0.5, 1]) within group (order by id) from test;    

 percentile_cont     

-----------------    

 {5.5,1000}   

(1 row)    

接下来看一下稀疏分布:

返回行号大于等于指定百分比的值或表达式值.

例如:

postgres=# select id from test;    

  id     

------    

    1    

    2    

    3    

    4    

    5    

    6    

    7    

    8    

  100    

 1000    

(10 rows)    

当前组一共10行, 取位置在0.5的.即行号>=0.5*10的第一行的值或表达式的值.

postgres=# select percentile_disc(0.5) within group (order by id) from test;    

 percentile_disc     

-----------------    

               5    

(1 row)    

postgres=# select percentile_disc(0.5) within group (order by id^2) from test;    

 percentile_disc     

-----------------    

              25    

(1 row)    

输入0.11, 表示行号返回>=1.1的第一行的值.

postgres=# select percentile_disc(0.11) within group (order by id) from test;    

 percentile_disc     

-----------------    

               2    

(1 row)    

再看个例子

postgres=# select id,info,count(*) over (partition by info) from test;    

  id  | info  | count     

------+-------+-------    

    1 | test1 |     1   

    2 | test2 |     6   

    3 | test2 |     6   

    4 | test2 |     6   

    5 | test2 |     6   

    6 | test2 |     6   

    7 | test2 |     6   

    8 | test3 |     2   

  100 | test3 |     2   

 1000 | test4 |     1   

(10 rows)    

取分组的数据, 主要看test2 这个组一共有6行, 0.3*6=1.8, 所以它需要取第二行的数据.

postgres=# select info,percentile_disc(0.3) within group (order by id) from test group by info;    

 info  | percentile_disc     

-------+-----------------    

 test1 |               1    

 test2 |              3    

 test3 |               8    

 test4 |            1000    

(4 rows)    

注意

最终计算的是表达式的分布数, 而不是计算列值的分布数后再计算表达式.

验证如下:

或者你可以看代码:

postgres=# select percentile_cont(0.5) within group (order by id^2),info from test group by info;    

 percentile_cont | info      

-----------------+-------    

               1 | test1    

            20.5 | test2    

            5032 | test3    

         1000000 | test4    

(4 rows)    

    

postgres=# select percentile_cont(0.5) within group (order by id),info from test group by info;    

 percentile_cont | info      

-----------------+-------    

               1 | test1    

             4.5 | test2    

              54 | test3    

            1000 | test4    

(4 rows)    

    

postgres=# select 4.5^2;    

      ?column?           

---------------------    

 20.2500000000000000    

(1 row)    

    

postgres=# select 54^2;    

 ?column?    

----------    

     2916   

(1 row)    

Hypothetical-Set聚合函数

函数

返回

描述

rank()

bigint

rank of the current row with gaps; same as row_number of its first peer

dense_rank()

bigint

rank of the current row without gaps; this function counts peer groups

percent_rank()

double precision

relative rank of the current row: (rank - 1) / (total rows - 1)

cume_dist()

double precision

relative rank of the current row: (number of rows preceding or peer with current row) / (total rows)

the four ranking functions are defined so that they give the same answer for any two peer rows.

rank 返回值在分组内的等级, 如果值有重复的话, 跳级处理.

dense_rank 返回值在分组内的等级, 如果值有重复的话, 不跳级处理.

percent_rank 返回(rank - 1) / (total rows - 1), rank指当前rank值, rows指当前组的记录数

cume_dist 返回(number of rows preceding or peer with current row) / (total rows), 即截至当前记录等级一共有多少行除以本组的总行数.

看一个例子比较明白.

postgres=# select *,row_number() over(partition by info order by id),rank() over(partition by info order by id),dense_rank() over(partition by info order by id),percent_rank() over(partition by info order by id),cume_dist() over(partition by info order by id) from test;    

  id  | info  | row_number | rank | dense_rank | percent_rank |     cume_dist         

------+-------+------------+------+------------+--------------+-------------------    

    1 | test1 |          1 |   1 |          1 |            0 |                 1    

    2 | test2 |          1 |   1 |          1 |            0 | 0.444444444444444    

    2 | test2 |          2 |   1 |          1 |           0 | 0.444444444444444    

    2 | test2 |          3 |   1 |          1 |            0 | 0.444444444444444    

    2 | test2 |          4 |   1 |          1 |            0 | 0.444444444444444    

    3 | test2 |          5 |   5 |          2 |         0.5 | 0.555555555555556    

    4 | test2 |          6 |   6 |          3 |        0.625 | 0.666666666666667    

    5 | test2 |          7 |   7 |          4 |         0.75 | 0.777777777777778    

    6 | test2 |          8 |   8 |          5 |       0.875 | 0.888888888888889    

    7 | test2 |          9 |   9 |          6 |            1 |                 1    

    8 | test3 |          1 |   1 |          1 |            0 |               0.5    

  100 | test3 |          2 |   2 |          2 |            1 |                 1    

 1000 | test4 |          1 |   1 |          1 |            0 |                 1    

(13 rows)    

算法:

以info='test2'这个组为例:

    2 | test2 |          1 |   1 |          1 |            0 | 0.444444444444444    

    2 | test2 |          2 |   1 |          1 |            0 | 0.444444444444444    

    2 | test2 |          3 |   1 |          1 |            0 | 0.444444444444444    

    2 | test2 |          4 |   1 |          1 |            0 | 0.444444444444444    

id=2 的rank和dense_rank都是1.

percent_rank 返回(rank - 1) / (total rows - 1), rank指当前rank值, rows指当前组的记录数

cume_dist 返回(number of rows preceding or peer with current row) / (total rows), 截至当前记录等级一共有多少行除以本组的总行数.

所以

percent_rank = (1-1)/(9-1)=0    

cume_dist = (4)/(9) = 0.444444444444444    

    3 | test2 |          5 |   5 |          2 |          0.5 | 0.555555555555556    

rank = 5, 跳级

dense_rank = 2, 不跳级

percent_rank = (5-1)/(9-1)=0.5    

cume_dist = (5)/(9) = 0.555555555555556    

这些窗口函数的另一种用法, 聚合用法.

Function

Direct Argument Type(s)

Aggregated Argument Type(s)

Return Type

Description

rank(args) WITHIN GROUP (ORDER BY sorted_args)

VARIADIC "any"

VARIADIC "any"

bigint

rank of the hypothetical row, with gaps for duplicate rows

dense_rank(args) WITHIN GROUP (ORDER BY sorted_args)

VARIADIC "any"

VARIADIC "any"

bigint

rank of the hypothetical row, without gaps

percent_rank(args) WITHIN GROUP (ORDER BYsorted_args)

VARIADIC "any"

VARIADIC "any"

double precision

relative rank of the hypothetical row, ranging from 0 to 1

cume_dist(args) WITHIN GROUP (ORDER BY sorted_args)

VARIADIC "any"

VARIADIC "any"

double precision

relative rank of the hypothetical row, ranging from 1/N to 1

这些用法比较奇特, 其实是要返回给定参数在集合中的位置.

例如:

1    

2    

3    

4    

5    

如果我们给一个参数值是2.2, 应该排在以上数据中的第三行.

例子:

postgres=# select * from test order by info,id;    

  id  | info      

------+-------    

    1 | test1   

    2 | test2   

    2 | test2   

    2 | test2   

    2 | test2   

    3 | test2   

    4 | test2   

    5 | test2   

    6 | test2   

    7 | test2   

    8 | test3   

  100 | test3   

 1000 | test4   

(13 rows)    

    

postgres=# select info,rank(4.9) within group (order by id),dense_rank(4.9) within group (order by id) from test group by info;   

 info  | rank | dense_rank     

-------+------+------------    

 test1 |   2 |          2    

 test2 |   7 |          4    

 test3 |   1 |          1    

 test4 |   1 |          1    

(4 rows)    

4.9在test1这个分组, 排名第2, 并且这个分组只有1个值, 所以没有gap.

重点关注test2这个组, 这个组有9个值, 其中有4个重复值2, 所以4.9在这里排名需要考虑gap.

rank 返回7, 即4.9在这里考虑GAP排名第7

dense_rank 返回4, 即4.9在这里不考虑GAP排名第4.

又如:

postgres=# select info,rank(5) within group (order by id),dense_rank(5) within group (order by id) from test group by info;   

 info  | rank | dense_rank     

-------+------+------------    

 test1 |   2 |          2    

 test2 |   7 |          4    

 test3 |   1 |          1   

 test4 |   1 |          1    

(4 rows)    

postgres=# select info,rank(5.1) within group (order by id),dense_rank(5.1) within group (order by id) from test group by info;   

 info  | rank | dense_rank     

-------+------+------------    

 test1 |   2 |          2    

 test2 |   8 |          5    

 test3 |   1 |          1    

 test4 |   1 |          1    

(4 rows)    

 

最后要看计算0~1代表位置的聚合函数percent_rank和cume_dist.

算法

percent_rank 返回(rank - 1) / (total rows - 1), rank指当前rank值, rows指当前组的记录数

cume_dist 返回(number of rows preceding or peer with current row) / (total rows), 截至当前记录等级一共有多少行除以本组的总行数.

例子1 :

postgres=# select info,rank(4.9) within group (order by id),dense_rank(4.9) within group (order by id),percent_rank(4.9) within group (order by id),cume_dist(4.9) within group (order by id) from test group by info;    

 info  | rank | dense_rank |   percent_rank    |    cume_dist         

-------+------+------------+-------------------+-------------------    

 test1 |   2 |          2 |                 1 |                 1    

 test2 |   7 |          4 | 0.666666666666667 |               0.7    

 test3 |   1 |          1 |                 0 | 0.333333333333333    

 test4 |   1 |          1 |                 0 |               0.5    

(4 rows)    

同样以test2为分组, 讲解算法. 把4.9插入到这个分组后. 数据应该变成:

    2 | test2   

    2 | test2   

    2 | test2   

    2 | test2   

    3 | test2   

    4 | test2   

    4.9 | test2 # 计算位置    

    5 | test2    

    6 | test2   

    7 | test2   

一共10行.

percent_rank 返回(rank - 1) / (total rows - 1), rank指当前rank值, rows指当前组的记录数

cume_dist 返回(number of rows preceding or peer with current row) / (total rows), 截至当前记录等级一共有多少行除以本组的总行数.

所以4.9对应的percent_rank 和cume_dist 分别为:

percent_rank = (rank - 1) / (total rows - 1) = (7-1)/(10-1) = 0.666666666666667     

cume_dist = (7)/10 = 0.7    

例子2 :

postgres=# select info,rank(5) within group (order by id),dense_rank(5) within group (order by id),percent_rank(5) within group (order by id),cume_dist(5) within group (order by id) from test group by info;   

 info  | rank | dense_rank |   percent_rank    |    cume_dist         

-------+------+------------+-------------------+-------------------    

 test1 |   2 |          2 |                 1 |                 1    

 test2 |   7 |          4 | 0.666666666666667 |               0.8    

 test3 |   1 |          1 |                 0 | 0.333333333333333    

 test4 |   1 |          1 |                 0 |               0.5    

(4 rows)    

插入计算值5后, 数据变成

    2 | test2   

    2 | test2   

    2 | test2   

    2 | test2   

    3 | test2   

    4 | test2   

    5 | test2 # 计算位置即参数值    

    5 | test2   

    6 | test2   

    7 | test2   

依旧10行. 但是截至当前记录等级一共有多少行? 注意是8了.

percent_rank = (rank - 1) / (total rows - 1) = (7-1)/(10-1) = 0.666666666666667     

cume_dist = (8)/10 = 0.8    

例子3 :

postgres=# select info,rank(5.1) within group (order by id),dense_rank(5.1) within group (order by id),percent_rank(5.1) within group (order by id),cume_dist(5.1) within group (order by id) from test group by info;    

 info  | rank | dense_rank |   percent_rank    |    cume_dist         

-------+------+------------+-------------------+-------------------    

 test1 |   2 |          2 |                 1 |                 1    

 test2 |   8 |          5 | 0.777777777777778 |               0.8    

 test3 |   1 |          1 |                 0 | 0.333333333333333    

 test4 |   1 |          1 |                 0 |               0.5    

(4 rows)    

插入计算值5.1后, 数据变成:

    2 | test2   

    2 | test2   

    2 | test2   

    2 | test2   

    3 | test2   

    4 | test2   

    5 | test2   

    5.1 | test2 # 计算位置即参数值    

    6 | test2   

    7 | test2   

例子4 :

postgres=# select info,rank(5) within group (order by id desc),dense_rank(5) within group (order by id desc),percent_rank(5) within group (order by id desc),cume_dist(5) within group (order by id desc) from test group by info;    

 info  | rank | dense_rank |   percent_rank    | cume_dist     

-------+------+------------+-------------------+-----------    

 test1 |   1 |          1 |                 0 |      0.5    

 test2 |   3 |          3 | 0.222222222222222 |       0.4    

 test3 |   3 |          3 |                 1 |         1   

 test4 |   2 |          2 |                 1 |         1   

(4 rows)    

插入计算值5后, 数据变成:

    7 | test2   

    6 | test2   

    5 | test2 # 注意这才是计算位置即插入位置.    

    5 | test2   

    4 | test2   

    3 | test2   

    2 | test2   

    2 | test2   

    2 | test2   

    2 | test2   

 

多维分析

=> SELECT * FROM items_sold;    
 brand | size | sales    
-------+------+-------    
 Foo   | L    |  10    
 Foo   | M    |  20    
 Bar   | M    |  15    
 Bar   | L    |  5    
(4 rows)    
    
=> SELECT brand, size, sum(sales) FROM items_sold GROUP BY GROUPING SETS ((brand), (size), ());    
 brand | size | sum    
-------+------+-----    
 Foo   |      |  30    
 Bar   |      |  20    
       | L    |  15    
       | M    |  35    
       |      |  50    
(5 rows)    

其中GROUP BY GROUPING SETS ((brand), (size), ());

相当于以下三个group by的union all(未在分组的列以NULL代替。):

group by brand    

group by size    

group by ()    

分组集合除了可以用GROUPING SETS来指定,另外还提供了两个特殊的写法rollup和cube.

ROLLUP ( e1, e2, e3, ... )    

代表递减分组,一般用于异构结构的分组如国家,省份,城市,乡镇这样的结构查询。

逐级分组汇总结果,它相当于如下写法:

 

 GROUPING SETS (    

    ( e1, e2, e3, ... ),    

    ...   

    ( e1, e2 )    

    ( e1 )   

    ( ) -- 注意包含全集    

)    

还有一种写法是CUBE

CUBE ( a, b, c )    

cube是任意组合,相当于:

GROUPING SETS (    

    ( a, b, c ),    

    ( a, b   ),    

    ( a,   c ),    

    ( a      ),    

    (   b, c ),    

    (   b    ),    

    (      c ),    

    (        )    -- 注意包含全集   

)    

在cube和rollup中使用括号可以将多个表达式作为单个表达式来处理:

ROLLUP ( a, (b,c), d )    

递减,相当于

GROUPING SETS (    

    ( a, b, c, d ),    

    ( a, b, c   ),    

    ( a         ),    

    (           )    

)    

CUBE ( (a,b), (c,d) )    

相当于:

GROUPING SETS (    

    ( a, b, c, d ),    

    ( a, b      ),    

    (      c, d ),    

    (           )    

)    

同时cube,rollup,grouping sets还可以混合使用:

GROUP BY a, CUBE(b,c), GROUPING SETS ((d), (e))   

相当于:

GROUP BY GROUPING SETS (    

  (a,b,c,d), (a,b,c,e),    

  (a,b,d),  (a,b,e),    

  (a,c,d),  (a,c,e),    

  (a,d),    (a,e)    

)    

既然分组聚合相当于多个group by 的union all,那么我们如何区分当前输出的记录是哪个分组group by的输出呢?

用grouping(cols)可以表示未参与聚合的表达式的比特位,并转换为INT输出。

例如:

=> SELECT * FROM items_sold;    

 make  | model | sales    

-------+-------+-------    

 Foo   | GT    | 10    

 Foo   | Tour  | 20    

 Bar   | City  | 15    

 Bar   | Sport |  5    

(4 rows)    

grouping()中必须包含group by后面的任意或所有列。

=> SELECT make, model, GROUPING(make,model), sum(sales) FROM items_sold GROUP BY ROLLUP(make,model);    

 make  | model | grouping | sum    

-------+-------+----------+-----    

 Foo   | GT    |        0 | 10    

 Foo   | Tour  |        0 | 20    

 Bar   | City  |        0 | 15    

 Bar   | Sport |        0 | 5    

 Foo  |       |        1 | 30    

 Bar  |       |        1 | 20    

       |      |        3 | 50    

(7 rows)    

grouping()中必须包含group by后面的任意或所有列,不能包含未参与聚合的列。

grouping()中的每个表达式用1个比特位表示。

 

postgres=# create table tbl(c1 int,c2 int,c3 int, c4 int, c5 int);    
CREATE TABLE    
postgres=# select c1,count(*),grouping(c1,c2,c3) from tbl group by cube(c1,c2,c3);    
 c1 | count | grouping     
----+-------+----------    
    |     0 |        7   -- b'111'::int = 7    
(1 row)    
    
postgres=# select c1,count(*),grouping(c1,c2) from tbl group by cube(c1,c2,c3);    
 c1 | count | grouping     
----+-------+----------    
    |     0 |        3  b'11'::int = 3    
(1 row)    
    
postgres=# select c1,count(*),grouping(c2) from tbl group by cube(c1,c2,c3);    
 c1 | count | grouping     
----+-------+----------    
    |     0 |        1  b'1'::int = 1    
(1 row)    
    
postgres=# insert into tbl values (1,2,3,4,5);    
INSERT 0 1    
postgres=# insert into tbl values (1,2,3,4,6);    
INSERT 0 1    
postgres=# insert into tbl values (2,3,4,5,6);    
INSERT 0 1    
postgres=# select c1,count(*),grouping(c2) from tbl group by cube(c1,c2,c3);    
 c1 | count | grouping     
----+-------+----------    
  1 |     2 |        0    
  1 |     2 |        0    
  1 |     2 |        1    
  2 |     1 |        0    
  2 |     1 |        0    
  2 |     1 |        1    
    |     3 |        1    
  1 |     2 |        1    
    |     2 |        1    
  2 |     1 |        1    
    |     1 |        1    
    |     2 |        0    
    |     2 |        0    
    |     1 |        0    
    |     1 |        0    
(15 rows)    

 grouping中包含未参与聚合的列将报错:

postgres=# select c1,count(*),grouping(c4) from tbl group by cube(c1,c2,c3);    
ERROR:  arguments to GROUPING must be grouping expressions of the associated query level    
LINE 1: select c1,count(*),grouping(c4) from tbl group by cube(c1,c2...    
                                    ^    
postgres=# select c1,count(*),grouping(c1,c2,c3,c4) from tbl group by cube(c1,c2,c3);    
ERROR:  arguments to GROUPING must be grouping expressions of the associated query level    
LINE 1: select c1,count(*),grouping(c1,c2,c3,c4) from tbl group by c...    
                                             ^    
postgres=# select c1,count(*),grouping(c1,c2,c3) from tbl group by cube(c1,c2,c3);    
 c1 | count | grouping     
----+-------+----------    
  1 |     2 |        0    
  1 |     2 |        1    
  1 |     2 |        3    
  2 |     1 |        0    
  2 |     1 |        1    
  2 |     1 |        3    
    |     3 |        7    
  1 |     2 |        2    
    |     2 |        6    
  2 |     1 |        2    
    |     1 |        6    
    |     2 |        4    
    |     2 |        5    
    |     1 |        4    
    |     1 |        5    
(15 rows)    

2. 查询

select子查询只能返回一列

postgres=# select (select * from (values (1,2),(2,3)) as t(c1,c2)) , relname, relkind from pg_class;    

ERROR:  subquery must return only one column    

LINE 1: select (select * from (values (1,2),(2,3)) as t(c1,c2)) , re...    

               ^    

select子查询只能返回一条记录

postgres=# select (select * from (values (1),(2)) as t(c1)) , relname, relkind from pg_class;    

ERROR:  more than one row returned by a subquery used as an expression    

子查询可以用在select子句也可以用在源中,还可以用在with, update from语句

 

 

postgres=# select (select * from (values (1),(2)) as t(c1) limit 1) , relname, relkind from pg_class;    
 c1 |                    relname                    | relkind     
----+-----------------------------------------------+---------    
  1 | pg_type                                       | r    
  1 | pg_toast_187550                               | t    
  1 | new_type                                      | c    
  1 | pg_toast_187550_index                         | i    
  1 | test                                          | r    
  1 | pg_toast_187556                               | t    
    
postgres=# select t.relname from (select * from pg_class limit 1) t , pg_class where t.relname=pg_class.relname;    
 relname     
---------    
 pg_type    
(1 row)    

3. union\union all\except\intersect

两条或者多条查询结果的合并、去重合并、相交、求差。

union all(不去重)

postgres=# select * from (values (1),(1)) t(id) union all select * from (values (2),(2)) t(id);  

 id   

----  

  1  

  1  

  2  

  2  

(4 rows)  

union(去重)

postgres=# select * from (values (1),(1)) t(id) union select * from (values (2),(2)) t(id);  

 id   

----  

  1  

  2  

(2 rows)  

except(去重)

postgres=# select * from (values (1),(1),(2)) t(id) except select * from (values (2),(2)) t(id);  

 id   

----  

  1  

(1 row)  

intersect(去重)

postgres=# select * from (values (1),(1),(2)) t(id) intersect select * from (values (2),(2)) t(id);  

 id   

----  

  2  

(1 row)  

  

4. 

当某个表或者子句的A字段要和B字段进行关联时,可以使用自关联。

 

postgres=# create table tbl8(c1 int, c2 int, info text);  
CREATE TABLE  
postgres=# insert into tbl8 select generate_series(1,100), generate_series(2,101),md5(random()::text) ;  
INSERT 0 100  
  
postgres=# select t1.* from tbl8 t1, tbl8 t2 where t1.c1=t2.c2 and t2.c2<10 limit 10;  
 c1 | c2 |               info                 
----+----+----------------------------------  
  2 |  3 | b54fbfd843c2343330ceaa1758882ee4  
  3 |  4 | df999680d87435f2d2d8d5604aca2f1e  
  4 |  5 | 497ad2cfa8acd1f7062bb1bd7aa7a646  
  5 |  6 | ef55bcdc85dfa8d54978e4c2085ec55a  
  6 |  7 | 45f60117d6d577389707b22823a513e5  
  7 |  8 | ebde16ccaeced9a400a1608d591a2bf0  
  8 |  9 | fdeb1505dd02aca33abed20531592302  
  9 | 10 | 010d07f3b9b05b2c36b46440b1dd92aa  
(8 rows)  

 5.内连接

内连接,仅仅输出符合连接条件的记录。

digoal=# create table tbl_join_1(id int primary key,info text);  
CREATE TABLE  
digoal=# create table tbl_join_2(id int primary key,info text);  
CREATE TABLE  
digoal=# create table tbl_join_3(id int primary key,info text);  
CREATE TABLE  
digoal=# create table tbl_join_4(id int primary key,info text);  
CREATE TABLE  
digoal=# create table tbl_join_5(id int primary key,info text);  
CREATE TABLE  
digoal=# create table tbl_join_6(id int primary key,info text);  
CREATE TABLE  
digoal=# create table tbl_join_7(id int primary key,info text);  
CREATE TABLE  
digoal=# create table tbl_join_8(id int primary key,info text);  
CREATE TABLE  
digoal=# create table tbl_join_9(id int primary key,info text);  
CREATE TABLE  
  
digoal=# insert into tbl_join_1 select generate_series(1,10),md5(random()::text);  
digoal=# insert into tbl_join_2 select generate_series(1,100),md5(random()::text);  
digoal=# insert into tbl_join_3 select generate_series(1,1000),md5(random()::text);  
digoal=# insert into tbl_join_4 select generate_series(1,10000),md5(random()::text);  
digoal=# insert into tbl_join_5 select generate_series(1,100000),md5(random()::text);  
digoal=# insert into tbl_join_6 select generate_series(1,1000000),md5(random()::text);  
digoal=# insert into tbl_join_7 select generate_series(1,2000000),md5(random()::text);  
digoal=# insert into tbl_join_8 select generate_series(1,3000000),md5(random()::text);  
digoal=# insert into tbl_join_9 select generate_series(1,4000000),md5(random()::text);  


非显示JOIN

postgres=# select t1.* from tbl_join_1 t1, tbl_join_2 t2 where t1.id=t2.id and t2.id=1;  
 id |               info                 
----+----------------------------------  
  1 | 5b6049ec7b94d17c0bd92dc4da436311  
(1 row)  

显示JOIN

postgres=# select t1.* from tbl_join_1 t1 join tbl_join_2 t2 on(t1.id=t2.id and t2.id=1);  
 id |               info                 
----+----------------------------------  
  1 | 5b6049ec7b94d17c0bd92dc4da436311  
(1 row)  


优化器如何强制连接顺序?

当join_collapse_limit=1时, 显示JOIN的关联按SQL写法进行关联. 例如 :

以下写法非显示关联,无法强制JOIN顺序

digoal=# set join_collapse_limit=1;  
  
digoal=# explain select t1.info, t5.info from tbl_join_1 t1,  
tbl_join_2 t2,  
tbl_join_3 t3,  
tbl_join_4 t4,  
tbl_join_5 t5,  
tbl_join_6 t6,  
tbl_join_7 t7,  
tbl_join_8 t8,  
tbl_join_9 t9  
where   
t1.id=t2.id and  
t2.id=t3.id and  
t3.id=t4.id and  
t4.id=t5.id and  
t5.id=t6.id and  
t6.id=t7.id and  
t7.id=t8.id and  
t8.id=t9.id and  
t9.id=10000;  
  
                                                              QUERY PLAN                                                                
--------------------------------------------------------------------------------------------------------------------------------------  
 Nested Loop  (cost=2.72..25.99 rows=1 width=65)  
   ->  Nested Loop  (cost=2.29..22.93 rows=1 width=69)  
         ->  Nested Loop  (cost=1.86..19.87 rows=1 width=69)  
               ->  Nested Loop  (cost=1.43..16.82 rows=1 width=69)  
                     ->  Nested Loop  (cost=1.00..13.76 rows=1 width=69)  
                           ->  Nested Loop  (cost=0.71..10.84 rows=1 width=36)  
                                 ->  Nested Loop  (cost=0.43..7.93 rows=1 width=36)  
                                       ->  Nested Loop  (cost=0.15..5.03 rows=1 width=36)  
                                             ->  Index Scan using tbl_join_1_pkey on tbl_join_1 t1  (cost=0.15..2.77 rows=1 width=36)  
                                                   Index Cond: (id = 10000)  
                                             ->  Seq Scan on tbl_join_2 t2  (cost=0.00..2.25 rows=1 width=4)  
                                                   Filter: (id = 10000)  
                                       ->  Index Only Scan using tbl_join_3_pkey on tbl_join_3 t3  (cost=0.28..2.89 rows=1 width=4)  
                                             Index Cond: (id = 10000)  
                                 ->  Index Only Scan using tbl_join_4_pkey on tbl_join_4 t4  (cost=0.29..2.90 rows=1 width=4)  
                                       Index Cond: (id = 10000)  
                           ->  Index Scan using tbl_join_5_pkey on tbl_join_5 t5  (cost=0.29..2.91 rows=1 width=37)  
                                 Index Cond: (id = 10000)  
                     ->  Index Only Scan using tbl_join_6_pkey on tbl_join_6 t6  (cost=0.42..3.04 rows=1 width=4)  
                           Index Cond: (id = 10000)  
               ->  Index Only Scan using tbl_join_7_pkey on tbl_join_7 t7  (cost=0.43..3.04 rows=1 width=4)  
                     Index Cond: (id = 10000)  
         ->  Index Only Scan using tbl_join_8_pkey on tbl_join_8 t8  (cost=0.43..3.05 rows=1 width=4)  
               Index Cond: (id = 10000)  
   ->  Index Only Scan using tbl_join_9_pkey on tbl_join_9 t9  (cost=0.43..3.05 rows=1 width=4)  
         Index Cond: (id = 10000)  
(26 rows)  
  
Time: 8.398 ms  

以下写法显示关联,可以强制JOIN顺序

digoal=# set join_collapse_limit=1;  
  
digoal=# explain select t1.info, t5.info from   
tbl_join_1 t1 join tbl_join_2 t2 on (t1.id=t2.id)  
join tbl_join_3 t3 on (t2.id=t3.id)  
join tbl_join_4 t4 on (t3.id=t4.id)  
join tbl_join_5 t5 on (t4.id=t5.id)  
join tbl_join_6 t6 on (t5.id=t6.id)  
join tbl_join_7 t7 on (t6.id=t7.id)  
join tbl_join_8 t8 on (t7.id=t8.id)  
join tbl_join_9 t9 on (t8.id=t9.id)  
where t9.id=10000;  
  
                                                              QUERY PLAN                                                                
--------------------------------------------------------------------------------------------------------------------------------------  
 Nested Loop  (cost=2.72..25.99 rows=1 width=65)  
   ->  Nested Loop  (cost=2.29..22.93 rows=1 width=69)  
         ->  Nested Loop  (cost=1.86..19.87 rows=1 width=69)  
               ->  Nested Loop  (cost=1.43..16.82 rows=1 width=69)  
                     ->  Nested Loop  (cost=1.00..13.76 rows=1 width=69)  
                           ->  Nested Loop  (cost=0.71..10.84 rows=1 width=36)  
                                 ->  Nested Loop  (cost=0.43..7.93 rows=1 width=36)  
                                       ->  Nested Loop  (cost=0.15..5.03 rows=1 width=36)  
                                             ->  Index Scan using tbl_join_1_pkey on tbl_join_1 t1  (cost=0.15..2.77 rows=1 width=36)  
                                                   Index Cond: (id = 10000)  
                                             ->  Seq Scan on tbl_join_2 t2  (cost=0.00..2.25 rows=1 width=4)  
                                                   Filter: (id = 10000)  
                                       ->  Index Only Scan using tbl_join_3_pkey on tbl_join_3 t3  (cost=0.28..2.89 rows=1 width=4)  
                                             Index Cond: (id = 10000)  
                                 ->  Index Only Scan using tbl_join_4_pkey on tbl_join_4 t4  (cost=0.29..2.90 rows=1 width=4)  
                                       Index Cond: (id = 10000)  
                           ->  Index Scan using tbl_join_5_pkey on tbl_join_5 t5  (cost=0.29..2.91 rows=1 width=37)  
                                 Index Cond: (id = 10000)  
                     ->  Index Only Scan using tbl_join_6_pkey on tbl_join_6 t6  (cost=0.42..3.04 rows=1 width=4)  
                           Index Cond: (id = 10000)  
               ->  Index Only Scan using tbl_join_7_pkey on tbl_join_7 t7  (cost=0.43..3.04 rows=1 width=4)  
                     Index Cond: (id = 10000)  
         ->  Index Only Scan using tbl_join_8_pkey on tbl_join_8 t8  (cost=0.43..3.05 rows=1 width=4)  
               Index Cond: (id = 10000)  
   ->  Index Only Scan using tbl_join_9_pkey on tbl_join_9 t9  (cost=0.43..3.05 rows=1 width=4)  
         Index Cond: (id = 10000)  
(26 rows)  
  
Time: 0.829 ms  

设置join_collapse_limit=1后, 按照SQL写法进行关联. 执行计划的时间也缩短了.

6. 外连接

左外连接

左边的表,即使没有匹配,也输出

postgres=# create table tab1(id int, info text, crt_time timestamp);  
CREATE TABLE  
Time: 26.056 ms  
postgres=# create table tab2(id int, info text, crt_time timestamp);  
CREATE TABLE  
Time: 36.215 ms  
postgres=# insert into tab1 values (1,'test',now());  
INSERT 0 1  
Time: 0.520 ms  
postgres=# insert into tab1 values (2,'test',now());  
INSERT 0 1  
Time: 0.297 ms  
postgres=# insert into tab2 values (2,'test',now());  
INSERT 0 1  
Time: 11.325 ms  
postgres=# insert into tab2 values (3,'test',now());  
INSERT 0 1  
Time: 0.352 ms  
postgres=# select * from tab1;  
 id | info |          crt_time            
----+------+----------------------------  
  1 | test | 2017-04-11 17:48:29.37083  
  2 | test | 2017-04-11 17:48:32.742795  
(2 rows)  
  
Time: 0.506 ms  
postgres=# select * from tab2;  
 id | info |          crt_time            
----+------+----------------------------  
  2 | test | 2017-04-11 17:48:39.722821  
  3 | test | 2017-04-11 17:48:41.901834  
(2 rows)  
  
Time: 0.335 ms  

右表没有被匹配时,输出NULL

postgres=# select * from tab1 left join tab2 on (tab1.id=tab2.id);  
 id | info |          crt_time          | id | info |          crt_time            
----+------+----------------------------+----+------+----------------------------  
  1 | test | 2017-04-11 17:48:29.37083  |    |      |   
  2 | test | 2017-04-11 17:48:32.742795 |  2 | test | 2017-04-11 17:48:39.722821  
(2 rows)  

过滤在A表但是不在B表的记录。

postgres=# select tab1.* from tab1 left join tab2 on (tab1.id=tab2.id) where tab2.* is null;  
 id | info |         crt_time            
----+------+---------------------------  
  1 | test | 2017-04-11 17:48:29.37083  
(1 row)  
  
postgres=# select * from tab1 where id in (select id from tab1 except select id from tab2);  
 id | info |         crt_time            
----+------+---------------------------  
  1 | test | 2017-04-11 17:48:29.37083  
(1 row)  

过外部where filter,可以过滤连接=false的记录

postgres=# select * from tab1 left join tab2 on (tab1.id=tab2.id) where tab1.id=1;  
 id | info |         crt_time          | id | info | crt_time   
----+------+---------------------------+----+------+----------  
  1 | test | 2017-04-11 17:48:29.37083 |    |      |   
(1 row)  
  
postgres=# explain select * from tab1 left join tab2 on (tab1.id=tab2.id) where tab1.id=1;  
                            QUERY PLAN                              
------------------------------------------------------------------  
 Nested Loop Left Join  (cost=0.00..48.81 rows=36 width=88)  
   Join Filter: (tab1.id = tab2.id)  
   ->  Seq Scan on tab1  (cost=0.00..24.12 rows=6 width=44)  
         Filter: (id = 1)  
   ->  Materialize  (cost=0.00..24.16 rows=6 width=44)  
         ->  Seq Scan on tab2  (cost=0.00..24.12 rows=6 width=44)  
               Filter: (id = 1)  
(7 rows)  


join内的条件,不会过滤未连接的记录

postgres=# select * from tab1 left join tab2 on (tab1.id=tab2.id and tab1.id=1);  

 id | info |          crt_time          | id | info | crt_time   

----+------+----------------------------+----+------+----------  

  1 | test | 2017-04-11 17:48:29.37083  |   |      |   

  2 | test | 2017-04-11 17:48:32.742795 |    |     |   

(2 rows)  

  

postgres=# explain select * from tab1 left join tab2 on (tab1.id=tab2.id and tab1.id=1);  

                             QUERY PLAN                                

---------------------------------------------------------------------  

 Hash Left Join (cost=35.42..298.96 rows=1130 width=88)  

   Hash Cond: (tab1.id = tab2.id)  

   Join Filter: (tab1.id = 1)  

   -> Seq Scan on tab1 (cost=0.00..21.30 rows=1130 width=44) 

   -> Hash  (cost=21.30..21.30 rows=1130 width=44)  

         -> Seq Scan on tab2 (cost=0.00..21.30 rows=1130 width=44) 

(6 rows)  

右外

左表没有被匹配时,输出NULL

postgres=# select * from tab1 right join tab2 on (tab1.id=tab2.id);  

 id | info |          crt_time          | id | info |          crt_time            

----+------+----------------------------+----+------+----------------------------  

  2 | test | 2017-04-11 17:48:32.742795 |  2 | test | 2017-04-11 17:48:39.722821  

    |     |                           |  3 | test | 2017-04-11 17:48:41.901834  

(2 rows)  

全外

没有被匹配时,输出NULL

postgres=# select * from tab1 full join tab2 on (tab1.id=tab2.id);  

 id | info |          crt_time          | id | info |          crt_time            

----+------+----------------------------+----+------+----------------------------  

  1 | test | 2017-04-11 17:48:29.37083  |   |      |   

  2 | test | 2017-04-11 17:48:32.742795 |  2 | test | 2017-04-11 17:48:39.722821  

    |     |                           |  3 | test | 2017-04-11 17:48:41.901834  

(3 rows)  

 

 7. 窗口查询

https://www.postgresql.org/docs/9.6/static/functions-window.html

常用窗口函数

Function

Return Type

Description

row_number()

bigint

number of the current row within its partition, counting from 1

rank()

bigint

rank of the current row with gaps; same as row_number of its first peer

dense_rank()

bigint

rank of the current row without gaps; this function counts peer groups

percent_rank()

double precision

relative rank of the current row: (rank - 1) / (total rows - 1)

cume_dist()

double precision

relative rank of the current row: (number of rows preceding or peer with current row) / (total rows)

ntile(num_buckets integer)

integer

integer ranging from 1 to the argument value, dividing the partition as equally as possible

lag(value anyelement [, offset integer [, default anyelement ]])

same type as value

returns value evaluated at the row that is offset rows before the current row within the partition; if there is no such row, instead return default (which must be of the same type as value). Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to null

lead(value anyelement [, offset integer [, default anyelement ]])

same type as value

returns value evaluated at the row that is offset rows after the current row within the partition; if there is no such row, instead return default (which must be of the same type as value). Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to null

first_value(value any)

same type as value

returns value evaluated at the row that is the first row of the window frame

last_value(value any)

same type as value

returns value evaluated at the row that is the last row of the window frame

nth_value(value any, nth integer)

same type as value

returns value evaluated at the row that is the nth row of the window frame (counting from 1); null if no such row

其他函数也可用在窗口语法中

例子

CREATE TABLE window_test(id int, name text, subject text, score numeric);  

INSERT INTO window_test VALUES(1,'digoal','数学',99.5),  

(2,'digoal','语文',89.5),  

(3,'digoal','英语',79.5),  

(4,'digoal','物理',99.5),  

(5,'digoal','化学',98.5),  

(6,'刘德华','数学',89.5),  

(7,'刘德华','语文',99.5),  

(8,'刘德华','英语',79.5),  

(9,'刘德华','物理',89.5),  

(10,'刘德华','化学',69.5),  

(11,'张学友','数学',89.5),  

(12,'张学友','语文',91.5),  

(13,'张学友','英语',92.5),  

(14,'张学友','物理',93.5),  

(15,'张学友','化学',94.5);  

-- 取出每门课程的第一名.

SELECT id,name,subject,score FROM   

  (SELECT row_number() OVER (PARTITION BY subject ORDER BY score DESC) AS rn,* FROM window_test) AS t   

WHERE rn=1 ORDER BY SUBJECT;  

 id | name  | subject | score   

----+--------+---------+-------  

  5 | digoal | 化学    | 98.5  

  1 | digoal | 数学    | 99.5  

  4 | digoal | 物理    | 99.5  

 13 | 张学友英语    | 92.5  

  7 | 刘德华语文    | 99.5  

每个人每门成绩与第一名的差距,第一名,排名

postgres=# SELECT row_number() over(partition by subject order by score desc) as ord,

id,name,subject,score,max(score) over (partition by subject) as top1, 

max(score) over (partition by subject) - score as diff from window_test ;  

 ord | id | name  | subject | score | top1 | diff   

-----+----+--------+---------+-------+------+------  

   1 |  5 | digoal | 化学    | 98.5 | 98.5 |  0.0  

   2 | 15 | 张学友化学    | 94.5 | 98.5 |  4.0  

   3 | 10 | 刘德华化学    | 69.5 | 98.5 | 29.0  

   1 |  1 | digoal | 数学    | 99.5 | 99.5 |  0.0  

   2 | 11 | 张学友数学    | 89.5 | 99.5 | 10.0  

   3 |  6 | 刘德华数学    | 89.5 | 99.5 | 10.0  

   1 |  4 | digoal | 物理    | 99.5 | 99.5 |  0.0  

   2 | 14 | 张学友物理    | 93.5 | 99.5 |  6.0  

   3 |  9 | 刘德华物理    | 89.5 | 99.5 | 10.0  

   1 | 13 | 张学友英语    | 92.5 | 92.5 |  0.0  

   2 |  3 | digoal | 英语    | 79.5 | 92.5 | 13.0  

   3 |  8 | 刘德华英语    | 79.5 | 92.5 | 13.0  

   1 |  7 | 刘德华语文    | 99.5 | 99.5 |  0.0  

   2 | 12 | 张学友语文    | 91.5 | 99.5 |  8.0  

   3 |  2 | digoal | 语文    |  89.5 | 99.5 | 10.0  

(15 rows)  

 

 

上一篇:【学习资料】第13期快速入门PostgreSQL应用开发与管理 - 3 访问数据


下一篇:刷脸支付会不会刷到别人的钱包