Postgresql 表连接方法介绍(和Oracle对比测试)

作者:杨云龙,瀚高PG实验室核心成员,数据库高级工程师,擅长HGDB、Post-greSQL、Oracle等主流数据库。

表连接方式方法介绍(Nested Loop/Hash Join/Merge Join/Join) 数据库版本(oracle11.2.0.4 and PostgreSQL 13.1)

环境构造

-- oracle11.2.0.4
--城市、国家
drop table country purge;
CREATE TABLE country (
country_id int primary key,
country_name VARCHAR(50) NOT NULL
);

drop table city purge;
CREATE TABLE city (
city_id int primary key,
city_name VARCHAR(50) NOT NULL,
country_id int NOT NULL
);

begin
for i in 1 .. 10 loop
insert into country values(i,'country'||i);
end loop;
commit;
end;

begin
for i in 1 .. 10000 loop
insert into city values(i,'city'||i,ceil(i/1000));
end loop;
commit;
end;

execute dbms_stats.gather_table_stats(ownname =>'SCOTT',tabname => 'CITY' ,estimate_percent=> 100 ,cascade => true);
execute dbms_stats.gather_table_stats(ownname =>'SCOTT',tabname => 'COUNTRY' ,estimate_percent=> 100 ,cascade => true);

-- PostgreSQL 13.1
drop table country;
CREATE TABLE country (
country_id integer primary key,
country_name text NOT NULL
);

drop table city purge;
CREATE TABLE city (
city_id integer primary key,
city_name text NOT NULL,
country_id integer NOT NULL
);

insert into country values (generate_series(1,10),'country'||generate_series(1,10));

insert into city values(generate_series(1,10000),'city'||generate_series(1,10000),ceil(random()*(10-1)+1));

analyze city;
analyze country;

Nested Loop

Postgresql 表连接方法介绍(和Oracle对比测试)

如上图所示,为Nested Loop方式介绍。 其算法:驱动表返回一行数据,通过连接列传值给被驱动表,驱动表返回多少行,被驱动表就要被扫描多少次。 被驱动表索引方式索引唯一扫描或者范围扫描。在被驱动表数据子集较少的情况下,嵌套循环是比较好的选择,也就是适合结果集比较小的查询,通常超过10000行被认为大行,会变成低效。 举例如下:

Oracle 执行举例
SQL> explain plan for select city_name,country_name from city,country where city.country_id=country.country_id and city.city_id=99;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 2738185913

---------------------------------------------------------------------------------------------
| Id  | Operation		     | Name	    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT	     |		    |	  1 |	 93 |	  2   (0)| 00:00:01 |
|   1 |  NESTED LOOPS		     |		    |	  1 |	 93 |	  2   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| CITY	    |	  1 |	 53 |	  1   (0)| 00:00:01 |
|*  3 |    INDEX UNIQUE SCAN	     | SYS_C0018242 |	  1 |	    |	  1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| COUNTRY	    |	  1 |	 40 |	  1   (0)| 00:00:01 |
|*  5 |    INDEX UNIQUE SCAN	     | SYS_C0018239 |	  1 |	    |	  0   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("CITY"."CITY_ID"=99)
   5 - access("CITY"."COUNTRY_ID"="COUNTRY"."COUNTRY_ID")

18 rows selected.

语句对比

--下面方式 oracle会选择 hash  全表扫描
explain plan for select city_name,country_name from city,country where city.country_id=country.country_id and city.city_id>=10;

--强制使用nested loop 方式
explain plan for select /*+ leading(city) use_nl(country) */ city_name,country_name from city,country
 where city.country_id=country.country_id and city.city_id>=10;

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 103883790

---------------------------------------------------------------------------------------------
| Id  | Operation		     | Name	    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT	     |		    |  9991 |	907K| 10005   (1)| 00:02:01 |
|   1 |  NESTED LOOPS		     |		    |  9991 |	907K| 10005   (1)| 00:02:01 |
|   2 |   NESTED LOOPS		     |		    |  9991 |	907K| 10005   (1)| 00:02:01 |
|*  3 |    TABLE ACCESS FULL	     | CITY	    |  9991 |	517K|	 11   (0)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN	     | SYS_C0018239 |	  1 |	    |	  0   (0)| 00:00:01 |
|   5 |   TABLE ACCESS BY INDEX ROWID| COUNTRY	    |	  1 |	 40 |	  1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("CITY"."CITY_ID">=10)
   4 - access("CITY"."COUNTRY_ID"="COUNTRY"."COUNTRY_ID")

Note
-----
   - dynamic sampling used for this statement (level=2)

22 rows selected.
--由上所示,被驱动别执行了 9991次,也就是(10000-9),资源消耗明显上升
Postgresql 执行举例
mydb=# explain analyze select city_name,country_name from city,country where city.country_id=country.country_id and city.city_id=99;
                                                         QUERY PLAN                                                          
-----------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.44..16.49 rows=1 width=40) (actual time=0.031..0.036 rows=1 loops=1)
   ->  Index Scan using city_pkey on city  (cost=0.29..8.30 rows=1 width=12) (actual time=0.015..0.017 rows=1 loops=1)
         Index Cond: (city_id = 99)
   ->  Index Scan using country_pkey on country  (cost=0.15..8.17 rows=1 width=36) (actual time=0.006..0.006 rows=1 loops=1)
         Index Cond: (country_id = city.country_id)
 Planning Time: 0.621 ms
 Execution Time: 0.098 ms
(7 rows)

--返回多行,强制使用nested loop,比默认选择hash join 多30ms
mydb=# set enable_hashjoin=off;
SET
mydb=# set enable_mergejoin=off;
SET
mydb=# explain analyze select city_name,country_name from city,country where city.country_id=country.country_id and city.city_id>=10;
                                                           QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.15..1934.29 rows=9991 width=40) (actual time=0.050..43.354 rows=9991 loops=1)
   ->  Seq Scan on city  (cost=0.00..188.00 rows=9991 width=12) (actual time=0.025..4.870 rows=9991 loops=1)
         Filter: (city_id >= 10)
         Rows Removed by Filter: 9
   ->  Index Scan using country_pkey on country  (cost=0.15..0.17 rows=1 width=36) (actual time=0.002..0.002 rows=1 loops=9991)
         Index Cond: (country_id = city.country_id)
 Planning Time: 0.393 ms
 Execution Time: 45.366 ms
(8 rows)

PG Nested loop说明

Postgresql使用嵌套循环执行查询,那么它可以通过迭代表1中的所有条目、迭代表2中的所有条目,然后在表1和表2中的行对满足筛选条件时发出一行。嵌套循环是Postgresql唯一可以用来处理任何连接的连接算法。例如下面语句,Oracle会选择Merge Join,Postgresql则选择 Nested Loop,详细可参考 Merge Join部分。

--即使数据量较大情况,以下语句pg还会选择Nested Loop
explain  select a.city_name,b.city_id from city a,city2 b where a.country_id<b.country_id;

Hash Join

两表关联时主要依靠哈希运算来得到结果集的表连接方式,只支持等值。 算法:两表等值连接,返回大量数据,较少的表选为驱动表,将驱动表相关列读入PGA中的work area(PG则放入内存work_mem),然后对驱动表的连接列进行hash运算生成hash table,然后读取被驱动表并对关联列进行hash运算,然后到pga探测hash table,找到数据关联上。如果HASH表太大,无法一次构造在内存中,则分成若干个partition,写入磁盘的temporary segment,则会多一个写的代价,会降低效率

Postgresql 表连接方法介绍(和Oracle对比测试)

上图为Hash大概流程,如果统计信息等都准确,数据库会自动选择最好执行计划。当使用ORDERED提示时,FROM子句中的第一张表将用于建立哈希表。

Oracle举例
-- country 作为驱动表,不管使用那种join(left/right/full)方式,Oracle都会选择小表为驱动表。
SQL> explain plan for select city_name,country_name from city,country where city.country_id=country.country_id;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 114462077

------------------------------------------------------------------------------
| Id  | Operation	   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |	     | 10000 |	 781K|	  14   (0)| 00:00:01 |
|*  1 |  HASH JOIN	   |	     | 10000 |	 781K|	  14   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| COUNTRY |	  10 |	 400 |	   3   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| CITY    | 10000 |	 390K|	  11   (0)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("CITY"."COUNTRY_ID"="COUNTRY"."COUNTRY_ID")

Note
-----
   - dynamic sampling used for this statement (level=2)

19 rows selected.

Postgresql举例
mydb=# explain analyze select city_name,country_name from city,country where city.country_id=country.country_id;
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=38.58..227.91 rows=10000 width=40) (actual time=0.185..12.086 rows=10000 loops=1)
   Hash Cond: (city.country_id = country.country_id)
   ->  Seq Scan on city  (cost=0.00..163.00 rows=10000 width=12) (actual time=0.026..3.735 rows=10000 loops=1)
   ->  Hash  (cost=22.70..22.70 rows=1270 width=36) (actual time=0.064..0.067 rows=10 loops=1)
         Buckets: 2048  Batches: 1  Memory Usage: 17kB
         ->  Seq Scan on country  (cost=0.00..22.70 rows=1270 width=36) (actual time=0.025..0.031 rows=10 loops=1)
 Planning Time: 1.953 ms
 Execution Time: 13.983 ms
(8 rows)

为了可以使用小表建立hash table,优化器会转换,如 right join 被改为Left Join

mydb=# explain analyze select * from country right join city on city.country_id=country.country_id;
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Hash Left Join  (cost=38.58..227.91 rows=10000 width=52) (actual time=0.087..11.883 rows=10000 loops=1)
   Hash Cond: (city.country_id = country.country_id)
   ->  Seq Scan on city  (cost=0.00..163.00 rows=10000 width=16) (actual time=0.020..3.035 rows=10000 loops=1)
   ->  Hash  (cost=22.70..22.70 rows=1270 width=36) (actual time=0.044..0.047 rows=10 loops=1)
         Buckets: 2048  Batches: 1  Memory Usage: 17kB
         ->  Seq Scan on country  (cost=0.00..22.70 rows=1270 width=36) (actual time=0.024..0.028 rows=10 loops=1)
 Planning Time: 0.413 ms
 Execution Time: 13.588 ms
(8 rows)

Merge Join

两张表在做连接时用排序操作和合并操作来得到结果集的连接方式。排序主要处理非等值管理。算法:先对两张表根据连接列各自进行排序,嵌套循环是从被驱动表的索引中匹配数据,排序合并连接是从内存(PGA中的work area)中匹配数据,严格来说没有驱动表,可以认为较少表作为驱动表。HASH JOIN只需要把驱动表放入PGA中,但是排序连接合并需要将两表结果集均放入PGA中

Postgresql 表连接方法介绍(和Oracle对比测试)

上图为Merge大概流程,如果统计信息等都准确,数据库会自动选择最好执行计划。

Oracle举例
col PLAN_TABLE_OUTPUT for a100
set lines 200 pages 999
explain plan for select a.city_name,b.country_name from city a,country b where a.country_id<b.country_id;

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1026867539

------------------------------------------------------------------------------------------------------
| Id  | Operation		      | Name	     | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT	      | 	     | 45000 |	1054K|	     |	  62   (2)| 00:00:01 |
|   1 |  MERGE JOIN		      | 	     | 45000 |	1054K|	     |	  62   (2)| 00:00:01 |
|   2 |   SORT JOIN		      | 	     |	  10 |	 120 |	     |	   2   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| COUNTRY      |	  10 |	 120 |	     |	   2   (0)| 00:00:01 |
|   4 |     INDEX FULL SCAN	      | SYS_C0018239 |	  10 |	     |	     |	   1   (0)| 00:00:01 |
|*  5 |   SORT JOIN		      | 	     | 10000 |	 117K|	 408K|	  60   (2)| 00:00:01 |
|   6 |    TABLE ACCESS FULL	      | CITY	     | 10000 |	 117K|	     |	  11   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access(INTERNAL_FUNCTION("A"."COUNTRY_ID")<INTERNAL_FUNCTION("B"."COUNTRY_ID"))
       filter(INTERNAL_FUNCTION("A"."COUNTRY_ID")<INTERNAL_FUNCTION("B"."COUNTRY_ID"))

19 rows selected.
Postgresql举例

merge join需要两个JOIN的表的KEY都是先排好顺序的,有索引的话没有排序过程,merge join的好处是两个表都只扫描一次. 而nested loop的话其中一个表扫描一次, 另一个表则循环多次.

Postgresql中,Merge Join只适用于自然连接和等值连接,如下例所示,即使将enable_nestloop关闭,pg在此种连接方式时都会转变成Nested Loop,且初始成本估算很大。由此看出,Postgresql与Oracle 机制有区别。

mydb=# analyze city;
ANALYZE
mydb=# analyze country;
ANALYZE
mydb=# set enable_nestloop=off;
SET
mydb=#  explain analyze select a.city_name,b.country_name from city a,country b where a.country_id<b.country_id;
                                                        QUERY PLAN                                                        
--------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=10000000000.00..10000016637.12 rows=333333 width=18) (actual time=0.104..419.608 rows=400687 loops=1)
   Join Filter: (a.country_id < b.country_id)
   Rows Removed by Join Filter: 599313
   ->  Seq Scan on city a  (cost=0.00..1636.00 rows=100000 width=13) (actual time=0.066..20.058 rows=100000 loops=1)
   ->  Materialize  (cost=0.00..1.15 rows=10 width=13) (actual time=0.000..0.001 rows=10 loops=100000)
         ->  Seq Scan on country b  (cost=0.00..1.10 rows=10 width=13) (actual time=0.014..0.019 rows=10 loops=1)
 Planning Time: 0.649 ms
 Execution Time: 457.499 ms
(8 rows)

hash join 只有当在内存中执行是最快连接方式,当pg认为hash连接所需超出内存时,会选择Merge join,Merge Join可以更有效使用磁盘。Postgresql中,Merge Join仅适用于条件为相等的连接。其工作原理,比较每个输出的第一行(最小值),如果相等,结果中添加一行,并从input删除已比较的两行;如果字段不相同,删除id最小的,并继续与下一个最小的对比。

Postgres可以非常有效地在磁盘上执行合并,方法是将数据的一小部分保存在内存中,对该数据执行合并,将合并结果写入磁盘,然后从磁盘将更多的输入读取到内存中。这种方法一次在磁盘上读写大量的数据,比Hash 随机读高效。

mydb=# explain  select a.city_name,b.city_id from city a,city2 b where a.country_id=b.country_id and a.city_id<30000;
                                      QUERY PLAN                                      
--------------------------------------------------------------------------------------
 Hash Join  (cost=3274.21..3764393.24 rows=333242482 width=13)
   Hash Cond: (a.country_id = b.country_id)
   ->  Index Scan using city_pkey on city a  (cost=0.29..1052.87 rows=30033 width=13)
         Index Cond: (city_id < 30000)
   ->  Hash  (cost=1634.63..1634.63 rows=99863 width=8)
         ->  Seq Scan on city2 b  (cost=0.00..1634.63 rows=99863 width=8)
(6 rows)

mydb=# set enable_hashjoin=off;
SET
mydb=# explain  select a.city_name,b.city_id from city a,city2 b where a.country_id=b.country_id and a.city_id<30000;
                                         QUERY PLAN                                         
--------------------------------------------------------------------------------------------
 Merge Join  (cost=13213.55..5012000.94 rows=333242482 width=13)
   Merge Cond: (a.country_id = b.country_id)
   ->  Sort  (cost=3286.46..3361.55 rows=30033 width=13)
         Sort Key: a.country_id
         ->  Index Scan using city_pkey on city a  (cost=0.29..1052.87 rows=30033 width=13)
               Index Cond: (city_id < 30000)
   ->  Sort  (cost=9927.09..10176.74 rows=99863 width=8)
         Sort Key: b.country_id
         ->  Seq Scan on city2 b  (cost=0.00..1634.63 rows=99863 width=8)
(9 rows)

Merge Join方法主要是内存不够用时,效果更好

当数据量特别大时,会选择Merge Join,如果没有索引,会选择"Materialize",将结果缓存内存或文件中,避免多次全表扫描。如下所示:

mydb=# create  unlogged table tbl1(id int, info text) with (autovacuum_enabled=off);
CREATE TABLE
mydb=#  create  unlogged table tbl2(id int, info text) with (autovacuum_enabled=off);
CREATE TABLE
mydb=# insert into tbl1 select generate_series(1,10000000);  
INSERT 0 10000000
mydb=# insert into tbl2 select generate_series(1,10000000);  
INSERT 0 10000000
mydb=#  explain select * from tbl1,tbl2 where tbl1.id=tbl2.id;  
                                    QUERY PLAN                                    
----------------------------------------------------------------------------------
 Merge Join  (cost=2075531.84..2370522825.11 rows=157893676470 width=72)
   Merge Cond: (tbl1.id = tbl2.id)
   ->  Sort  (cost=1037765.92..1051814.66 rows=5619496 width=36)
         Sort Key: tbl1.id
         ->  Seq Scan on tbl1  (cost=0.00..100442.96 rows=5619496 width=36)
   ->  Materialize  (cost=1037765.92..1065863.40 rows=5619496 width=36)
         ->  Sort  (cost=1037765.92..1051814.66 rows=5619496 width=36)
               Sort Key: tbl2.id
               ->  Seq Scan on tbl2  (cost=0.00..100442.96 rows=5619496 width=36)
(9 rows)

mydb=#  explain select * from tbl1,tbl2 where tbl1.id<tbl2.id;  
                                 QUERY PLAN                                 
----------------------------------------------------------------------------
 Nested Loop  (cost=0.00..720393977232.90 rows=10526245098005 width=72)
   Join Filter: (tbl1.id < tbl2.id)
   ->  Seq Scan on tbl1  (cost=0.00..100442.96 rows=5619496 width=36)
   ->  Materialize  (cost=0.00..172443.44 rows=5619496 width=36)
         ->  Seq Scan on tbl2  (cost=0.00..100442.96 rows=5619496 width=36)
(5 rows)

mydb=# set enable_mergejoin=off;
SET
mydb=#  explain select * from tbl1,tbl2 where tbl1.id=tbl2.id;  
                                 QUERY PLAN                                 
----------------------------------------------------------------------------
 Hash Join  (cost=214589.66..5526742276.56 rows=157893676470 width=72)
   Hash Cond: (tbl1.id = tbl2.id)
   ->  Seq Scan on tbl1  (cost=0.00..100442.96 rows=5619496 width=36)
   ->  Hash  (cost=100442.96..100442.96 rows=5619496 width=36)
         ->  Seq Scan on tbl2  (cost=0.00..100442.96 rows=5619496 width=36)
(5 rows)

mydb=#  explain select * from tbl1,tbl2 where tbl1.id=tbl2.id and tbl1.id<2000;  
                                    QUERY PLAN                                    
----------------------------------------------------------------------------------
 Merge Join  (cost=1449858.75..790941515.17 rows=52631216124 width=72)
   Merge Cond: (tbl1.id = tbl2.id)
   ->  Sort  (cost=412092.83..416775.74 rows=1873165 width=36)
         Sort Key: tbl1.id
         ->  Seq Scan on tbl1  (cost=0.00..114491.70 rows=1873165 width=36)
               Filter: (id < 2000)
   ->  Materialize  (cost=1037765.92..1065863.40 rows=5619496 width=36)
         ->  Sort  (cost=1037765.92..1051814.66 rows=5619496 width=36)
               Sort Key: tbl2.id
               ->  Seq Scan on tbl2  (cost=0.00..100442.96 rows=5619496 width=36)
(10 rows)

三种连接方法的优劣

还有一种连接叫笛卡尔积(Cross Join/Cartesian Product),两表关联没有任何条件。结果集为两张表行数的乘积,此种方式基本不会用到,这次不做讨论。

类别 Nested Loop Hash Join Merge Join
使用条件 任何条件 等值连接 非等值,’<>'除外
使用资源 CPU, disk IO 内存、临时空间 内存、临时空间
特点 高选择索引或进行限制性搜索时效率高,能快速返回第一次搜索结果 表记录多,缺乏有效索引, 效率高 非等值连接,效率高
缺点 数据量大效率低,尤其无索引或限制条件不够时 需要大量内存,可能会用磁盘 表都需要排序,结果最后返回,PG批量依次内存排序

Join连接类型

Postgresql 表连接方法介绍(和Oracle对比测试)

inner Join
--结果只包含满足连接条件的记录
select a.id,a.col1,b.col2 from a,b where a.id=b.id;
--or
select a.id,a.col1,b.col2 from a inner join b on a.id=b.id;
--or
select id,a.col1,b.col2 from a inner join b using(id);
Full outer Join
SELECT * FROM TableA FULL OUTER JOIN TableB ON TableA.name = TableB.name;
	ID NAME 	  ID NAME
---------- ------ ---------- ------
	 1 aaa		   1 aaa
			       2 bbb
	 3 ccc

--如果没有匹配记录,null作为值,可以去掉相同结果集
SELECT * FROM TableA FULL OUTER JOIN TableB ON TableA.name = TableB.name 
where tablea.id is null or tableb.id is null;
	ID      NAME  ID       NAME
---------- ------ ------ ------
			        2     bbb
	 3       ccc
left/right join
--如上图所示,产生A表所有结果,B没有的用null作为值,除相同值外,right与left正好相反
SELECT * FROM TableA left JOIN TableB ON TableA.name = TableB.name;
	ID NAME 	  ID NAME
---------- ------ ---------- ------
	 1 aaa		   1 aaa
	 3 ccc
--可以去掉相同结果集
SELECT * FROM TableA left JOIN TableB ON TableA.name = TableB.name 
where  tableb.id is null;
	ID NAME 	  ID NAME
---------- ------ ---------- ------
	 3 ccc

union 介绍

UNION 操作符用于合并两个或多个 SELECT 语句的结果集。

请注意,UNION 内部的 SELECT 语句必须拥有相同数量的列。列也必须拥有相似的数据类型。同时,每条 SELECT 语句中的列的顺序必须相同。UNION 只选取记录,有去重功能,而UNION ALL会列出所有记录。

SELECT name FROM TableA UNION SELECT name FROM TableB;
NAME
------------------------------
aaa
bbb

SELECT name FROM TableA UNION all SELECT name FROM TableB;

NAME
------------------------------
aaa
aaa
bbb

有些情况下可以使用union 代替or ,提供查询效率

select * from t1 where owner='SCOTT' or object_id in (select object_id from t2);
--上述语句会使用FILTER,可以调整sql,或者t2.object_id创建索引
select * from t1 where owner='SCOTT' 
union 
select * from t1 where object_id in (select object_id from t2);

--列转行举例
select employee,child_1 as child from personnel
union all
select employee,child_2 as child from personnel;

自连接

分页查询
--四层分页更适合页数比较大的分页查询,返回结果集小的当做驱动表,扫描被驱动表。
with tmp as
(SELECT * FROM
    ( SELECT rid, ROWNUM as RN
       FROM
           (SELECT rowid as rid
             FROM t1
             where owner='SYS'
             order by object_id desc
            ) WHERE ROWNUM <= 500
   )  WHERE RN > 490
) 
select  /*+ use_nl(a) leading(b) */ owner,object_name,object_id,rn
from t1 a,tmp b
where a.rowid=b.rid;
--三层select是常用的
select * from (select owner,object_name,object_id,rownum as rn
 from(select owner,object_name,object_id from t1 where owner='SYS' order by object_id desc) 
 where rownum<=500) 
 where rn>490;
删除重复行
--极致函数
delete from products p1
where rowid < (
   select max(p2.rowid) from products p2
   where p1.name = p2.name
   and p1.price =p2.price
);
--使用非等值连接
delete from products p1
where exists (
   select * from products p2
   where p1.name = p2.name
   and p1.price = p2.price
   and p1.rowid < p2.rowid
);
关联子查询比较行与行 eg
--求与上一年营业额一样的年份
select year,sale from sales s1
where sale = (
   select sale from sales s2 
   where s2.year = s1.year-1
) order by year;

--or
select s1.year,s1.sale from sales s1,sales s2
where s2.sale =s1.sale
and s2.year = s1.year-1
order by year;

--求同期对比情况
select s1.year,s1.sale,
case when s1.sale = s2.sale then '持平'
     when s1.sale > s2.sale then '增长'
     when s1.sale < s2.sale then '减少'
else '-' end as var
from sales s1,sales s2
where s2.year = s1.year -1
order by year;     

参考文档

  • Join introduce: https://www.cnblogs.com/xqzt/p/4469673.html
  • Join introduce: https://logicalread.com/
  • pg join method: http://www.jasongj.com/2015/03/07/Join1/
  • pg join method: https://developer.aliyun.com/article/70992
  • pg join method: https://www.interdb.jp/pg/pgsql03.html
  • pg Merge Joins: https://malisper.me/postgres-merge-joins/
  • 表连接方法参考 :书籍 《SQL 优化核心思想》
  • SQL连接示例参考:书籍 《SQL进阶教程》
上一篇:1002 写出这个数 (20 分)


下一篇:电脑数据恢复软件推荐10款