由于业务需要,需实现如下功能的sql语句:查询出一个表T中每个不同id值所对应的最大val值。
---构造环境。
SQL> insert into t values (1,2,1);
已创建 1 行。
SQL> insert into t values (1,2,3);
已创建 1 行。
SQL> insert into t values (1,3,4);
已创建 1 行。
SQL> insert into t values (1,3,5);
已创建 1 行。
SQL> insert into t values (2,3,5);
已创建 1 行。
SQL> insert into t values (2,3,4);
已创建 1 行。
SQL> insert into t values (2,3,6);
已创建 1 行
SQL> insert into t values (2,3,56);
已创建 1 行。
SQL> commit;
提交完成。
SQL> select * from t;
ID STATE VAL
---------- ---------- ----------
1 2 1
1 2 3
1 3 4
1 3 5
2 3 5
2 3 4
2 3 6
2 3 56
已选择8行。
-----------------------------------------
SQL> select id,state ,val
2 from (select id ,state,val, row_number()
3 over (partition by id order by val desc ) rn from t )
4 where rn=1;
ID STATE VAL
---------- ---------- ----------
1 3 5
2 3 56 ---结果符合要求
-----查看使用分析函数的执行计划。
SQL> set autot on
SQL> select id,state ,val
2 from (select id ,state,val, row_number()
3 over (partition by id order by val desc ) rn from t )
4 where rn=1;
ID STATE VAL
---------- ---------- ----------
1 3 5
2 3 56
----查询出 id 为1对应的最大值 5 ,id 为2 时对应的最大值56
----使用分析函数的执行计划
执行计划
----------------------------------------------------------
Plan hash value: 3047187157
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8 | 368 | 4 (25)| 00:00:01 |
|* 1 | VIEW | | 8 | 368 | 4 (25)| 00:00:01 |
|* 2 | WINDOW SORT PUSHED RANK| | 8 | 264 | 4 (25)| 00:00:01 |
| 3 | TABLE ACCESS FULL | T | 8 | 264 | 3 (0)| 00:00:01 |
----注意这里使用了 WINDOW SORT PUSHED RANK,只进行了一次FTS
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RN"=1)
2 - filter(ROW_NUMBER() OVER ( PARTITION BY "ID" ORDER BY
INTERNAL_FUNCTION("VAL") DESC )<=1)
Note
-----
- dynamic sampling used for this statement
统计信息
----------------------------------------------------------
76 recursive calls
0 db block gets
21 consistent gets
0 physical reads
.......
3 sorts (memory)
0 sorts (disk)
2 rows processed
SQL> set linesize 999
SQL> /
ID STATE VAL
---------- ---------- ----------
1 3 5
2 3 56
执行计划
----------------------------------------------------------
Plan hash value: 3047187157
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8 | 368 | 4 (25)| 00:00:01 |
|* 1 | VIEW | | 8 | 368 | 4 (25)| 00:00:01 |
|* 2 | WINDOW SORT PUSHED RANK| | 8 | 264 | 4 (25)| 00:00:01 |
| 3 | TABLE ACCESS FULL | T | 8 | 264 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RN"=1)
2 - filter(ROW_NUMBER() OVER ( PARTITION BY "ID" ORDER BY
INTERNAL_FUNCTION("VAL") DESC )<=1)
Note
----
- dynamic sampling used for this statement
统计信息
----------------------------------------------------------
0 recursive calls
0 db block gets
7 consistent gets -------逻辑读为 7
0 physical reads
0 redo size
568 bytes sent via SQL*Net to client
416 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
2 rows processed
---没有使用分析函数的执行计划,可以看出此计划执行了两次全表扫描(见下面的例子),而使用了分析函数的情况下,只是扫描了一次!
---显然当数据量巨大时,两种方式的资源消耗差别是巨大的。
SQL> select id,state ,val from
2 t a where a.val in
3 (select max(b.val) from t b
4 where b.id =a.id group by id ); ---在业务处理时应该慎用关联子查询!
ID STATE VAL
---------- ---------- ----------
1 3 5
2 3 56
执行计划
----------------------------------------------------------
Plan hash value: 2845706984
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 33 | 6 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL | T | 8 | 264 | 3 (0)| 00:00:01 |
|* 3 | FILTER | | | | | |
| 4 | SORT GROUP BY NOSORT| | 1 | 26 | 3 (0)| 00:00:01|
|* 5 | TABLE ACCESS FULL | T | 1 | 26 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( EXISTS (SELECT 0 FROM "T" "B" WHERE "B"."ID"=:B1 GROUP
BY "ID" HAVING MAX("B"."VAL")=:B2))
3 - filter(MAX("B"."VAL")=:B1)
5 - filter("B"."ID"=:B1)
Note
-----
- dynamic sampling used for this statement
统计信息
----------------------------------------------------------
296 recursive calls
0 db block gets
203 consistent gets ---逻辑读为 203 是使用分析函数的30倍!
0 physical reads
11 sorts (memory)
0 sorts (disk)
2 rows processed
总结:相关子查询对于外部查询的每一个值都会有一个结果与其对应,其计算的过程是这样的:
1.扫描外查询的第一条记录
2.扫描子查询,并将第一条记录的对应值传给子查询,由此计算出子查询的结果
3.根据子查询的结果,返回外查询的结果。
4.重复上述动作,开始扫描外查询的第二条记录,第三条记录,直至全部扫描完毕。
所以,当使用分析函数能够满足业务需求时,最好使用分析函数。处理大量业务数据时,相关子查询是相当耗费资源的。慎用!