首先来看一个例子:
--删除emp表 drop table emp purge; --创建emp表 CREATE TABLE emp ( emp_id NUMBER(6), ename VARCHAR2(45), dept_id NUMBER(4), hire_date DATE, sal NUMBER(8,2) ); --创建emp数据 INSERT INTO emp (emp_id, ename, dept_id, hire_date, sal) VALUES (101, ‘Tom‘, 20, TO_DATE(‘21-09-1989‘, ‘DD-MM-YYYY‘), 2000); INSERT INTO emp (emp_id, ename, dept_id, hire_date, sal) VALUES (102, ‘Mike‘, 20, TO_DATE(‘13-01-1993‘, ‘DD-MM-YYYY‘), 8000); INSERT INTO emp (emp_id, ename, dept_id, hire_date, sal) VALUES (120, ‘John‘, 50, TO_DATE(‘18-07-1996‘, ‘DD-MM-YYYY‘), 1000); INSERT INTO emp (emp_id, ename, dept_id, hire_date, sal) VALUES (121, ‘Joy‘, 50, TO_DATE(‘10-04-1997‘, ‘DD-MM-YYYY‘), 4000); INSERT INTO emp (emp_id, ename, dept_id, hire_date, sal) VALUES (122, ‘Rich‘, 50, TO_DATE(‘01-05-1995‘, ‘DD-MM-YYYY‘), 3000); INSERT INTO emp (emp_id, ename, dept_id, hire_date, sal) VALUES (123, ‘Kate‘, 50, TO_DATE(‘10-10-1997‘, ‘DD-MM-YYYY‘), 5000); INSERT INTO emp (emp_id, ename, dept_id, hire_date, sal) VALUES (124, ‘Jess‘, 50, TO_DATE(‘16-11-1999‘, ‘DD-MM-YYYY‘), 6000); INSERT INTO emp (emp_id, ename, dept_id, hire_date, sal) VALUES (100, ‘Stev‘, 10, TO_DATE(‘01-01-1990‘, ‘DD-MM-YYYY‘), 7000); COMMIT;
查询结果:
SELECT
emp_id,ename,dept_id,hire_date,sal,
SUM(sal) OVER (PARTITION BY dept_id ORDER BY hire_date) sum_sal,
SUM(sal) OVER (PARTITION BY dept_id ) sum_sal2,
SUM(sal) OVER ( ) sum_sal3
FROM emp;
从上面的查询结果可以看出,
SUM() OVER()这个分析函数返回的数据行数和原数据行数一样,也就是说原来数据有多少行,使用分析函数后还是返回多少行,而聚合函数SUM不管有多少行,每个分组内都聚合成了一行。比如,select sum(sal) from emp只返回一行数据。
PARTITION BY后面的字段是分组范围,ORDER BY后面的字段是排序,但是这个排序会影响计算结果。
SUM_SAL是按照dept_id分组且按照hire_date排序后,在每个分组内对排序后的数据每行的逐行累加。
SUM_SAL2没有ORDER BY,就只是简单的将按照dept_id分组后的数据进行加和。
SUM_SAL3没有PARTITION BY和ORDER BY,就是对所有数据的加和。
1、开窗函数之ROWS:
SELECT emp_id,ename,dept_id,hire_date,sal, -- 以下均为首先按dept_id进行分组,其次按照hire_date进行排序,且所有统计不能跨越其所在分区,故不再重复 -- 窗口范围为该分区的第一行到该分区的最后一行,与sum_sal_part等同 SUM(sal) OVER (PARTITION BY dept_id ORDER BY hire_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) sum_1_to_last, -- 窗口范围为该分区的第一行到本行,与sum_sal_part_order等同 SUM(sal) OVER (PARTITION BY dept_id ORDER BY hire_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) sum_1_to_cur, -- 窗口范围为该分区的第一行到本行前一行,统计的是第一行到本行前一行薪资的累计 SUM(sal) OVER (PARTITION BY dept_id ORDER BY hire_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1/*value_expr*/ PRECEDING) sum_1_to_curbef1, -- 窗口范围为该分区的第一行到本行后一行,统计的是第一行到本行后一行薪资的累计 SUM(sal) OVER (PARTITION BY dept_id ORDER BY hire_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 FOLLOWING) sum_1_to_curaft1 FROM emp order by dept_id,hire_date;
查询结果:
2、开窗函数之RANGE:
RANGE窗口中只能对NUMBER、DATE起作用。
RANGE窗口中,ORDER BY中只能有一列;但是在ROWS窗口中,ORDER BY中可以有多列。
SELECT emp_id,ename,dept_id,hire_date,sal, -- 后面均为以dept_id分组,再按hire_date排序,且所有统计不能跨分区,由于是逻辑范围,因此PRECEDING和FOLLOWING表达式有符号 -- 窗口范围为该分区的第一行到该分区的最后一行,与sum_sal_part等同,在非条件表达式中等同于ROWS SUM(sal) OVER (PARTITION BY dept_id ORDER BY sal RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) sum_1_to_last, -- 窗口范围为该分区的第一行到本行,与sum_sal_part_order等同,在非条件表达式中等同于ROWS SUM(sal) OVER (PARTITION BY dept_id ORDER BY sal RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) sum_1_to_cur, -- 窗口范围为该分区内小于(当前行sal - 2500)的所有的薪资累计 SUM(sal) OVER (PARTITION BY dept_id ORDER BY sal RANGE BETWEEN UNBOUNDED PRECEDING AND 2500/*value_expr*/ PRECEDING) sum1, -- 窗口范围为该分区内小于(当前行sal + 2500)的所有薪资累计 SUM(sal) OVER (PARTITION BY dept_id ORDER BY sal RANGE BETWEEN UNBOUNDED PRECEDING AND 2500/*value_expr*/ FOLLOWING) sum2 FROM emp;
查询结果:
3、开窗函数之KEEP
KEEP为聚合函数的特殊关键字。(弄不明白这个函数)
当聚合函数MIN、MAX、SUM、AVG、COUNT、VARIANCE和STDDEV使用KEEP。
当KEEP和DENSE_RANK FIRST/DENSE_RANK LAST 一起使用时,获取一组中排名第一或者排名最后的记录。必须由order by子句来排序。后面也可以接over()分析函数部分。
Min(col2) keep (dense_rank first order by coll)保留按coll排名第一的col2的最小值。
Min(col2) keep (dense_rank first order by co11) over (partition by co 13 )按col3分组保留按coll排名各组第一的col2 的最小值。
SELECT emp_id,ename,dept_id,hire_date,sal,
DENSE_RANK() OVER(PARTITION BY dept_id ORDER BY sal) DENSE_RANK,
MIN(hire_date) KEEP (DENSE_RANK FIRST ORDER BY sal) OVER(PARTITION BY dept_id) min_first,
MIN(hire_date) KEEP (DENSE_RANK LAST ORDER BY sal) OVER(PARTITION BY dept_id) min_last,
MAX(hire_date) KEEP (DENSE_RANK FIRST ORDER BY sal) OVER(PARTITION BY dept_id) max_first,
MAX(hire_date) KEEP (DENSE_RANK LAST ORDER BY sal) OVER(PARTITION BY dept_id) max_last
FROM emp;
查询结果:
1
2
2
2
2
2