1. 开窗函数的作用
说明 : 开窗函数就是 根据指定的开窗规则 为表的每条记录,标记状态,不会增加或者减少表的记录数
2. 语法
分析函数、聚合函数 over(partition by 列名 order by 列名 rows between 开始位置 and 结束位置)
3. 分区 : partition by 列名
说明 : 按照指定字段分区,可以为多个字段,相当于 group by
不指定分区时,操作的是全表
4. 排序 : order by 列名
说明 : 按照指定字段,进行排序
如果指定分区,则是分区内排序,不指定分区,则是全表排序
5. 开窗范围 : rows between 开始位置 and 结束位置
说明 : 计算的范围
注意 : 不指定时,默认范围: 第一行到当前行
位置参数 :
current row:当前行
n preceding:往前 n 行数据
n following:往后 n 行数据
unbounded:起点,
unbounded preceding 表示该窗口最前面的行(起点)
unbounded following 表示该窗口最后面的行(终点)
示例 :
rows between unbounded preceding and current row(表示从起点到当前行)
rows between 2 preceding and 1 following(表示往前2行到往后1行)
rows between 2 preceding and 1 current row(表示往前2行到当前行)
rows between current row and unbounded following(表示当前行到终点)
with t1 as (
select 'huawei' as channel ,'2021-08-01' as credit_date,5 as score
union all
select 'huawei' as channel ,'2021-08-01' as credit_date,5 as score
union all
select 'huawei' as channel ,'2021-08-03' as credit_date,12 as score
union all
select 'huawei' as channel ,'2021-08-04' as credit_date,25 as score
union all
select 'huawei' as channel ,'2021-08-02' as credit_date,1 as score
union all
select 'huawei' as channel ,'2021-08-07' as credit_date,7 as score
union all
select 'huawei' as channel ,'2021-08-08' as credit_date,10 as score
union all
select 'huawei' as channel ,'2021-08-06' as credit_date,33 as score
union all
select 'vivo' as channel ,'2021-09-01' as credit_date,1 as score
union all
select 'vivo' as channel ,'2021-09-02' as credit_date,2 as score
union all
select 'vivo' as channel ,'2021-09-04' as credit_date,5 as score
union all
select 'vivo' as channel ,'2021-09-09' as credit_date,9 as score
union all
select 'vivo' as channel ,'2021-09-07' as credit_date,77 as score
union all
select 'vivo' as channel ,'2021-09-08' as credit_date,10 as score
union all
select 'vivo' as channel ,'2021-09-11' as credit_date,3 as score
)
select *
,sum(t1.score) over(partition by t1.channel
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) as sum1 -- 开窗范围:首行~当前行(默认参数为此参数)
,sum(t1.score) over(partition by t1.channel
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
) as sum2 -- 开窗范围:当前行~末尾行
from t1
;
channel credit_date score sum1 sum2
huawei 2021-08-01 5 5 98
huawei 2021-08-06 33 38 93
huawei 2021-08-08 10 48 60
huawei 2021-08-07 7 55 50
huawei 2021-08-02 1 56 43
huawei 2021-08-04 25 81 42
huawei 2021-08-03 12 93 17
huawei 2021-08-01 5 98 5
vivo 2021-09-08 10 10 107
vivo 2021-09-07 77 87 97
vivo 2021-09-09 9 96 20
vivo 2021-09-04 5 101 11
vivo 2021-09-02 2 103 6
vivo 2021-09-01 1 104 4
vivo 2021-09-11 3 107 3
View Code
6. 聚合函数
count、sum、min、max、avg
7. 分析函数
rank : 跳跃排序 当排序字段值相同时, 自增序号重复,且会跳过下一个序号 示例(1,2,3,4,4,6)
row_number : 连续排序 当排序字段值相同时, 自增序号不会重复 示例(1,2,3,4,5,6)
dense_rank : 密集排序 当排序字段值相同时, 自增序号重复,且不会跳过下一个序号 示例(1,2,3,4,4,5)
cume_dist : <= 当前值的记录数据 / 分组内总记录数
percent_rank : 分组内当前行的RANK值-1/分组内总行数-1
ntile(x) : 将窗口内数据进行切片(根据序号),并返回该记录的切片编号
当切片不均时,默认增加第一个切片的分布
-- 7.1 排序函数示例
-- rank、dense_rank、row_number
-- 排序函数示例
-- rank、dense_rank、row_number
with t1 as (
select 'huawei' as channel ,'2021-08-01' as credit_date,5 as score
union all
select 'huawei' as channel ,'2021-08-01' as credit_date,5 as score
union all
select 'huawei' as channel ,'2021-08-03' as credit_date,12 as score
union all
select 'huawei' as channel ,'2021-08-04' as credit_date,25 as score
union all
select 'huawei' as channel ,'2021-08-02' as credit_date,1 as score
union all
select 'huawei' as channel ,'2021-08-07' as credit_date,7 as score
union all
select 'huawei' as channel ,'2021-08-08' as credit_date,10 as score
union all
select 'huawei' as channel ,'2021-08-06' as credit_date,33 as score
union all
select 'vivo' as channel ,'2021-09-01' as credit_date,1 as score
union all
select 'vivo' as channel ,'2021-09-02' as credit_date,2 as score
union all
select 'vivo' as channel ,'2021-09-04' as credit_date,5 as score
union all
select 'vivo' as channel ,'2021-09-09' as credit_date,9 as score
union all
select 'vivo' as channel ,'2021-09-07' as credit_date,77 as score
union all
select 'vivo' as channel ,'2021-09-08' as credit_date,10 as score
union all
select 'vivo' as channel ,'2021-09-11' as credit_date,3 as score
)
select *
,rank() over(partition by t1.channel order by t1.score) as rank
,dense_rank() over(partition by t1.channel order by t1.score) as dense_rank
,row_number() over(partition by t1.channel order by t1.score) as row_number
from t1;
channel credit_date score rank dense_rank row_number
huawei 2021-08-02 1 1 1 1
huawei 2021-08-01 5 2 2 2
huawei 2021-08-01 5 2 2 3
huawei 2021-08-07 7 4 3 4
huawei 2021-08-08 10 5 4 5
huawei 2021-08-03 12 6 5 6
huawei 2021-08-04 25 7 6 7
huawei 2021-08-06 33 8 7 8
vivo 2021-09-01 1 1 1 1
vivo 2021-09-02 2 2 2 2
vivo 2021-09-11 3 3 3 3
vivo 2021-09-04 5 4 4 4
vivo 2021-09-09 9 5 5 5
vivo 2021-09-08 10 6 6 6
vivo 2021-09-07 77 7 7 7
View Code
-- 7.2 统计当前记录组内占比
-- cume_dist