连续区间问题在业务分析中也很常见:连续N天登陆、车辆持续运行时长、系统状态报告连续日期等等等,接下来以真实需求的简化版进行分析
案例:某线上作业每天都会运行一次,且独立于其他的任务,运行结果会保留到一张表中run_status_result_1d;运行状态只有两个值:failed/succeeded,数据如下:
run_date |
status |
... |
... |
2021-08-01 |
succeeded |
2021-08-02 |
succeeded |
2021-08-03 |
succeeded |
2021-08-04 |
succeeded |
2021-08-05 |
succeeded |
2021-08-06 |
failed |
2021-08-07 |
failed |
2021-08-08 |
failed |
2021-08-09 |
succeeded |
2021-08-10 |
succeeded |
2021-08-11 |
succeeded |
... |
... |
需求:作业运行状态的连续日期 结果数据展示如下:
status |
start_date |
end_date |
succeeded |
2021-08-01 |
2021-08-05 |
failed |
2021-08-06 |
2021-08-07 |
... |
... |
... |
分析:利用sum(if)组合进行分析
- 生成一些数据
with tb1 as ( select run_date, status from values('2021-08-01','succeeded'), ('2021-08-02','succeeded'), ('2021-08-03','succeeded'), ('2021-08-04','succeeded'), ('2021-08-05','succeeded'), ('2021-08-06','failed'), ('2021-08-07','failed'), ('2021-08-08','failed'), ('2021-08-09','succeeded'), ('2021-08-10','succeeded'), ('2021-08-11','succeeded'), ('2021-08-12','succeeded') t(run_date,status) )
- 利用lag函数拿到上一条运行状态
tb2 as ( select run_date, status, lag(status,1,status) over(order by run_date) lag_status from tb1 )
- 利用当前数据状态与上一条状态的异同进行分分组
tb3 as ( select run_date, status, lag_status, sum(if(status!=lag_status,1,0)) over(order by run_date) flag from tb2 ) --数据分组效果如下: run_date status lag_status flag 2021-08-01 succeeded succeeded 0 2021-08-02 succeeded succeeded 0 2021-08-03 succeeded succeeded 0 2021-08-04 succeeded succeeded 0 2021-08-05 succeeded succeeded 0 2021-08-06 failed succeeded 1 2021-08-07 failed failed 1 2021-08-08 failed failed 1 2021-08-09 succeeded failed 2 2021-08-10 succeeded succeeded 2 2021-08-11 succeeded succeeded 2 2021-08-12 succeeded succeeded 2
- 对上述数据按flag求最大时间和最小时间即为结果
select min(status) as status, min(run_date) as start_date, max(run_date) as end_date from tb3 group by flag; --结果如下: status start_date end_date succeeded 2021-08-01 2021-08-05 failed 2021-08-06 2021-08-08 succeeded 2021-08-09 2021-08-12
对于案例中使用sum(if)组合,在业务需求分析中经常使用,也可以简化很多代码,还未使用的小伙伴,可以尝试使用下
拜了个拜