同时在线问题在游戏、电商、直播、教育等互联网行业中很常见,该指标直接展示了各行业的竞争力,Boss也是很关注这个点,结果放到大屏展示也是在显眼的位置。这个问题如何进行分析呢,话不多说,直接上示例
案例:某游戏玩家玩游戏时间表如下:玩家ID、上线时间、下线时间
player_id |
on_time |
off_time |
10001 |
2021-08-19 08:08:08 |
2021-08-19 12:11:23 |
10002 |
2021-08-19 09:01:07 |
2021-08-19 13:15:12 |
10003 |
2021-08-19 10:12:14 |
2021-08-19 10:50:18 |
10004 |
2021-08-19 09:20:21 |
2021-08-19 12:46:25 |
10005 |
2021-08-19 10:14:22 |
2021-08-19 13:26:15 |
10006 |
2021-08-19 11:11:25 |
2021-08-19 16:24:18 |
10007 |
2021-08-19 12:00:00 |
2021-08-19 17:48:03 |
10008 |
2021-08-19 13:12:14 |
2021-08-19 18:18:14 |
10009 |
2021-08-19 15:21:21 |
2021-08-19 21:15:17 |
10010 |
2021-08-19 18:20:02 |
2021-08-19 23:45:12 |
... |
... |
... |
要求展示每天的最高在线人数:
date |
online_ct |
2021-08-01 |
254120 |
2021-08-02 |
198652 |
... |
... |
分析:在之前的文章中我们也使用过增加一个列用作标记,应用在这里可以对每个玩家上线标记1,下线标记-1,应用流式思想来一条数据就累加这个标记值,当然这个需要按时间先后顺序。累加的值就代表当前在线的人数问题。
- 首先生成示例数据
with tb1 as ( select player_id, on_time, off_time, substr(on_time,1,10) as dt from values('10001','2021-08-19 08:08:08','2021-08-19 12:11:23'), ('10002','2021-08-19 09:01:07','2021-08-19 13:15:12'), ('10003','2021-08-19 10:12:14','2021-08-19 10:50:18'), ('10004','2021-08-19 09:20:21','2021-08-19 12:46:25'), ('10005','2021-08-19 10:14:22','2021-08-19 13:26:15'), ('10006','2021-08-19 11:11:25','2021-08-19 16:24:18'), ('10007','2021-08-19 12:00:00','2021-08-19 17:48:03'), ('10008','2021-08-19 13:12:14','2021-08-19 18:18:14'), ('10009','2021-08-19 15:21:21','2021-08-19 21:15:17'), ('10010','2021-08-19 18:20:02','2021-08-19 23:45:12') t(player_id,on_time,off_time) )
- 利用前面分析的列转行方法,将每个玩家的上下线时间转成一列
tb2 as ( select player_id, on_time as on_off_time, dt, 1 as flag from tb1 union all select player_id, off_time as on_off_time, dt, -1 as flag from tb1 )
- 按时间先后顺序进行累加
tb3 as ( select player_id, on_off_time, dt, flag, sum(flag) over(partition by dt order by on_off_time) sums from tb2 ) --结果如下 player_id on_off_time dt flag sums 10001 2021-08-19 08:08:08 2021-08-19 1 1 10002 2021-08-19 09:01:07 2021-08-19 1 2 10004 2021-08-19 09:20:21 2021-08-19 1 3 10003 2021-08-19 10:12:14 2021-08-19 1 4 10005 2021-08-19 10:14:22 2021-08-19 1 5 10003 2021-08-19 10:50:18 2021-08-19 -1 4 10006 2021-08-19 11:11:25 2021-08-19 1 5 10007 2021-08-19 12:00:00 2021-08-19 1 6 10001 2021-08-19 12:11:23 2021-08-19 -1 5 10004 2021-08-19 12:46:25 2021-08-19 -1 4 10008 2021-08-19 13:12:14 2021-08-19 1 5 10002 2021-08-19 13:15:12 2021-08-19 -1 4 10005 2021-08-19 13:26:15 2021-08-19 -1 3 10009 2021-08-19 15:21:21 2021-08-19 1 4 10006 2021-08-19 16:24:18 2021-08-19 -1 3 10007 2021-08-19 17:48:03 2021-08-19 -1 2 10008 2021-08-19 18:18:14 2021-08-19 -1 1 10010 2021-08-19 18:20:02 2021-08-19 1 2 10009 2021-08-19 21:15:17 2021-08-19 -1 1 10010 2021-08-19 23:45:12 2021-08-19 -1 0
- 按天统计最高人数
select dt as date, max(sums) as online_ct from tb3 group by dt; --结果如下 date online_ct 2021-08-19 6
掌握该类问题分析方法对业务需求分析有很大帮助,该类问题也可细化为每个小时在线人数等等。
拜了个拜