MaxCompute 使用SQL进行同时在线问题分析

同时在线问题在游戏、电商、直播、教育等互联网行业中很常见,该指标直接展示了各行业的竞争力,Boss也是很关注这个点,结果放到大屏展示也是在显眼的位置。这个问题如何进行分析呢,话不多说,直接上示例

案例:某游戏玩家玩游戏时间表如下:玩家ID、上线时间、下线时间

player_id

on_time

off_time

10001

2021-08-19 08:08:08

2021-08-19 12:11:23

10002

2021-08-19 09:01:07

2021-08-19 13:15:12

10003

2021-08-19 10:12:14

2021-08-19 10:50:18

10004

2021-08-19 09:20:21

2021-08-19 12:46:25

10005

2021-08-19 10:14:22

2021-08-19 13:26:15

10006

2021-08-19 11:11:25

2021-08-19 16:24:18

10007

2021-08-19 12:00:00

2021-08-19 17:48:03

10008

2021-08-19 13:12:14

2021-08-19 18:18:14

10009

2021-08-19 15:21:21

2021-08-19 21:15:17

10010

2021-08-19 18:20:02

2021-08-19 23:45:12

...

...

...

要求展示每天的最高在线人数:

date

online_ct

2021-08-01

254120

2021-08-02

198652

...

...

分析:在之前的文章中我们也使用过增加一个列用作标记,应用在这里可以对每个玩家上线标记1,下线标记-1,应用流式思想来一条数据就累加这个标记值,当然这个需要按时间先后顺序。累加的值就代表当前在线的人数问题。

  1. 首先生成示例数据
with tb1 as (
    select 
        player_id,
        on_time,
        off_time,
          substr(on_time,1,10) as dt
    from values('10001','2021-08-19 08:08:08','2021-08-19 12:11:23'),
               ('10002','2021-08-19 09:01:07','2021-08-19 13:15:12'),
               ('10003','2021-08-19 10:12:14','2021-08-19 10:50:18'),
               ('10004','2021-08-19 09:20:21','2021-08-19 12:46:25'),
               ('10005','2021-08-19 10:14:22','2021-08-19 13:26:15'),
               ('10006','2021-08-19 11:11:25','2021-08-19 16:24:18'),
               ('10007','2021-08-19 12:00:00','2021-08-19 17:48:03'),
               ('10008','2021-08-19 13:12:14','2021-08-19 18:18:14'),
               ('10009','2021-08-19 15:21:21','2021-08-19 21:15:17'),
               ('10010','2021-08-19 18:20:02','2021-08-19 23:45:12')
               t(player_id,on_time,off_time)
)
  1. 利用前面分析的列转行方法,将每个玩家的上下线时间转成一列
tb2 as (
    select 
       player_id,
       on_time as on_off_time,
       dt,
       1 as flag 
    from tb1
    union all 
    select 
       player_id,
       off_time as on_off_time,
       dt,
       -1 as flag 
    from tb1
)
  1. 按时间先后顺序进行累加
tb3 as (
    select 
        player_id,
        on_off_time,
        dt,
        flag,
        sum(flag) over(partition by dt order by on_off_time) sums
    from tb2
)
--结果如下
player_id   on_off_time         dt                  flag    sums
10001   2021-08-19 08:08:08 2021-08-19  1           1
10002   2021-08-19 09:01:07 2021-08-19  1           2
10004   2021-08-19 09:20:21 2021-08-19  1           3
10003   2021-08-19 10:12:14 2021-08-19  1           4
10005   2021-08-19 10:14:22 2021-08-19  1           5
10003   2021-08-19 10:50:18 2021-08-19  -1      4
10006   2021-08-19 11:11:25 2021-08-19  1           5
10007   2021-08-19 12:00:00 2021-08-19  1           6
10001   2021-08-19 12:11:23 2021-08-19  -1      5
10004   2021-08-19 12:46:25 2021-08-19  -1      4
10008   2021-08-19 13:12:14 2021-08-19  1           5
10002   2021-08-19 13:15:12 2021-08-19  -1      4
10005   2021-08-19 13:26:15 2021-08-19  -1      3
10009   2021-08-19 15:21:21 2021-08-19  1           4
10006   2021-08-19 16:24:18 2021-08-19  -1      3
10007   2021-08-19 17:48:03 2021-08-19  -1      2
10008   2021-08-19 18:18:14 2021-08-19  -1      1
10010   2021-08-19 18:20:02 2021-08-19  1           2
10009   2021-08-19 21:15:17 2021-08-19  -1      1
10010   2021-08-19 23:45:12 2021-08-19  -1      0
  1. 按天统计最高人数
select 
    dt as date,
    max(sums) as online_ct
from tb3
group by dt;
--结果如下
date                online_ct
2021-08-19  6

掌握该类问题分析方法对业务需求分析有很大帮助,该类问题也可细化为每个小时在线人数等等。

拜了个拜

上一篇:X-Cart 学习笔记(二)X-Cart框架1


下一篇:ADO.NET结合C#常见的数据库操作(二)