MaxCompute SQL使用小技巧之多列转多行

我们知道SQL中有个explode函数,经常使用的也是该函数,但还有个函数posexplode用法和其类似,使用好这个函数,可以简化不少工作。话不多说,直接上示例。

案例:有一张表,其数据如下

seat_no

bd_time

A,B,C,D

8:00,8:10,8:20,8:30

H,I,J,K

9:10,9:20,9:30,9:40

...

...

展示如下:

sn

bt

A

8:00

B

8:10

C

8:20

D

8:30

...

...

分析:看到字段数据,可以想到使用lateral view explode展开,我们先使用explode展开看下

with tb1 as (
    select
        seat_no,
        bd_time
    from values('A,B,C,D','8:00,8:10,8:20,8:30'),
               ('H,I,J,K','9:10,9:20,9:30,9:40') 
               t(seat_no,bd_time)
)
select 
    seat_no,
    bd_time,
    exp_sn
from tb1 
    lateral view explode(split(seat_no,',')) t as exp_sn;
--只展开seat_no 结果如下
seat_no bd_time                         exp_sn
A,B,C,D 8:00,8:10,8:20,8:30 A
A,B,C,D 8:00,8:10,8:20,8:30 B
A,B,C,D 8:00,8:10,8:20,8:30 C
A,B,C,D 8:00,8:10,8:20,8:30 D
H,I,J,K 9:10,9:20,9:30,9:40 H
H,I,J,K 9:10,9:20,9:30,9:40 I
H,I,J,K 9:10,9:20,9:30,9:40 J
H,I,J,K 9:10,9:20,9:30,9:40 K

select 
    seat_no,
    bd_time,
    exp_sn,
    exp_bt
from tb1 
    lateral view explode(split(seat_no,',')) t as exp_sn
    lateral view explode(split(bd_time,',')) t as exp_bt;
--两个字段均展开
seat_no bd_time                         exp_sn  exp_bt
A,B,C,D 8:00,8:10,8:20,8:30 A               8:00
A,B,C,D 8:00,8:10,8:20,8:30 A               8:10
A,B,C,D 8:00,8:10,8:20,8:30 A               8:20
A,B,C,D 8:00,8:10,8:20,8:30 A               8:30
A,B,C,D 8:00,8:10,8:20,8:30 B               8:00
A,B,C,D 8:00,8:10,8:20,8:30 B               8:10
A,B,C,D 8:00,8:10,8:20,8:30 B               8:20
A,B,C,D 8:00,8:10,8:20,8:30 B               8:30
A,B,C,D 8:00,8:10,8:20,8:30 C               8:00
A,B,C,D 8:00,8:10,8:20,8:30 C               8:10
A,B,C,D 8:00,8:10,8:20,8:30 C               8:20
A,B,C,D 8:00,8:10,8:20,8:30 C               8:30
A,B,C,D 8:00,8:10,8:20,8:30 D               8:00
A,B,C,D 8:00,8:10,8:20,8:30 D               8:10
A,B,C,D 8:00,8:10,8:20,8:30 D               8:20
A,B,C,D 8:00,8:10,8:20,8:30 D               8:30
H,I,J,K 9:10,9:20,9:30,9:40 H               9:10
H,I,J,K 9:10,9:20,9:30,9:40 H               9:20
H,I,J,K 9:10,9:20,9:30,9:40 H               9:30
H,I,J,K 9:10,9:20,9:30,9:40 H               9:40
H,I,J,K 9:10,9:20,9:30,9:40 I               9:10
H,I,J,K 9:10,9:20,9:30,9:40 I               9:20
H,I,J,K 9:10,9:20,9:30,9:40 I               9:30
H,I,J,K 9:10,9:20,9:30,9:40 I               9:40
H,I,J,K 9:10,9:20,9:30,9:40 J               9:10
H,I,J,K 9:10,9:20,9:30,9:40 J               9:20
H,I,J,K 9:10,9:20,9:30,9:40 J               9:30
H,I,J,K 9:10,9:20,9:30,9:40 J               9:40
H,I,J,K 9:10,9:20,9:30,9:40 K               9:10
H,I,J,K 9:10,9:20,9:30,9:40 K               9:20
H,I,J,K 9:10,9:20,9:30,9:40 K               9:30
H,I,J,K 9:10,9:20,9:30,9:40 K               9:40

从上述结果可以看到,按两个字段展开相当于直接笛卡尔积了,在之前的文章中我们也分析过生成索引的posexplode函数,接下来我们加上索引试试看

select 
    seat_no,
    bd_time,
    inx_sn,
    exp_sn
from tb1 
    lateral view posexplode(split(seat_no,',')) t as inx_sn,exp_sn;
--只展开seat_no 结果如下
seat_no bd_time inx_sn  exp_sn
A,B,C,D 8:00,8:10,8:20,8:30 0   A
A,B,C,D 8:00,8:10,8:20,8:30 1   B
A,B,C,D 8:00,8:10,8:20,8:30 2   C
A,B,C,D 8:00,8:10,8:20,8:30 3   D
H,I,J,K 9:10,9:20,9:30,9:40 0   H
H,I,J,K 9:10,9:20,9:30,9:40 1   I
H,I,J,K 9:10,9:20,9:30,9:40 2   J
H,I,J,K 9:10,9:20,9:30,9:40 3   K

select 
    seat_no,
    bd_time,
    inx_sn,
    exp_sn,
    inx_bt,
    exp_bt
from tb1 
    lateral view posexplode(split(seat_no,',')) t as inx_sn,exp_sn
    lateral view posexplode(split(bd_time,',')) t as inx_bt,exp_bt;
--两个字段均展开
seat_no bd_time inx_sn  exp_sn  inx_bt  exp_bt
A,B,C,D 8:00,8:10,8:20,8:30 0   A   0   8:00
A,B,C,D 8:00,8:10,8:20,8:30 0   A   1   8:10
A,B,C,D 8:00,8:10,8:20,8:30 0   A   2   8:20
A,B,C,D 8:00,8:10,8:20,8:30 0   A   3   8:30
A,B,C,D 8:00,8:10,8:20,8:30 1   B   0   8:00
A,B,C,D 8:00,8:10,8:20,8:30 1   B   1   8:10
A,B,C,D 8:00,8:10,8:20,8:30 1   B   2   8:20
A,B,C,D 8:00,8:10,8:20,8:30 1   B   3   8:30
A,B,C,D 8:00,8:10,8:20,8:30 2   C   0   8:00
A,B,C,D 8:00,8:10,8:20,8:30 2   C   1   8:10
A,B,C,D 8:00,8:10,8:20,8:30 2   C   2   8:20
A,B,C,D 8:00,8:10,8:20,8:30 2   C   3   8:30
A,B,C,D 8:00,8:10,8:20,8:30 3   D   0   8:00
A,B,C,D 8:00,8:10,8:20,8:30 3   D   1   8:10
A,B,C,D 8:00,8:10,8:20,8:30 3   D   2   8:20
A,B,C,D 8:00,8:10,8:20,8:30 3   D   3   8:30
H,I,J,K 9:10,9:20,9:30,9:40 0   H   0   9:10
H,I,J,K 9:10,9:20,9:30,9:40 0   H   1   9:20
H,I,J,K 9:10,9:20,9:30,9:40 0   H   2   9:30
H,I,J,K 9:10,9:20,9:30,9:40 0   H   3   9:40
H,I,J,K 9:10,9:20,9:30,9:40 1   I   0   9:10
H,I,J,K 9:10,9:20,9:30,9:40 1   I   1   9:20
H,I,J,K 9:10,9:20,9:30,9:40 1   I   2   9:30
H,I,J,K 9:10,9:20,9:30,9:40 1   I   3   9:40
H,I,J,K 9:10,9:20,9:30,9:40 2   J   0   9:10
H,I,J,K 9:10,9:20,9:30,9:40 2   J   1   9:20
H,I,J,K 9:10,9:20,9:30,9:40 2   J   2   9:30
H,I,J,K 9:10,9:20,9:30,9:40 2   J   3   9:40
H,I,J,K 9:10,9:20,9:30,9:40 3   K   0   9:10
H,I,J,K 9:10,9:20,9:30,9:40 3   K   1   9:20
H,I,J,K 9:10,9:20,9:30,9:40 3   K   2   9:30
H,I,J,K 9:10,9:20,9:30,9:40 3   K   3   9:40

从上述结果看,posexplode的索引的确省下了不少事,可以利用两个索引值相同进行过滤,可得到想要的结果

select 
    seat_no,
    bd_time,
    exp_sn,
    exp_bt
from tb1 
    lateral view posexplode(split(seat_no,',')) t as inx_sn,exp_sn
    lateral view posexplode(split(bd_time,',')) t as inx_bt,exp_bt
where inx_sn = inx_bt;
--结果如下
seat_no bd_time exp_sn  exp_bt
A,B,C,D 8:00,8:10,8:20,8:30 A   8:00
A,B,C,D 8:00,8:10,8:20,8:30 B   8:10
A,B,C,D 8:00,8:10,8:20,8:30 C   8:20
A,B,C,D 8:00,8:10,8:20,8:30 D   8:30
H,I,J,K 9:10,9:20,9:30,9:40 H   9:10
H,I,J,K 9:10,9:20,9:30,9:40 I   9:20
H,I,J,K 9:10,9:20,9:30,9:40 J   9:30
H,I,J,K 9:10,9:20,9:30,9:40 K   9:40

关于posexplode具体细节用法,可参考阿里云文档

https://help.aliyun.com/document_detail/293597.html#section-2yc-ymd-p11

拜了个拜

上一篇:MaxCompute 使用SQL进行重叠交叉区间问题分析


下一篇:MaxCompute 使用SQL进行分区间段统计分析