我们知道SQL中有个explode函数,经常使用的也是该函数,但还有个函数posexplode用法和其类似,使用好这个函数,可以简化不少工作。话不多说,直接上示例。
案例:有一张表,其数据如下
seat_no |
bd_time |
A,B,C,D |
8:00,8:10,8:20,8:30 |
H,I,J,K |
9:10,9:20,9:30,9:40 |
... |
... |
展示如下:
sn |
bt |
A |
8:00 |
B |
8:10 |
C |
8:20 |
D |
8:30 |
... |
... |
分析:看到字段数据,可以想到使用lateral view explode展开,我们先使用explode展开看下
with tb1 as ( select seat_no, bd_time from values('A,B,C,D','8:00,8:10,8:20,8:30'), ('H,I,J,K','9:10,9:20,9:30,9:40') t(seat_no,bd_time) ) select seat_no, bd_time, exp_sn from tb1 lateral view explode(split(seat_no,',')) t as exp_sn; --只展开seat_no 结果如下 seat_no bd_time exp_sn A,B,C,D 8:00,8:10,8:20,8:30 A A,B,C,D 8:00,8:10,8:20,8:30 B A,B,C,D 8:00,8:10,8:20,8:30 C A,B,C,D 8:00,8:10,8:20,8:30 D H,I,J,K 9:10,9:20,9:30,9:40 H H,I,J,K 9:10,9:20,9:30,9:40 I H,I,J,K 9:10,9:20,9:30,9:40 J H,I,J,K 9:10,9:20,9:30,9:40 K select seat_no, bd_time, exp_sn, exp_bt from tb1 lateral view explode(split(seat_no,',')) t as exp_sn lateral view explode(split(bd_time,',')) t as exp_bt; --两个字段均展开 seat_no bd_time exp_sn exp_bt A,B,C,D 8:00,8:10,8:20,8:30 A 8:00 A,B,C,D 8:00,8:10,8:20,8:30 A 8:10 A,B,C,D 8:00,8:10,8:20,8:30 A 8:20 A,B,C,D 8:00,8:10,8:20,8:30 A 8:30 A,B,C,D 8:00,8:10,8:20,8:30 B 8:00 A,B,C,D 8:00,8:10,8:20,8:30 B 8:10 A,B,C,D 8:00,8:10,8:20,8:30 B 8:20 A,B,C,D 8:00,8:10,8:20,8:30 B 8:30 A,B,C,D 8:00,8:10,8:20,8:30 C 8:00 A,B,C,D 8:00,8:10,8:20,8:30 C 8:10 A,B,C,D 8:00,8:10,8:20,8:30 C 8:20 A,B,C,D 8:00,8:10,8:20,8:30 C 8:30 A,B,C,D 8:00,8:10,8:20,8:30 D 8:00 A,B,C,D 8:00,8:10,8:20,8:30 D 8:10 A,B,C,D 8:00,8:10,8:20,8:30 D 8:20 A,B,C,D 8:00,8:10,8:20,8:30 D 8:30 H,I,J,K 9:10,9:20,9:30,9:40 H 9:10 H,I,J,K 9:10,9:20,9:30,9:40 H 9:20 H,I,J,K 9:10,9:20,9:30,9:40 H 9:30 H,I,J,K 9:10,9:20,9:30,9:40 H 9:40 H,I,J,K 9:10,9:20,9:30,9:40 I 9:10 H,I,J,K 9:10,9:20,9:30,9:40 I 9:20 H,I,J,K 9:10,9:20,9:30,9:40 I 9:30 H,I,J,K 9:10,9:20,9:30,9:40 I 9:40 H,I,J,K 9:10,9:20,9:30,9:40 J 9:10 H,I,J,K 9:10,9:20,9:30,9:40 J 9:20 H,I,J,K 9:10,9:20,9:30,9:40 J 9:30 H,I,J,K 9:10,9:20,9:30,9:40 J 9:40 H,I,J,K 9:10,9:20,9:30,9:40 K 9:10 H,I,J,K 9:10,9:20,9:30,9:40 K 9:20 H,I,J,K 9:10,9:20,9:30,9:40 K 9:30 H,I,J,K 9:10,9:20,9:30,9:40 K 9:40
从上述结果可以看到,按两个字段展开相当于直接笛卡尔积了,在之前的文章中我们也分析过生成索引的posexplode函数,接下来我们加上索引试试看
select seat_no, bd_time, inx_sn, exp_sn from tb1 lateral view posexplode(split(seat_no,',')) t as inx_sn,exp_sn; --只展开seat_no 结果如下 seat_no bd_time inx_sn exp_sn A,B,C,D 8:00,8:10,8:20,8:30 0 A A,B,C,D 8:00,8:10,8:20,8:30 1 B A,B,C,D 8:00,8:10,8:20,8:30 2 C A,B,C,D 8:00,8:10,8:20,8:30 3 D H,I,J,K 9:10,9:20,9:30,9:40 0 H H,I,J,K 9:10,9:20,9:30,9:40 1 I H,I,J,K 9:10,9:20,9:30,9:40 2 J H,I,J,K 9:10,9:20,9:30,9:40 3 K select seat_no, bd_time, inx_sn, exp_sn, inx_bt, exp_bt from tb1 lateral view posexplode(split(seat_no,',')) t as inx_sn,exp_sn lateral view posexplode(split(bd_time,',')) t as inx_bt,exp_bt; --两个字段均展开 seat_no bd_time inx_sn exp_sn inx_bt exp_bt A,B,C,D 8:00,8:10,8:20,8:30 0 A 0 8:00 A,B,C,D 8:00,8:10,8:20,8:30 0 A 1 8:10 A,B,C,D 8:00,8:10,8:20,8:30 0 A 2 8:20 A,B,C,D 8:00,8:10,8:20,8:30 0 A 3 8:30 A,B,C,D 8:00,8:10,8:20,8:30 1 B 0 8:00 A,B,C,D 8:00,8:10,8:20,8:30 1 B 1 8:10 A,B,C,D 8:00,8:10,8:20,8:30 1 B 2 8:20 A,B,C,D 8:00,8:10,8:20,8:30 1 B 3 8:30 A,B,C,D 8:00,8:10,8:20,8:30 2 C 0 8:00 A,B,C,D 8:00,8:10,8:20,8:30 2 C 1 8:10 A,B,C,D 8:00,8:10,8:20,8:30 2 C 2 8:20 A,B,C,D 8:00,8:10,8:20,8:30 2 C 3 8:30 A,B,C,D 8:00,8:10,8:20,8:30 3 D 0 8:00 A,B,C,D 8:00,8:10,8:20,8:30 3 D 1 8:10 A,B,C,D 8:00,8:10,8:20,8:30 3 D 2 8:20 A,B,C,D 8:00,8:10,8:20,8:30 3 D 3 8:30 H,I,J,K 9:10,9:20,9:30,9:40 0 H 0 9:10 H,I,J,K 9:10,9:20,9:30,9:40 0 H 1 9:20 H,I,J,K 9:10,9:20,9:30,9:40 0 H 2 9:30 H,I,J,K 9:10,9:20,9:30,9:40 0 H 3 9:40 H,I,J,K 9:10,9:20,9:30,9:40 1 I 0 9:10 H,I,J,K 9:10,9:20,9:30,9:40 1 I 1 9:20 H,I,J,K 9:10,9:20,9:30,9:40 1 I 2 9:30 H,I,J,K 9:10,9:20,9:30,9:40 1 I 3 9:40 H,I,J,K 9:10,9:20,9:30,9:40 2 J 0 9:10 H,I,J,K 9:10,9:20,9:30,9:40 2 J 1 9:20 H,I,J,K 9:10,9:20,9:30,9:40 2 J 2 9:30 H,I,J,K 9:10,9:20,9:30,9:40 2 J 3 9:40 H,I,J,K 9:10,9:20,9:30,9:40 3 K 0 9:10 H,I,J,K 9:10,9:20,9:30,9:40 3 K 1 9:20 H,I,J,K 9:10,9:20,9:30,9:40 3 K 2 9:30 H,I,J,K 9:10,9:20,9:30,9:40 3 K 3 9:40
从上述结果看,posexplode的索引的确省下了不少事,可以利用两个索引值相同进行过滤,可得到想要的结果
select seat_no, bd_time, exp_sn, exp_bt from tb1 lateral view posexplode(split(seat_no,',')) t as inx_sn,exp_sn lateral view posexplode(split(bd_time,',')) t as inx_bt,exp_bt where inx_sn = inx_bt; --结果如下 seat_no bd_time exp_sn exp_bt A,B,C,D 8:00,8:10,8:20,8:30 A 8:00 A,B,C,D 8:00,8:10,8:20,8:30 B 8:10 A,B,C,D 8:00,8:10,8:20,8:30 C 8:20 A,B,C,D 8:00,8:10,8:20,8:30 D 8:30 H,I,J,K 9:10,9:20,9:30,9:40 H 9:10 H,I,J,K 9:10,9:20,9:30,9:40 I 9:20 H,I,J,K 9:10,9:20,9:30,9:40 J 9:30 H,I,J,K 9:10,9:20,9:30,9:40 K 9:40
关于posexplode具体细节用法,可参考阿里云文档
https://help.aliyun.com/document_detail/293597.html#section-2yc-ymd-p11
拜了个拜