数据说明:
+-----+-----------+------------+------------+---------+---------+
|id |device_type|business_gmv|order_source|pay_money|event_day|
+-----+-----------+------------+------------+---------+---------+
|10001|jingdong |16 |1 |1000 |20211020 |
|10004|jingdong |15 |1 |2000 |20211021 |
+-----+-----------+------------+------------+---------+---------+
使用炸裂函数进行如下操作:
select
device_type_all,
business_gmv_all,
order_source,
pay_money,
event_day_all
from order_hbi
LATERAL VIEW OUTER EXPLODE(array('ALL', device_type)) col1 AS device_type_all
LATERAL VIEW OUTER EXPLODE(array('ALL', business_gmv)) col1 AS business_gmv_all
LATERAL VIEW OUTER EXPLODE(array('ALL', event_day)) col1 AS event_day_all
order by event_day_all asc
语句说明:
1.首先将device_type 和ALL的组合数组 进行炸裂,那么这里原来的两行数据在各自加上ALL 之后会得到一共四行数据:
+---------------+----------------+------------+---------+-------------+
|device_type_all|business_gmv_all|order_source|pay_money|event_day_all|
+---------------+----------------+------------+---------+-------------+
|ALL |16 |1 |1000 |20211020 |
|jingdong |16 |1 |1000 |20211020 |
|ALL |15 |1 |2000 |20211021 |
|jingdong |15 |1 |2000 |20211021 |
+---------------+----------------+------------+---------+-------------+
2.将business_gmv 和ALL 的组合数组进行炸裂,则由上述的数据再次翻倍,即每一行在基于business_gmv 进行和ALL 的炸裂则得到如下的八行数据
+---------------+----------------+------------+---------+-------------+
|device_type_all|business_gmv_all|order_source|pay_money|event_day_all|
+---------------+----------------+------------+---------+-------------+
|ALL |ALL |1 |1000 |20211020 |
|ALL |16 |1 |1000 |20211020 |
|jingdong |ALL |1 |1000 |20211020 |
|jingdong |16 |1 |1000 |20211020 |
|ALL |ALL |1 |2000 |20211021 |
|jingdong |15 |1 |2000 |20211021 |
|ALL |15 |1 |2000 |20211021 |
|jingdong |ALL |1 |2000 |20211021 |
+---------------+----------------+------------+---------+-------------+
再基于上述结果进行event_day 的炸裂,则最终可以得到如下的结果的16行数据:
+---------------+----------------+------------+---------+-------------+
|device_type_all|business_gmv_all|order_source|pay_money|event_day_all|
+---------------+----------------+------------+---------+-------------+
|ALL |ALL |1 |1000 |20211020 |
|ALL |16 |1 |1000 |20211020 |
|jingdong |ALL |1 |1000 |20211020 |
|jingdong |16 |1 |1000 |20211020 |
|jingdong |15 |1 |2000 |20211021 |
|ALL |ALL |1 |2000 |20211021 |
|ALL |15 |1 |2000 |20211021 |
|jingdong |ALL |1 |2000 |20211021 |
|ALL |ALL |1 |1000 |ALL |
|ALL |16 |1 |1000 |ALL |
|jingdong |16 |1 |1000 |ALL |
|jingdong |15 |1 |2000 |ALL |
|jingdong |ALL |1 |1000 |ALL |
|ALL |ALL |1 |2000 |ALL |
|ALL |15 |1 |2000 |ALL |
|jingdong |ALL |1 |2000 |ALL |
+---------------+----------------+------------+---------+-------------+
基于上述结果统计的目的是可以统计多纬度的指标的聚合结果
4.加上group by 进行统计结果的分析,聚合订单金额
select
device_type_all,
business_gmv_all,
order_source,
sum(pay_money) pay_money,
event_day_all
from order_hbi
LATERAL VIEW OUTER EXPLODE(array('ALL', device_type)) col1 AS device_type_all
LATERAL VIEW OUTER EXPLODE(array('ALL', business_gmv)) col1 AS business_gmv_all
LATERAL VIEW OUTER EXPLODE(array('ALL', event_day)) col1 AS event_day_all
group by device_type_all,business_gmv_all,order_source,event_day_all
order by event_day_all asc
可以得到如下的结果:
+---------------+----------------+------------+---------+-------------+
|device_type_all|business_gmv_all|order_source|pay_money|event_day_all|
+---------------+----------------+------------+---------+-------------+
|jingdong |ALL |1 |1000.0 |20211020 |
|ALL |ALL |1 |1000.0 |20211020 |
|ALL |16 |1 |1000.0 |20211020 |
|jingdong |16 |1 |1000.0 |20211020 |
|jingdong |15 |1 |2000.0 |20211021 |
|ALL |15 |1 |2000.0 |20211021 |
|jingdong |ALL |1 |2000.0 |20211021 |
|ALL |ALL |1 |2000.0 |20211021 |
|ALL |16 |1 |1000.0 |ALL |
|jingdong |16 |1 |1000.0 |ALL |
|ALL |ALL |1 |3000.0 |ALL |
|ALL |15 |1 |2000.0 |ALL |
|jingdong |15 |1 |2000.0 |ALL |
|jingdong |ALL |1 |3000.0 |ALL |
+---------------+----------------+------------+---------+-------------+
比如我们可以直接从这个结果中选择所有设备类型(device_type_all)并且所有的bussiness_gmv 并且所有日期的订单总量则可以直接选取如下的数据,订单总金额为3000
|ALL |ALL |1 |3000.0 |ALL
也可以宣组jingdong 所有business_gmv_all 所有日期的订单总额则为:3000
|jingdong |ALL |1 |3000.0 |ALL |
假如不想要某个字段的聚合结果可以用!=ALL 先过滤掉,比如:business_gmv_all !=‘ALL’
SELECT device_type_all,sum(pay_money) gmv,event_day_all
FROM (
select
device_type_all,business_gmv_all,order_source,
sum(pay_money) pay_money,
event_day_all
from order_hbi
LATERAL VIEW OUTER EXPLODE(array('ALL', device_type)) col1 AS device_type_all
LATERAL VIEW OUTER EXPLODE(array('ALL', business_gmv)) col1 AS business_gmv_all
LATERAL VIEW OUTER EXPLODE(array('ALL', event_day)) col1 AS event_day_all
group by device_type_all,business_gmv_all,order_source,event_day_all
order by event_day_all asc
) tmp
where business_gmv_all !='ALL'
group by device_type_all,event_day_all
得到如下结果:可以选取想要的数据避免重复计算,直接选取即可
+---------------+------+-------------+
|device_type_all|gmv |event_day_all|
+---------------+------+-------------+
|ALL |1000.0|20211020 |
|jingdong |1000.0|20211020 |
|jingdong |2000.0|20211021 |
|ALL |2000.0|20211021 |
|ALL |3000.0|ALL |
|jingdong |3000.0|ALL |
+---------------+------+-------------+