1、explode
explode(ARRAY) 列表中的每个元素生成一行
explode(MAP) map中每个key-value对,生成一行,key为一列,value为一列
限制:
1、No other expressions are allowed in SELECT
SELECT pageid, explode(adid_list) AS myCol... is not supported
2、UDTF's can't be nested
SELECT explode(explode(adid_list)) AS myCol... is not supported
3、GROUP BY / CLUSTER BY / DISTRIBUTE BY / SORT BY is not supported
SELECT explode(adid_list) AS myCol ... GROUP BY myCol is not supported
2、lateral view
可使用lateral view解除以上限制,语法:
lateralView: LATERAL VIEW explode(expression) tableAlias AS columnAlias (',' columnAlias)*
fromClause: FROM baseTable (lateralView)*
案例:
table名称为pageAds
SELECT pageid, adid
FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid;
输出结果:
3、多个lateral view
from语句后面可以带多个lateral view语句
案例:
表名:baseTable
from后只有一个lateral view:
SELECT myCol1, col2 FROM baseTable
LATERAL VIEW explode(col1) myTable1 AS myCol1;
结果:
多个lateral view:
SELECT myCol1, myCol2 FROM baseTable
LATERAL VIEW explode(col1) myTable1 AS myCol1
LATERAL VIEW explode(col2) myTable2 AS myCol2;
结果:
4、Outer Lateral Views
如果array类型的字段为空,但依然需返回记录,可使用outer关键词。
比如:select * from src LATERAL VIEW explode(array()) C AS a limit 10;
这条语句中的array字段是个空列表,这条语句不管src表中是否有记录,结果都是空的。
而:select * from src LATERAL VIEW OUTER explode(array()) C AS a limit 10;
结果中的记录数为src表的记录数,只是a字段为NULL。
比如:
238 val_238 NULL
86 val_86 NULL
311 val_311 NULL
27 val_27 NULL
165 val_165 NULL
409 val_409 NULL
255 val_255 NULL
278 val_278 NULL
98 val_98 NULL
官方文档:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-explode