Group By Operator
分组聚合, 常见的属性
aggregations、分组是为了哪个聚合函数
mode , 一般是hash,对keys计算hash
keys 当没有keys属性时只有一个分组。
outputColumnNames 输出的临时列名
举个例子
explain select sum(sal) from tb_emp;
查看其Group By Operator
+---------------------------------------------------------------------------------------------+
|Explain |
+---------------------------------------------------------------------------------------------+
| Group By Operator |
| aggregations: sum(sal) |
| mode: hash |
| outputColumnNames: _col0 |
| Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE|
+---------------------------------------------------------------------------------------------+
再比如
explain select deptno,sum(sal) from tb_emp group by deptno;
查看其Group By Operator
+------------------------------------------------------------------------------------------------+
|Explain |
+------------------------------------------------------------------------------------------------+
| Group By Operator |
| aggregations: sum(sal) |
| keys: deptno (type: int) |
| mode: hash |
| outputColumnNames: _col0, _col1 |
| Statistics: Num rows: 89 Data size: 718 Basic stats: COMPLETE Column stats: NONE|
+------------------------------------------------------------------------------------------------+
group by执行原理
Group By任务转化为MR任务的流程如下:
Map:生成键值对,以GROUP BY条件中的列作为Key,以聚集函数的结果作为Value
Shuffle:根据Key的值进行 Hash,按照Hash值将键值对发送至不同的Reducer中
Reduce:根据SELECT子句的列以及聚集函数进行Reduce
总结
-
Group By Operator
大致有四个属性 - 当一个查询没有用
group by,
也可以有Group By Operator
,相当于是整个数据集是一个组, 或者说没有keys
参考
Hive执行计划分析之group by执行计划分析_进击的数据小白-CSDN博客