Hive执行计划之 Group By Operator

Group By Operator 分组聚合, 常见的属性

aggregations、分组是为了哪个聚合函数
mode , 一般是hash,对keys计算hash
keys 当没有keys属性时只有一个分组。
outputColumnNames 输出的临时列名

举个例子

 explain select sum(sal) from tb_emp;

查看其Group By Operator

+---------------------------------------------------------------------------------------------+
|Explain                                                                                      |
+---------------------------------------------------------------------------------------------+
|              Group By Operator                                                              |
|                aggregations: sum(sal)                                                       |
|                mode: hash                                                                   |
|                outputColumnNames: _col0                                                     |
|                Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE|
+---------------------------------------------------------------------------------------------+

再比如

explain select deptno,sum(sal) from tb_emp group by deptno;

查看其Group By Operator

+------------------------------------------------------------------------------------------------+
|Explain                                                                                         |
+------------------------------------------------------------------------------------------------+
|              Group By Operator                                                                 |
|                aggregations: sum(sal)                                                          |
|                keys: deptno (type: int)                                                        |
|                mode: hash                                                                      |
|                outputColumnNames: _col0, _col1                                                 |
|                Statistics: Num rows: 89 Data size: 718 Basic stats: COMPLETE Column stats: NONE|
+------------------------------------------------------------------------------------------------+

group by执行原理

Group By任务转化为MR任务的流程如下:

Map:生成键值对,以GROUP BY条件中的列作为Key,以聚集函数的结果作为Value
Shuffle:根据Key的值进行 Hash,按照Hash值将键值对发送至不同的Reducer中
Reduce:根据SELECT子句的列以及聚集函数进行Reduce

总结

  • Group By Operator大致有四个属性
  • 当一个查询没有用group by,也可以有Group By Operator,相当于是整个数据集是一个组, 或者说没有keys

参考

Hive执行计划分析之group by执行计划分析_进击的数据小白-CSDN博客

上一篇:Linux下比较全面的监控工具dstat


下一篇:jquery 的 each 方法中 return 的坑