GROUP BY算子,带有分区字段dt,导致执行计划中有一个reduce task
0: jdbc:hive2://172.0.0.1:10015/> SET hive.execution.engine=spark;
No rows affected (0.011 seconds)
0: jdbc:hive2://172.0.0.1:10015/> explain SELECT id AS pvid
. . . . . . . . . . . . . . . . . .> , dt
. . . . . . . . . . . . . . . . . .> , req_time
. . . . . . . . . . . . . . . . . .> , SUM(click) AS click_cnt
. . . . . . . . . . . . . . . . . .> FROM dwd_flow
. . . . . . . . . . . . . . . . . .> WHERE dt >= '20210602'
. . . . . . . . . . . . . . . . . .> AND dt <= '20210617'
. . . . . . . . . . . . . . . . . .> AND from_unixtime(req_time, 'yyyyMMdd') >= '20210602'
. . . . . . . . . . . . . . . . . .> AND from_unixtime(req_time, 'yyyyMMdd') <= '20210617'
. . . . . . . . . . . . . . . . . .> GROUP BY id, dt, req_time;
INFO : Semantic Analysis Completed
INFO : Completed compiling command(queryId=hive_20210621222525_9c76944f-d2fa-4a0f-b089-40db7e79a74e); Time taken: 0.739 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Starting task [Stage-2:EXPLAIN] in serial mode
INFO : Completed executing command(queryId=hive_20210621222525_9c76944f-d2fa-4a0f-b089-40db7e79a74e); Time taken: 0.003 seconds
INFO : OK
+----------------------------------------------------+--+
| Explain |
+----------------------------------------------------+--+
| STAGE DEPENDENCIES: |
| Stage-1 is a root stage |
| Stage-0 depends on stages: Stage-1 |
| |
| STAGE PLANS: |
| Stage: Stage-1 |
| Spark |
| Edges: |
| Reducer 2 <- Map 1 (GROUP, 1) |
| DagName: hive_20210621222525_9c76944f-d2fa-4a0f-b089-40db7e79a74e:11882 |
| Vertices: |
| Map 1 |
| Map Operator Tree: |
| TableScan |
| alias: dwd_flow |
| filterExpr: ((from_unixtime(req_time, 'yyyyMMdd') >= '20210602') and (from_unixtime(req_time, 'yyyyMMdd') <= '20210617')) (type: boolean) |
| Statistics: Num rows: 17229612672 Data size: 1207269292666 Basic stats: COMPLETE Column stats: PARTIAL |
| Filter Operator |
| predicate: ((from_unixtime(req_time, 'yyyyMMdd') >= '20210602') and (from_unixtime(req_time, 'yyyyMMdd') <= '20210617')) (type: boolean) |
| Statistics: Num rows: 1914401408 Data size: 352249859072 Basic stats: COMPLETE Column stats: PARTIAL |
| Group By Operator |
| aggregations: sum(click) |
| keys: id (type: string), dt (type: string), req_time (type: int) |
| mode: hash |
| outputColumnNames: _col0, _col1, _col2, _col3 |
| Statistics: Num rows: 9432 Data size: 1810944 Basic stats: COMPLETE Column stats: PARTIAL |
| Reduce Output Operator |
| key expressions: _col0 (type: string), _col1 (type: string), _col2 (type: int) |
| sort order: +++ |
| Map-reduce partition columns: _col0 (type: string), _col1 (type: string), _col2 (type: int) |
| Statistics: Num rows: 9432 Data size: 1810944 Basic stats: COMPLETE Column stats: PARTIAL |
| value expressions: _col3 (type: bigint) |
| Reducer 2 |
| Reduce Operator Tree: |
| Group By Operator |
| aggregations: sum(VALUE._col0) |
| keys: KEY._col0 (type: string), KEY._col1 (type: string), KEY._col2 (type: int) |
| mode: mergepartial |
| outputColumnNames: _col0, _col1, _col2, _col3 |
| Statistics: Num rows: 2 Data size: 384 Basic stats: COMPLETE Column stats: PARTIAL |
| File Output Operator |
| compressed: false |
| Statistics: Num rows: 2 Data size: 384 Basic stats: COMPLETE Column stats: PARTIAL |
| table: |
| input format: org.apache.hadoop.mapred.TextInputFormat |
| output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
| serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| |
| Stage: Stage-0 |
| Fetch Operator |
| limit: -1 |
| Processor Tree: |
| ListSink |
| |
+----------------------------------------------------+--+
------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------------------------------------
0: jdbc:hive2://172.0.0.1:10015/> SET hive.execution.engine=spark;
No rows affected (0.012 seconds)
0: jdbc:hive2://172.0.0.1:10015/>
0: jdbc:hive2://172.0.0.1:10015/> explain SELECT id AS pvid
. . . . . . . . . . . . . . . . . .> , req_time
. . . . . . . . . . . . . . . . . .> , SUM(click) AS click_cnt
. . . . . . . . . . . . . . . . . .> FROM dwd_flow
. . . . . . . . . . . . . . . . . .> WHERE dt >= '20210602'
. . . . . . . . . . . . . . . . . .> AND dt <= '20210617'
. . . . . . . . . . . . . . . . . .> AND from_unixtime(req_time, 'yyyyMMdd') >= '20210602'
. . . . . . . . . . . . . . . . . .> AND from_unixtime(req_time, 'yyyyMMdd') <= '20210617'
. . . . . . . . . . . . . . . . . .> AND id IS NOT NULL
. . . . . . . . . . . . . . . . . .> AND id != ''
. . . . . . . . . . . . . . . . . .> GROUP BY id, req_time;
INFO : Semantic Analysis Completed
INFO : Completed compiling command(queryId=hive_20210621222525_8faef825-1a1f-4971-82eb-a64e923b2fa9); Time taken: 0.785 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Starting task [Stage-2:EXPLAIN] in serial mode
INFO : Completed executing command(queryId=hive_20210621222525_8faef825-1a1f-4971-82eb-a64e923b2fa9); Time taken: 0.002 seconds
INFO : OK
+----------------------------------------------------+--+
| Explain |
+----------------------------------------------------+--+
| STAGE DEPENDENCIES: |
| Stage-1 is a root stage |
| Stage-0 depends on stages: Stage-1 |
| |
| STAGE PLANS: |
| Stage: Stage-1 |
| Spark |
| Edges: |
| Reducer 2 <- Map 1 (GROUP, 250) |
| DagName: hive_20210621222525_8faef825-1a1f-4971-82eb-a64e923b2fa9:11883 |
| Vertices: |
| Map 1 |
| Map Operator Tree: |
| TableScan |
| alias: dwd_flow |
| filterExpr: ((((from_unixtime(req_time, 'yyyyMMdd') >= '20210602') and (from_unixtime(req_time, 'yyyyMMdd') <= '20210617')) and id is not null) and (id <> '')) (type: boolean) |
| Statistics: Num rows: 17229612672 Data size: 1207269292666 Basic stats: COMPLETE Column stats: NONE |
| Filter Operator |
| predicate: ((((from_unixtime(req_time, 'yyyyMMdd') >= '20210602') and (from_unixtime(req_time, 'yyyyMMdd') <= '20210617')) and id is not null) and (id <> '')) (type: boolean) |
| Statistics: Num rows: 957200704 Data size: 67070516259 Basic stats: COMPLETE Column stats: NONE |
| Group By Operator |
| aggregations: sum(click) |
| keys: id (type: string), req_time (type: int) |
| mode: hash |
| outputColumnNames: _col0, _col1, _col2 |
| Statistics: Num rows: 957200704 Data size: 67070516259 Basic stats: COMPLETE Column stats: NONE |
| Reduce Output Operator |
| key expressions: _col0 (type: string), _col1 (type: int) |
| sort order: ++ |
| Map-reduce partition columns: _col0 (type: string), _col1 (type: int) |
| Statistics: Num rows: 957200704 Data size: 67070516259 Basic stats: COMPLETE Column stats: NONE |
| value expressions: _col2 (type: bigint) |
| Reducer 2 |
| Reduce Operator Tree: |
| Group By Operator |
| aggregations: sum(VALUE._col0) |
| keys: KEY._col0 (type: string), KEY._col1 (type: int) |
| mode: mergepartial |
| outputColumnNames: _col0, _col1, _col2 |
| Statistics: Num rows: 478600352 Data size: 33535258129 Basic stats: COMPLETE Column stats: NONE |
| File Output Operator |
| compressed: false |
| Statistics: Num rows: 478600352 Data size: 33535258129 Basic stats: COMPLETE Column stats: NONE |
| table: |
| input format: org.apache.hadoop.mapred.TextInputFormat |
| output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
| serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| |
| Stage: Stage-0 |
| Fetch Operator |
| limit: -1 |
| Processor Tree: |
| ListSink |
| |
+----------------------------------------------------+--+