备注:
Hive 版本 2.1.1
文章目录
抽样概述
当数据量特别大时,对全体数据进行处理存在困难时,抽样就显得尤其重要了。抽样可以从被抽取的数据中估计和推断出整体的特性,是科学实验、质量检验、社会调查普遍采用的一种经济有效的工作和研究方法。
Hive中,数据抽样分为以下三种:
- 随机抽样
- 桶表抽样
- 块抽样
一.随机抽样
Hive有个随机函数rand(),我们可以通过rand()函数对表进行抽样,然后用limit子句进行限制抽样数据的返回。
其中rand函数前的distribute和sort关键字可以保证数据在mapper和reducer阶段是随机分布的。
代码:
select * from ods_fact_sale order by rand() limit 20;
select * from ods_fact_sale where sale_date = '2011-08-16 00:00:00.0' distribute by rand() sort by rand() limit 10;
测试记录:
从测试记录可以看出,随机抽样因为需要排序,所以性能也不佳,当然会比全量数据查询性能更优一些
hive>
> select * from ods_fact_sale order by rand() limit 20;
Query ID = root_20201231105936_75f9fb76-9149-4884-8faf-4254fd1e3b30
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
20/12/31 10:59:37 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm69
Starting Job = job_1609141291605_0022, Tracking URL = http://hp3:8088/proxy/application_1609141291605_0022/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job -kill job_1609141291605_0022
Hadoop job information for Stage-1: number of mappers: 117; number of reducers: 1
2020-12-31 10:59:46,944 Stage-1 map = 0%, reduce = 0%
2020-12-31 11:00:01,475 Stage-1 map = 1%, reduce = 0%, Cumulative CPU 13.27 sec
2020-12-31 11:00:02,506 Stage-1 map = 2%, reduce = 0%, Cumulative CPU 27.08 sec
2020-12-31 11:00:13,893 Stage-1 map = 3%, reduce = 0%, Cumulative CPU 50.64 sec
2020-12-31 11:00:25,199 Stage-1 map = 5%, reduce = 0%, Cumulative CPU 74.57 sec
2020-12-31 11:00:37,526 Stage-1 map = 7%, reduce = 0%, Cumulative CPU 99.39 sec
2020-12-31 11:00:49,832 Stage-1 map = 9%, reduce = 0%, Cumulative CPU 123.39 sec
2020-12-31 11:01:01,139 Stage-1 map = 10%, reduce = 0%, Cumulative CPU 147.16 sec
2020-12-31 11:01:12,412 Stage-1 map = 12%, reduce = 0%, Cumulative CPU 170.94 sec
2020-12-31 11:01:24,721 Stage-1 map = 14%, reduce = 0%, Cumulative CPU 194.79 sec
2020-12-31 11:01:35,987 Stage-1 map = 15%, reduce = 0%, Cumulative CPU 206.61 sec
2020-12-31 11:01:47,263 Stage-1 map = 16%, reduce = 0%, Cumulative CPU 230.32 sec
2020-12-31 11:01:49,314 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 242.44 sec
2020-12-31 11:01:58,542 Stage-1 map = 18%, reduce = 0%, Cumulative CPU 254.01 sec
2020-12-31 11:02:00,591 Stage-1 map = 19%, reduce = 0%, Cumulative CPU 266.02 sec
2020-12-31 11:02:09,819 Stage-1 map = 20%, reduce = 0%, Cumulative CPU 277.88 sec
2020-12-31 11:02:12,895 Stage-1 map = 21%, reduce = 0%, Cumulative CPU 289.49 sec
2020-12-31 11:02:24,167 Stage-1 map = 22%, reduce = 0%, Cumulative CPU 312.52 sec
2020-12-31 11:02:31,327 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 324.24 sec
2020-12-31 11:02:34,390 Stage-1 map = 24%, reduce = 0%, Cumulative CPU 336.04 sec
2020-12-31 11:02:42,588 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 348.11 sec
2020-12-31 11:02:45,663 Stage-1 map = 26%, reduce = 0%, Cumulative CPU 359.64 sec
2020-12-31 11:02:56,917 Stage-1 map = 27%, reduce = 0%, Cumulative CPU 383.53 sec
2020-12-31 11:03:06,149 Stage-1 map = 28%, reduce = 0%, Cumulative CPU 395.32 sec
2020-12-31 11:03:09,227 Stage-1 map = 29%, reduce = 0%, Cumulative CPU 407.11 sec
2020-12-31 11:03:16,393 Stage-1 map = 30%, reduce = 0%, Cumulative CPU 418.82 sec
2020-12-31 11:03:19,467 Stage-1 map = 31%, reduce = 0%, Cumulative CPU 430.2 sec
2020-12-31 11:03:27,645 Stage-1 map = 32%, reduce = 0%, Cumulative CPU 441.85 sec
2020-12-31 11:03:38,914 Stage-1 map = 33%, reduce = 0%, Cumulative CPU 465.44 sec
2020-12-31 11:03:43,008 Stage-1 map = 34%, reduce = 0%, Cumulative CPU 477.26 sec
2020-12-31 11:03:51,199 Stage-1 map = 35%, reduce = 0%, Cumulative CPU 489.1 sec
2020-12-31 11:03:55,286 Stage-1 map = 36%, reduce = 0%, Cumulative CPU 500.86 sec
2020-12-31 11:04:02,462 Stage-1 map = 37%, reduce = 0%, Cumulative CPU 512.68 sec
2020-12-31 11:04:06,560 Stage-1 map = 38%, reduce = 0%, Cumulative CPU 524.44 sec
2020-12-31 11:04:17,815 Stage-1 map = 39%, reduce = 0%, Cumulative CPU 548.46 sec
2020-12-31 11:04:23,958 Stage-1 map = 40%, reduce = 0%, Cumulative CPU 560.07 sec
2020-12-31 11:04:30,092 Stage-1 map = 41%, reduce = 0%, Cumulative CPU 572.02 sec
2020-12-31 11:04:35,194 Stage-1 map = 42%, reduce = 0%, Cumulative CPU 584.2 sec
2020-12-31 11:04:42,337 Stage-1 map = 43%, reduce = 0%, Cumulative CPU 595.69 sec
2020-12-31 11:04:47,456 Stage-1 map = 44%, reduce = 0%, Cumulative CPU 607.62 sec
2020-12-31 11:04:58,717 Stage-1 map = 45%, reduce = 0%, Cumulative CPU 631.07 sec
2020-12-31 11:05:03,819 Stage-1 map = 46%, reduce = 0%, Cumulative CPU 642.81 sec
2020-12-31 11:05:09,966 Stage-1 map = 47%, reduce = 0%, Cumulative CPU 654.43 sec
2020-12-31 11:05:15,077 Stage-1 map = 48%, reduce = 0%, Cumulative CPU 665.79 sec
2020-12-31 11:05:21,220 Stage-1 map = 49%, reduce = 0%, Cumulative CPU 677.29 sec
2020-12-31 11:05:26,334 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 688.58 sec
2020-12-31 11:05:38,617 Stage-1 map = 51%, reduce = 0%, Cumulative CPU 710.6 sec
2020-12-31 11:05:41,688 Stage-1 map = 52%, reduce = 0%, Cumulative CPU 723.47 sec
2020-12-31 11:05:49,869 Stage-1 map = 53%, reduce = 0%, Cumulative CPU 735.39 sec
2020-12-31 11:05:52,936 Stage-1 map = 54%, reduce = 0%, Cumulative CPU 747.02 sec
2020-12-31 11:06:01,119 Stage-1 map = 55%, reduce = 0%, Cumulative CPU 759.4 sec
2020-12-31 11:06:05,217 Stage-1 map = 56%, reduce = 0%, Cumulative CPU 771.34 sec
2020-12-31 11:06:16,493 Stage-1 map = 57%, reduce = 0%, Cumulative CPU 795.58 sec
2020-12-31 11:06:25,733 Stage-1 map = 58%, reduce = 0%, Cumulative CPU 807.6 sec
2020-12-31 11:06:28,797 Stage-1 map = 59%, reduce = 0%, Cumulative CPU 819.42 sec
2020-12-31 11:06:38,003 Stage-1 map = 60%, reduce = 0%, Cumulative CPU 831.37 sec
2020-12-31 11:06:39,030 Stage-1 map = 61%, reduce = 0%, Cumulative CPU 842.84 sec
2020-12-31 11:06:49,244 Stage-1 map = 62%, reduce = 0%, Cumulative CPU 854.85 sec
2020-12-31 11:07:00,504 Stage-1 map = 63%, reduce = 0%, Cumulative CPU 879.03 sec
2020-12-31 11:07:01,528 Stage-1 map = 64%, reduce = 0%, Cumulative CPU 891.09 sec
2020-12-31 11:07:11,764 Stage-1 map = 65%, reduce = 0%, Cumulative CPU 902.4 sec
2020-12-31 11:07:13,809 Stage-1 map = 66%, reduce = 0%, Cumulative CPU 913.91 sec
2020-12-31 11:07:24,033 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 937.48 sec
2020-12-31 11:07:36,294 Stage-1 map = 69%, reduce = 0%, Cumulative CPU 961.98 sec
2020-12-31 11:07:47,557 Stage-1 map = 70%, reduce = 0%, Cumulative CPU 973.78 sec
2020-12-31 11:07:48,577 Stage-1 map = 71%, reduce = 0%, Cumulative CPU 986.12 sec
2020-12-31 11:07:58,802 Stage-1 map = 72%, reduce = 0%, Cumulative CPU 997.56 sec
2020-12-31 11:07:59,822 Stage-1 map = 73%, reduce = 0%, Cumulative CPU 1009.51 sec
2020-12-31 11:08:10,088 Stage-1 map = 74%, reduce = 0%, Cumulative CPU 1021.66 sec
2020-12-31 11:08:21,359 Stage-1 map = 75%, reduce = 0%, Cumulative CPU 1045.31 sec
2020-12-31 11:08:22,387 Stage-1 map = 76%, reduce = 0%, Cumulative CPU 1057.46 sec
2020-12-31 11:08:32,659 Stage-1 map = 77%, reduce = 0%, Cumulative CPU 1069.52 sec
2020-12-31 11:08:34,714 Stage-1 map = 78%, reduce = 0%, Cumulative CPU 1081.69 sec
2020-12-31 11:08:44,962 Stage-1 map = 79%, reduce = 0%, Cumulative CPU 1093.68 sec
2020-12-31 11:08:57,248 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 1117.85 sec
2020-12-31 11:08:59,303 Stage-1 map = 81%, reduce = 0%, Cumulative CPU 1129.42 sec
2020-12-31 11:09:08,562 Stage-1 map = 82%, reduce = 0%, Cumulative CPU 1141.26 sec
2020-12-31 11:09:14,718 Stage-1 map = 82%, reduce = 27%, Cumulative CPU 1142.0 sec
2020-12-31 11:09:19,840 Stage-1 map = 83%, reduce = 27%, Cumulative CPU 1153.68 sec
2020-12-31 11:09:25,991 Stage-1 map = 83%, reduce = 28%, Cumulative CPU 1153.9 sec
2020-12-31 11:09:31,094 Stage-1 map = 84%, reduce = 28%, Cumulative CPU 1165.59 sec
2020-12-31 11:09:42,338 Stage-1 map = 85%, reduce = 28%, Cumulative CPU 1177.7 sec
2020-12-31 11:10:04,855 Stage-1 map = 86%, reduce = 28%, Cumulative CPU 1201.17 sec
2020-12-31 11:10:08,950 Stage-1 map = 86%, reduce = 29%, Cumulative CPU 1201.22 sec
2020-12-31 11:10:16,121 Stage-1 map = 87%, reduce = 29%, Cumulative CPU 1212.59 sec
2020-12-31 11:10:27,370 Stage-1 map = 88%, reduce = 29%, Cumulative CPU 1224.54 sec
2020-12-31 11:10:37,625 Stage-1 map = 89%, reduce = 29%, Cumulative CPU 1236.28 sec
2020-12-31 11:10:38,646 Stage-1 map = 89%, reduce = 30%, Cumulative CPU 1236.32 sec
2020-12-31 11:10:48,884 Stage-1 map = 90%, reduce = 30%, Cumulative CPU 1248.28 sec
2020-12-31 11:11:00,139 Stage-1 map = 91%, reduce = 30%, Cumulative CPU 1260.32 sec
2020-12-31 11:11:23,680 Stage-1 map = 92%, reduce = 30%, Cumulative CPU 1283.88 sec
2020-12-31 11:11:26,757 Stage-1 map = 92%, reduce = 31%, Cumulative CPU 1283.92 sec
2020-12-31 11:11:34,927 Stage-1 map = 93%, reduce = 31%, Cumulative CPU 1295.65 sec
2020-12-31 11:11:46,182 Stage-1 map = 94%, reduce = 31%, Cumulative CPU 1308.17 sec
2020-12-31 11:11:58,462 Stage-1 map = 95%, reduce = 31%, Cumulative CPU 1320.31 sec
2020-12-31 11:12:02,563 Stage-1 map = 95%, reduce = 32%, Cumulative CPU 1320.36 sec
2020-12-31 11:12:08,713 Stage-1 map = 96%, reduce = 32%, Cumulative CPU 1331.91 sec
2020-12-31 11:12:19,990 Stage-1 map = 97%, reduce = 32%, Cumulative CPU 1343.48 sec
2020-12-31 11:12:43,556 Stage-1 map = 98%, reduce = 32%, Cumulative CPU 1367.18 sec
2020-12-31 11:12:45,603 Stage-1 map = 98%, reduce = 33%, Cumulative CPU 1367.24 sec
2020-12-31 11:12:55,854 Stage-1 map = 99%, reduce = 33%, Cumulative CPU 1378.97 sec
2020-12-31 11:13:07,114 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 1390.76 sec
2020-12-31 11:13:09,162 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1392.76 sec
MapReduce Total cumulative CPU time: 23 minutes 12 seconds 760 msec
Ended Job = job_1609141291605_0022
MapReduce Jobs Launched:
Stage-Stage-1: Map: 117 Reduce: 1 Cumulative CPU: 1392.76 sec HDFS Read: 31436905540 HDFS Write: 1147 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 23 minutes 12 seconds 760 msec
OK
654691105 2011-05-25 00:00:00.0 PROD10 53
752859493 2011-11-08 00:00:00.0 PROD4 92
620442730 2010-06-11 00:00:00.0 PROD5 22
524983813 2011-04-11 00:00:00.0 PROD6 31
89887602 2010-08-18 00:00:00.0 PROD7 45
93701058 2011-10-31 00:00:00.0 PROD4 62
739459682 2011-01-15 00:00:00.0 PROD4 93
480818608 2010-07-12 00:00:00.0 PROD2 87
457915153 2011-09-09 00:00:00.0 PROD9 85
405422684 2011-11-23 00:00:00.0 PROD10 86
322983965 2012-04-06 00:00:00.0 PROD8 7
588940412 2010-08-15 00:00:00.0 PROD8 51
421954935 2012-01-24 00:00:00.0 PROD4 17
749374812 2010-12-12 00:00:00.0 PROD4 62
298315594 2010-06-13 00:00:00.0 PROD5 75
723116860 2011-01-17 00:00:00.0 PROD10 89
167011022 2011-01-20 00:00:00.0 PROD4 69
430667509 2011-07-07 00:00:00.0 PROD6 63
665176804 2012-08-25 00:00:00.0 PROD7 77
648219864 2012-05-15 00:00:00.0 PROD7 74
Time taken: 814.055 seconds, Fetched: 20 row(s)
hive>
> select * from ods_fact_sale where sale_date = '2011-08-16 00:00:00.0' distribute by rand() sort by rand() limit 10;
Query ID = root_20201231135813_71f7d916-8e6f-4c7a-846f-49b78194da8d
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 469
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
20/12/31 13:58:14 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm69
Starting Job = job_1609141291605_0023, Tracking URL = http://hp3:8088/proxy/application_1609141291605_0023/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job -kill job_1609141291605_0023
Hadoop job information for Stage-1: number of mappers: 117; number of reducers: 469
2020-12-31 13:58:21,609 Stage-1 map = 0%, reduce = 0%
2020-12-31 13:58:31,907 Stage-1 map = 1%, reduce = 0%, Cumulative CPU 7.78 sec
2020-12-31 13:58:32,938 Stage-1 map = 2%, reduce = 0%, Cumulative CPU 17.16 sec
2020-12-31 13:58:39,109 Stage-1 map = 3%, reduce = 0%, Cumulative CPU 24.81 sec
2020-12-31 13:58:46,309 Stage-1 map = 4%, reduce = 0%, Cumulative CPU 41.49 sec
2020-12-31 13:58:49,396 Stage-1 map = 5%, reduce = 0%, Cumulative CPU 50.29 sec
2020-12-31 13:58:52,477 Stage-1 map = 6%, reduce = 0%, Cumulative CPU 57.84 sec
2020-12-31 13:58:56,588 Stage-1 map = 7%, reduce = 0%, Cumulative CPU 66.39 sec
2020-12-31 13:58:59,672 Stage-1 map = 8%, reduce = 0%, Cumulative CPU 73.76 sec
2020-12-31 13:59:04,828 Stage-1 map = 9%, reduce = 0%, Cumulative CPU 81.84 sec
2020-12-31 13:59:12,006 Stage-1 map = 10%, reduce = 0%, Cumulative CPU 97.09 sec
2020-12-31 13:59:14,061 Stage-1 map = 11%, reduce = 0%, Cumulative CPU 104.49 sec
2020-12-31 13:59:20,215 Stage-1 map = 12%, reduce = 0%, Cumulative CPU 112.37 sec
2020-12-31 13:59:21,246 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 119.89 sec
2020-12-31 13:59:28,440 Stage-1 map = 15%, reduce = 0%, Cumulative CPU 135.55 sec
2020-12-31 13:59:36,643 Stage-1 map = 16%, reduce = 0%, Cumulative CPU 151.18 sec
2020-12-31 13:59:41,772 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 158.62 sec
2020-12-31 13:59:44,854 Stage-1 map = 18%, reduce = 0%, Cumulative CPU 166.61 sec
2020-12-31 13:59:48,955 Stage-1 map = 19%, reduce = 0%, Cumulative CPU 173.93 sec
2020-12-31 13:59:52,032 Stage-1 map = 20%, reduce = 0%, Cumulative CPU 182.11 sec
2020-12-31 13:59:56,135 Stage-1 map = 21%, reduce = 0%, Cumulative CPU 189.52 sec
2020-12-31 14:00:03,337 Stage-1 map = 22%, reduce = 0%, Cumulative CPU 204.95 sec
2020-12-31 14:00:08,474 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 212.83 sec
2020-12-31 14:00:10,529 Stage-1 map = 24%, reduce = 0%, Cumulative CPU 220.27 sec
2020-12-31 14:00:16,678 Stage-1 map = 26%, reduce = 0%, Cumulative CPU 235.53 sec
2020-12-31 14:00:24,875 Stage-1 map = 27%, reduce = 0%, Cumulative CPU 251.0 sec
2020-12-31 14:00:31,029 Stage-1 map = 28%, reduce = 0%, Cumulative CPU 258.48 sec
2020-12-31 14:00:33,084 Stage-1 map = 29%, reduce = 0%, Cumulative CPU 266.53 sec
2020-12-31 14:00:38,214 Stage-1 map = 30%, reduce = 0%, Cumulative CPU 273.81 sec
2020-12-31 14:00:40,263 Stage-1 map = 31%, reduce = 0%, Cumulative CPU 281.73 sec
2020-12-31 14:00:45,380 Stage-1 map = 32%, reduce = 0%, Cumulative CPU 289.15 sec
2020-12-31 14:00:52,560 Stage-1 map = 33%, reduce = 0%, Cumulative CPU 304.46 sec
2020-12-31 14:00:56,667 Stage-1 map = 34%, reduce = 0%, Cumulative CPU 312.72 sec
2020-12-31 14:00:59,737 Stage-1 map = 35%, reduce = 0%, Cumulative CPU 320.25 sec
2020-12-31 14:01:04,867 Stage-1 map = 36%, reduce = 0%, Cumulative CPU 328.38 sec
2020-12-31 14:01:05,893 Stage-1 map = 37%, reduce = 0%, Cumulative CPU 335.74 sec
2020-12-31 14:01:13,071 Stage-1 map = 38%, reduce = 0%, Cumulative CPU 350.86 sec
2020-12-31 14:01:20,251 Stage-1 map = 40%, reduce = 0%, Cumulative CPU 366.17 sec
2020-12-31 14:01:27,416 Stage-1 map = 41%, reduce = 0%, Cumulative CPU 373.71 sec
2020-12-31 14:01:28,442 Stage-1 map = 42%, reduce = 0%, Cumulative CPU 381.35 sec
2020-12-31 14:01:34,585 Stage-1 map = 43%, reduce = 0%, Cumulative CPU 388.67 sec
2020-12-31 14:01:35,607 Stage-1 map = 44%, reduce = 0%, Cumulative CPU 396.63 sec
2020-12-31 14:01:43,802 Stage-1 map = 45%, reduce = 0%, Cumulative CPU 412.13 sec
2020-12-31 14:01:48,929 Stage-1 map = 46%, reduce = 0%, Cumulative CPU 419.64 sec
2020-12-31 14:01:52,005 Stage-1 map = 47%, reduce = 0%, Cumulative CPU 427.68 sec
2020-12-31 14:01:54,056 Stage-1 map = 48%, reduce = 0%, Cumulative CPU 433.61 sec
2020-12-31 14:02:00,199 Stage-1 map = 49%, reduce = 0%, Cumulative CPU 441.53 sec
2020-12-31 14:02:02,274 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 449.88 sec
2020-12-31 14:02:09,465 Stage-1 map = 51%, reduce = 0%, Cumulative CPU 465.79 sec
2020-12-31 14:02:15,630 Stage-1 map = 52%, reduce = 0%, Cumulative CPU 473.67 sec
2020-12-31 14:02:17,687 Stage-1 map = 53%, reduce = 0%, Cumulative CPU 481.54 sec
2020-12-31 14:02:23,854 Stage-1 map = 54%, reduce = 0%, Cumulative CPU 489.38 sec
2020-12-31 14:02:25,905 Stage-1 map = 55%, reduce = 0%, Cumulative CPU 497.17 sec
2020-12-31 14:02:33,084 Stage-1 map = 56%, reduce = 0%, Cumulative CPU 506.49 sec
2020-12-31 14:02:41,293 Stage-1 map = 57%, reduce = 0%, Cumulative CPU 523.33 sec
2020-12-31 14:02:43,344 Stage-1 map = 58%, reduce = 0%, Cumulative CPU 532.39 sec
2020-12-31 14:02:49,496 Stage-1 map = 59%, reduce = 0%, Cumulative CPU 540.88 sec
2020-12-31 14:02:51,541 Stage-1 map = 60%, reduce = 0%, Cumulative CPU 548.98 sec
2020-12-31 14:02:57,686 Stage-1 map = 61%, reduce = 0%, Cumulative CPU 557.3 sec
2020-12-31 14:03:00,757 Stage-1 map = 62%, reduce = 0%, Cumulative CPU 567.09 sec
2020-12-31 14:03:08,949 Stage-1 map = 63%, reduce = 0%, Cumulative CPU 585.16 sec
2020-12-31 14:03:13,044 Stage-1 map = 64%, reduce = 0%, Cumulative CPU 593.36 sec
2020-12-31 14:03:18,168 Stage-1 map = 65%, reduce = 0%, Cumulative CPU 602.73 sec
2020-12-31 14:03:21,233 Stage-1 map = 66%, reduce = 0%, Cumulative CPU 610.98 sec
2020-12-31 14:03:27,362 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 620.83 sec
2020-12-31 14:03:29,405 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 629.08 sec
2020-12-31 14:03:37,585 Stage-1 map = 69%, reduce = 0%, Cumulative CPU 646.57 sec
2020-12-31 14:03:43,726 Stage-1 map = 70%, reduce = 0%, Cumulative CPU 654.23 sec
2020-12-31 14:03:45,778 Stage-1 map = 71%, reduce = 0%, Cumulative CPU 662.3 sec
2020-12-31 14:03:50,901 Stage-1 map = 72%, reduce = 0%, Cumulative CPU 670.0 sec
2020-12-31 14:03:52,949 Stage-1 map = 73%, reduce = 0%, Cumulative CPU 678.12 sec
2020-12-31 14:03:58,069 Stage-1 map = 74%, reduce = 0%, Cumulative CPU 685.59 sec
2020-12-31 14:04:05,244 Stage-1 map = 75%, reduce = 0%, Cumulative CPU 701.02 sec
2020-12-31 14:04:09,343 Stage-1 map = 76%, reduce = 0%, Cumulative CPU 709.68 sec
2020-12-31 14:04:12,416 Stage-1 map = 77%, reduce = 0%, Cumulative CPU 717.45 sec
2020-12-31 14:04:17,540 Stage-1 map = 78%, reduce = 0%, Cumulative CPU 717.45 sec
2020-12-31 14:04:19,590 Stage-1 map = 79%, reduce = 0%, Cumulative CPU 734.92 sec
2020-12-31 14:04:28,796 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 752.8 sec
2020-12-31 14:04:33,910 Stage-1 map = 81%, reduce = 0%, Cumulative CPU 761.43 sec
2020-12-31 14:04:38,010 Stage-1 map = 82%, reduce = 0%, Cumulative CPU 771.05 sec
2020-12-31 14:04:46,214 Stage-1 map = 83%, reduce = 0%, Cumulative CPU 780.66 sec
2020-12-31 14:04:55,479 Stage-1 map = 84%, reduce = 0%, Cumulative CPU 791.47 sec
2020-12-31 14:05:04,703 Stage-1 map = 85%, reduce = 0%, Cumulative CPU 801.23 sec
2020-12-31 14:05:22,125 Stage-1 map = 86%, reduce = 0%, Cumulative CPU 820.39 sec
2020-12-31 14:05:31,339 Stage-1 map = 87%, reduce = 0%, Cumulative CPU 830.03 sec
2020-12-31 14:05:40,555 Stage-1 map = 88%, reduce = 0%, Cumulative CPU 839.37 sec
2020-12-31 14:05:49,765 Stage-1 map = 89%, reduce = 0%, Cumulative CPU 848.73 sec
2020-12-31 14:05:57,956 Stage-1 map = 90%, reduce = 0%, Cumulative CPU 848.87 sec
2020-12-31 14:06:07,193 Stage-1 map = 91%, reduce = 0%, Cumulative CPU 866.8 sec
2020-12-31 14:06:21,535 Stage-1 map = 92%, reduce = 0%, Cumulative CPU 882.17 sec
2020-12-31 14:06:28,722 Stage-1 map = 93%, reduce = 0%, Cumulative CPU 890.1 sec
2020-12-31 14:06:34,871 Stage-1 map = 94%, reduce = 0%, Cumulative CPU 897.74 sec
2020-12-31 14:06:42,042 Stage-1 map = 95%, reduce = 0%, Cumulative CPU 905.22 sec
2020-12-31 14:06:49,212 Stage-1 map = 96%, reduce = 0%, Cumulative CPU 912.75 sec
2020-12-31 14:06:56,368 Stage-1 map = 97%, reduce = 0%, Cumulative CPU 920.31 sec
2020-12-31 14:07:10,714 Stage-1 map = 98%, reduce = 0%, Cumulative CPU 935.37 sec
2020-12-31 14:07:16,855 Stage-1 map = 99%, reduce = 0%, Cumulative CPU 943.06 sec
2020-12-31 14:07:24,004 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 950.73 sec
2020-12-31 14:07:30,153 Stage-1 map = 100%, reduce = 1%, Cumulative CPU 957.08 sec
2020-12-31 14:07:40,370 Stage-1 map = 100%, reduce = 2%, Cumulative CPU 968.59 sec
2020-12-31 14:07:47,548 Stage-1 map = 100%, reduce = 3%, Cumulative CPU 977.62 sec
2020-12-31 14:07:57,780 Stage-1 map = 100%, reduce = 4%, Cumulative CPU 989.98 sec
2020-12-31 14:08:08,025 Stage-1 map = 100%, reduce = 5%, Cumulative CPU 1002.35 sec
2020-12-31 14:08:16,225 Stage-1 map = 100%, reduce = 6%, Cumulative CPU 1012.48 sec
2020-12-31 14:08:26,474 Stage-1 map = 100%, reduce = 7%, Cumulative CPU 1023.83 sec
2020-12-31 14:08:35,686 Stage-1 map = 100%, reduce = 8%, Cumulative CPU 1035.25 sec
2020-12-31 14:08:43,876 Stage-1 map = 100%, reduce = 9%, Cumulative CPU 1044.42 sec
2020-12-31 14:08:54,122 Stage-1 map = 100%, reduce = 10%, Cumulative CPU 1056.78 sec
2020-12-31 14:09:04,381 Stage-1 map = 100%, reduce = 11%, Cumulative CPU 1068.06 sec
2020-12-31 14:09:12,577 Stage-1 map = 100%, reduce = 12%, Cumulative CPU 1077.07 sec
2020-12-31 14:09:22,814 Stage-1 map = 100%, reduce = 13%, Cumulative CPU 1089.5 sec
2020-12-31 14:09:32,020 Stage-1 map = 100%, reduce = 14%, Cumulative CPU 1100.92 sec
2020-12-31 14:09:42,265 Stage-1 map = 100%, reduce = 15%, Cumulative CPU 1112.77 sec
2020-12-31 14:09:50,446 Stage-1 map = 100%, reduce = 16%, Cumulative CPU 1122.02 sec
2020-12-31 14:10:00,697 Stage-1 map = 100%, reduce = 17%, Cumulative CPU 1134.19 sec
2020-12-31 14:10:10,947 Stage-1 map = 100%, reduce = 18%, Cumulative CPU 1145.76 sec
2020-12-31 14:10:18,126 Stage-1 map = 100%, reduce = 19%, Cumulative CPU 1154.87 sec
2020-12-31 14:10:28,387 Stage-1 map = 100%, reduce = 20%, Cumulative CPU 1166.3 sec
2020-12-31 14:10:38,627 Stage-1 map = 100%, reduce = 21%, Cumulative CPU 1178.67 sec
2020-12-31 14:10:46,829 Stage-1 map = 100%, reduce = 22%, Cumulative CPU 1188.38 sec
2020-12-31 14:10:56,045 Stage-1 map = 100%, reduce = 23%, Cumulative CPU 1199.71 sec
2020-12-31 14:11:06,291 Stage-1 map = 100%, reduce = 24%, Cumulative CPU 1211.25 sec
2020-12-31 14:11:14,480 Stage-1 map = 100%, reduce = 25%, Cumulative CPU 1220.47 sec
2020-12-31 14:11:24,728 Stage-1 map = 100%, reduce = 26%, Cumulative CPU 1231.71 sec
2020-12-31 14:11:34,956 Stage-1 map = 100%, reduce = 27%, Cumulative CPU 1243.07 sec
2020-12-31 14:11:43,155 Stage-1 map = 100%, reduce = 28%, Cumulative CPU 1252.2 sec
2020-12-31 14:11:52,379 Stage-1 map = 100%, reduce = 29%, Cumulative CPU 1263.42 sec
2020-12-31 14:12:02,628 Stage-1 map = 100%, reduce = 30%, Cumulative CPU 1274.71 sec
2020-12-31 14:12:12,877 Stage-1 map = 100%, reduce = 31%, Cumulative CPU 1285.68 sec
2020-12-31 14:12:21,081 Stage-1 map = 100%, reduce = 32%, Cumulative CPU 1294.61 sec
2020-12-31 14:12:32,371 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 1308.38 sec
2020-12-31 14:12:40,558 Stage-1 map = 100%, reduce = 34%, Cumulative CPU 1317.37 sec
2020-12-31 14:12:48,765 Stage-1 map = 100%, reduce = 35%, Cumulative CPU 1326.48 sec
2020-12-31 14:13:00,064 Stage-1 map = 100%, reduce = 36%, Cumulative CPU 1337.97 sec
2020-12-31 14:13:08,279 Stage-1 map = 100%, reduce = 37%, Cumulative CPU 1348.92 sec
2020-12-31 14:13:16,478 Stage-1 map = 100%, reduce = 38%, Cumulative CPU 1358.16 sec
2020-12-31 14:13:27,748 Stage-1 map = 100%, reduce = 39%, Cumulative CPU 1369.61 sec
2020-12-31 14:13:36,954 Stage-1 map = 100%, reduce = 40%, Cumulative CPU 1381.52 sec
2020-12-31 14:13:45,162 Stage-1 map = 100%, reduce = 41%, Cumulative CPU 1390.5 sec
2020-12-31 14:13:56,426 Stage-1 map = 100%, reduce = 42%, Cumulative CPU 1404.04 sec
2020-12-31 14:14:04,630 Stage-1 map = 100%, reduce = 43%, Cumulative CPU 1413.17 sec
2020-12-31 14:14:15,907 Stage-1 map = 100%, reduce = 44%, Cumulative CPU 1424.15 sec
2020-12-31 14:14:24,108 Stage-1 map = 100%, reduce = 45%, Cumulative CPU 1433.07 sec
2020-12-31 14:14:33,324 Stage-1 map = 100%, reduce = 46%, Cumulative CPU 1444.27 sec
2020-12-31 14:14:44,594 Stage-1 map = 100%, reduce = 47%, Cumulative CPU 1457.57 sec
2020-12-31 14:14:51,765 Stage-1 map = 100%, reduce = 48%, Cumulative CPU 1464.38 sec
2020-12-31 14:15:00,997 Stage-1 map = 100%, reduce = 49%, Cumulative CPU 1475.56 sec
2020-12-31 14:15:12,284 Stage-1 map = 100%, reduce = 50%, Cumulative CPU 1486.63 sec
2020-12-31 14:15:20,479 Stage-1 map = 100%, reduce = 51%, Cumulative CPU 1495.52 sec
2020-12-31 14:15:28,690 Stage-1 map = 100%, reduce = 52%, Cumulative CPU 1506.78 sec
2020-12-31 14:15:39,970 Stage-1 map = 100%, reduce = 53%, Cumulative CPU 1518.01 sec
2020-12-31 14:15:48,182 Stage-1 map = 100%, reduce = 54%, Cumulative CPU 1527.01 sec
2020-12-31 14:15:57,401 Stage-1 map = 100%, reduce = 55%, Cumulative CPU 1538.2 sec
2020-12-31 14:16:08,678 Stage-1 map = 100%, reduce = 56%, Cumulative CPU 1551.7 sec
2020-12-31 14:16:16,895 Stage-1 map = 100%, reduce = 57%, Cumulative CPU 1560.42 sec
2020-12-31 14:16:25,107 Stage-1 map = 100%, reduce = 58%, Cumulative CPU 1569.44 sec
2020-12-31 14:16:36,370 Stage-1 map = 100%, reduce = 59%, Cumulative CPU 1580.69 sec
2020-12-31 14:16:45,596 Stage-1 map = 100%, reduce = 60%, Cumulative CPU 1592.01 sec
2020-12-31 14:16:52,776 Stage-1 map = 100%, reduce = 61%, Cumulative CPU 1601.0 sec
2020-12-31 14:17:05,078 Stage-1 map = 100%, reduce = 62%, Cumulative CPU 1614.51 sec
2020-12-31 14:17:13,284 Stage-1 map = 100%, reduce = 63%, Cumulative CPU 1623.31 sec
2020-12-31 14:17:21,491 Stage-1 map = 100%, reduce = 64%, Cumulative CPU 1632.19 sec
2020-12-31 14:17:32,762 Stage-1 map = 100%, reduce = 65%, Cumulative CPU 1643.45 sec
2020-12-31 14:17:40,961 Stage-1 map = 100%, reduce = 66%, Cumulative CPU 1654.75 sec
2020-12-31 14:17:49,164 Stage-1 map = 100%, reduce = 67%, Cumulative CPU 1663.65 sec
2020-12-31 14:18:00,460 Stage-1 map = 100%, reduce = 68%, Cumulative CPU 1674.82 sec
2020-12-31 14:18:09,686 Stage-1 map = 100%, reduce = 69%, Cumulative CPU 1685.96 sec
2020-12-31 14:18:18,922 Stage-1 map = 100%, reduce = 70%, Cumulative CPU 1694.86 sec
2020-12-31 14:18:29,166 Stage-1 map = 100%, reduce = 71%, Cumulative CPU 1706.34 sec
2020-12-31 14:18:38,369 Stage-1 map = 100%, reduce = 72%, Cumulative CPU 1717.78 sec
2020-12-31 14:18:48,641 Stage-1 map = 100%, reduce = 73%, Cumulative CPU 1728.87 sec
2020-12-31 14:18:56,848 Stage-1 map = 100%, reduce = 74%, Cumulative CPU 1737.66 sec
2020-12-31 14:19:06,087 Stage-1 map = 100%, reduce = 75%, Cumulative CPU 1748.69 sec
2020-12-31 14:19:16,359 Stage-1 map = 100%, reduce = 76%, Cumulative CPU 1759.78 sec
2020-12-31 14:19:24,564 Stage-1 map = 100%, reduce = 77%, Cumulative CPU 1768.65 sec
2020-12-31 14:19:34,806 Stage-1 map = 100%, reduce = 78%, Cumulative CPU 1779.88 sec
2020-12-31 14:19:45,062 Stage-1 map = 100%, reduce = 79%, Cumulative CPU 1791.24 sec
2020-12-31 14:19:53,273 Stage-1 map = 100%, reduce = 80%, Cumulative CPU 1800.31 sec
2020-12-31 14:20:02,499 Stage-1 map = 100%, reduce = 81%, Cumulative CPU 1811.28 sec
2020-12-31 14:20:12,750 Stage-1 map = 100%, reduce = 82%, Cumulative CPU 1822.45 sec
2020-12-31 14:20:20,981 Stage-1 map = 100%, reduce = 83%, Cumulative CPU 1831.51 sec
2020-12-31 14:20:31,219 Stage-1 map = 100%, reduce = 84%, Cumulative CPU 1843.78 sec
2020-12-31 14:20:41,474 Stage-1 map = 100%, reduce = 85%, Cumulative CPU 1855.11 sec
2020-12-31 14:20:48,653 Stage-1 map = 100%, reduce = 86%, Cumulative CPU 1864.26 sec
2020-12-31 14:20:58,906 Stage-1 map = 100%, reduce = 87%, Cumulative CPU 1875.36 sec
2020-12-31 14:21:09,160 Stage-1 map = 100%, reduce = 88%, Cumulative CPU 1886.85 sec
2020-12-31 14:21:19,417 Stage-1 map = 100%, reduce = 89%, Cumulative CPU 1897.92 sec
2020-12-31 14:21:26,596 Stage-1 map = 100%, reduce = 90%, Cumulative CPU 1907.18 sec
2020-12-31 14:21:36,846 Stage-1 map = 100%, reduce = 91%, Cumulative CPU 1918.57 sec
2020-12-31 14:21:47,109 Stage-1 map = 100%, reduce = 92%, Cumulative CPU 1929.52 sec
2020-12-31 14:21:55,303 Stage-1 map = 100%, reduce = 93%, Cumulative CPU 1938.42 sec
2020-12-31 14:22:05,571 Stage-1 map = 100%, reduce = 94%, Cumulative CPU 1949.77 sec
2020-12-31 14:22:14,793 Stage-1 map = 100%, reduce = 95%, Cumulative CPU 1960.81 sec
2020-12-31 14:22:23,001 Stage-1 map = 100%, reduce = 96%, Cumulative CPU 1969.72 sec
2020-12-31 14:22:33,270 Stage-1 map = 100%, reduce = 97%, Cumulative CPU 1981.0 sec
2020-12-31 14:22:43,503 Stage-1 map = 100%, reduce = 98%, Cumulative CPU 1992.01 sec
2020-12-31 14:22:50,683 Stage-1 map = 100%, reduce = 99%, Cumulative CPU 2000.89 sec
2020-12-31 14:23:05,030 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2016.56 sec
MapReduce Total cumulative CPU time: 33 minutes 36 seconds 560 msec
Ended Job = job_1609141291605_0023
Launching Job 2 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
20/12/31 14:23:06 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm69
Starting Job = job_1609141291605_0024, Tracking URL = http://hp3:8088/proxy/application_1609141291605_0024/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job -kill job_1609141291605_0024
Hadoop job information for Stage-2: number of mappers: 2; number of reducers: 1
2020-12-31 14:23:17,212 Stage-2 map = 0%, reduce = 0%
2020-12-31 14:23:24,475 Stage-2 map = 50%, reduce = 0%, Cumulative CPU 5.37 sec
2020-12-31 14:23:25,505 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 11.9 sec
2020-12-31 14:23:30,651 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 14.14 sec
MapReduce Total cumulative CPU time: 14 seconds 140 msec
Ended Job = job_1609141291605_0024
MapReduce Jobs Launched:
Stage-Stage-1: Map: 117 Reduce: 469 Cumulative CPU: 2016.56 sec HDFS Read: 31438766866 HDFS Write: 79070 HDFS EC Read: 0 SUCCESS
Stage-Stage-2: Map: 2 Reduce: 1 Cumulative CPU: 14.14 sec HDFS Read: 207188 HDFS Write: 614 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 33 minutes 50 seconds 700 msec
OK
601096637 2011-08-16 00:00:00.0 PROD10 28
7504198 2011-08-16 00:00:00.0 PROD7 22
7666912 2011-08-16 00:00:00.0 PROD7 70
393337914 2011-08-16 00:00:00.0 PROD5 55
98814403 2011-08-16 00:00:00.0 PROD4 45
744615937 2011-08-16 00:00:00.0 PROD7 73
124859277 2011-08-16 00:00:00.0 PROD3 69
212317100 2011-08-16 00:00:00.0 PROD10 48
504809117 2011-08-16 00:00:00.0 PROD3 33
268235827 2011-08-16 00:00:00.0 PROD9 91
Time taken: 1517.782 seconds, Fetched: 10 row(s)
二.桶表抽样
当数据量特别大时,对全体数据进行处理存在困难时,抽样就显得尤其重要了。抽样可以从被抽取的数据中估计和推断出整体的特性,是科学实验、质量检验、社会调查普遍采用的一种经济有效的工作和研究方法。
Hive支持桶表抽样和块抽样。所谓桶表指的是在创建表时使用CLUSTERED BY子句创建了桶的表。桶表抽样的语法如下:
table_sample: TABLESAMPLE (BUCKET x OUT OF y [ON colname])
TABLESAMPLE子句允许用户编写用于数据抽样而不是整个表的查询,该子句出现FROM子句中,可用于任何表中。桶编号从1开始,colname表明抽取样本的列,可以是非分区列中的任意一列,或者使用rand()表明在整个行中抽取样本而不是单个列。在colname上分桶的行随机进入1到y个桶中,返回属于桶x的行。下面的例子中,返回32个桶中的第3个桶中的行:
代码:
-- 随机抽取一百分之一的数据
select * from ods_fact_sale tablesample(bucket 1 out of 100 on rand()) limit 100
测试记录:
hive> select * from ods_fact_sale tablesample(bucket 1 out of 100 on rand()) limit 100;
Query ID = root_20210106102309_b7fd3c38-74f3-4877-bf44-d5bb24a62a93
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
21/01/06 10:23:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm69
Starting Job = job_1609141291605_0029, Tracking URL = http://hp3:8088/proxy/application_1609141291605_0029/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job -kill job_1609141291605_0029
Hadoop job information for Stage-1: number of mappers: 117; number of reducers: 0
2021-01-06 10:23:18,751 Stage-1 map = 0%, reduce = 0%
2021-01-06 10:23:26,042 Stage-1 map = 2%, reduce = 0%, Cumulative CPU 7.34 sec
2021-01-06 10:23:30,196 Stage-1 map = 3%, reduce = 0%, Cumulative CPU 14.66 sec
2021-01-06 10:23:34,325 Stage-1 map = 5%, reduce = 0%, Cumulative CPU 21.83 sec
2021-01-06 10:23:38,447 Stage-1 map = 7%, reduce = 0%, Cumulative CPU 29.09 sec
2021-01-06 10:23:42,571 Stage-1 map = 8%, reduce = 0%, Cumulative CPU 32.67 sec
2021-01-06 10:23:43,616 Stage-1 map = 9%, reduce = 0%, Cumulative CPU 36.22 sec
2021-01-06 10:23:46,695 Stage-1 map = 10%, reduce = 0%, Cumulative CPU 43.32 sec
2021-01-06 10:23:49,779 Stage-1 map = 11%, reduce = 0%, Cumulative CPU 46.9 sec
2021-01-06 10:23:50,812 Stage-1 map = 12%, reduce = 0%, Cumulative CPU 50.38 sec
2021-01-06 10:23:53,928 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 53.95 sec
2021-01-06 10:23:54,958 Stage-1 map = 14%, reduce = 0%, Cumulative CPU 57.4 sec
2021-01-06 10:23:58,044 Stage-1 map = 15%, reduce = 0%, Cumulative CPU 60.99 sec
2021-01-06 10:24:02,163 Stage-1 map = 16%, reduce = 0%, Cumulative CPU 68.22 sec
2021-01-06 10:24:03,201 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 71.85 sec
2021-01-06 10:24:06,282 Stage-1 map = 18%, reduce = 0%, Cumulative CPU 75.38 sec
2021-01-06 10:24:07,315 Stage-1 map = 19%, reduce = 0%, Cumulative CPU 78.96 sec
2021-01-06 10:24:10,398 Stage-1 map = 20%, reduce = 0%, Cumulative CPU 82.35 sec
2021-01-06 10:24:11,427 Stage-1 map = 21%, reduce = 0%, Cumulative CPU 85.78 sec
2021-01-06 10:24:15,539 Stage-1 map = 22%, reduce = 0%, Cumulative CPU 92.86 sec
2021-01-06 10:24:18,621 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 96.44 sec
2021-01-06 10:24:19,646 Stage-1 map = 24%, reduce = 0%, Cumulative CPU 100.07 sec
2021-01-06 10:24:22,724 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 103.7 sec
2021-01-06 10:24:23,749 Stage-1 map = 26%, reduce = 0%, Cumulative CPU 107.35 sec
2021-01-06 10:24:26,827 Stage-1 map = 27%, reduce = 0%, Cumulative CPU 114.56 sec
2021-01-06 10:24:29,950 Stage-1 map = 28%, reduce = 0%, Cumulative CPU 118.18 sec
2021-01-06 10:24:30,972 Stage-1 map = 29%, reduce = 0%, Cumulative CPU 121.63 sec
2021-01-06 10:24:34,056 Stage-1 map = 30%, reduce = 0%, Cumulative CPU 125.19 sec
2021-01-06 10:24:35,083 Stage-1 map = 31%, reduce = 0%, Cumulative CPU 128.71 sec
2021-01-06 10:24:38,153 Stage-1 map = 32%, reduce = 0%, Cumulative CPU 132.27 sec
2021-01-06 10:24:42,256 Stage-1 map = 33%, reduce = 0%, Cumulative CPU 139.28 sec
2021-01-06 10:24:43,284 Stage-1 map = 34%, reduce = 0%, Cumulative CPU 142.67 sec
2021-01-06 10:24:46,354 Stage-1 map = 35%, reduce = 0%, Cumulative CPU 146.25 sec
2021-01-06 10:24:47,379 Stage-1 map = 36%, reduce = 0%, Cumulative CPU 149.89 sec
2021-01-06 10:24:50,443 Stage-1 map = 37%, reduce = 0%, Cumulative CPU 153.59 sec
2021-01-06 10:24:51,469 Stage-1 map = 38%, reduce = 0%, Cumulative CPU 157.12 sec
2021-01-06 10:24:55,594 Stage-1 map = 39%, reduce = 0%, Cumulative CPU 164.4 sec
2021-01-06 10:24:58,671 Stage-1 map = 40%, reduce = 0%, Cumulative CPU 168.07 sec
2021-01-06 10:24:59,695 Stage-1 map = 41%, reduce = 0%, Cumulative CPU 171.6 sec
2021-01-06 10:25:03,822 Stage-1 map = 43%, reduce = 0%, Cumulative CPU 179.17 sec
2021-01-06 10:25:07,927 Stage-1 map = 44%, reduce = 0%, Cumulative CPU 186.31 sec
2021-01-06 10:25:12,024 Stage-1 map = 46%, reduce = 0%, Cumulative CPU 193.51 sec
2021-01-06 10:25:16,114 Stage-1 map = 48%, reduce = 0%, Cumulative CPU 200.55 sec
2021-01-06 10:25:20,199 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 207.57 sec
2021-01-06 10:25:24,294 Stage-1 map = 51%, reduce = 0%, Cumulative CPU 214.76 sec
2021-01-06 10:25:28,387 Stage-1 map = 53%, reduce = 0%, Cumulative CPU 221.97 sec
2021-01-06 10:25:32,493 Stage-1 map = 55%, reduce = 0%, Cumulative CPU 228.8 sec
2021-01-06 10:25:36,581 Stage-1 map = 56%, reduce = 0%, Cumulative CPU 235.82 sec
2021-01-06 10:25:40,678 Stage-1 map = 58%, reduce = 0%, Cumulative CPU 243.01 sec
2021-01-06 10:25:44,771 Stage-1 map = 60%, reduce = 0%, Cumulative CPU 250.35 sec
2021-01-06 10:25:48,863 Stage-1 map = 62%, reduce = 0%, Cumulative CPU 257.45 sec
2021-01-06 10:25:52,971 Stage-1 map = 63%, reduce = 0%, Cumulative CPU 264.79 sec
2021-01-06 10:25:56,067 Stage-1 map = 64%, reduce = 0%, Cumulative CPU 268.38 sec
2021-01-06 10:25:57,091 Stage-1 map = 65%, reduce = 0%, Cumulative CPU 272.06 sec
2021-01-06 10:26:00,161 Stage-1 map = 66%, reduce = 0%, Cumulative CPU 275.58 sec
2021-01-06 10:26:01,181 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 279.07 sec
2021-01-06 10:26:04,288 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 282.68 sec
2021-01-06 10:26:08,389 Stage-1 map = 69%, reduce = 0%, Cumulative CPU 289.73 sec
2021-01-06 10:26:09,413 Stage-1 map = 70%, reduce = 0%, Cumulative CPU 293.31 sec
2021-01-06 10:26:12,491 Stage-1 map = 71%, reduce = 0%, Cumulative CPU 296.94 sec
2021-01-06 10:26:13,517 Stage-1 map = 72%, reduce = 0%, Cumulative CPU 300.47 sec
2021-01-06 10:26:16,646 Stage-1 map = 73%, reduce = 0%, Cumulative CPU 304.17 sec
2021-01-06 10:26:17,667 Stage-1 map = 74%, reduce = 0%, Cumulative CPU 307.78 sec
2021-01-06 10:26:21,775 Stage-1 map = 75%, reduce = 0%, Cumulative CPU 314.72 sec
2021-01-06 10:26:24,842 Stage-1 map = 76%, reduce = 0%, Cumulative CPU 318.24 sec
2021-01-06 10:26:25,864 Stage-1 map = 77%, reduce = 0%, Cumulative CPU 321.84 sec
2021-01-06 10:26:28,990 Stage-1 map = 78%, reduce = 0%, Cumulative CPU 325.54 sec
2021-01-06 10:26:30,010 Stage-1 map = 79%, reduce = 0%, Cumulative CPU 328.98 sec
2021-01-06 10:26:33,100 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 335.9 sec
2021-01-06 10:26:36,182 Stage-1 map = 81%, reduce = 0%, Cumulative CPU 339.46 sec
2021-01-06 10:26:37,208 Stage-1 map = 82%, reduce = 0%, Cumulative CPU 342.94 sec
2021-01-06 10:26:40,278 Stage-1 map = 83%, reduce = 0%, Cumulative CPU 346.48 sec
2021-01-06 10:26:41,299 Stage-1 map = 84%, reduce = 0%, Cumulative CPU 350.03 sec
2021-01-06 10:26:44,368 Stage-1 map = 85%, reduce = 0%, Cumulative CPU 353.74 sec
2021-01-06 10:26:48,470 Stage-1 map = 86%, reduce = 0%, Cumulative CPU 360.97 sec
2021-01-06 10:26:49,519 Stage-1 map = 87%, reduce = 0%, Cumulative CPU 364.47 sec
2021-01-06 10:26:52,587 Stage-1 map = 88%, reduce = 0%, Cumulative CPU 367.97 sec
2021-01-06 10:26:53,612 Stage-1 map = 89%, reduce = 0%, Cumulative CPU 371.48 sec
2021-01-06 10:26:56,714 Stage-1 map = 90%, reduce = 0%, Cumulative CPU 375.17 sec
2021-01-06 10:26:57,740 Stage-1 map = 91%, reduce = 0%, Cumulative CPU 378.75 sec
2021-01-06 10:27:01,849 Stage-1 map = 92%, reduce = 0%, Cumulative CPU 386.16 sec
2021-01-06 10:27:04,955 Stage-1 map = 93%, reduce = 0%, Cumulative CPU 389.82 sec
2021-01-06 10:27:05,978 Stage-1 map = 94%, reduce = 0%, Cumulative CPU 393.33 sec
2021-01-06 10:27:09,048 Stage-1 map = 95%, reduce = 0%, Cumulative CPU 397.01 sec
2021-01-06 10:27:10,079 Stage-1 map = 96%, reduce = 0%, Cumulative CPU 400.49 sec
2021-01-06 10:27:13,200 Stage-1 map = 97%, reduce = 0%, Cumulative CPU 404.31 sec
2021-01-06 10:27:16,273 Stage-1 map = 98%, reduce = 0%, Cumulative CPU 411.48 sec
2021-01-06 10:27:18,312 Stage-1 map = 99%, reduce = 0%, Cumulative CPU 414.97 sec
2021-01-06 10:27:20,354 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 418.35 sec
MapReduce Total cumulative CPU time: 6 minutes 58 seconds 350 msec
Ended Job = job_1609141291605_0029
MapReduce Jobs Launched:
Stage-Stage-1: Map: 117 Cumulative CPU: 418.35 sec HDFS Read: 62555036 HDFS Write: 629015 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 6 minutes 58 seconds 350 msec
OK
169387977 2011-01-30 00:00:00.0 PROD10 87
169387995 2011-05-10 00:00:00.0 PROD6 86
169388013 2011-04-14 00:00:00.0 PROD10 46
169388092 2010-06-07 00:00:00.0 PROD3 34
169388149 2010-06-21 00:00:00.0 PROD7 16
169388210 2011-10-05 00:00:00.0 PROD3 85
169388272 2012-02-27 00:00:00.0 PROD6 65
169388359 2012-08-30 00:00:00.0 PROD10 10
169388383 2011-11-09 00:00:00.0 PROD5 95
169388414 2011-07-25 00:00:00.0 PROD2 35
169388433 2011-05-18 00:00:00.0 PROD9 85
169388697 2010-12-25 00:00:00.0 PROD9 20
169388811 2012-04-03 00:00:00.0 PROD9 49
169388872 2010-11-23 00:00:00.0 PROD7 71
169388935 2012-04-18 00:00:00.0 PROD6 62
169389026 2011-03-21 00:00:00.0 PROD10 80
169389070 2010-09-09 00:00:00.0 PROD3 90
169389083 2010-05-20 00:00:00.0 PROD3 41
169389370 2011-01-28 00:00:00.0 PROD6 39
169389409 2012-08-09 00:00:00.0 PROD2 20
169389430 2012-08-23 00:00:00.0 PROD3 47
169389517 2011-10-25 00:00:00.0 PROD8 33
169389759 2010-09-03 00:00:00.0 PROD3 14
169389802 2010-08-22 00:00:00.0 PROD3 55
169389899 2012-01-14 00:00:00.0 PROD5 80
169389935 2010-06-10 00:00:00.0 PROD9 25
169390249 2010-09-05 00:00:00.0 PROD7 89
169390332 2012-07-28 00:00:00.0 PROD9 24
169390405 2011-09-30 00:00:00.0 PROD6 82
169390432 2010-09-04 00:00:00.0 PROD6 3
169390525 2011-04-24 00:00:00.0 PROD6 50
169390529 2012-06-29 00:00:00.0 PROD4 36
169390596 2011-09-29 00:00:00.0 PROD2 69
169390726 2011-01-09 00:00:00.0 PROD4 20
169390784 2011-08-20 00:00:00.0 PROD7 19
169390821 2010-07-14 00:00:00.0 PROD4 44
169390835 2010-09-24 00:00:00.0 PROD2 15
169390858 2012-08-08 00:00:00.0 PROD5 3
169391297 2011-03-24 00:00:00.0 PROD10 75
169391461 2012-03-14 00:00:00.0 PROD4 32
169391509 2010-11-23 00:00:00.0 PROD3 28
169391526 2012-03-28 00:00:00.0 PROD6 35
169391558 2011-02-21 00:00:00.0 PROD2 79
169391632 2010-10-09 00:00:00.0 PROD9 37
169391649 2012-09-22 00:00:00.0 PROD8 80
169391761 2011-03-15 00:00:00.0 PROD7 45
169391765 2011-01-23 00:00:00.0 PROD4 71
169391951 2012-03-08 00:00:00.0 PROD3 97
169392051 2011-05-13 00:00:00.0 PROD9 27
169392357 2010-05-22 00:00:00.0 PROD4 8
169392408 2011-01-06 00:00:00.0 PROD7 31
169392481 2012-07-25 00:00:00.0 PROD10 81
169392709 2012-08-12 00:00:00.0 PROD3 75
169392782 2012-07-28 00:00:00.0 PROD2 8
169392825 2011-03-14 00:00:00.0 PROD7 89
169392843 2010-10-31 00:00:00.0 PROD3 19
169392864 2011-05-19 00:00:00.0 PROD4 88
169392979 2012-05-11 00:00:00.0 PROD4 65
169393180 2011-05-02 00:00:00.0 PROD4 99
169393214 2011-10-27 00:00:00.0 PROD7 31
169393460 2012-07-27 00:00:00.0 PROD8 63
169393613 2011-03-03 00:00:00.0 PROD9 55
169393624 2010-04-24 00:00:00.0 PROD7 80
169393740 2011-08-17 00:00:00.0 PROD8 71
169394026 2012-06-07 00:00:00.0 PROD9 76
169394117 2012-02-29 00:00:00.0 PROD4 72
169394147 2011-12-23 00:00:00.0 PROD7 53
169394177 2011-01-07 00:00:00.0 PROD7 35
169394508 2012-05-24 00:00:00.0 PROD3 88
169394552 2011-07-16 00:00:00.0 PROD4 41
169394614 2010-08-17 00:00:00.0 PROD6 98
169394631 2010-09-23 00:00:00.0 PROD10 45
169394679 2011-01-22 00:00:00.0 PROD6 57
169394778 2011-09-03 00:00:00.0 PROD10 45
169394824 2011-06-04 00:00:00.0 PROD8 82
169394827 2010-07-14 00:00:00.0 PROD9 42
169394830 2012-03-09 00:00:00.0 PROD10 36
169394864 2010-09-17 00:00:00.0 PROD9 56
169394881 2011-07-01 00:00:00.0 PROD6 7
169395019 2011-11-17 00:00:00.0 PROD6 66
169395142 2012-01-21 00:00:00.0 PROD6 54
169395197 2012-08-10 00:00:00.0 PROD5 72
169395226 2010-09-20 00:00:00.0 PROD3 88
169395253 2011-12-31 00:00:00.0 PROD4 56
169395358 2010-07-16 00:00:00.0 PROD2 75
169395367 2010-12-16 00:00:00.0 PROD4 86
169395398 2012-01-07 00:00:00.0 PROD5 18
169395418 2011-05-08 00:00:00.0 PROD7 82
169395463 2011-08-23 00:00:00.0 PROD9 44
169395636 2011-01-16 00:00:00.0 PROD8 11
169395766 2012-06-05 00:00:00.0 PROD4 43
169395909 2011-12-10 00:00:00.0 PROD5 79
169395943 2012-05-11 00:00:00.0 PROD4 27
169395960 2012-01-17 00:00:00.0 PROD7 43
169396093 2011-08-28 00:00:00.0 PROD8 60
169396142 2010-11-13 00:00:00.0 PROD7 46
169396183 2011-06-16 00:00:00.0 PROD8 88
169396195 2010-10-06 00:00:00.0 PROD3 60
169396279 2012-06-18 00:00:00.0 PROD2 65
169396328 2011-05-14 00:00:00.0 PROD5 21
Time taken: 251.799 seconds, Fetched: 100 row(s)
hive>
三.数据块抽样
1) tablesample(n percent) 根据hive表数据的大小按比例抽取数据,并保存到新的hive表中。如:抽取原hive表中10%的数据
(注意:测试过程中发现,select语句不能带where条件且不支持子查询,可通过新建中间表或使用随机抽样解决)
create table xxx_new as select * from xxx tablesample(10 percent)
2)tablesample(n M) 指定抽样数据的大小,单位为M。
3)tablesample(n rows) 指定抽样数据的行数,其中n代表每个map任务均取n行数据,map数量可通过hive表的简单查询语句确认(关键词:number of mappers: x)
代码:
create table sample_test1 as select * from ods_fact_sale tablesample(10000 rows);
测试记录:
hive>
> create table sample_test1 as select * from ods_fact_sale tablesample(10000 rows);
Query ID = root_20210106103549_9aaeea0b-6414-40ea-af0b-2942c80ad3a4
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
21/01/06 10:35:50 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm69
Starting Job = job_1609141291605_0031, Tracking URL = http://hp3:8088/proxy/application_1609141291605_0031/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job -kill job_1609141291605_0031
Hadoop job information for Stage-1: number of mappers: 117; number of reducers: 0
2021-01-06 10:43:18,970 Stage-1 map = 0%, reduce = 0%
2021-01-06 10:43:25,150 Stage-1 map = 1%, reduce = 0%, Cumulative CPU 2.23 sec
2021-01-06 10:43:26,183 Stage-1 map = 2%, reduce = 0%, Cumulative CPU 4.48 sec
2021-01-06 10:43:29,274 Stage-1 map = 3%, reduce = 0%, Cumulative CPU 6.63 sec
2021-01-06 10:43:33,375 Stage-1 map = 4%, reduce = 0%, Cumulative CPU 10.81 sec
2021-01-06 10:43:34,415 Stage-1 map = 5%, reduce = 0%, Cumulative CPU 13.01 sec
2021-01-06 10:43:37,513 Stage-1 map = 6%, reduce = 0%, Cumulative CPU 15.14 sec
2021-01-06 10:43:38,545 Stage-1 map = 7%, reduce = 0%, Cumulative CPU 17.34 sec
2021-01-06 10:43:41,660 Stage-1 map = 9%, reduce = 0%, Cumulative CPU 22.66 sec
2021-01-06 10:43:45,757 Stage-1 map = 10%, reduce = 0%, Cumulative CPU 27.05 sec
2021-01-06 10:43:48,836 Stage-1 map = 11%, reduce = 0%, Cumulative CPU 29.23 sec
2021-01-06 10:43:49,866 Stage-1 map = 12%, reduce = 0%, Cumulative CPU 31.3 sec
2021-01-06 10:43:53,953 Stage-1 map = 14%, reduce = 0%, Cumulative CPU 36.41 sec
2021-01-06 10:43:57,029 Stage-1 map = 15%, reduce = 0%, Cumulative CPU 38.51 sec
2021-01-06 10:44:01,131 Stage-1 map = 16%, reduce = 0%, Cumulative CPU 42.67 sec
2021-01-06 10:44:02,159 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 44.8 sec
2021-01-06 10:44:05,239 Stage-1 map = 18%, reduce = 0%, Cumulative CPU 47.68 sec
2021-01-06 10:44:06,263 Stage-1 map = 19%, reduce = 0%, Cumulative CPU 49.82 sec
2021-01-06 10:44:09,337 Stage-1 map = 20%, reduce = 0%, Cumulative CPU 51.91 sec
2021-01-06 10:44:10,363 Stage-1 map = 21%, reduce = 0%, Cumulative CPU 54.01 sec
2021-01-06 10:44:14,485 Stage-1 map = 22%, reduce = 0%, Cumulative CPU 58.3 sec
2021-01-06 10:44:17,602 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 60.69 sec
2021-01-06 10:44:18,629 Stage-1 map = 24%, reduce = 0%, Cumulative CPU 62.8 sec
2021-01-06 10:44:20,695 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 62.8 sec
2021-01-06 10:44:21,722 Stage-1 map = 26%, reduce = 0%, Cumulative CPU 67.1 sec
2021-01-06 10:44:25,807 Stage-1 map = 27%, reduce = 0%, Cumulative CPU 72.28 sec
2021-01-06 10:44:28,901 Stage-1 map = 28%, reduce = 0%, Cumulative CPU 74.58 sec
2021-01-06 10:44:29,928 Stage-1 map = 29%, reduce = 0%, Cumulative CPU 76.67 sec
2021-01-06 10:44:33,007 Stage-1 map = 30%, reduce = 0%, Cumulative CPU 78.79 sec
2021-01-06 10:44:34,028 Stage-1 map = 31%, reduce = 0%, Cumulative CPU 80.96 sec
2021-01-06 10:44:37,102 Stage-1 map = 32%, reduce = 0%, Cumulative CPU 83.07 sec
2021-01-06 10:44:41,245 Stage-1 map = 33%, reduce = 0%, Cumulative CPU 87.27 sec
2021-01-06 10:44:42,273 Stage-1 map = 34%, reduce = 0%, Cumulative CPU 89.43 sec
2021-01-06 10:44:45,358 Stage-1 map = 35%, reduce = 0%, Cumulative CPU 91.67 sec
2021-01-06 10:44:46,384 Stage-1 map = 36%, reduce = 0%, Cumulative CPU 93.74 sec
2021-01-06 10:44:49,455 Stage-1 map = 37%, reduce = 0%, Cumulative CPU 95.87 sec
2021-01-06 10:44:50,475 Stage-1 map = 38%, reduce = 0%, Cumulative CPU 97.96 sec
2021-01-06 10:44:54,573 Stage-1 map = 39%, reduce = 0%, Cumulative CPU 102.2 sec
2021-01-06 10:44:57,641 Stage-1 map = 40%, reduce = 0%, Cumulative CPU 104.33 sec
2021-01-06 10:44:58,664 Stage-1 map = 41%, reduce = 0%, Cumulative CPU 106.43 sec
2021-01-06 10:45:01,731 Stage-1 map = 42%, reduce = 0%, Cumulative CPU 108.6 sec
2021-01-06 10:45:02,748 Stage-1 map = 43%, reduce = 0%, Cumulative CPU 110.65 sec
2021-01-06 10:45:05,815 Stage-1 map = 44%, reduce = 0%, Cumulative CPU 112.77 sec
2021-01-06 10:45:09,914 Stage-1 map = 45%, reduce = 0%, Cumulative CPU 117.8 sec
2021-01-06 10:45:11,961 Stage-1 map = 46%, reduce = 0%, Cumulative CPU 120.7 sec
2021-01-06 10:45:13,062 Stage-1 map = 47%, reduce = 0%, Cumulative CPU 122.89 sec
2021-01-06 10:45:15,114 Stage-1 map = 48%, reduce = 0%, Cumulative CPU 125.05 sec
2021-01-06 10:45:17,165 Stage-1 map = 49%, reduce = 0%, Cumulative CPU 127.48 sec
2021-01-06 10:45:19,206 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 129.7 sec
2021-01-06 10:45:23,292 Stage-1 map = 51%, reduce = 0%, Cumulative CPU 134.0 sec
2021-01-06 10:45:25,332 Stage-1 map = 52%, reduce = 0%, Cumulative CPU 136.29 sec
2021-01-06 10:45:27,388 Stage-1 map = 53%, reduce = 0%, Cumulative CPU 138.46 sec
2021-01-06 10:45:29,446 Stage-1 map = 54%, reduce = 0%, Cumulative CPU 140.55 sec
2021-01-06 10:45:31,492 Stage-1 map = 55%, reduce = 0%, Cumulative CPU 142.66 sec
2021-01-06 10:45:33,543 Stage-1 map = 56%, reduce = 0%, Cumulative CPU 144.88 sec
2021-01-06 10:45:37,635 Stage-1 map = 57%, reduce = 0%, Cumulative CPU 149.09 sec
2021-01-06 10:45:39,684 Stage-1 map = 58%, reduce = 0%, Cumulative CPU 151.19 sec
2021-01-06 10:45:41,722 Stage-1 map = 59%, reduce = 0%, Cumulative CPU 153.36 sec
2021-01-06 10:45:43,772 Stage-1 map = 60%, reduce = 0%, Cumulative CPU 155.63 sec
2021-01-06 10:45:45,845 Stage-1 map = 61%, reduce = 0%, Cumulative CPU 157.83 sec
2021-01-06 10:45:47,898 Stage-1 map = 62%, reduce = 0%, Cumulative CPU 160.0 sec
2021-01-06 10:45:50,964 Stage-1 map = 63%, reduce = 0%, Cumulative CPU 164.4 sec
2021-01-06 10:45:53,011 Stage-1 map = 64%, reduce = 0%, Cumulative CPU 166.52 sec
2021-01-06 10:45:56,082 Stage-1 map = 65%, reduce = 0%, Cumulative CPU 169.38 sec
2021-01-06 10:45:57,100 Stage-1 map = 66%, reduce = 0%, Cumulative CPU 171.54 sec
2021-01-06 10:45:59,150 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 173.78 sec
2021-01-06 10:46:01,196 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 175.86 sec
2021-01-06 10:46:05,279 Stage-1 map = 69%, reduce = 0%, Cumulative CPU 180.23 sec
2021-01-06 10:46:07,323 Stage-1 map = 70%, reduce = 0%, Cumulative CPU 182.35 sec
2021-01-06 10:46:09,367 Stage-1 map = 71%, reduce = 0%, Cumulative CPU 184.54 sec
2021-01-06 10:46:11,417 Stage-1 map = 72%, reduce = 0%, Cumulative CPU 186.79 sec
2021-01-06 10:46:13,466 Stage-1 map = 73%, reduce = 0%, Cumulative CPU 189.05 sec
2021-01-06 10:46:15,512 Stage-1 map = 74%, reduce = 0%, Cumulative CPU 191.37 sec
2021-01-06 10:46:19,604 Stage-1 map = 75%, reduce = 0%, Cumulative CPU 196.46 sec
2021-01-06 10:46:21,656 Stage-1 map = 76%, reduce = 0%, Cumulative CPU 198.58 sec
2021-01-06 10:46:23,700 Stage-1 map = 77%, reduce = 0%, Cumulative CPU 200.71 sec
2021-01-06 10:46:25,743 Stage-1 map = 78%, reduce = 0%, Cumulative CPU 202.83 sec
2021-01-06 10:46:27,790 Stage-1 map = 79%, reduce = 0%, Cumulative CPU 205.01 sec
2021-01-06 10:46:31,884 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 210.06 sec
2021-01-06 10:46:33,933 Stage-1 map = 81%, reduce = 0%, Cumulative CPU 212.22 sec
2021-01-06 10:46:36,001 Stage-1 map = 82%, reduce = 0%, Cumulative CPU 215.35 sec
2021-01-06 10:46:38,047 Stage-1 map = 83%, reduce = 0%, Cumulative CPU 217.44 sec
2021-01-06 10:46:40,097 Stage-1 map = 84%, reduce = 0%, Cumulative CPU 219.61 sec
2021-01-06 10:46:42,146 Stage-1 map = 85%, reduce = 0%, Cumulative CPU 221.86 sec
2021-01-06 10:46:45,215 Stage-1 map = 86%, reduce = 0%, Cumulative CPU 226.88 sec
2021-01-06 10:46:47,259 Stage-1 map = 87%, reduce = 0%, Cumulative CPU 229.08 sec
2021-01-06 10:46:50,350 Stage-1 map = 88%, reduce = 0%, Cumulative CPU 231.42 sec
2021-01-06 10:46:51,376 Stage-1 map = 89%, reduce = 0%, Cumulative CPU 233.67 sec
2021-01-06 10:46:53,421 Stage-1 map = 90%, reduce = 0%, Cumulative CPU 235.9 sec
2021-01-06 10:46:55,456 Stage-1 map = 91%, reduce = 0%, Cumulative CPU 238.13 sec
2021-01-06 10:46:59,543 Stage-1 map = 92%, reduce = 0%, Cumulative CPU 242.35 sec
2021-01-06 10:47:01,588 Stage-1 map = 93%, reduce = 0%, Cumulative CPU 244.55 sec
2021-01-06 10:47:03,636 Stage-1 map = 94%, reduce = 0%, Cumulative CPU 246.69 sec
2021-01-06 10:47:05,701 Stage-1 map = 95%, reduce = 0%, Cumulative CPU 248.95 sec
2021-01-06 10:47:07,755 Stage-1 map = 96%, reduce = 0%, Cumulative CPU 251.08 sec
2021-01-06 10:47:09,798 Stage-1 map = 97%, reduce = 0%, Cumulative CPU 253.23 sec
2021-01-06 10:47:13,877 Stage-1 map = 98%, reduce = 0%, Cumulative CPU 257.48 sec
2021-01-06 10:47:15,930 Stage-1 map = 99%, reduce = 0%, Cumulative CPU 259.56 sec
2021-01-06 10:47:17,973 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 262.3 sec
MapReduce Total cumulative CPU time: 4 minutes 22 seconds 300 msec
Ended Job = job_1609141291605_0031
Stage-4 is filtered out by condition resolver.
Stage-3 is selected by condition resolver.
Stage-5 is filtered out by condition resolver.
Launching Job 3 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
21/01/06 10:47:19 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm69
Starting Job = job_1609141291605_0032, Tracking URL = http://hp3:8088/proxy/application_1609141291605_0032/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job -kill job_1609141291605_0032
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2021-01-06 10:47:29,293 Stage-3 map = 0%, reduce = 0%
2021-01-06 10:47:37,518 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 5.67 sec
MapReduce Total cumulative CPU time: 5 seconds 670 msec
Ended Job = job_1609141291605_0032
Moving data to directory hdfs://nameservice1/user/hive/warehouse/test.db/sample_test1
MapReduce Jobs Launched:
Stage-Stage-1: Map: 117 Cumulative CPU: 262.3 sec HDFS Read: 61853025 HDFS Write: 47856547 HDFS EC Read: 0 SUCCESS
Stage-Stage-3: Map: 1 Cumulative CPU: 5.67 sec HDFS Read: 47866656 HDFS Write: 47847187 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 4 minutes 27 seconds 970 msec
OK
Time taken: 709.589 seconds
hive>
> select count(*) from sample_test1;
Query ID = root_20210106105110_0c94562d-021f-45ac-bf4e-d0fa98dcf849
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
21/01/06 10:51:10 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm69
Starting Job = job_1609141291605_0033, Tracking URL = http://hp3:8088/proxy/application_1609141291605_0033/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job -kill job_1609141291605_0033
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2021-01-06 10:51:17,757 Stage-1 map = 0%, reduce = 0%
2021-01-06 10:51:25,012 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.8 sec
2021-01-06 10:51:30,170 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.23 sec
MapReduce Total cumulative CPU time: 6 seconds 230 msec
Ended Job = job_1609141291605_0033
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 6.23 sec HDFS Read: 47855329 HDFS Write: 107 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 230 msec
OK
1170000
Time taken: 20.625 seconds, Fetched: 1 row(s)
hive>
参考
1.https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Sampling
2.https://blog.csdn.net/baidu_20183817/article/details/84099049