Hive之Hive快捷查询（避免Mapruduce查询）

2023-10-10 13:26:04

避免Mapruduce查询

如果你想查询某个表的某一列，Hive默认是会启用MapReduce Job来完成这个任务。
但是，我们可以设置参数来避免Mapruduce查询，下面先介绍一下几个小的知识点，并不Mapruduce查询。

hive (zb_dwd)> select * from  user_id  limit 1;
OK
14510812944   
Time taken: 1.608 seconds, Fetched: 1 row(s)

这种情况下，Hive可以简单地读取user_id对应的存储目录下的文件，然后输出格式化后的内容到控制台。
对于WHERE语句中的过滤条件只是分区字段情况，也是无需MapRuduce过程。

hive (zb_dwd)> select * from  user_id  where date_id='20140512' limit 1;
OK
14510812944 
Time taken: 0.782 seconds, Fetched: 1 row(s)

参数设置

Hive查询的时候，启用MapReduce Job是会消耗系统开销的。对于这个问题，从Hive0.10.0版本开始，对于简单的不需要聚合的类似SELECT

from

###1.
set hive.fetch.task.conversion=more;开启了Fetch任务，所以对于上述简单的列查询不在启用MapReduce job

hive> set hive.fetch.task.conversion=more;
hive> SELECT id, money FROM m limit 10;
OK
1       122
1       185
1       231
1       292
1       316
1       329
1       355
1       356
1       362
1       364
Time taken: 0.138 seconds, Fetched: 10 row(s)

2.

bin/hive设置

bin/hive --hiveconf hive.fetch.task.conversion=more

3.

上面的两种方法都可以开启了Fetch任务，但是都是临时起作用的；如果你想一直启用这个功能，可以在${HIVE_HOME}/conf/hive-site.xml里面加入以下配置

<property>
  <name>hive.fetch.task.conversion</name>
  <value>more</value>
  <description>
    Some select queries can be converted to single FETCH task 
    minimizing latency.Currently the query should be single 
    sourced not having any subquery and should not have
    any aggregations or distincts (which incurrs RS), 
    lateral views and joins.
    1. minimal : SELECT STAR, FILTER on partition columns, LIMIT only
    2. more    : SELECT, FILTER, LIMIT only (+TABLESAMPLE, virtual columns)
  </description>
</property>

码农公寓

避免Mapruduce查询

参数设置

2.

3.

相关文章