Hive 利用 on tez 引擎 合并小文件

Hive 利用 on tez 引擎 合并小文件

标签(空格分隔): Hive


\[f(N) + \sum_{i=2}^N f(N-i+1)*X_i\]



SET hive.exec.dynamic.partition=true;   
SET hive.exec.dynamic.partition.mode=nonstrict;  
set hive.exec.max.dynamic.partitions=3000;
set hive.exec.max.dynamic.partitions.pernode=500;
SET hive.tez.container.size=6656;
SET hive.tez.java.opts=-Xmx5120m;
set hive.merge.tezfiles=true;
set hive.merge.smallfiles.avgsize=1280000000;
set hive.merge.size.per.task=1280000000;
set hive.execution.engine=tez;


insert overwrite table zhaobo_test.lazy_st_rpt_priv_occupation_new partition (pt) select * from zhaobo_test.lazy_st_rpt_priv_occupation_new;

=============tez 合并========



Try using TEZ execution engine and then hive.merge.tezfiles. You might also want to specify the size as well.

set hive.execution.engine=tez; -- TEZ execution engine
set hive.merge.tezfiles=true; -- Notifying that merge step is required
set hive.merge.smallfiles.avgsize=128000000; --128MB
set hive.merge.size.per.task=128000000; -- 128MB













================合并============

If you want to go with MR engine then add following settings (I haven't tried it personally)
set hive.merge.mapredfiles=true; -- Notifying that merge step is required
set hive.merge.smallfiles.avgsize=128000000; --128MB
set hive.merge.size.per.task=128000000; -- 128MB
Above setting will spawn one more step to merge the files and approx size of each part file should be 128MB.

获取 partition.

beeline -u jdbc:hive2://10.111.55.163:10000 -n   deploy --showHeader=false --outputformat=tsv2 --silent=true -e "show partitions ods.t_city" > found_partitions.txt

开始执行

#!/bin/bash

for line in `cat found_partitions.txt`; 
do
    echo "the next partition is $line"
    partition=`(echo $line | sed -e 's/\//,/g' -e "s/=/='/g" -e "s/,/',/g")`\'
    beeline -u jdbc:hive2://10.111.55.163:10000 -n  deploy -e "alter table database.table partition($partition) concatenate" 
done
上一篇:having和where的区别


下一篇:illustrator插件--常用功能开发--画刀版2--js脚本开发--AI插件