Hive 利用 on tez 引擎合并小文件

2023-12-31 20:33:28

Hive 利用 on tez 引擎合并小文件

标签（空格分隔）： Hive

\[f(N) + \sum_{i=2}^N f(N-i+1)*X_i\]



SET hive.exec.dynamic.partition=true;   
SET hive.exec.dynamic.partition.mode=nonstrict;  
set hive.exec.max.dynamic.partitions=3000;
set hive.exec.max.dynamic.partitions.pernode=500;
SET hive.tez.container.size=6656;
SET hive.tez.java.opts=-Xmx5120m;
set hive.merge.tezfiles=true;
set hive.merge.smallfiles.avgsize=1280000000;
set hive.merge.size.per.task=1280000000;
set hive.execution.engine=tez;


insert overwrite table zhaobo_test.lazy_st_rpt_priv_occupation_new partition (pt) select * from zhaobo_test.lazy_st_rpt_priv_occupation_new;


=============tez 合并========



Try using TEZ execution engine and then hive.merge.tezfiles. You might also want to specify the size as well.

set hive.execution.engine=tez; -- TEZ execution engine
set hive.merge.tezfiles=true; -- Notifying that merge step is required
set hive.merge.smallfiles.avgsize=128000000; --128MB
set hive.merge.size.per.task=128000000; -- 128MB













================合并============

If you want to go with MR engine then add following settings (I haven't tried it personally)
set hive.merge.mapredfiles=true; -- Notifying that merge step is required
set hive.merge.smallfiles.avgsize=128000000; --128MB
set hive.merge.size.per.task=128000000; -- 128MB
Above setting will spawn one more step to merge the files and approx size of each part file should be 128MB.

获取 partition.

beeline -u jdbc:hive2://10.111.55.163:10000 -n   deploy --showHeader=false --outputformat=tsv2 --silent=true -e "show partitions ods.t_city" > found_partitions.txt

开始执行

#!/bin/bash

for line in `cat found_partitions.txt`; 
do
    echo "the next partition is $line"
    partition=`(echo $line | sed -e 's/\//,/g' -e "s/=/='/g" -e "s/,/',/g")`\'
    beeline -u jdbc:hive2://10.111.55.163:10000 -n  deploy -e "alter table database.table partition($partition) concatenate" 
done

码农公寓

Hive 利用 on tez 引擎 合并小文件

获取 partition.

开始执行

相关文章

Hive 利用 on tez 引擎合并小文件