shell脚本实现hive增量加载

2024-01-27 19:30:40

实现思路：

1、每天凌晨将前一天增量的数据从业务系统导出到文本，并FTP到Hadoop集群某个主节点上

　　上传路径默认为：/mnt/data/crawler/

2、主节点上通过shell脚本调用hive命令加载本地增量温江到hive临时表

3、shell脚本中，使用hive sql 实现临时表中的增量数据更新或者新增增量数据到hive 主数据表中

实现步骤：
1.建表语句, 分别创建两张表test_temp, test 表

[SQL] 纯文本查看复制代码

crrawler.test_temp(

a.id string,

a.name string,

a.email string,

create_time string

)

row format delimited

fields terminated by ','

stored as textfile

;

+++++++++++++++++++++++++++++++++

create table crawler.test(

a.id string,

a.name string,

a.email string,

create_time string

)

partitioned by (dt string)

row format delimited

fields terminated by '\t'

stored as orc

;

2.编写处理加载本地增量数据到hive临时表的shell脚本test_temp.sh

[Shell] 纯文本查看复制代码

#! /bin/bash

##################################

# 调用格式: #

# 脚本名称 [yyyymmdd] #

# 日期参数可选，默认是系统日期-1 #

##################################

dt=''

table=test_temp

#获取当前系统日期

sysdate=`date +%Y%m%d`

#获取昨日日期,格式: YYYYMMDD

yesterday=`date -d yesterday +%Y%m%d`

#数据文件地址

file_path=/mnt/data/crawler/

if [ $# -eq 1 ]; then

dt=$1

elif [ $# -eq 0 ]; then

dt=$yesterday

else

echo "非法参数!"

#0-成功，非0-失败

exit 1

filename=$file_path$table'_'$dt'.txt'

if [ ! -e $filename ]; then

echo "$filename 数据文件不存在!"

exit 1

hive<<EOF

load data local inpath '$filename' overwrite into table crawler.$table;

EOF

if [ $? -eq 0 ]; then

echo ""

echo $dt "$table 加载成功!"

else

echo ""

echo $dt "$table 加载失败!"

3.增量加载临时数据到主数据表的shell脚本test.sh

[Shell] 纯文本查看复制代码

#! /bin/bash

##################################

table=test

#获取当前系统日期

sysdate=`date +%Y%m%d`

#实现增量覆盖

hive<<EOF

set hive.exec.dynamic.partition=true;

set hive.exec.dynamic.partition.mode=nonstrict;

insert overwrite table crawler.test partition (dt)

select a.id, a.name, a.email, a.create_time, a.create_time as dt

from (

select id, name, email, create_time from crawler.test_temp

union all

select t.id, t.name, t.email, t.create_time

from crawler.test t

left outer join crawler.test_temp t1

on t.id = t1.id

where t1.id is null

) a;

quit;

EOF

if [ $? -eq 0 ]; then

echo $sysdate $0 " 增量抽取完成!"

else

echo $sysdate $0 " 增量抽取失败!"

码农公寓

相关文章