原本要将ods层的newlogs表中365天的数据全部导入到dwd层的logs表,并按天分区,但是报错了,具体情况如下
执行sql前,开启动态分区并设置参数
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=3000;
set hive.exec.max.dynamic.partitions=6000;
set mapreduce.map.memory.mb=2048;
set mapreduce.reduce.memory.mb=3072;
以下是hql语句
insert overwrite table dwd_myshops.dwd_logs partition(date)
select userid,event,time,goodid,title,price,shopid,mark,
from_unixtime(cast(time/1000 as bigint),'yyyyMMdd') date
from ods_myshops.ods_newlogs;
报错内容如下
MapReduce Total cumulative CPU time: 17 seconds 220 msec
Ended Job = job_1616718205783_0010 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1616718205783_0010_m_000000 (and more) from job job_1616718205783_0010
Task with the most failures(4):
-----
Task ID:
task_1616718205783_0010_m_000000
URL:
http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1616718205783_0010&tipid=task_1616718205783_0010_m_000000
-----
Diagnostic Messages for this Task:
Error: Java heap space
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 17.22 sec HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 17 seconds 220 msec
后来修改了动态分区的参数
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=3000;
set hive.optimize.sort.dynamic.partition=true;
hive.optimize.sort.dynamic.partition=true
这个参数可以使得每个分区只产生一个文件,可以解决动态分区时的OOM问题
但会严重降低reduce处理并写入一个分区的速度
此时重新执行hql语句,按天分区成功