Consider boosting spark.yarn.executor.memoryOverhead

2023-11-04 13:50:58

前言

本文隶属于专栏《Spark异常问题汇总》，该专栏为笔者原创，引用请注明来源，不足和错误之处请在评论区帮忙指出，谢谢！

本专栏目录结构和参考文献请见 Spark异常问题汇总

问题描述

spark submit 报错：

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 9, bj-yarn002.aibee.cn, executor 7): ExecutorLostFailure (executor 7 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 5.1 GB of 4.5 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1524)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1512)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1511)

源数据

不到100M

spark-submit 参数

--executor-cores 1 --num-executors 5 --executor-memory 4g --driver-memory 4g

问题定位

这是一个与 Spark Executor 和YARN Container 共存相关的非常具体的错误。

Spark Executor 使用的内存已超过预定义的限制（通常由个别的高峰期导致的），这导致 YARN 使用前面提到的消息错误杀死 Container。

解决方案

默认情况下，“spark.yarn.executor.memoryOverhead”参数设置为 384 MB。

注意：从 Spark 2.3 开始，这个参数已经更名为：spark.executor.memoryOverhead

根据应用程序和数据负载的不同，此值可能较低。

此参数的建议值为“executorMemory * 0.10”。

我们可以将“spark.yarn.executor.memoryOverhead”的值增加到1GB，在 spark-submit 上将此添加到命令行：

–conf spark.yarn.executor.memoryOverhead=1024

关于 spark.executor.memoryOverhead 请参考我的博客——spark.executor.memoryOverhead

码农公寓

前言

问题描述

源数据

spark-submit 参数

问题定位

解决方案

相关文章