先来看一下报错内容
20/07/17 10:20:07 INFO yarn.YarnAllocator: Will request 1 executor container(s), each with 1 core(s) and 1408 MB memory (including 384 MB of overhead) 20/07/17 10:20:07 INFO yarn.YarnAllocator: Submitted 1 unlocalized container requests. 20/07/17 10:20:07 WARN yarn.YarnAllocator: Cannot find executorId for container: container_1594881950724_0016_01_000264 20/07/17 10:20:07 INFO yarn.YarnAllocator: Completed container container_1594881950724_0016_01_000264 (state: COMPLETE, exit status: -100) 20/07/17 10:20:07 INFO yarn.YarnAllocator: Container marked as failed: container_1594881950724_0016_01_000264. Exit status: -100. Diagnostics: Container released by application. 20/07/17 10:20:08 INFO yarn.YarnAllocator: Launching container container_1594881950724_0016_01_000265 on host mip-test-hdp134 for executor with ID 264 20/07/17 10:20:08 INFO yarn.YarnAllocator: Received 1 containers from YARN, launching executors on 1 of them. 20/07/17 10:20:08 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0 20/07/17 10:20:08 INFO impl.ContainerManagementProtocolProxy: Opening proxy : mip-test-hdp134:23855 20/07/17 10:20:08 ERROR yarn.YarnAllocator: Failed to launch executor 264 on container container_1594881950724_0016_01_000265 org.apache.spark.SparkException: Exception while starting container container_1594881950724_0016_01_000265 on host mip-test-hdp134 at org.apache.spark.deploy.yarn.ExecutorRunnable.startContainer(ExecutorRunnable.scala:125) at org.apache.spark.deploy.yarn.ExecutorRunnable.run(ExecutorRunnable.scala:65) at org.apache.spark.deploy.yarn.YarnAllocator$$anonfun$runAllocatedContainers$1$$anon$2.run(YarnAllocator.scala:546) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:spark_shuffle does not exist at sun.reflect.GeneratedConstructorAccessor22.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106) at org.apache.hadoop.yarn.client.api.impl.NMClientImpl.startContainer(NMClientImpl.java:205) at org.apache.spark.deploy.yarn.ExecutorRunnable.startContainer(ExecutorRunnable.scala:122) ... 5 more 20/07/17 10:20:10 ERROR yarn.ApplicationMaster: RECEIVED SIGNAL TERM 20/07/17 10:20:10 INFO yarn.ApplicationMaster: Final app status: UNDEFINED, exitCode: 16, (reason: Shutdown hook called before final status was reported.) 20/07/17 10:20:10 INFO util.ShutdownHookManager: Shutdown hook called
重点是
Caused by: org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:spark_shuffle does not exist
一番搜索之后得到的解决方案是
在yarn-site.xml中添加如下配置
<property> <name>yarn.nodemanager.aux-services</name> <value>spark_shuffle,mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.spark_shuffle.class</name> <value>org.apache.spark.network.yarn.YarnShuffleService</value> </property>
之后重启yarn。
然而,只做这两个操作是不够的,需要检查一下${HADOOP_HOME}/share/hadoop/yarn/lib目录下是否有spark-*-yarn-shuffle.jar,其中*代表spark版本号,如果没有需要从spark的安装目录下拷贝过来。
spark-*-yarn-shuffle.jar在spark的yarn目录下(也有人说是在jar目录下,可能不同的spark版本有差别吧,未深究)
参考:
1. spark提交至yarn的的动态资源分配
https://www.cnblogs.com/hejunhong/p/12335258.html
2. Spark任务异常The auxService spaark_shuffle does not exist