这是我的分析,当然查阅书籍和网络。如有什么不对的,请各位批评指正。以下的类有的并不完全,只列出重要的方法。
如要转载,请注上作者以及出处。
一、源码阅读环境
需要安装jdk1.7.0版本及其以上版本,还需要安装Eclipse阅读hadoop源码。
Eclipse安装教程参见我的博客。
Hadoop源码官网下载。我下载的是2.7.3版本的。其中source是源代码工程,需要你编译才能执行。而binary是编译好的克执行文件。
如果你要搭建Hadoop集群,则下载binary的。如果阅读源代码,下载source的。
这里我们需要分析源代码,下载source,解压后文件名是hadoop-2.7.3-src。
把Hadoop导入Eclipse:
1.打开Eclipse,点击File->New->Java Project,会弹出如下图1所示:
图 1
我们就把Hadoop源码导入了Eclipse,但会报很多错误,具体解决方案参见我的博客 Eclipse中导入Hadoop源代码工程 。不过不影响我们看源码。
2.如果你想看某个类,但你不知道在哪?先定位到hadoop-2.7.3-src
以Job.java为例子,
在Windows中:你可以在文件资源管理器的搜索栏里输入想要搜索的类名,如下图2所示:
图 2
然后右键该文件,选择打开文件所在的位置。这个位置和Eclipse中hadoop-2.7.3-src项目的目录结构相对应。
比如我们搜到的Job.java在hadoop-2.7.3-src\hadoop-mapreduce-project\hadoop-mapreduce-client\hadoop-mapreduce-client-core\src\main\java\org\apache\hadoop\mapreduce中。
在Linux中,需要在终端通过find命令查找。
find . -name "Job.java" 第一个参数是路径,其中.表示当前目录,/表示根目录。如下图3所示:
图 3
我们就可以在Eclipse中hadoop-2.7.3-src项目中找到,如下图4所示:
图 4
二、分析前须知:
我们运行装好的集群时,要想启动集群,通常在主节点Master的hadoop-2.7.3/sbin/,目录下运行(命令行)./start-all.sh脚本,直接全部启动。如下代码所示:
start-all.sh
"${HADOOP_HDFS_HOME}"/sbin/start-dfs.sh --config $HADOOP_CONF_DIR 它会启动start-dfs.sh脚本
"${HADOOP_YARN_HOME}"/sbin/start-yarn.sh --config $HADOOP_CONF_DIR 它会启动start-yarn.sh脚本
#Add other possible options
nameStartOpt="$nameStartOpt $@" #---------------------------------------------------------
# namenodes NAMENODES=$($HADOOP_PREFIX/bin/hdfs getconf -namenodes) echo "Starting namenodes on [$NAMENODES]" "$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \
--config "$HADOOP_CONF_DIR" \
--hostnames "$NAMENODES" \
--script "$bin/hdfs" start namenode $nameStartOpt #---------------------------------------------------------
# datanodes (using default slaves file) if [ -n "$HADOOP_SECURE_DN_USER" ]; then
echo \
"Attempting to start secure cluster, skipping datanodes. " \
"Run start-secure-dns.sh as root to complete startup."
else
"$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \
--config "$HADOOP_CONF_DIR" \
--script "$bin/hdfs" start datanode $dataStartOpt
fi #---------------------------------------------------------
# secondary namenodes (if any) SECONDARY_NAMENODES=$($HADOOP_PREFIX/bin/hdfs getconf -secondarynamenodes >/dev/null) if [ -n "$SECONDARY_NAMENODES" ]; then
echo "Starting secondary namenodes [$SECONDARY_NAMENODES]" "$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \
--config "$HADOOP_CONF_DIR" \
--hostnames "$SECONDARY_NAMENODES" \
--script "$bin/hdfs" start secondarynamenode
fi #---------------------------------------------------------
# quorumjournal nodes (if any) SHARED_EDITS_DIR=$($HADOOP_PREFIX/bin/hdfs getconf -confKey dfs.namenode.shared.edits.dir >&-) case "$SHARED_EDITS_DIR" in
qjournal://*)
JOURNAL_NODES=$(echo "$SHARED_EDITS_DIR" | sed 's,qjournal://\([^/]*\)/.*,\1,g; s/;/ /g; s/:[0-9]*//g')
echo "Starting journal nodes [$JOURNAL_NODES]"
"$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \
--config "$HADOOP_CONF_DIR" \
--hostnames "$JOURNAL_NODES" \
--script "$bin/hdfs" start journalnode ;;
esac #---------------------------------------------------------
# ZK Failover controllers, if auto-HA is enabled
AUTOHA_ENABLED=$($HADOOP_PREFIX/bin/hdfs getconf -confKey dfs.ha.automatic-failover.enabled)
if [ "$(echo "$AUTOHA_ENABLED" | tr A-Z a-z)" = "true" ]; then
echo "Starting ZK Failover Controllers on NN hosts [$NAMENODES]"
"$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \
--config "$HADOOP_CONF_DIR" \
--hostnames "$NAMENODES" \
--script "$bin/hdfs" start zkfc
fi # eof
start-dfs.sh
我们可以看到该脚本会启动namenode,datanode以及secondarynamenode等。
# Start all yarn daemons. Run this on master node. echo "starting yarn daemons" bin=`dirname "${BASH_SOURCE-$0}"`
bin=`cd "$bin"; pwd` DEFAULT_LIBEXEC_DIR="$bin"/../libexec
HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}
. $HADOOP_LIBEXEC_DIR/yarn-config.sh # start resourceManager
"$bin"/yarn-daemon.sh --config $YARN_CONF_DIR start resourcemanager
# start nodeManager
"$bin"/yarn-daemons.sh --config $YARN_CONF_DIR start nodemanager
# start proxyserver
#"$bin"/yarn-daemon.sh --config $YARN_CONF_DIR start proxyserver
start-yarn.sh
我们可以看到该脚本会启动resourcemanager,nodemanager等。
这几个分别对应NameNode.java, DataNode.java, SecondaryNameNode.java, ResourceManager.java以及NodeManager.java。 而且都有public static void main ...(String argv[]){...}方法。就是启动后,它们都处于运行状态。
NameNode.java在hadoop-2.7.3-src/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
DataNode.java在hadoop-2.7.3-src/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
SecondaryNameNode.java在hadoop-2.7.3-src/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java
ResourceManager.java在hadoop-2.7.3-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
NodeManager.java在hadoop-2.7.3-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
三、YARN(MapReduce 2)
对于节点数超出4000的大型集群,MapReduce 1系统开始面临着扩展性的瓶颈。因此2010年雅虎的一个团队开始设计了下一代的MapRduce。由此YARN(Yet Another Resource Negotiator)应运而生。
YARN将MapReduce 1中的Jobtracker的两种角色划分为两个独立的守护进程:管理集群上资源使用的资源管理器和管理集群上运行任务生命周期的应用管理器。基本思路是:应用服务器与资源管理器协商集群的计算资源:容器(每个容器都有特定的内存上限),在这些容器上运行特定应用程序的进程。容器由集群节点上运行的节点管理器监视,以确保应用程序使用的资源不会超过分配给它的资源。
YARN设计的精妙之处在于不同的YARN应用可以在同一个集群上共存。此外,用户甚至有可能在同一个YARN集群上运行多个不同版本的MapReduce,这使得Mapreduce 升级过程更容易管理。
YARN上的MapReduce比经典的MapReduce 1包括更多的实体:
- 提交MapReduce作业的客户端
- YARN资源管理器,负责协调集群上计算资源的分配
- YARN节点管理器,负责启动和监视集群中机器上的计算容器(container)
- MapReduce应用程序master负责协调运行MapReduce作业的任务。它和MapReduce任务在容器中运行,这些容器由资源管理器分配并由节点管理器进行管理。
- 分布式文件系统(一般为HDFS),用来与其他实体间共享作业文件。
作业的运行过程如下图5所示,并具体分析
图 5 Hadoop 使用YARN 运行 MapReduce 的过程
我们在进行MR的编写完成后,会调用job.waitForCompletion(boolean)来将作业提交到集群并等待作业完成,在该方法内部,首先会判断Job状态并调用submit()方法进行提交,将任务提交到集群后会立刻返回。
Hadoop会提供一些自带的测试用例,其中比较常见的如WordCount等。我们就以WordCount为例。Hadoop提供的自带的WordCount.java在hadoop-2.7.3-src/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/WordCount.java
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.examples; import java.io.IOException;
import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser; //Hadoop 自带测试用例WordCount
public class WordCount { //继承泛型类Mapper
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{ //定义hadoop数据类型IntWritable实例one,并且赋值为1
private final static IntWritable one = new IntWritable(1);
//定义hadoop数据类型Text实例word
private Text word = new Text(); //实现map函数
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
//Java的字符串分解类,默认分隔符“空格”、“制表符(‘\t’)”、“换行符(‘\n’)”、“回车符(‘\r’)”
StringTokenizer itr = new StringTokenizer(value.toString());
//循环条件表示返回是否还有分隔符。
while (itr.hasMoreTokens()) {
/*
nextToken():返回从当前位置到下一个分隔符的字符串
word.set()Java数据类型与hadoop数据类型转换
*/
word.set(itr.nextToken());
//hadoop全局类context输出函数write;
context.write(word, one);
}
}
} //继承泛型类Reducer
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
//实例化IntWritable
private IntWritable result = new IntWritable(); //实现reduce
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
//循环values,并记录单词个数
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
//Java数据类型sum,转换为hadoop数据类型result
result.set(sum);
//输出结果到hdfs
context.write(key, result);
}
} public static void main(String[] args) throws Exception {
//实例化Configuration
Configuration conf = new Configuration();
/*
GenericOptionsParser是hadoop框架中解析命令行参数的基本类。
getRemainingArgs();返回数组【一组路径】
*/
/*
函数实现
public String[] getRemainingArgs() {
return (commandLine == null) ? new String[]{} : commandLine.getArgs();
}
*/
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
//如果只有一个路径,则输出需要有输入路径和输出路径
if (otherArgs.length < 2) {
System.err.println("Usage: wordcount <in> [<in>...] <out>");
System.exit(2);
}
//实例化job
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
/*
指定CombinerClass类
这里很多人对CombinerClass不理解
*/
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
//reduce输出Key的类型,是Text
job.setOutputKeyClass(Text.class);
// reduce输出Value的类型
job.setOutputValueClass(IntWritable.class);
//添加输入路径
for (int i = 0; i < otherArgs.length - 1; ++i) {
FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
}
//添加输出路径
FileOutputFormat.setOutputPath(job,
new Path(otherArgs[otherArgs.length - 1]));
//提交job,这句话调用Job.java中的waitForCompletion(...)方法
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
WordCount.java
最后一句job.waitForCompletion(true)调用Job.java中的waitForCompletion(...)方法。
1. 作业提交
(1)MapReduce 2 中的作业提交是使用与MapReduce 1 相同的用户API。即Job的submit()方法创建一个内部的JobSummiter实例,并且调用其submitJobInternal()方法。提交作业后,waitForCompletion()每秒轮询作业的进度,如果发现自上次报告后有改变,便把进度报告到控制台。作业完成后,如果成功就显示作业计数器;如果失败则导致作业失败的错误被记录到控制台。
其中submit()以及waitForCompletion()都在Job.java 类中。而submitJobInternal()在JobSubmitter.java类中。而且这两个类都hadoop-2.7.3-src\hadoop-mapreduce-project\hadoop-mapreduce-client\hadoop-mapreduce-client-core\src\main\java\org\apache\hadoop\mapreduce中。
/**
* The job submitter's view of the Job.
*
* <p>It allows the user to configure the
* job, submit it, control its execution, and query the state. The set methods
* only work until the job is submitted, afterwards they will throw an
* IllegalStateException. </p>
*
* <p>
* Normally the user creates the application, describes various facets of the
* job via {@link Job} and then submits the job and monitor its progress.</p>
*
* <p>Here is an example on how to submit a job:</p>
* <p><blockquote><pre>
* // Create a new Job
* Job job = Job.getInstance();
* job.setJarByClass(MyJob.class);
*
* // Specify various job-specific parameters
* job.setJobName("myjob");
*
* job.setInputPath(new Path("in"));
* job.setOutputPath(new Path("out"));
*
* job.setMapperClass(MyJob.MyMapper.class);
* job.setReducerClass(MyJob.MyReducer.class);
*
* // Submit the job, then poll for progress until the job is complete
* job.waitForCompletion(true);
* </pre></blockquote>
*
*
*/ /*
* Job作业提交流程:
* 1.我们在进行MR的编写完成后,
* a.会调用job.waitForCompletion(boolean)来将作业提交到集群并等待作业完成。
* b.在该方法内部,首先会判断Job状态并调用submit()方法进行提交,将任务提交到集群后会立刻返回;
* c.提交后,会判断waitForCompletion()的参数布尔变量,若为true的话,表示在作业进行的过程中会实时地进行状态监控
* 并打印输出其状态,调用monitorAndPrintJob()。否则,首先会获取线程休眠间隔时间(默认为5000ms),其次循环调用
* isComplete()方法来获取任务执行状态,未完成的话,启动线程休眠配置指定的时间,如此循环,知道任务执行完成或则失败。
* 2.submit()方法内部
* a.确保作业状态;
* b.调用setUserNewAPI()来进行api设置 ;
* c.调用connect()方法连接RM(ResourceManager);
* d.获取JobSubmitter对象,getJobSubmitter(fs,client)
* e.submitter对象进行提交作业的提交:submitJobInternal(Job.this,cluster)
* 3.在连接RM方法connnect()内部,
* a.会创建Cluster实例,Cluster构造函数内部重要的是初始化部分
* b.在初始化函数内部,使用java.util.ServiceLoader创建客户端代理,目前包含两个代理对象,
* LocalClientProtocolProvider(本地作业)和YarnClientProtocolProvider(yarn作业),
* 此处会根据mapreduce.framework.name的配置创建相应的客户端。
* 通过LocalClientProtocolProvider创建LocalJobRunner对象,在此就不进行详细说明了。
* 通过YarnClientProtocolProvider创建YarnRunner对象,YarnRunner保证当前JobClient运行在Yarn上。
* c.在YarnRunner实例化的过程中,创建客户端代理的流程如下:
* Cluster->ClientProtocol(YarnRunner)->ResourceMgrDelegate->client(YarnClientImpl)->rmClient(ApplicationClientProtocol)
* 在YarnClientImpl的serviceStart阶段会创建RPC代理,注意其中的协议.
* Cluster:主要是提供一个访问map/reduce集群的途径;
* YarnRunner: 保证当前JobClient运行在Yarn上,在实例化的过程中创建ResourceMgrDelegate;
* ResourceMgrDelegate:主要负责与RM进行信息交互;
* YarnClientImpl:主要负责实例化rmClient;
* rmClient:是各个客户端与RM交互的协议,主要是负责提交或终止Job任务和获取信息(applications、cluster metrics、nodes、queues和ACLs)
* 4.接下来,看最核心的部分,JobSubmitter.submitJobInternal(Job,Cluster),主要负责将作业提交到系统上运行,主要包括:
* a.校验作业的输入输出checkSpecs(Job),主要是确保输出目录是否设置且在FS上不存在;
* b.通过JobSubmissionFiles来获取Job运行时资源文件存放的目录,属性信息key为yarn.app.mapreduce.am.staging-dir,
* 默认的目录为/tmp/Hadoop-yarn/staging/hadoop/.staging/JobSubmissionFiles.getStagingDir():
* 在方法内部进行判断若目录存在则判断其所属关系及操作权限;不存在的话,创建并赋予权限700;
* c.为缓存中的Job组织必须的统计信息,设置主机名及地址信息,获取jobId,获取提交目录,如
* /tmp/hadoop-yarn/staging/root/.staging/job_1395778831382_0002;
* d.拷贝作业的jar和配置信息到分布式文件系统上的map-reduce系统目录,调用copyAndConfigureFiles(job,submitJobDir),
* 主要是拷贝-libjars,-files,-archives属性对应的信息至submitJobDir;
* e.计算作业的输入分片数,调用writeSplits()写job.split,job.splitmetainfo;
* f.调用writeConf(conf,submitJobFile)将job.xml文件写入到JobTracker的文件系统;
* g.提交作业到JobTracker并监控器状态,调用yarnRunner对象的submitJob(jobId,submitJobDir,job.getCredentials())
* 5.真正的提交在YarnRunner对象的submitJob(…)方法内部:
*
* 问题1:在进行MR编写时,Hadoop 2.x若引用了hadoop-*-mr1-*.jar的话,在使用Java进行调用的时候,会使用本地方式运行;
* 而使用hadoop jar进行调用时,才会提交到yarn环境上运行。
*/
//中间省略很多
@InterfaceAudience.Public
@InterfaceStability.Evolving
public class Job extends JobContextImpl implements JobContext {
private static final Log LOG = LogFactory.getLog(Job.class); private JobState state = JobState.DEFINE;
private JobStatus status;
private long statustime;
private Cluster cluster;
private ReservationId reservationId; /**
* @deprecated Use {@link #getInstance()}
*/
@Deprecated
public Job() throws IOException {
this(new JobConf(new Configuration()));
} /**
* @deprecated Use {@link #getInstance(Configuration)}
*/
@Deprecated
public Job(Configuration conf) throws IOException {
this(new JobConf(conf));
} /**
* @deprecated Use {@link #getInstance(Configuration, String)}
*/
@Deprecated
public Job(Configuration conf, String jobName) throws IOException {
this(new JobConf(conf));
setJobName(jobName);
} Job(JobConf conf) throws IOException {
super(conf, null);
// propagate existing user credentials to job
this.credentials.mergeAll(this.ugi.getCredentials());
this.cluster = null;
} Job(JobStatus status, JobConf conf) throws IOException {
this(conf);
setJobID(status.getJobID());
this.status = status;
state = JobState.RUNNING;
} /**
* Creates a new {@link Job} with no particular {@link Cluster} and a given jobName.
* A Cluster will be created from the conf parameter only when it's needed.
*
* The <code>Job</code> makes a copy of the <code>Configuration</code> so
* that any necessary internal modifications do not reflect on the incoming
* parameter.
*
* @param conf the configuration
* @return the {@link Job} , with no connection to a cluster yet.
* @throws IOException
*/
public static Job getInstance(Configuration conf, String jobName)
throws IOException {
// create with a null Cluster
Job result = getInstance(conf);
result.setJobName(jobName);
return result;
} private synchronized void connect()
throws IOException, InterruptedException, ClassNotFoundException {
if (cluster == null) {
//会创建Cluster实例,Cluster构造函数内部重要的是初始化部分
cluster =
ugi.doAs(new PrivilegedExceptionAction<Cluster>() {
public Cluster run()
throws IOException, InterruptedException,
ClassNotFoundException {
return new Cluster(getConfiguration());
}
});
}
} /**
* Submit the job to the cluster and return immediately.
* @throws IOException
*/
public void submit()
throws IOException, InterruptedException, ClassNotFoundException {
ensureState(JobState.DEFINE); //确保作业状态;
setUseNewAPI(); //调用setUserNewAPI()来进行api设置 ;
connect(); //调用connect()方法连接RM(ResourceManager);
//获取JobSubmitter对象,getJobSubmitter(fs,client)
final JobSubmitter submitter =
getJobSubmitter(cluster.getFileSystem(), cluster.getClient());
status = ugi.doAs(new PrivilegedExceptionAction<JobStatus>() {
public JobStatus run() throws IOException, InterruptedException,
ClassNotFoundException {
//submitter对象进行提交作业的提交:submitJobInternal(Job.this,cluster)
return submitter.submitJobInternal(Job.this, cluster);
}
});
state = JobState.RUNNING;
LOG.info("The url to track the job: " + getTrackingURL());
} /**
* Submit the job to the cluster and wait for it to finish.
* @param verbose print the progress to the user
* @return true if the job succeeded
* @throws IOException thrown if the communication with the
* <code>JobTracker</code> is lost
*/
//将作业提交到集群并等待作业完成。
public boolean waitForCompletion(boolean verbose
) throws IOException, InterruptedException,
ClassNotFoundException {
//判断Job状态并调用submit()方法进行提交,将任务提交到集群后会立刻返回;
if (state == JobState.DEFINE) {
submit();
}
/*提交后,会判断waitForCompletion()的参数布尔变量,若为true的话,表示在作业进行的过程中会实时地进行状态监控并打印输出其状态,
* 调用monitorAndPrintJob()。否则,首先会获取线程休眠间隔时间(默认为5000ms),其次循环调用isComplete()方法
* 来获取任务执行状态,未完成的话,启动线程休眠配置指定的时间,如此循环,知道任务执行完成或则失败。
*/
if (verbose) {
monitorAndPrintJob();
} else {
// get the completion poll interval from the client.
int completionPollIntervalMillis =
Job.getCompletionPollInterval(cluster.getConf());
while (!isComplete()) {
try {
Thread.sleep(completionPollIntervalMillis);
} catch (InterruptedException ie) {
}
}
}
return isSuccessful();
} }
Job.java
/**
* Provides a way to access information about the map/reduce cluster.
*/
//主要是提供一个访问map/reduce集群的途径;
@InterfaceAudience.Public
@InterfaceStability.Evolving
public class Cluster { @InterfaceStability.Evolving
public static enum JobTrackerStatus {INITIALIZING, RUNNING}; private ClientProtocolProvider clientProtocolProvider;
private ClientProtocol client;
private UserGroupInformation ugi;
private Configuration conf;
private FileSystem fs = null;
private Path sysDir = null;
private Path stagingAreaDir = null;
private Path jobHistoryDir = null;
private static final Log LOG = LogFactory.getLog(Cluster.class); private static ServiceLoader<ClientProtocolProvider> frameworkLoader =
ServiceLoader.load(ClientProtocolProvider.class); static {
ConfigUtil.loadResources();
} public Cluster(Configuration conf) throws IOException {
this(null, conf); //调用双参数构造函数
} public Cluster(InetSocketAddress jobTrackAddr, Configuration conf)
throws IOException {
this.conf = conf;
this.ugi = UserGroupInformation.getCurrentUser();
initialize(jobTrackAddr, conf); //
} /*
* 在初始化函数内部,使用java.util.ServiceLoader创建客户端代理,目前包含两个代理对象,
* LocalClientProtocolProvider(本地作业)和YarnClientProtocolProvider(yarn作业),
* 此处会根据mapreduce.framework.name的配置创建相应的客户端。
* 通过LocalClientProtocolProvider创建LocalJobRunner对象,在此就不进行详细说明了。
* 通过YarnClientProtocolProvider创建YarnRunner对象,YarnRunner保证当前JobClient运行在Yarn上。
*/
private void initialize(InetSocketAddress jobTrackAddr, Configuration conf)
throws IOException { synchronized (frameworkLoader) {
for (ClientProtocolProvider provider : frameworkLoader) { //
LOG.debug("Trying ClientProtocolProvider : "
+ provider.getClass().getName());
ClientProtocol clientProtocol = null;
try {
if (jobTrackAddr == null) {
clientProtocol = provider.create(conf);
} else {
clientProtocol = provider.create(jobTrackAddr, conf);
} if (clientProtocol != null) {
clientProtocolProvider = provider;
client = clientProtocol;
LOG.debug("Picked " + provider.getClass().getName()
+ " as the ClientProtocolProvider");
break;
}
else {
LOG.debug("Cannot pick " + provider.getClass().getName()
+ " as the ClientProtocolProvider - returned null protocol");
}
}
catch (Exception e) {
LOG.info("Failed to use " + provider.getClass().getName()
+ " due to error: ", e);
}
}
} if (null == clientProtocolProvider || null == client) {
throw new IOException(
"Cannot initialize Cluster. Please check your configuration for "
+ MRConfig.FRAMEWORK_NAME
+ " and the correspond server addresses.");
}
} ClientProtocol getClient() {
return client;
} Configuration getConf() {
return conf;
} }
Cluster.java
图6中的1:run job 就是submit()方法实现把作业提交到集群。这个方法内部有一个调用submitter.submitJobInternal(Job.this, cluster),即调用JobSubmitter.java中的submitJobInternal(...)方法,
subJobInternal(...)方法向系统提交作业,它内部调用submitClient.submitJob(...),即ClientProtocol.java的submitJob(...)方法。
@InterfaceAudience.Private
@InterfaceStability.Unstable
class JobSubmitter {
protected static final Log LOG = LogFactory.getLog(JobSubmitter.class);
private static final String SHUFFLE_KEYGEN_ALGORITHM = "HmacSHA1";
private static final int SHUFFLE_KEY_LENGTH = 64;
private FileSystem jtFs;
private ClientProtocol submitClient;
private String submitHostName;
private String submitHostAddress; JobSubmitter(FileSystem submitFs, ClientProtocol submitClient)
throws IOException {
this.submitClient = submitClient;
this.jtFs = submitFs;
} /**
* configure the jobconf of the user with the command line options of
* -libjars, -files, -archives.
* @param job
* @throws IOException
*/
private void copyAndConfigureFiles(Job job, Path jobSubmitDir)
throws IOException {
JobResourceUploader rUploader = new JobResourceUploader(jtFs);
rUploader.uploadFiles(job, jobSubmitDir); // Get the working directory. If not set, sets it to filesystem working dir
// This code has been added so that working directory reset before running
// the job. This is necessary for backward compatibility as other systems
// might use the public API JobConf#setWorkingDirectory to reset the working
// directory.
job.getWorkingDirectory();
} /**
* Internal method for submitting jobs to the system.
*
* <p>The job submission process involves:
* <ol>
* <li>
* Checking the input and output specifications of the job.
* </li>
* <li>
* Computing the {@link InputSplit}s for the job.
* </li>
* <li>
* Setup the requisite accounting information for the
* {@link DistributedCache} of the job, if necessary.
* </li>
* <li>
* Copying the job's jar and configuration to the map-reduce system
* directory on the distributed file-system.
* </li>
* <li>
* Submitting the job to the <code>JobTracker</code> and optionally
* monitoring it's status.
* </li>
* </ol></p>
* @param job the configuration to submit
* @param cluster the handle to the Cluster
* @throws ClassNotFoundException
* @throws InterruptedException
* @throws IOException
*/
//主要负责将作业提交到系统上运行,
JobStatus submitJobInternal(Job job, Cluster cluster)
throws ClassNotFoundException, InterruptedException, IOException { //validate the jobs output specs
//校验作业的输入输出checkSpecs(Job),主要是确保输出目录是否设置且在FS上不存在;
checkSpecs(job); Configuration conf = job.getConfiguration();
addMRFrameworkToDistributedCache(conf); /*
* 通过JobSubmissionFiles来获取Job运行时资源文件存放的目录,属性信息key为
* yarn.app.mapreduce.am.staging-dir,默认的目录为/tmp/Hadoop-yarn/staging/hadoop/.staging/
* JobSubmissionFiles.getStagingDir():在方法内部进行判断若目录存在则判断其所属关系及操作权限;
* 不存在的话,创建并赋予权限700;
*/
Path jobStagingArea = JobSubmissionFiles.getStagingDir(cluster, conf);
//configure the command line options correctly on the submitting dfs
//为缓存中的Job组织必须的统计信息,设置主机名及地址信息,获取jobId,
//获取提交目录,如/tmp/hadoop-yarn/staging/root/.staging/job_1395778831382_0002;
InetAddress ip = InetAddress.getLocalHost();
if (ip != null) {
submitHostAddress = ip.getHostAddress();
submitHostName = ip.getHostName();
conf.set(MRJobConfig.JOB_SUBMITHOST,submitHostName);
conf.set(MRJobConfig.JOB_SUBMITHOSTADDR,submitHostAddress);
}
JobID jobId = submitClient.getNewJobID();
job.setJobID(jobId);
Path submitJobDir = new Path(jobStagingArea, jobId.toString());
JobStatus status = null;
try {
conf.set(MRJobConfig.USER_NAME,
UserGroupInformation.getCurrentUser().getShortUserName());
conf.set("hadoop.http.filter.initializers",
"org.apache.hadoop.yarn.server.webproxy.amfilter.AmFilterInitializer");
conf.set(MRJobConfig.MAPREDUCE_JOB_DIR, submitJobDir.toString());
LOG.debug("Configuring job " + jobId + " with " + submitJobDir
+ " as the submit dir");
// get delegation token for the dir
TokenCache.obtainTokensForNamenodes(job.getCredentials(),
new Path[] { submitJobDir }, conf); populateTokenCache(conf, job.getCredentials()); // generate a secret to authenticate shuffle transfers
if (TokenCache.getShuffleSecretKey(job.getCredentials()) == null) {
KeyGenerator keyGen;
try {
keyGen = KeyGenerator.getInstance(SHUFFLE_KEYGEN_ALGORITHM);
keyGen.init(SHUFFLE_KEY_LENGTH);
} catch (NoSuchAlgorithmException e) {
throw new IOException("Error generating shuffle secret key", e);
}
SecretKey shuffleKey = keyGen.generateKey();
TokenCache.setShuffleSecretKey(shuffleKey.getEncoded(),
job.getCredentials());
}
if (CryptoUtils.isEncryptedSpillEnabled(conf)) {
conf.setInt(MRJobConfig.MR_AM_MAX_ATTEMPTS, 1);
LOG.warn("Max job attempts set to 1 since encrypted intermediate" +
"data spill is enabled");
} //拷贝作业的jar和配置信息到分布式文件系统上的map-reduce系统目录,调用copyAndConfigureFiles(job,submitJobDir),
//主要是拷贝-libjars,-files,-archives属性对应的信息至submitJobDir;
copyAndConfigureFiles(job, submitJobDir); Path submitJobFile = JobSubmissionFiles.getJobConfPath(submitJobDir); // Create the splits for the job
//计算作业的输入分片数,调用writeSplits()写job.split,job.splitmetainfo;
LOG.debug("Creating splits at " + jtFs.makeQualified(submitJobDir));
int maps = writeSplits(job, submitJobDir);
conf.setInt(MRJobConfig.NUM_MAPS, maps);
LOG.info("number of splits:" + maps); // write "queue admins of the queue to which job is being submitted"
// to job file.
String queue = conf.get(MRJobConfig.QUEUE_NAME,
JobConf.DEFAULT_QUEUE_NAME);
AccessControlList acl = submitClient.getQueueAdmins(queue);
conf.set(toFullPropertyName(queue,
QueueACL.ADMINISTER_JOBS.getAclName()), acl.getAclString()); // removing jobtoken referrals before copying the jobconf to HDFS
// as the tasks don't need this setting, actually they may break
// because of it if present as the referral will point to a
// different job.
TokenCache.cleanUpTokenReferral(conf); if (conf.getBoolean(
MRJobConfig.JOB_TOKEN_TRACKING_IDS_ENABLED,
MRJobConfig.DEFAULT_JOB_TOKEN_TRACKING_IDS_ENABLED)) {
// Add HDFS tracking ids
ArrayList<String> trackingIds = new ArrayList<String>();
for (Token<? extends TokenIdentifier> t :
job.getCredentials().getAllTokens()) {
trackingIds.add(t.decodeIdentifier().getTrackingId());
}
conf.setStrings(MRJobConfig.JOB_TOKEN_TRACKING_IDS,
trackingIds.toArray(new String[trackingIds.size()]));
} // Set reservation info if it exists
ReservationId reservationId = job.getReservationId();
if (reservationId != null) {
conf.set(MRJobConfig.RESERVATION_ID, reservationId.toString());
} // Write job file to submit dir
//调用writeConf(conf,submitJobFile)将job.xml文件写入到JobTracker的文件系统;
writeConf(conf, submitJobFile); //
// Now, actually submit the job (using the submit name)
//
//提交作业到JobTracker并监控器状态,调用yarnRunner对象的submitJob(jobId,submitJobDir,job.getCredentials())
//真正的提交在YarnRunner对象的submitJob(…)方法内部:
printTokens(jobId, job.getCredentials());
status = submitClient.submitJob(
jobId, submitJobDir.toString(), job.getCredentials());
if (status != null) {
return status;
} else {
throw new IOException("Could not launch job");
}
} finally {
if (status == null) {
LOG.info("Cleaning up the staging area " + submitJobDir);
if (jtFs != null && submitJobDir != null)
jtFs.delete(submitJobDir, true); }
}
} @SuppressWarnings("unchecked")
private <T extends InputSplit>
int writeNewSplits(JobContext job, Path jobSubmitDir) throws IOException,
InterruptedException, ClassNotFoundException {
Configuration conf = job.getConfiguration();
InputFormat<?, ?> input =
ReflectionUtils.newInstance(job.getInputFormatClass(), conf); List<InputSplit> splits = input.getSplits(job);
T[] array = (T[]) splits.toArray(new InputSplit[splits.size()]); // sort the splits into order based on size, so that the biggest
// go first
Arrays.sort(array, new SplitComparator());
JobSplitWriter.createSplitFiles(jobSubmitDir, conf,
jobSubmitDir.getFileSystem(conf), array);
return array.length;
} private int writeSplits(org.apache.hadoop.mapreduce.JobContext job,
Path jobSubmitDir) throws IOException,
InterruptedException, ClassNotFoundException {
JobConf jConf = (JobConf)job.getConfiguration();
int maps;
if (jConf.getUseNewMapper()) {
maps = writeNewSplits(job, jobSubmitDir);
} else {
maps = writeOldSplits(jConf, jobSubmitDir);
}
return maps;
} //method to write splits for old api mapper.
private int writeOldSplits(JobConf job, Path jobSubmitDir)
throws IOException {
org.apache.hadoop.mapred.InputSplit[] splits =
job.getInputFormat().getSplits(job, job.getNumMapTasks());
// sort the splits into order based on size, so that the biggest
// go first
Arrays.sort(splits, new Comparator<org.apache.hadoop.mapred.InputSplit>() {
public int compare(org.apache.hadoop.mapred.InputSplit a,
org.apache.hadoop.mapred.InputSplit b) {
try {
long left = a.getLength();
long right = b.getLength();
if (left == right) {
return 0;
} else if (left < right) {
return 1;
} else {
return -1;
}
} catch (IOException ie) {
throw new RuntimeException("Problem getting input split size", ie);
}
}
});
JobSplitWriter.createSplitFiles(jobSubmitDir, job,
jobSubmitDir.getFileSystem(job), splits);
return splits.length;
} }
JobSubmitter.java
难点1 (2)MapReduce 2 实现了ClientProtocol,当mapreduce.framework.name设置为yarn时启动。提交的过程与经典的非常相似。从资源管理器(而不是Jobtracker)获取新的作业ID,在YARN命名法中它是一个应用程序ID。
public interface ClientProtocol extends VersionedProtocol {...}是个接口类。它有两个实现类。分别是:(我们程序跟踪到这一步丢了:当类调用接口的方法时,会跟丢;这时就要看该接口的实现类)
public class LocalJobRunner implements ClientProtocol {...} //Implements MapReduce locally, in-process, for debugging.
public class YARNRunner implements ClientProtocol {...} //This class enables the current JobClient (0.22 hadoop) to run on YARN.
我们是在YARN上运行的,所以选用YARNRunner.java类。 对于LocalJobRunner,我们以后在讨论。
其中ClientProtocol.java类在hadoop-2.7.3-src\hadoop-mapreduce-project\hadoop-mapreduce-client\hadoop-mapreduce-client-core\src\main\java\org\apache\hadoop\mapreduce\protocol中
YARNRunner.java类在hadoop-2.7.3-src\hadoop-mapreduce-project\hadoop-mapreduce-client\hadoop-mapreduce-client-jobclient\src\main\java\org\apache\hadoop\mapred
/**
* This class enables the current JobClient (0.22 hadoop) to run on YARN.
*/
@SuppressWarnings("unchecked")
public class YARNRunner implements ClientProtocol { private static final Log LOG = LogFactory.getLog(YARNRunner.class); private final RecordFactory recordFactory = RecordFactoryProvider.getRecordFactory(null);
private ResourceMgrDelegate resMgrDelegate;
private ClientCache clientCache;
private Configuration conf;
private final FileContext defaultFileContext; /**
* Yarn runner incapsulates the client interface of
* yarn
* @param conf the configuration object for the client
*/ /*
* 在YarnRunner实例化的过程中,创建客户端代理的流程如下:
* Cluster->ClientProtocol(YarnRunner)->ResourceMgrDelegate->client(YarnClientImpl)->rmClient(ApplicationClientProtocol)
* 保证当前JobClient运行在Yarn上,在实例化的过程中创建ResourceMgrDelegate;
*/
public YARNRunner(Configuration conf) {
this(conf, new ResourceMgrDelegate(new YarnConfiguration(conf)));
} /**
* Similar to {@link #YARNRunner(Configuration)} but allowing injecting
* {@link ResourceMgrDelegate}. Enables mocking and testing.
* @param conf the configuration object for the client
* @param resMgrDelegate the resourcemanager client handle.
*/
public YARNRunner(Configuration conf, ResourceMgrDelegate resMgrDelegate) {
this(conf, resMgrDelegate, new ClientCache(conf, resMgrDelegate));
} /**
* Similar to {@link YARNRunner#YARNRunner(Configuration, ResourceMgrDelegate)}
* but allowing injecting {@link ClientCache}. Enable mocking and testing.
* @param conf the configuration object
* @param resMgrDelegate the resource manager delegate
* @param clientCache the client cache object.
*/
public YARNRunner(Configuration conf, ResourceMgrDelegate resMgrDelegate,
ClientCache clientCache) {
this.conf = conf;
try {
this.resMgrDelegate = resMgrDelegate;
this.clientCache = clientCache;
this.defaultFileContext = FileContext.getFileContext(this.conf);
} catch (UnsupportedFileSystemException ufe) {
throw new RuntimeException("Error in instantiating YarnClient", ufe);
}
} @Override
public JobStatus submitJob(JobID jobId, String jobSubmitDir, Credentials ts)
throws IOException, InterruptedException { addHistoryToken(ts); // Construct necessary information to start the MR AM
//这个appContext很重要, 里面拼接了各种环境变量, 以及启动App Master的脚本 这个对象会一直贯穿于各个类之间, 直到AM启动
ApplicationSubmissionContext appContext =
createApplicationSubmissionContext(conf, jobSubmitDir, ts); // Submit to ResourceManager
////通过ResourceMgrDelegate来sumbit这个appContext, ResourceMgrDelegate类是用来和Resource Manager在通讯的
try {
ApplicationId applicationId =
resMgrDelegate.submitApplication(appContext); //这个appMaster并不是我们说的ApplicationMaster对象, 这样的命名刚开始也把我迷惑了。。。
ApplicationReport appMaster = resMgrDelegate
.getApplicationReport(applicationId);
String diagnostics =
(appMaster == null ?
"application report is null" : appMaster.getDiagnostics());
if (appMaster == null
|| appMaster.getYarnApplicationState() == YarnApplicationState.FAILED
|| appMaster.getYarnApplicationState() == YarnApplicationState.KILLED) {
throw new IOException("Failed to run job : " +
diagnostics);
}
return clientCache.getClient(jobId).getJobStatus(jobId);
} catch (YarnException e) {
throw new IOException(e);
}
} //ApplicationSubmissionContext只需要记住amContainer的启动脚本在里面, 后面会用到。
public ApplicationSubmissionContext createApplicationSubmissionContext(
Configuration jobConf,
String jobSubmitDir, Credentials ts) throws IOException {
ApplicationId applicationId = resMgrDelegate.getApplicationId(); // Setup resource requirements
Resource capability = recordFactory.newRecordInstance(Resource.class);
capability.setMemory(
conf.getInt(
MRJobConfig.MR_AM_VMEM_MB, MRJobConfig.DEFAULT_MR_AM_VMEM_MB
)
);
capability.setVirtualCores(
conf.getInt(
MRJobConfig.MR_AM_CPU_VCORES, MRJobConfig.DEFAULT_MR_AM_CPU_VCORES
)
);
LOG.debug("AppMaster capability = " + capability); // Setup LocalResources
Map<String, LocalResource> localResources =
new HashMap<String, LocalResource>(); Path jobConfPath = new Path(jobSubmitDir, MRJobConfig.JOB_CONF_FILE); URL yarnUrlForJobSubmitDir = ConverterUtils
.getYarnUrlFromPath(defaultFileContext.getDefaultFileSystem()
.resolvePath(
defaultFileContext.makeQualified(new Path(jobSubmitDir))));
LOG.debug("Creating setup context, jobSubmitDir url is "
+ yarnUrlForJobSubmitDir); localResources.put(MRJobConfig.JOB_CONF_FILE,
createApplicationResource(defaultFileContext,
jobConfPath, LocalResourceType.FILE));
if (jobConf.get(MRJobConfig.JAR) != null) {
Path jobJarPath = new Path(jobConf.get(MRJobConfig.JAR));
LocalResource rc = createApplicationResource(
FileContext.getFileContext(jobJarPath.toUri(), jobConf),
jobJarPath,
LocalResourceType.PATTERN);
String pattern = conf.getPattern(JobContext.JAR_UNPACK_PATTERN,
JobConf.UNPACK_JAR_PATTERN_DEFAULT).pattern();
rc.setPattern(pattern);
localResources.put(MRJobConfig.JOB_JAR, rc);
} else {
// Job jar may be null. For e.g, for pipes, the job jar is the hadoop
// mapreduce jar itself which is already on the classpath.
LOG.info("Job jar is not present. "
+ "Not adding any jar to the list of resources.");
} // TODO gross hack
for (String s : new String[] {
MRJobConfig.JOB_SPLIT,
MRJobConfig.JOB_SPLIT_METAINFO }) {
localResources.put(
MRJobConfig.JOB_SUBMIT_DIR + "/" + s,
createApplicationResource(defaultFileContext,
new Path(jobSubmitDir, s), LocalResourceType.FILE));
} // Setup security tokens
DataOutputBuffer dob = new DataOutputBuffer();
ts.writeTokenStorageToStream(dob);
ByteBuffer securityTokens = ByteBuffer.wrap(dob.getData(), 0, dob.getLength()); // Setup the command to run the AM
//这里才是设定Appmaster类的地方,
//MRJobConfig.APPLICATION_MASTER_CLASS = org.apache.hadoop.mapreduce.v2.app.MRAppMaster
////所以最后通过命令在nodemanager那边执行的其实是MRAppMaster类的main方法
List<String> vargs = new ArrayList<String>(8);
vargs.add(MRApps.crossPlatformifyMREnv(jobConf, Environment.JAVA_HOME)
+ "/bin/java"); Path amTmpDir =
new Path(MRApps.crossPlatformifyMREnv(conf, Environment.PWD),
YarnConfiguration.DEFAULT_CONTAINER_TEMP_DIR);
vargs.add("-Djava.io.tmpdir=" + amTmpDir);
MRApps.addLog4jSystemProperties(null, vargs, conf); // Check for Java Lib Path usage in MAP and REDUCE configs
warnForJavaLibPath(conf.get(MRJobConfig.MAP_JAVA_OPTS,""), "map",
MRJobConfig.MAP_JAVA_OPTS, MRJobConfig.MAP_ENV);
warnForJavaLibPath(conf.get(MRJobConfig.MAPRED_MAP_ADMIN_JAVA_OPTS,""), "map",
MRJobConfig.MAPRED_MAP_ADMIN_JAVA_OPTS, MRJobConfig.MAPRED_ADMIN_USER_ENV);
warnForJavaLibPath(conf.get(MRJobConfig.REDUCE_JAVA_OPTS,""), "reduce",
MRJobConfig.REDUCE_JAVA_OPTS, MRJobConfig.REDUCE_ENV);
warnForJavaLibPath(conf.get(MRJobConfig.MAPRED_REDUCE_ADMIN_JAVA_OPTS,""), "reduce",
MRJobConfig.MAPRED_REDUCE_ADMIN_JAVA_OPTS, MRJobConfig.MAPRED_ADMIN_USER_ENV); // Add AM admin command opts before user command opts
// so that it can be overridden by user
String mrAppMasterAdminOptions = conf.get(MRJobConfig.MR_AM_ADMIN_COMMAND_OPTS,
MRJobConfig.DEFAULT_MR_AM_ADMIN_COMMAND_OPTS);
warnForJavaLibPath(mrAppMasterAdminOptions, "app master",
MRJobConfig.MR_AM_ADMIN_COMMAND_OPTS, MRJobConfig.MR_AM_ADMIN_USER_ENV);
vargs.add(mrAppMasterAdminOptions); // Add AM user command opts
String mrAppMasterUserOptions = conf.get(MRJobConfig.MR_AM_COMMAND_OPTS,
MRJobConfig.DEFAULT_MR_AM_COMMAND_OPTS);
warnForJavaLibPath(mrAppMasterUserOptions, "app master",
MRJobConfig.MR_AM_COMMAND_OPTS, MRJobConfig.MR_AM_ENV);
vargs.add(mrAppMasterUserOptions); if (jobConf.getBoolean(MRJobConfig.MR_AM_PROFILE,
MRJobConfig.DEFAULT_MR_AM_PROFILE)) {
final String profileParams = jobConf.get(MRJobConfig.MR_AM_PROFILE_PARAMS,
MRJobConfig.DEFAULT_TASK_PROFILE_PARAMS);
if (profileParams != null) {
vargs.add(String.format(profileParams,
ApplicationConstants.LOG_DIR_EXPANSION_VAR + Path.SEPARATOR
+ TaskLog.LogName.PROFILE));
}
} vargs.add(MRJobConfig.APPLICATION_MASTER_CLASS);
vargs.add("1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR +
Path.SEPARATOR + ApplicationConstants.STDOUT);
vargs.add("2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR +
Path.SEPARATOR + ApplicationConstants.STDERR); Vector<String> vargsFinal = new Vector<String>(8);
// Final command
StringBuilder mergedCommand = new StringBuilder();
for (CharSequence str : vargs) {
mergedCommand.append(str).append(" ");
}
vargsFinal.add(mergedCommand.toString()); LOG.debug("Command to launch container for ApplicationMaster is : "
+ mergedCommand); // Setup the CLASSPATH in environment
// i.e. add { Hadoop jars, job jar, CWD } to classpath.
Map<String, String> environment = new HashMap<String, String>();
MRApps.setClasspath(environment, conf); // Shell
environment.put(Environment.SHELL.name(),
conf.get(MRJobConfig.MAPRED_ADMIN_USER_SHELL,
MRJobConfig.DEFAULT_SHELL)); // Add the container working directory in front of LD_LIBRARY_PATH
MRApps.addToEnvironment(environment, Environment.LD_LIBRARY_PATH.name(),
MRApps.crossPlatformifyMREnv(conf, Environment.PWD), conf); // Setup the environment variables for Admin first
MRApps.setEnvFromInputString(environment,
conf.get(MRJobConfig.MR_AM_ADMIN_USER_ENV,
MRJobConfig.DEFAULT_MR_AM_ADMIN_USER_ENV), conf);
// Setup the environment variables (LD_LIBRARY_PATH, etc)
MRApps.setEnvFromInputString(environment,
conf.get(MRJobConfig.MR_AM_ENV), conf); // Parse distributed cache
MRApps.setupDistributedCache(jobConf, localResources); Map<ApplicationAccessType, String> acls
= new HashMap<ApplicationAccessType, String>(2);
acls.put(ApplicationAccessType.VIEW_APP, jobConf.get(
MRJobConfig.JOB_ACL_VIEW_JOB, MRJobConfig.DEFAULT_JOB_ACL_VIEW_JOB));
acls.put(ApplicationAccessType.MODIFY_APP, jobConf.get(
MRJobConfig.JOB_ACL_MODIFY_JOB,
MRJobConfig.DEFAULT_JOB_ACL_MODIFY_JOB)); // Setup ContainerLaunchContext for AM container
//根据前面的拼接的命令生成AM的container 在后面会通过这个对象来启动container 从而启动MRAppMaster
ContainerLaunchContext amContainer =
ContainerLaunchContext.newInstance(localResources, environment,
vargsFinal, null, securityTokens, acls); Collection<String> tagsFromConf =
jobConf.getTrimmedStringCollection(MRJobConfig.JOB_TAGS); // Set up the ApplicationSubmissionContext
ApplicationSubmissionContext appContext =
recordFactory.newRecordInstance(ApplicationSubmissionContext.class);
appContext.setApplicationId(applicationId); // ApplicationId
appContext.setQueue( // Queue name
jobConf.get(JobContext.QUEUE_NAME,
YarnConfiguration.DEFAULT_QUEUE_NAME));
// add reservationID if present
ReservationId reservationID = null;
try {
reservationID =
ReservationId.parseReservationId(jobConf
.get(JobContext.RESERVATION_ID));
} catch (NumberFormatException e) {
// throw exception as reservationid as is invalid
String errMsg =
"Invalid reservationId: " + jobConf.get(JobContext.RESERVATION_ID)
+ " specified for the app: " + applicationId;
LOG.warn(errMsg);
throw new IOException(errMsg);
}
if (reservationID != null) {
appContext.setReservationID(reservationID);
LOG.info("SUBMITTING ApplicationSubmissionContext app:" + applicationId
+ " to queue:" + appContext.getQueue() + " with reservationId:"
+ appContext.getReservationID());
}
appContext.setApplicationName( // Job name
jobConf.get(JobContext.JOB_NAME,
YarnConfiguration.DEFAULT_APPLICATION_NAME));
appContext.setCancelTokensWhenComplete(
conf.getBoolean(MRJobConfig.JOB_CANCEL_DELEGATION_TOKEN, true));
//设置AM Container
appContext.setAMContainerSpec(amContainer); // AM Container
appContext.setMaxAppAttempts(
conf.getInt(MRJobConfig.MR_AM_MAX_ATTEMPTS,
MRJobConfig.DEFAULT_MR_AM_MAX_ATTEMPTS));
appContext.setResource(capability);
appContext.setApplicationType(MRJobConfig.MR_APPLICATION_TYPE);
if (tagsFromConf != null && !tagsFromConf.isEmpty()) {
appContext.setApplicationTags(new HashSet<String>(tagsFromConf));
} return appContext;
} }
YARNRunner.java
JobSubmitter类中subJobInternal(...)方法中调用JobID jobId = submitClient.getNewJobID(),完成了图6中2:get new application
getNewjobID()是ClientProtocol.java和YARNRunner.java类中的方法。在YARNRunner.java类中,getNewjobID()方法进一步调用ResourceMgrDelegate.java类中的getNewjobID()方法。
YARNRunner类中的构造函数以及submitJob(...)方法内部有resMgrDelegate.submitApplication(appContext),即调用ResourceMgrDelegate.java的方法submitApplication(...)。
ResourceMgrDelegate.java类在hadoop-2.7.3-src\hadoop-mapreduce-project\hadoop-mapreduce-client\hadoop-mapreduce-client-jobclient\src\main\java\org\apache\hadoop\mapred
(3)作业客户端检查作业的输出说明,计算输入分片(虽然有选项yarn.app.mapreduce.am.compute-splits-in-cluster在集群上来产生分片,这可以使具有多个分片的作业从中受益)并将作业资源(包括作业JAR、配置和分片信息)复制到HDFS。
完成了图6的3:copy job resources
JobSubmitter类中subJobInternal(...)方法中调用copyAndConfigureFiles(job, submitJobDir),将作业资源复制到HDFS。
分片:具体的分片细节由InputSplitFormat指定。分片的规则为FileInputFormat中的getSplits()方法指定。
subJobInternal(...)方法中调用maps = writeSplits(job, submitJobDir)实现分片。writeSplits(...)方法又会调用maps = writeNewSplits(job, jobSubmitDir)或者writeOldSplits(jConf, jobSubmitDir)这两个方法实现分片。这两个方法都会调用List<InputSplit> splits = input.getSplits(job)方法实现分片。而getSplits(...)方法是抽象类public abstract class InputFormat<K, V> {...}的方法。
(4)最后,通过调用资源管理器上的submitApplication()方法提交作业。
完成了图6的4:submit application
JobSubmitter类中subJobInternal(...)方法中调用status = submitClient.submitJob(...)方法,真正的提交作业----->submitJob(...)方法是接口ClientProtocol.java和它的实现类YARNRunner.java类中的方法---->在YARNRunner.java类中,submitJob(...)方法又会进一步调用ResourceMgrDelegate.java类中的submitApplication(...)方法---->而ResourceMgrDelegate.java类中的submitApplication(...)方法又进一步调用它父类抽象类YarnClient.java的submitApplication(...)方法---->抽象类YarnClient.java的submitApplication(...)方法进一步调用子类YarnClientImpl.java的submitApplication(...)方法---->
具体如下:
public abstract class YarnClient extends AbstractService {...}是个抽象类,它有两个扩展类,分别是:
public class ResourceMgrDelegate extends YarnClient {...}
public class YarnClientImpl extends YarnClient {...}
public class ResourceMgrDelegate extends YarnClient {
private static final Log LOG = LogFactory.getLog(ResourceMgrDelegate.class); private YarnConfiguration conf;
private ApplicationSubmissionContext application;
private ApplicationId applicationId;
@Private
@VisibleForTesting
protected YarnClient client;
private Text rmDTService; /**
* Delegate responsible for communicating with the Resource Manager's
* {@link ApplicationClientProtocol}.
* @param conf the configuration object.
*/
//主要负责与RM进行信息交互
public ResourceMgrDelegate(YarnConfiguration conf) {
super(ResourceMgrDelegate.class.getName());
this.conf = conf;
this.client = YarnClient.createYarnClient();
init(conf);
start();
} @Override
protected void serviceInit(Configuration conf) throws Exception {
client.init(conf);
super.serviceInit(conf);
} @Override
protected void serviceStart() throws Exception {
client.start();
super.serviceStart();
} @Override
protected void serviceStop() throws Exception {
client.stop();
super.serviceStop();
} public String getFilesystemName() throws IOException, InterruptedException {
return FileSystem.get(conf).getUri().toString();
} public JobID getNewJobID() throws IOException, InterruptedException {
try {
this.application = client.createApplication().getApplicationSubmissionContext();
this.applicationId = this.application.getApplicationId();
return TypeConverter.fromYarn(applicationId);
} catch (YarnException e) {
throw new IOException(e);
}
} public ApplicationId getApplicationId() {
return applicationId;
} @Override
public YarnClientApplication createApplication() throws
YarnException, IOException {
return client.createApplication();
}
//
@Override
public ApplicationId
submitApplication(ApplicationSubmissionContext appContext)
throws YarnException, IOException {
return client.submitApplication(appContext);
} }
ResourceMgrDelegate.java
@InterfaceAudience.Public
@InterfaceStability.Stable
public abstract class YarnClient extends AbstractService { /**
* Create a new instance of YarnClient.
*/
@Public
public static YarnClient createYarnClient() {
YarnClient client = new YarnClientImpl(); //新建YarnClientImpl对象
return client;
} @Private
protected YarnClient(String name) {
super(name);
} /**
* <p>
* Submit a new application to <code>YARN.</code> It is a blocking call - it
* will not return {@link ApplicationId} until the submitted application is
* submitted successfully and accepted by the ResourceManager.
* </p>
*
* <p>
* Users should provide an {@link ApplicationId} as part of the parameter
* {@link ApplicationSubmissionContext} when submitting a new application,
* otherwise it will throw the {@link ApplicationIdNotProvidedException}.
* </p>
*
* <p>This internally calls {@link ApplicationClientProtocol#submitApplication
* (SubmitApplicationRequest)}, and after that, it internally invokes
* {@link ApplicationClientProtocol#getApplicationReport
* (GetApplicationReportRequest)} and waits till it can make sure that the
* application gets properly submitted. If RM fails over or RM restart
* happens before ResourceManager saves the application's state,
* {@link ApplicationClientProtocol
* #getApplicationReport(GetApplicationReportRequest)} will throw
* the {@link ApplicationNotFoundException}. This API automatically resubmits
* the application with the same {@link ApplicationSubmissionContext} when it
* catches the {@link ApplicationNotFoundException}</p>
*
* @param appContext
* {@link ApplicationSubmissionContext} containing all the details
* needed to submit a new application
* @return {@link ApplicationId} of the accepted application
* @throws YarnException
* @throws IOException
* @see #createApplication()
*/
public abstract ApplicationId submitApplication(
ApplicationSubmissionContext appContext) throws YarnException,
IOException; }
YarnClient.java
@Private
@Unstable
//主要负责实例化rmClient;
public class YarnClientImpl extends YarnClient { private static final Log LOG = LogFactory.getLog(YarnClientImpl.class); protected ApplicationClientProtocol rmClient;
protected long submitPollIntervalMillis;
private long asyncApiPollIntervalMillis;
private long asyncApiPollTimeoutMillis;
protected AHSClient historyClient;
private boolean historyServiceEnabled;
protected TimelineClient timelineClient;
@VisibleForTesting
Text timelineService;
@VisibleForTesting
String timelineDTRenewer;
protected boolean timelineServiceEnabled;
protected boolean timelineServiceBestEffort; private static final String ROOT = "root"; public YarnClientImpl() {
super(YarnClientImpl.class.getName());
} @SuppressWarnings("deprecation")
@Override
protected void serviceInit(Configuration conf) throws Exception {
asyncApiPollIntervalMillis =
conf.getLong(YarnConfiguration.YARN_CLIENT_APPLICATION_CLIENT_PROTOCOL_POLL_INTERVAL_MS,
YarnConfiguration.DEFAULT_YARN_CLIENT_APPLICATION_CLIENT_PROTOCOL_POLL_INTERVAL_MS);
asyncApiPollTimeoutMillis =
conf.getLong(YarnConfiguration.YARN_CLIENT_APPLICATION_CLIENT_PROTOCOL_POLL_TIMEOUT_MS,
YarnConfiguration.DEFAULT_YARN_CLIENT_APPLICATION_CLIENT_PROTOCOL_POLL_TIMEOUT_MS);
submitPollIntervalMillis = asyncApiPollIntervalMillis;
if (conf.get(YarnConfiguration.YARN_CLIENT_APP_SUBMISSION_POLL_INTERVAL_MS)
!= null) {
submitPollIntervalMillis = conf.getLong(
YarnConfiguration.YARN_CLIENT_APP_SUBMISSION_POLL_INTERVAL_MS,
YarnConfiguration.DEFAULT_YARN_CLIENT_APPLICATION_CLIENT_PROTOCOL_POLL_INTERVAL_MS);
} if (conf.getBoolean(YarnConfiguration.APPLICATION_HISTORY_ENABLED,
YarnConfiguration.DEFAULT_APPLICATION_HISTORY_ENABLED)) {
historyServiceEnabled = true;
historyClient = AHSClient.createAHSClient();
historyClient.init(conf);
} if (conf.getBoolean(YarnConfiguration.TIMELINE_SERVICE_ENABLED,
YarnConfiguration.DEFAULT_TIMELINE_SERVICE_ENABLED)) {
timelineServiceEnabled = true;
timelineClient = createTimelineClient();
timelineClient.init(conf);
timelineDTRenewer = getTimelineDelegationTokenRenewer(conf);
timelineService = TimelineUtils.buildTimelineTokenService(conf);
} timelineServiceBestEffort = conf.getBoolean(
YarnConfiguration.TIMELINE_SERVICE_CLIENT_BEST_EFFORT,
YarnConfiguration.DEFAULT_TIMELINE_SERVICE_CLIENT_BEST_EFFORT);
super.serviceInit(conf);
} TimelineClient createTimelineClient() throws IOException, YarnException {
return TimelineClient.createTimelineClient();
} @Override
protected void serviceStart() throws Exception {
//rmClient是各个客户端与RM交互的协议,主要是负责提交或终止Job任务和获取信息(applications、cluster metrics、nodes、queues和ACLs)
try {
rmClient = ClientRMProxy.createRMProxy(getConfig(),
ApplicationClientProtocol.class);
if (historyServiceEnabled) {
historyClient.start();
}
if (timelineServiceEnabled) {
timelineClient.start();
}
} catch (IOException e) {
throw new YarnRuntimeException(e);
}
super.serviceStart();
} @Override
protected void serviceStop() throws Exception {
if (this.rmClient != null) {
RPC.stopProxy(this.rmClient);
}
if (historyServiceEnabled) {
historyClient.stop();
}
if (timelineServiceEnabled) {
timelineClient.stop();
}
super.serviceStop();
} private GetNewApplicationResponse getNewApplication()
throws YarnException, IOException {
GetNewApplicationRequest request =
Records.newRecord(GetNewApplicationRequest.class);
return rmClient.getNewApplication(request);
} @Override
public YarnClientApplication createApplication()
throws YarnException, IOException {
ApplicationSubmissionContext context = Records.newRecord
(ApplicationSubmissionContext.class);
GetNewApplicationResponse newApp = getNewApplication();
ApplicationId appId = newApp.getApplicationId();
context.setApplicationId(appId);
return new YarnClientApplication(newApp, context);
} @Override
public ApplicationId
submitApplication(ApplicationSubmissionContext appContext)
throws YarnException, IOException {
ApplicationId applicationId = appContext.getApplicationId();
if (applicationId == null) {
throw new ApplicationIdNotProvidedException(
"ApplicationId is not provided in ApplicationSubmissionContext");
}
//将appContext设置到一个request里面
SubmitApplicationRequest request =
Records.newRecord(SubmitApplicationRequest.class);
request.setApplicationSubmissionContext(appContext); // Automatically add the timeline DT into the CLC
// Only when the security and the timeline service are both enabled
if (isSecurityEnabled() && timelineServiceEnabled) {
addTimelineDelegationToken(appContext.getAMContainerSpec());
} //TODO: YARN-1763:Handle RM failovers during the submitApplication call.
////通过rmClient提交request, 这个rmClient其实就是ClientRMService类,
//是用来和Resource Manager做RPC的call, 通过这个类, 可以直接和RM对话
rmClient.submitApplication(request); //这里提交任务 int pollCount = 0;
long startTime = System.currentTimeMillis();
EnumSet<YarnApplicationState> waitingStates =
EnumSet.of(YarnApplicationState.NEW,
YarnApplicationState.NEW_SAVING,
YarnApplicationState.SUBMITTED);
EnumSet<YarnApplicationState> failToSubmitStates =
EnumSet.of(YarnApplicationState.FAILED,
YarnApplicationState.KILLED);
////一直循环, 直到状态变为NEW为止, 如果长时间状态没变, 那么就timeout
while (true) {
try {
ApplicationReport appReport = getApplicationReport(applicationId);
YarnApplicationState state = appReport.getYarnApplicationState();
if (!waitingStates.contains(state)) {
if(failToSubmitStates.contains(state)) {
throw new YarnException("Failed to submit " + applicationId +
" to YARN : " + appReport.getDiagnostics());
}
LOG.info("Submitted application " + applicationId);
break;
} long elapsedMillis = System.currentTimeMillis() - startTime;
if (enforceAsyncAPITimeout() &&
elapsedMillis >= asyncApiPollTimeoutMillis) {
throw new YarnException("Timed out while waiting for application " +
applicationId + " to be submitted successfully");
} // Notify the client through the log every 10 poll, in case the client
// is blocked here too long.
if (++pollCount % 10 == 0) {
LOG.info("Application submission is not finished, " +
"submitted application " + applicationId +
" is still in " + state);
}
try {
Thread.sleep(submitPollIntervalMillis);
} catch (InterruptedException ie) {
LOG.error("Interrupted while waiting for application "
+ applicationId
+ " to be successfully submitted.");
}
} catch (ApplicationNotFoundException ex) {
// FailOver or RM restart happens before RMStateStore saves
// ApplicationState
LOG.info("Re-submit application " + applicationId + "with the " +
"same ApplicationSubmissionContext");
rmClient.submitApplication(request);
}
} return applicationId;
} }
YarnClientImpl.java
YARNRunner类的构造函数---->调用ResourceMgrDelegate的构造函数---->调用抽象类YarnClient的createYarnClient()方法---->调用YarnClientImpl的构造函数生成对象并返回给YarnClient类和ResourceMgrDelegate类的对象(所以ResourceMgrDelegate类中对象调用的方法最终调用的是YarnClientImpl类中的方法)。
YARNRunner类的构造函数---->调用ResourceMgrDelegate的构造函数---->调用抽象类AbstractService的init()以及start()方法,该抽象类是抽象类YarnClient的父类---->该抽象类的两个方法分别会调用该类内部方法serviceInit()以及serviceStart()方法,(其实最终对用的是YarnClientImpl类中的相应的serviceInit()以及serviceStart()方法)。
YARNRunner类的submitJob()方法---->调用ResourceMgrDelegate类的submitApplication()方法---->调用抽象类YarnClient的submitApplication()方法(其实最终对用的是YarnClientImpl类中的submitApplication()方法)。
到目前为止, 所有的内容都还是在提交Job的那台Client机器上, 还没有到ResourceManger那边。接下来是ResourceManger端:
难点2 接着上面,YarnClientImpl.java的submitApplication(...)方法---->YarnClientImpl.java的submitApplication(...)方法内部进一步调用接口ApplicationClientProtocol.java的submitApplication(...)方法,调用它的实现类ClientRMService.java中的submitApplication(...)方法,以及它的构造函数。
public interface ApplicationClientProtocol extends ApplicationBaseProtocol {...}是个接口,它有四个实现类,其中只有两个实现类中有submitApplication()方法,这两个实现类分别是:(我们程序跟踪到这一步丢了:当类调用接口的方法时,会跟丢;这时就要看该接口的实现类)
public class ApplicationClientProtocolPBClientImpl implements ApplicationClientProtocol, Closeable {...}
public class ClientRMService extends AbstractService implements ApplicationClientProtocol {...} //这里选用这个类,
参考2 参考
ClientRMService.java 在 hadoop-2.7.3-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
/**
* The client interface to the Resource Manager. This module handles all the rpc
* interfaces to the resource manager from the client.
*/
public class ClientRMService extends AbstractService implements
ApplicationClientProtocol {
private static final ArrayList<ApplicationReport> EMPTY_APPS_REPORT = new ArrayList<ApplicationReport>(); private static final Log LOG = LogFactory.getLog(ClientRMService.class); final private AtomicInteger applicationCounter = new AtomicInteger(0);
final private YarnScheduler scheduler;
final private RMContext rmContext;
private final RMAppManager rmAppManager; private Server server;
protected RMDelegationTokenSecretManager rmDTSecretManager; private final RecordFactory recordFactory = RecordFactoryProvider.getRecordFactory(null);
InetSocketAddress clientBindAddress; private final ApplicationACLsManager applicationsACLsManager;
private final QueueACLsManager queueACLsManager; // For Reservation APIs
private Clock clock;
private ReservationSystem reservationSystem;
private ReservationInputValidator rValidator; public ClientRMService(RMContext rmContext, YarnScheduler scheduler,
RMAppManager rmAppManager, ApplicationACLsManager applicationACLsManager,
QueueACLsManager queueACLsManager,
RMDelegationTokenSecretManager rmDTSecretManager) {
this(rmContext, scheduler, rmAppManager, applicationACLsManager,
queueACLsManager, rmDTSecretManager, new UTCClock());
} public ClientRMService(RMContext rmContext, YarnScheduler scheduler,
RMAppManager rmAppManager, ApplicationACLsManager applicationACLsManager,
QueueACLsManager queueACLsManager,
RMDelegationTokenSecretManager rmDTSecretManager, Clock clock) {
super(ClientRMService.class.getName());
this.scheduler = scheduler;
this.rmContext = rmContext;
this.rmAppManager = rmAppManager;
this.applicationsACLsManager = applicationACLsManager;
this.queueACLsManager = queueACLsManager;
this.rmDTSecretManager = rmDTSecretManager;
this.reservationSystem = rmContext.getReservationSystem();
this.clock = clock;
this.rValidator = new ReservationInputValidator(clock);
} @Override
protected void serviceInit(Configuration conf) throws Exception {
clientBindAddress = getBindAddress(conf);
super.serviceInit(conf);
} @Override
protected void serviceStart() throws Exception {
Configuration conf = getConfig();
YarnRPC rpc = YarnRPC.create(conf);
this.server =
rpc.getServer(ApplicationClientProtocol.class, this,
clientBindAddress,
conf, this.rmDTSecretManager,
conf.getInt(YarnConfiguration.RM_CLIENT_THREAD_COUNT,
YarnConfiguration.DEFAULT_RM_CLIENT_THREAD_COUNT)); // Enable service authorization?
if (conf.getBoolean(
CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHORIZATION,
false)) {
InputStream inputStream =
this.rmContext.getConfigurationProvider()
.getConfigurationInputStream(conf,
YarnConfiguration.HADOOP_POLICY_CONFIGURATION_FILE);
if (inputStream != null) {
conf.addResource(inputStream);
}
refreshServiceAcls(conf, RMPolicyProvider.getInstance());
} this.server.start();
clientBindAddress = conf.updateConnectAddr(YarnConfiguration.RM_BIND_HOST,
YarnConfiguration.RM_ADDRESS,
YarnConfiguration.DEFAULT_RM_ADDRESS,
server.getListenerAddress());
super.serviceStart();
} @Override
protected void serviceStop() throws Exception {
if (this.server != null) {
this.server.stop();
}
super.serviceStop();
} ApplicationId getNewApplicationId() {
ApplicationId applicationId = org.apache.hadoop.yarn.server.utils.BuilderUtils
.newApplicationId(recordFactory, ResourceManager.getClusterTimeStamp(),
applicationCounter.incrementAndGet());
LOG.info("Allocated new applicationId: " + applicationId.getId());
return applicationId;
} @Override
public GetNewApplicationResponse getNewApplication(
GetNewApplicationRequest request) throws YarnException {
GetNewApplicationResponse response = recordFactory
.newRecordInstance(GetNewApplicationResponse.class);
response.setApplicationId(getNewApplicationId());
// Pick up min/max resource from scheduler...
response.setMaximumResourceCapability(scheduler
.getMaximumResourceCapability()); return response;
} @Override
public SubmitApplicationResponse submitApplication(
SubmitApplicationRequest request) throws YarnException {
ApplicationSubmissionContext submissionContext = request
.getApplicationSubmissionContext();
ApplicationId applicationId = submissionContext.getApplicationId(); // ApplicationSubmissionContext needs to be validated for safety - only
// those fields that are independent of the RM's configuration will be
// checked here, those that are dependent on RM configuration are validated
// in RMAppManager. //开始各种验证 一不开心就不让干活
String user = null;
try {
// Safety
user = UserGroupInformation.getCurrentUser().getShortUserName();
} catch (IOException ie) {
LOG.warn("Unable to get the current user.", ie);
RMAuditLogger.logFailure(user, AuditConstants.SUBMIT_APP_REQUEST,
ie.getMessage(), "ClientRMService",
"Exception in submitting application", applicationId);
throw RPCUtil.getRemoteException(ie);
} //各种验证
// Check whether app has already been put into rmContext,
// If it is, simply return the response
if (rmContext.getRMApps().get(applicationId) != null) {
LOG.info("This is an earlier submitted application: " + applicationId);
return SubmitApplicationResponse.newInstance();
} //继续验证
if (submissionContext.getQueue() == null) {
submissionContext.setQueue(YarnConfiguration.DEFAULT_QUEUE_NAME);
}
if (submissionContext.getApplicationName() == null) {
submissionContext.setApplicationName(
YarnConfiguration.DEFAULT_APPLICATION_NAME);
}
if (submissionContext.getApplicationType() == null) {
submissionContext
.setApplicationType(YarnConfiguration.DEFAULT_APPLICATION_TYPE);
} else {
if (submissionContext.getApplicationType().length() > YarnConfiguration.APPLICATION_TYPE_LENGTH) {
submissionContext.setApplicationType(submissionContext
.getApplicationType().substring(0,
YarnConfiguration.APPLICATION_TYPE_LENGTH));
}
} try {
// call RMAppManager to submit application directly
//干活 通过rmAppManager提交
rmAppManager.submitApplication(submissionContext,
System.currentTimeMillis(), user); LOG.info("Application with id " + applicationId.getId() +
" submitted by user " + user);
RMAuditLogger.logSuccess(user, AuditConstants.SUBMIT_APP_REQUEST,
"ClientRMService", applicationId);
} catch (YarnException e) {
LOG.info("Exception in submitting application with id " +
applicationId.getId(), e);
RMAuditLogger.logFailure(user, AuditConstants.SUBMIT_APP_REQUEST,
e.getMessage(), "ClientRMService",
"Exception in submitting application", applicationId);
throw e;
} SubmitApplicationResponse response = recordFactory
.newRecordInstance(SubmitApplicationResponse.class);
return response;
} }
ClientRMService.java
ClientRMService.java中的submitApplication(...)方法---->调用RMAppManager.java的submitApplication(...)方法
/**
* This class manages the list of applications for the resource manager.
*/
public class RMAppManager implements EventHandler<RMAppManagerEvent>,
Recoverable { private static final Log LOG = LogFactory.getLog(RMAppManager.class); private int maxCompletedAppsInMemory;
private int maxCompletedAppsInStateStore;
protected int completedAppsInStateStore = 0;
private LinkedList<ApplicationId> completedApps = new LinkedList<ApplicationId>(); private final RMContext rmContext;
private final ApplicationMasterService masterService;
private final YarnScheduler scheduler;
private final ApplicationACLsManager applicationACLsManager;
private Configuration conf; public RMAppManager(RMContext context,
YarnScheduler scheduler, ApplicationMasterService masterService,
ApplicationACLsManager applicationACLsManager, Configuration conf) {
this.rmContext = context;
this.scheduler = scheduler;
this.masterService = masterService;
this.applicationACLsManager = applicationACLsManager;
this.conf = conf;
this.maxCompletedAppsInMemory = conf.getInt(
YarnConfiguration.RM_MAX_COMPLETED_APPLICATIONS,
YarnConfiguration.DEFAULT_RM_MAX_COMPLETED_APPLICATIONS);
this.maxCompletedAppsInStateStore =
conf.getInt(
YarnConfiguration.RM_STATE_STORE_MAX_COMPLETED_APPLICATIONS,
YarnConfiguration.DEFAULT_RM_STATE_STORE_MAX_COMPLETED_APPLICATIONS);
if (this.maxCompletedAppsInStateStore > this.maxCompletedAppsInMemory) {
this.maxCompletedAppsInStateStore = this.maxCompletedAppsInMemory;
}
} @SuppressWarnings("unchecked")
protected void submitApplication(
ApplicationSubmissionContext submissionContext, long submitTime,
String user) throws YarnException {
ApplicationId applicationId = submissionContext.getApplicationId(); //创建一个RMAppImpl对象 其实就是启动RMApp状态机 以及执行RMAppEvent
RMAppImpl application =
createAndPopulateNewRMApp(submissionContext, submitTime, user, false);
ApplicationId appId = submissionContext.getApplicationId(); //如果有安全认证enable的话会走这里, 比如kerberos啥的 我就不这么麻烦了 以看懂为主, 直接到else
if (UserGroupInformation.isSecurityEnabled()) {
try {
this.rmContext.getDelegationTokenRenewer().addApplicationAsync(appId,
parseCredentials(submissionContext),
submissionContext.getCancelTokensWhenComplete(),
application.getUser());
} catch (Exception e) {
LOG.warn("Unable to parse credentials.", e);
// Sending APP_REJECTED is fine, since we assume that the
// RMApp is in NEW state and thus we haven't yet informed the
// scheduler about the existence of the application
assert application.getState() == RMAppState.NEW;
this.rmContext.getDispatcher().getEventHandler()
.handle(new RMAppEvent(applicationId,
RMAppEventType.APP_REJECTED, e.getMessage()));
throw RPCUtil.getRemoteException(e);
}
} else {
// Dispatcher is not yet started at this time, so these START events
// enqueued should be guaranteed to be first processed when dispatcher
// gets started.
//启动RMApp的状态机, 这里rmContext其实是resourceManager的Client代理,
//这一步就是让去RM端的dispatcher去处理RMAppEventType.START事件
this.rmContext.getDispatcher().getEventHandler()
.handle(new RMAppEvent(applicationId, RMAppEventType.START));
}
} private RMAppImpl createAndPopulateNewRMApp(
ApplicationSubmissionContext submissionContext, long submitTime,
String user, boolean isRecovery) throws YarnException {
ApplicationId applicationId = submissionContext.getApplicationId();
ResourceRequest amReq =
validateAndCreateResourceRequest(submissionContext, isRecovery); // Create RMApp
//创建 RMApp
RMAppImpl application =
new RMAppImpl(applicationId, rmContext, this.conf,
submissionContext.getApplicationName(), user,
submissionContext.getQueue(),
submissionContext, this.scheduler, this.masterService,
submitTime, submissionContext.getApplicationType(),
submissionContext.getApplicationTags(), amReq); // Concurrent app submissions with same applicationId will fail here
// Concurrent app submissions with different applicationIds will not
// influence each other
if (rmContext.getRMApps().putIfAbsent(applicationId, application) !=
null) {
String message = "Application with id " + applicationId
+ " is already present! Cannot add a duplicate!";
LOG.warn(message);
throw new YarnException(message);
}
// Inform the ACLs Manager
this.applicationACLsManager.addApplication(applicationId,
submissionContext.getAMContainerSpec().getApplicationACLs());
String appViewACLs = submissionContext.getAMContainerSpec()
.getApplicationACLs().get(ApplicationAccessType.VIEW_APP);
rmContext.getSystemMetricsPublisher().appACLsUpdated(
application, appViewACLs, System.currentTimeMillis());
return application;
} }
RMAppManager.java
RMAppManager.java 在 hadoop-2.7.3-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
难点3 RMAppManager之后,跟踪就出现了瓶颈,看了很多资料后
实际上RMAPPManager类的submitApplication()方法内部会调用该类的createAndPopulateNewRMApp()方法,该方法构建一个app(其实RMApp)并放入applicationACLS 。submitApplication()方法最后还会调用this.rmContext.getDispatcher().getEventHandler().handle(new RMAppEvent(applicationId, RMAppEventType.START)),触发app启动事件,往异步处理器增加个RMAppEvent事件,类型枚值RMAppEventType.START,在RM内部会注册该类型的事件会用什么处理器来处理。其中this.rmContext=RMContextImpl this.rmContext.getDispatcher()=AsyncDispatcher this.rmContext.getDispatcher().getEventHandler()=AsyncDispatcher$GenericEventHandler。
我们先来看一下createAndPopulateNewRMApp()方法,在RMAppManager类中,它调用了RMAppImpl构造函数,来创建RMApp。即会调用RMAppImpl.java, 包括全部的。
@SuppressWarnings({ "rawtypes", "unchecked" })
public class RMAppImpl implements RMApp, Recoverable { private static final Log LOG = LogFactory.getLog(RMAppImpl.class);
private static final String UNAVAILABLE = "N/A"; // Immutable fields
private final ApplicationId applicationId;
private final RMContext rmContext;
private final Configuration conf;
private final String user;
private final String name;
private final ApplicationSubmissionContext submissionContext;
private final Dispatcher dispatcher;
private final YarnScheduler scheduler;
private final ApplicationMasterService masterService;
private final StringBuilder diagnostics = new StringBuilder();
private final int maxAppAttempts;
private final ReadLock readLock;
private final WriteLock writeLock;
private final Map<ApplicationAttemptId, RMAppAttempt> attempts
= new LinkedHashMap<ApplicationAttemptId, RMAppAttempt>();
private final long submitTime;
private final Set<RMNode> updatedNodes = new HashSet<RMNode>();
private final String applicationType;
private final Set<String> applicationTags; private final long attemptFailuresValidityInterval; private Clock systemClock; private boolean isNumAttemptsBeyondThreshold = false; // Mutable fields
private long startTime;
private long finishTime = 0;
private long storedFinishTime = 0;
// This field isn't protected by readlock now.
private volatile RMAppAttempt currentAttempt;
private String queue;
private EventHandler handler;
private static final AppFinishedTransition FINISHED_TRANSITION =
new AppFinishedTransition();
private Set<NodeId> ranNodes = new ConcurrentSkipListSet<NodeId>(); // These states stored are only valid when app is at killing or final_saving.
private RMAppState stateBeforeKilling;
private RMAppState stateBeforeFinalSaving;
private RMAppEvent eventCausingFinalSaving;
private RMAppState targetedFinalState;
private RMAppState recoveredFinalState;
private ResourceRequest amReq; Object transitionTodo; private static final StateMachineFactory<RMAppImpl,
RMAppState,
RMAppEventType,
RMAppEvent> stateMachineFactory
= new StateMachineFactory<RMAppImpl,
RMAppState,
RMAppEventType,
RMAppEvent>(RMAppState.NEW) // Transitions from NEW state
.addTransition(RMAppState.NEW, RMAppState.NEW,
RMAppEventType.NODE_UPDATE, new RMAppNodeUpdateTransition())
.addTransition(RMAppState.NEW, RMAppState.NEW_SAVING,
RMAppEventType.START, new RMAppNewlySavingTransition())
.addTransition(RMAppState.NEW, EnumSet.of(RMAppState.SUBMITTED,
RMAppState.ACCEPTED, RMAppState.FINISHED, RMAppState.FAILED,
RMAppState.KILLED, RMAppState.FINAL_SAVING),
RMAppEventType.RECOVER, new RMAppRecoveredTransition())
.addTransition(RMAppState.NEW, RMAppState.KILLED, RMAppEventType.KILL,
new AppKilledTransition())
.addTransition(RMAppState.NEW, RMAppState.FINAL_SAVING,
RMAppEventType.APP_REJECTED,
new FinalSavingTransition(new AppRejectedTransition(),
RMAppState.FAILED)) // Transitions from NEW_SAVING state
.addTransition(RMAppState.NEW_SAVING, RMAppState.NEW_SAVING,
RMAppEventType.NODE_UPDATE, new RMAppNodeUpdateTransition())
.addTransition(RMAppState.NEW_SAVING, RMAppState.SUBMITTED,
RMAppEventType.APP_NEW_SAVED, new AddApplicationToSchedulerTransition())
.addTransition(RMAppState.NEW_SAVING, RMAppState.FINAL_SAVING,
RMAppEventType.KILL,
new FinalSavingTransition(
new AppKilledTransition(), RMAppState.KILLED))
.addTransition(RMAppState.NEW_SAVING, RMAppState.FINAL_SAVING,
RMAppEventType.APP_REJECTED,
new FinalSavingTransition(new AppRejectedTransition(),
RMAppState.FAILED))
.addTransition(RMAppState.NEW_SAVING, RMAppState.NEW_SAVING,
RMAppEventType.MOVE, new RMAppMoveTransition()) // Transitions from SUBMITTED state
.addTransition(RMAppState.SUBMITTED, RMAppState.SUBMITTED,
RMAppEventType.NODE_UPDATE, new RMAppNodeUpdateTransition())
.addTransition(RMAppState.SUBMITTED, RMAppState.SUBMITTED,
RMAppEventType.MOVE, new RMAppMoveTransition())
.addTransition(RMAppState.SUBMITTED, RMAppState.FINAL_SAVING,
RMAppEventType.APP_REJECTED,
new FinalSavingTransition(
new AppRejectedTransition(), RMAppState.FAILED))
.addTransition(RMAppState.SUBMITTED, RMAppState.ACCEPTED,
RMAppEventType.APP_ACCEPTED, new StartAppAttemptTransition())
.addTransition(RMAppState.SUBMITTED, RMAppState.FINAL_SAVING,
RMAppEventType.KILL,
new FinalSavingTransition(
new AppKilledTransition(), RMAppState.KILLED)) // Transitions from ACCEPTED state
.addTransition(RMAppState.ACCEPTED, RMAppState.ACCEPTED,
RMAppEventType.NODE_UPDATE, new RMAppNodeUpdateTransition())
.addTransition(RMAppState.ACCEPTED, RMAppState.ACCEPTED,
RMAppEventType.MOVE, new RMAppMoveTransition())
.addTransition(RMAppState.ACCEPTED, RMAppState.RUNNING,
RMAppEventType.ATTEMPT_REGISTERED)
.addTransition(RMAppState.ACCEPTED,
EnumSet.of(RMAppState.ACCEPTED, RMAppState.FINAL_SAVING),
// ACCEPTED state is possible to receive ATTEMPT_FAILED/ATTEMPT_FINISHED
// event because RMAppRecoveredTransition is returning ACCEPTED state
// directly and waiting for the previous AM to exit.
RMAppEventType.ATTEMPT_FAILED,
new AttemptFailedTransition(RMAppState.ACCEPTED))
.addTransition(RMAppState.ACCEPTED, RMAppState.FINAL_SAVING,
RMAppEventType.ATTEMPT_FINISHED,
new FinalSavingTransition(FINISHED_TRANSITION, RMAppState.FINISHED))
.addTransition(RMAppState.ACCEPTED, RMAppState.KILLING,
RMAppEventType.KILL, new KillAttemptTransition())
.addTransition(RMAppState.ACCEPTED, RMAppState.FINAL_SAVING,
RMAppEventType.ATTEMPT_KILLED,
new FinalSavingTransition(new AppKilledTransition(), RMAppState.KILLED))
.addTransition(RMAppState.ACCEPTED, RMAppState.ACCEPTED,
RMAppEventType.APP_RUNNING_ON_NODE,
new AppRunningOnNodeTransition()) // Transitions from RUNNING state
.addTransition(RMAppState.RUNNING, RMAppState.RUNNING,
RMAppEventType.NODE_UPDATE, new RMAppNodeUpdateTransition())
.addTransition(RMAppState.RUNNING, RMAppState.RUNNING,
RMAppEventType.MOVE, new RMAppMoveTransition())
.addTransition(RMAppState.RUNNING, RMAppState.FINAL_SAVING,
RMAppEventType.ATTEMPT_UNREGISTERED,
new FinalSavingTransition(
new AttemptUnregisteredTransition(),
RMAppState.FINISHING, RMAppState.FINISHED))
.addTransition(RMAppState.RUNNING, RMAppState.FINISHED,
// UnManagedAM directly jumps to finished
RMAppEventType.ATTEMPT_FINISHED, FINISHED_TRANSITION)
.addTransition(RMAppState.RUNNING, RMAppState.RUNNING,
RMAppEventType.APP_RUNNING_ON_NODE,
new AppRunningOnNodeTransition())
.addTransition(RMAppState.RUNNING,
EnumSet.of(RMAppState.ACCEPTED, RMAppState.FINAL_SAVING),
RMAppEventType.ATTEMPT_FAILED,
new AttemptFailedTransition(RMAppState.ACCEPTED))
.addTransition(RMAppState.RUNNING, RMAppState.KILLING,
RMAppEventType.KILL, new KillAttemptTransition()) // Transitions from FINAL_SAVING state
.addTransition(RMAppState.FINAL_SAVING,
EnumSet.of(RMAppState.FINISHING, RMAppState.FAILED,
RMAppState.KILLED, RMAppState.FINISHED), RMAppEventType.APP_UPDATE_SAVED,
new FinalStateSavedTransition())
.addTransition(RMAppState.FINAL_SAVING, RMAppState.FINAL_SAVING,
RMAppEventType.ATTEMPT_FINISHED,
new AttemptFinishedAtFinalSavingTransition())
.addTransition(RMAppState.FINAL_SAVING, RMAppState.FINAL_SAVING,
RMAppEventType.APP_RUNNING_ON_NODE,
new AppRunningOnNodeTransition())
// ignorable transitions
.addTransition(RMAppState.FINAL_SAVING, RMAppState.FINAL_SAVING,
EnumSet.of(RMAppEventType.NODE_UPDATE, RMAppEventType.KILL,
RMAppEventType.APP_NEW_SAVED, RMAppEventType.MOVE)) // Transitions from FINISHING state
.addTransition(RMAppState.FINISHING, RMAppState.FINISHED,
RMAppEventType.ATTEMPT_FINISHED, FINISHED_TRANSITION)
.addTransition(RMAppState.FINISHING, RMAppState.FINISHING,
RMAppEventType.APP_RUNNING_ON_NODE,
new AppRunningOnNodeTransition())
// ignorable transitions
.addTransition(RMAppState.FINISHING, RMAppState.FINISHING,
EnumSet.of(RMAppEventType.NODE_UPDATE,
// ignore Kill/Move as we have already saved the final Finished state
// in state store.
RMAppEventType.KILL, RMAppEventType.MOVE)) // Transitions from KILLING state
.addTransition(RMAppState.KILLING, RMAppState.KILLING,
RMAppEventType.APP_RUNNING_ON_NODE,
new AppRunningOnNodeTransition())
.addTransition(RMAppState.KILLING, RMAppState.FINAL_SAVING,
RMAppEventType.ATTEMPT_KILLED,
new FinalSavingTransition(
new AppKilledTransition(), RMAppState.KILLED))
.addTransition(RMAppState.KILLING, RMAppState.FINAL_SAVING,
RMAppEventType.ATTEMPT_UNREGISTERED,
new FinalSavingTransition(
new AttemptUnregisteredTransition(),
RMAppState.FINISHING, RMAppState.FINISHED))
.addTransition(RMAppState.KILLING, RMAppState.FINISHED,
// UnManagedAM directly jumps to finished
RMAppEventType.ATTEMPT_FINISHED, FINISHED_TRANSITION)
.addTransition(RMAppState.KILLING,
EnumSet.of(RMAppState.FINAL_SAVING),
RMAppEventType.ATTEMPT_FAILED,
new AttemptFailedTransition(RMAppState.KILLING)) .addTransition(RMAppState.KILLING, RMAppState.KILLING,
EnumSet.of(
RMAppEventType.NODE_UPDATE,
RMAppEventType.ATTEMPT_REGISTERED,
RMAppEventType.APP_UPDATE_SAVED,
RMAppEventType.KILL, RMAppEventType.MOVE)) // Transitions from FINISHED state
// ignorable transitions
.addTransition(RMAppState.FINISHED, RMAppState.FINISHED,
RMAppEventType.APP_RUNNING_ON_NODE,
new AppRunningOnNodeTransition())
.addTransition(RMAppState.FINISHED, RMAppState.FINISHED,
EnumSet.of(
RMAppEventType.NODE_UPDATE,
RMAppEventType.ATTEMPT_UNREGISTERED,
RMAppEventType.ATTEMPT_FINISHED,
RMAppEventType.KILL, RMAppEventType.MOVE)) // Transitions from FAILED state
// ignorable transitions
.addTransition(RMAppState.FAILED, RMAppState.FAILED,
RMAppEventType.APP_RUNNING_ON_NODE,
new AppRunningOnNodeTransition())
.addTransition(RMAppState.FAILED, RMAppState.FAILED,
EnumSet.of(RMAppEventType.KILL, RMAppEventType.NODE_UPDATE,
RMAppEventType.MOVE)) // Transitions from KILLED state
// ignorable transitions
.addTransition(RMAppState.KILLED, RMAppState.KILLED,
RMAppEventType.APP_RUNNING_ON_NODE,
new AppRunningOnNodeTransition())
.addTransition(
RMAppState.KILLED,
RMAppState.KILLED,
EnumSet.of(RMAppEventType.APP_ACCEPTED,
RMAppEventType.APP_REJECTED, RMAppEventType.KILL,
RMAppEventType.ATTEMPT_FINISHED, RMAppEventType.ATTEMPT_FAILED,
RMAppEventType.NODE_UPDATE, RMAppEventType.MOVE)) .installTopology(); private final StateMachine<RMAppState, RMAppEventType, RMAppEvent>
stateMachine; private static final int DUMMY_APPLICATION_ATTEMPT_NUMBER = -1; public RMAppImpl(ApplicationId applicationId, RMContext rmContext,
Configuration config, String name, String user, String queue,
ApplicationSubmissionContext submissionContext, YarnScheduler scheduler,
ApplicationMasterService masterService, long submitTime,
String applicationType, Set<String> applicationTags,
ResourceRequest amReq) { this.systemClock = new SystemClock(); this.applicationId = applicationId;
this.name = name;
this.rmContext = rmContext;
this.dispatcher = rmContext.getDispatcher();
this.handler = dispatcher.getEventHandler();
this.conf = config;
this.user = user;
this.queue = queue;
this.submissionContext = submissionContext;
this.scheduler = scheduler;
this.masterService = masterService;
this.submitTime = submitTime;
this.startTime = this.systemClock.getTime();
this.applicationType = applicationType;
this.applicationTags = applicationTags;
this.amReq = amReq; int globalMaxAppAttempts = conf.getInt(YarnConfiguration.RM_AM_MAX_ATTEMPTS,
YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS);
int individualMaxAppAttempts = submissionContext.getMaxAppAttempts();
if (individualMaxAppAttempts <= 0 ||
individualMaxAppAttempts > globalMaxAppAttempts) {
this.maxAppAttempts = globalMaxAppAttempts;
LOG.warn("The specific max attempts: " + individualMaxAppAttempts
+ " for application: " + applicationId.getId()
+ " is invalid, because it is out of the range [1, "
+ globalMaxAppAttempts + "]. Use the global max attempts instead.");
} else {
this.maxAppAttempts = individualMaxAppAttempts;
} this.attemptFailuresValidityInterval =
submissionContext.getAttemptFailuresValidityInterval();
if (this.attemptFailuresValidityInterval > 0) {
LOG.info("The attemptFailuresValidityInterval for the application: "
+ this.applicationId + " is " + this.attemptFailuresValidityInterval
+ ".");
} ReentrantReadWriteLock lock = new ReentrantReadWriteLock();
this.readLock = lock.readLock();
this.writeLock = lock.writeLock(); this.stateMachine = stateMachineFactory.make(this); rmContext.getRMApplicationHistoryWriter().applicationStarted(this);
rmContext.getSystemMetricsPublisher().appCreated(this, startTime);
} @Override
public ApplicationId getApplicationId() {
return this.applicationId;
} @Override
public ApplicationSubmissionContext getApplicationSubmissionContext() {
return this.submissionContext;
} @Override
public FinalApplicationStatus getFinalApplicationStatus() {
// finish state is obtained based on the state machine's current state
// as a fall-back in case the application has not been unregistered
// ( or if the app never unregistered itself )
// when the report is requested
if (currentAttempt != null
&& currentAttempt.getFinalApplicationStatus() != null) {
return currentAttempt.getFinalApplicationStatus();
}
return createFinalApplicationStatus(this.stateMachine.getCurrentState());
} @Override
public RMAppState getState() {
this.readLock.lock();
try {
return this.stateMachine.getCurrentState();
} finally {
this.readLock.unlock();
}
} @Override
public String getUser() {
return this.user;
} @Override
public float getProgress() {
RMAppAttempt attempt = this.currentAttempt;
if (attempt != null) {
return attempt.getProgress();
}
return 0;
} @Override
public RMAppAttempt getRMAppAttempt(ApplicationAttemptId appAttemptId) {
this.readLock.lock(); try {
return this.attempts.get(appAttemptId);
} finally {
this.readLock.unlock();
}
} @Override
public String getQueue() {
return this.queue;
} @Override
public void setQueue(String queue) {
this.queue = queue;
} @Override
public String getName() {
return this.name;
} @Override
public RMAppAttempt getCurrentAppAttempt() {
return this.currentAttempt;
} @Override
public Map<ApplicationAttemptId, RMAppAttempt> getAppAttempts() {
this.readLock.lock(); try {
return Collections.unmodifiableMap(this.attempts);
} finally {
this.readLock.unlock();
}
} private FinalApplicationStatus createFinalApplicationStatus(RMAppState state) {
switch(state) {
case NEW:
case NEW_SAVING:
case SUBMITTED:
case ACCEPTED:
case RUNNING:
case FINAL_SAVING:
case KILLING:
return FinalApplicationStatus.UNDEFINED;
// finished without a proper final state is the same as failed
case FINISHING:
case FINISHED:
case FAILED:
return FinalApplicationStatus.FAILED;
case KILLED:
return FinalApplicationStatus.KILLED;
}
throw new YarnRuntimeException("Unknown state passed!");
} @Override
public int pullRMNodeUpdates(Collection<RMNode> updatedNodes) {
this.writeLock.lock();
try {
int updatedNodeCount = this.updatedNodes.size();
updatedNodes.addAll(this.updatedNodes);
this.updatedNodes.clear();
return updatedNodeCount;
} finally {
this.writeLock.unlock();
}
} @Override
public ApplicationReport createAndGetApplicationReport(String clientUserName,
boolean allowAccess) {
this.readLock.lock(); try {
ApplicationAttemptId currentApplicationAttemptId = null;
org.apache.hadoop.yarn.api.records.Token clientToAMToken = null;
String trackingUrl = UNAVAILABLE;
String host = UNAVAILABLE;
String origTrackingUrl = UNAVAILABLE;
int rpcPort = -1;
ApplicationResourceUsageReport appUsageReport =
RMServerUtils.DUMMY_APPLICATION_RESOURCE_USAGE_REPORT;
FinalApplicationStatus finishState = getFinalApplicationStatus();
String diags = UNAVAILABLE;
float progress = 0.0f;
org.apache.hadoop.yarn.api.records.Token amrmToken = null;
if (allowAccess) {
trackingUrl = getDefaultProxyTrackingUrl();
if (this.currentAttempt != null) {
currentApplicationAttemptId = this.currentAttempt.getAppAttemptId();
trackingUrl = this.currentAttempt.getTrackingUrl();
origTrackingUrl = this.currentAttempt.getOriginalTrackingUrl();
if (UserGroupInformation.isSecurityEnabled()) {
// get a token so the client can communicate with the app attempt
// NOTE: token may be unavailable if the attempt is not running
Token<ClientToAMTokenIdentifier> attemptClientToAMToken =
this.currentAttempt.createClientToken(clientUserName);
if (attemptClientToAMToken != null) {
clientToAMToken = BuilderUtils.newClientToAMToken(
attemptClientToAMToken.getIdentifier(),
attemptClientToAMToken.getKind().toString(),
attemptClientToAMToken.getPassword(),
attemptClientToAMToken.getService().toString());
}
}
host = this.currentAttempt.getHost();
rpcPort = this.currentAttempt.getRpcPort();
appUsageReport = currentAttempt.getApplicationResourceUsageReport();
progress = currentAttempt.getProgress();
}
diags = this.diagnostics.toString(); if (currentAttempt != null &&
currentAttempt.getAppAttemptState() == RMAppAttemptState.LAUNCHED) {
if (getApplicationSubmissionContext().getUnmanagedAM() &&
clientUserName != null && getUser().equals(clientUserName)) {
Token<AMRMTokenIdentifier> token = currentAttempt.getAMRMToken();
if (token != null) {
amrmToken = BuilderUtils.newAMRMToken(token.getIdentifier(),
token.getKind().toString(), token.getPassword(),
token.getService().toString());
}
}
} RMAppMetrics rmAppMetrics = getRMAppMetrics();
appUsageReport.setMemorySeconds(rmAppMetrics.getMemorySeconds());
appUsageReport.setVcoreSeconds(rmAppMetrics.getVcoreSeconds());
} if (currentApplicationAttemptId == null) {
currentApplicationAttemptId =
BuilderUtils.newApplicationAttemptId(this.applicationId,
DUMMY_APPLICATION_ATTEMPT_NUMBER);
} return BuilderUtils.newApplicationReport(this.applicationId,
currentApplicationAttemptId, this.user, this.queue,
this.name, host, rpcPort, clientToAMToken,
createApplicationState(), diags,
trackingUrl, this.startTime, this.finishTime, finishState,
appUsageReport, origTrackingUrl, progress, this.applicationType,
amrmToken, applicationTags);
} finally {
this.readLock.unlock();
}
} private String getDefaultProxyTrackingUrl() {
try {
final String scheme = WebAppUtils.getHttpSchemePrefix(conf);
String proxy = WebAppUtils.getProxyHostAndPort(conf);
URI proxyUri = ProxyUriUtils.getUriFromAMUrl(scheme, proxy);
URI result = ProxyUriUtils.getProxyUri(null, proxyUri, applicationId);
return result.toASCIIString();
} catch (URISyntaxException e) {
LOG.warn("Could not generate default proxy tracking URL for "
+ applicationId);
return UNAVAILABLE;
}
} @Override
public long getFinishTime() {
this.readLock.lock(); try {
return this.finishTime;
} finally {
this.readLock.unlock();
}
} @Override
public long getStartTime() {
this.readLock.lock(); try {
return this.startTime;
} finally {
this.readLock.unlock();
}
} @Override
public long getSubmitTime() {
return this.submitTime;
} @Override
public String getTrackingUrl() {
RMAppAttempt attempt = this.currentAttempt;
if (attempt != null) {
return attempt.getTrackingUrl();
}
return null;
} @Override
public String getOriginalTrackingUrl() {
RMAppAttempt attempt = this.currentAttempt;
if (attempt != null) {
return attempt.getOriginalTrackingUrl();
}
return null;
} @Override
public StringBuilder getDiagnostics() {
this.readLock.lock(); try {
return this.diagnostics;
} finally {
this.readLock.unlock();
}
} @Override
public int getMaxAppAttempts() {
return this.maxAppAttempts;
} @Override
public void handle(RMAppEvent event) { this.writeLock.lock(); try {
ApplicationId appID = event.getApplicationId();
LOG.debug("Processing event for " + appID + " of type "
+ event.getType());
final RMAppState oldState = getState();
try {
/* keep the master in sync with the state machine */
this.stateMachine.doTransition(event.getType(), event);
} catch (InvalidStateTransitonException e) {
LOG.error("Can't handle this event at current state", e);
/* TODO fail the application on the failed transition */
} if (oldState != getState()) {
LOG.info(appID + " State change from " + oldState + " to "
+ getState());
}
} finally {
this.writeLock.unlock();
}
} @Override
public void recover(RMState state) {
ApplicationStateData appState =
state.getApplicationState().get(getApplicationId());
this.recoveredFinalState = appState.getState();
LOG.info("Recovering app: " + getApplicationId() + " with " +
+ appState.getAttemptCount() + " attempts and final state = "
+ this.recoveredFinalState );
this.diagnostics.append(appState.getDiagnostics());
this.storedFinishTime = appState.getFinishTime();
this.startTime = appState.getStartTime(); for(int i=0; i<appState.getAttemptCount(); ++i) {
// create attempt
createNewAttempt();
((RMAppAttemptImpl)this.currentAttempt).recover(state);
}
} private void createNewAttempt() {
ApplicationAttemptId appAttemptId =
ApplicationAttemptId.newInstance(applicationId, attempts.size() + 1);
RMAppAttempt attempt =
new RMAppAttemptImpl(appAttemptId, rmContext, scheduler, masterService,
submissionContext, conf,
// The newly created attempt maybe last attempt if (number of
// previously failed attempts(which should not include Preempted,
// hardware error and NM resync) + 1) equal to the max-attempt
// limit.
maxAppAttempts == (getNumFailedAppAttempts() + 1), amReq);
attempts.put(appAttemptId, attempt);
currentAttempt = attempt;
} private void
createAndStartNewAttempt(boolean transferStateFromPreviousAttempt) {
createNewAttempt();
handler.handle(new RMAppStartAttemptEvent(currentAttempt.getAppAttemptId(),
transferStateFromPreviousAttempt));
} private void processNodeUpdate(RMAppNodeUpdateType type, RMNode node) {
NodeState nodeState = node.getState();
updatedNodes.add(node);
LOG.debug("Received node update event:" + type + " for node:" + node
+ " with state:" + nodeState);
} private static class RMAppTransition implements
SingleArcTransition<RMAppImpl, RMAppEvent> {
public void transition(RMAppImpl app, RMAppEvent event) {
}; } private static final class RMAppNodeUpdateTransition extends RMAppTransition {
public void transition(RMAppImpl app, RMAppEvent event) {
RMAppNodeUpdateEvent nodeUpdateEvent = (RMAppNodeUpdateEvent) event;
app.processNodeUpdate(nodeUpdateEvent.getUpdateType(),
nodeUpdateEvent.getNode());
};
} private static final class AppRunningOnNodeTransition extends RMAppTransition {
public void transition(RMAppImpl app, RMAppEvent event) {
RMAppRunningOnNodeEvent nodeAddedEvent = (RMAppRunningOnNodeEvent) event; // if final state already stored, notify RMNode
if (isAppInFinalState(app)) {
app.handler.handle(
new RMNodeCleanAppEvent(nodeAddedEvent.getNodeId(), nodeAddedEvent
.getApplicationId()));
return;
} // otherwise, add it to ranNodes for further process
app.ranNodes.add(nodeAddedEvent.getNodeId());
};
} /**
* Move an app to a new queue.
* This transition must set the result on the Future in the RMAppMoveEvent,
* either as an exception for failure or null for success, or the client will
* be left waiting forever.
*/
private static final class RMAppMoveTransition extends RMAppTransition {
public void transition(RMAppImpl app, RMAppEvent event) {
RMAppMoveEvent moveEvent = (RMAppMoveEvent) event;
try {
app.queue = app.scheduler.moveApplication(app.applicationId,
moveEvent.getTargetQueue());
} catch (YarnException ex) {
moveEvent.getResult().setException(ex);
return;
} // TODO: Write out change to state store (YARN-1558)
// Also take care of RM failover
moveEvent.getResult().set(null);
}
} // synchronously recover attempt to ensure any incoming external events
// to be processed after the attempt processes the recover event.
private void recoverAppAttempts() {
for (RMAppAttempt attempt : getAppAttempts().values()) {
attempt.handle(new RMAppAttemptEvent(attempt.getAppAttemptId(),
RMAppAttemptEventType.RECOVER));
}
} private static final class RMAppRecoveredTransition implements
MultipleArcTransition<RMAppImpl, RMAppEvent, RMAppState> { @Override
public RMAppState transition(RMAppImpl app, RMAppEvent event) { RMAppRecoverEvent recoverEvent = (RMAppRecoverEvent) event;
app.recover(recoverEvent.getRMState());
// The app has completed.
if (app.recoveredFinalState != null) {
app.recoverAppAttempts();
new FinalTransition(app.recoveredFinalState).transition(app, event);
return app.recoveredFinalState;
} if (UserGroupInformation.isSecurityEnabled()) {
// asynchronously renew delegation token on recovery.
try {
app.rmContext.getDelegationTokenRenewer()
.addApplicationAsyncDuringRecovery(app.getApplicationId(),
app.parseCredentials(),
app.submissionContext.getCancelTokensWhenComplete(),
app.getUser());
} catch (Exception e) {
String msg = "Failed to fetch user credentials from application:"
+ e.getMessage();
app.diagnostics.append(msg);
LOG.error(msg, e);
}
} // No existent attempts means the attempt associated with this app was not
// started or started but not yet saved.
if (app.attempts.isEmpty()) {
app.scheduler.handle(new AppAddedSchedulerEvent(app.applicationId,
app.submissionContext.getQueue(), app.user,
app.submissionContext.getReservationID()));
return RMAppState.SUBMITTED;
} // Add application to scheduler synchronously to guarantee scheduler
// knows applications before AM or NM re-registers.
app.scheduler.handle(new AppAddedSchedulerEvent(app.applicationId,
app.submissionContext.getQueue(), app.user, true,
app.submissionContext.getReservationID())); // recover attempts
app.recoverAppAttempts(); // Last attempt is in final state, return ACCEPTED waiting for last
// RMAppAttempt to send finished or failed event back.
if (app.currentAttempt != null
&& (app.currentAttempt.getState() == RMAppAttemptState.KILLED
|| app.currentAttempt.getState() == RMAppAttemptState.FINISHED
|| (app.currentAttempt.getState() == RMAppAttemptState.FAILED
&& app.getNumFailedAppAttempts() == app.maxAppAttempts))) {
return RMAppState.ACCEPTED;
} // YARN-1507 is saving the application state after the application is
// accepted. So after YARN-1507, an app is saved meaning it is accepted.
// Thus we return ACCECPTED state on recovery.
return RMAppState.ACCEPTED;
}
} private static final class AddApplicationToSchedulerTransition extends
RMAppTransition {
@Override
public void transition(RMAppImpl app, RMAppEvent event) {
app.handler.handle(new AppAddedSchedulerEvent(app.applicationId,
app.submissionContext.getQueue(), app.user,
app.submissionContext.getReservationID()));
}
} private static final class StartAppAttemptTransition extends RMAppTransition {
@Override
public void transition(RMAppImpl app, RMAppEvent event) {
app.createAndStartNewAttempt(false);
};
} private static final class FinalStateSavedTransition implements
MultipleArcTransition<RMAppImpl, RMAppEvent, RMAppState> { @Override
public RMAppState transition(RMAppImpl app, RMAppEvent event) {
if (app.transitionTodo instanceof SingleArcTransition) {
((SingleArcTransition) app.transitionTodo).transition(app,
app.eventCausingFinalSaving);
} else if (app.transitionTodo instanceof MultipleArcTransition) {
((MultipleArcTransition) app.transitionTodo).transition(app,
app.eventCausingFinalSaving);
}
return app.targetedFinalState; }
} private static class AttemptFailedFinalStateSavedTransition extends
RMAppTransition {
@Override
public void transition(RMAppImpl app, RMAppEvent event) {
String msg = null;
if (event instanceof RMAppFailedAttemptEvent) {
msg = app.getAppAttemptFailedDiagnostics(event);
}
LOG.info(msg);
app.diagnostics.append(msg);
// Inform the node for app-finish
new FinalTransition(RMAppState.FAILED).transition(app, event);
}
} private String getAppAttemptFailedDiagnostics(RMAppEvent event) {
String msg = null;
RMAppFailedAttemptEvent failedEvent = (RMAppFailedAttemptEvent) event;
if (this.submissionContext.getUnmanagedAM()) {
// RM does not manage the AM. Do not retry
msg = "Unmanaged application " + this.getApplicationId()
+ " failed due to " + failedEvent.getDiagnosticMsg()
+ ". Failing the application.";
} else if (this.isNumAttemptsBeyondThreshold) {
msg = "Application " + this.getApplicationId() + " failed "
+ this.maxAppAttempts + " times due to "
+ failedEvent.getDiagnosticMsg() + ". Failing the application.";
}
return msg;
} private static final class RMAppNewlySavingTransition extends RMAppTransition {
@Override
public void transition(RMAppImpl app, RMAppEvent event) { // If recovery is enabled then store the application information in a
// non-blocking call so make sure that RM has stored the information
// needed to restart the AM after RM restart without further client
// communication
LOG.info("Storing application with id " + app.applicationId);
app.rmContext.getStateStore().storeNewApplication(app);
}
} private void rememberTargetTransitions(RMAppEvent event,
Object transitionToDo, RMAppState targetFinalState) {
transitionTodo = transitionToDo;
targetedFinalState = targetFinalState;
eventCausingFinalSaving = event;
} private void rememberTargetTransitionsAndStoreState(RMAppEvent event,
Object transitionToDo, RMAppState targetFinalState,
RMAppState stateToBeStored) {
rememberTargetTransitions(event, transitionToDo, targetFinalState);
this.stateBeforeFinalSaving = getState();
this.storedFinishTime = this.systemClock.getTime(); LOG.info("Updating application " + this.applicationId
+ " with final state: " + this.targetedFinalState);
// we lost attempt_finished diagnostics in app, because attempt_finished
// diagnostics is sent after app final state is saved. Later on, we will
// create GetApplicationAttemptReport specifically for getting per attempt
// info.
String diags = null;
switch (event.getType()) {
case APP_REJECTED:
case ATTEMPT_FINISHED:
case ATTEMPT_KILLED:
diags = event.getDiagnosticMsg();
break;
case ATTEMPT_FAILED:
RMAppFailedAttemptEvent failedEvent = (RMAppFailedAttemptEvent) event;
diags = getAppAttemptFailedDiagnostics(failedEvent);
break;
default:
break;
}
ApplicationStateData appState =
ApplicationStateData.newInstance(this.submitTime, this.startTime,
this.user, this.submissionContext,
stateToBeStored, diags, this.storedFinishTime);
this.rmContext.getStateStore().updateApplicationState(appState);
} private static final class FinalSavingTransition extends RMAppTransition {
Object transitionToDo;
RMAppState targetedFinalState;
RMAppState stateToBeStored; public FinalSavingTransition(Object transitionToDo,
RMAppState targetedFinalState) {
this(transitionToDo, targetedFinalState, targetedFinalState);
} public FinalSavingTransition(Object transitionToDo,
RMAppState targetedFinalState, RMAppState stateToBeStored) {
this.transitionToDo = transitionToDo;
this.targetedFinalState = targetedFinalState;
this.stateToBeStored = stateToBeStored;
} @Override
public void transition(RMAppImpl app, RMAppEvent event) {
app.rememberTargetTransitionsAndStoreState(event, transitionToDo,
targetedFinalState, stateToBeStored);
}
} private static class AttemptUnregisteredTransition extends RMAppTransition {
@Override
public void transition(RMAppImpl app, RMAppEvent event) {
app.finishTime = app.storedFinishTime;
}
} private static class AppFinishedTransition extends FinalTransition {
public AppFinishedTransition() {
super(RMAppState.FINISHED);
} public void transition(RMAppImpl app, RMAppEvent event) {
app.diagnostics.append(event.getDiagnosticMsg());
super.transition(app, event);
};
} private static class AttemptFinishedAtFinalSavingTransition extends
RMAppTransition {
@Override
public void transition(RMAppImpl app, RMAppEvent event) {
if (app.targetedFinalState.equals(RMAppState.FAILED)
|| app.targetedFinalState.equals(RMAppState.KILLED)) {
// Ignore Attempt_Finished event if we were supposed to reach FAILED
// FINISHED state
return;
} // pass in the earlier attempt_unregistered event, as it is needed in
// AppFinishedFinalStateSavedTransition later on
app.rememberTargetTransitions(event,
new AppFinishedFinalStateSavedTransition(app.eventCausingFinalSaving),
RMAppState.FINISHED);
};
} private static class AppFinishedFinalStateSavedTransition extends
RMAppTransition {
RMAppEvent attemptUnregistered; public AppFinishedFinalStateSavedTransition(RMAppEvent attemptUnregistered) {
this.attemptUnregistered = attemptUnregistered;
}
@Override
public void transition(RMAppImpl app, RMAppEvent event) {
new AttemptUnregisteredTransition().transition(app, attemptUnregistered);
FINISHED_TRANSITION.transition(app, event);
};
} private static class AppKilledTransition extends FinalTransition {
public AppKilledTransition() {
super(RMAppState.KILLED);
} @Override
public void transition(RMAppImpl app, RMAppEvent event) {
app.diagnostics.append(event.getDiagnosticMsg());
super.transition(app, event);
};
} private static class KillAttemptTransition extends RMAppTransition {
@Override
public void transition(RMAppImpl app, RMAppEvent event) {
app.stateBeforeKilling = app.getState();
// Forward app kill diagnostics in the event to kill app attempt.
// These diagnostics will be returned back in ATTEMPT_KILLED event sent by
// RMAppAttemptImpl.
app.handler.handle(
new RMAppAttemptEvent(app.currentAttempt.getAppAttemptId(),
RMAppAttemptEventType.KILL, event.getDiagnosticMsg()));
}
} private static final class AppRejectedTransition extends
FinalTransition{
public AppRejectedTransition() {
super(RMAppState.FAILED);
} public void transition(RMAppImpl app, RMAppEvent event) {
app.diagnostics.append(event.getDiagnosticMsg());
super.transition(app, event);
};
} private static class FinalTransition extends RMAppTransition { private final RMAppState finalState; public FinalTransition(RMAppState finalState) {
this.finalState = finalState;
} public void transition(RMAppImpl app, RMAppEvent event) {
for (NodeId nodeId : app.getRanNodes()) {
app.handler.handle(
new RMNodeCleanAppEvent(nodeId, app.applicationId));
}
app.finishTime = app.storedFinishTime;
if (app.finishTime == 0 ) {
app.finishTime = app.systemClock.getTime();
}
// Recovered apps that are completed were not added to scheduler, so no
// need to remove them from scheduler.
if (app.recoveredFinalState == null) {
app.handler.handle(new AppRemovedSchedulerEvent(app.applicationId,
finalState));
}
app.handler.handle(
new RMAppManagerEvent(app.applicationId,
RMAppManagerEventType.APP_COMPLETED)); app.rmContext.getRMApplicationHistoryWriter()
.applicationFinished(app, finalState);
app.rmContext.getSystemMetricsPublisher()
.appFinished(app, finalState, app.finishTime);
};
} private int getNumFailedAppAttempts() {
int completedAttempts = 0;
long endTime = this.systemClock.getTime();
// Do not count AM preemption, hardware failures or NM resync
// as attempt failure.
for (RMAppAttempt attempt : attempts.values()) {
if (attempt.shouldCountTowardsMaxAttemptRetry()) {
if (this.attemptFailuresValidityInterval <= 0
|| (attempt.getFinishTime() > endTime
- this.attemptFailuresValidityInterval)) {
completedAttempts++;
}
}
}
return completedAttempts;
} private static final class AttemptFailedTransition implements
MultipleArcTransition<RMAppImpl, RMAppEvent, RMAppState> { private final RMAppState initialState; public AttemptFailedTransition(RMAppState initialState) {
this.initialState = initialState;
} @Override
public RMAppState transition(RMAppImpl app, RMAppEvent event) {
int numberOfFailure = app.getNumFailedAppAttempts();
LOG.info("The number of failed attempts"
+ (app.attemptFailuresValidityInterval > 0 ? " in previous "
+ app.attemptFailuresValidityInterval + " milliseconds " : " ")
+ "is " + numberOfFailure + ". The max attempts is "
+ app.maxAppAttempts);
if (!app.submissionContext.getUnmanagedAM()
&& numberOfFailure < app.maxAppAttempts) {
if (initialState.equals(RMAppState.KILLING)) {
// If this is not last attempt, app should be killed instead of
// launching a new attempt
app.rememberTargetTransitionsAndStoreState(event,
new AppKilledTransition(), RMAppState.KILLED, RMAppState.KILLED);
return RMAppState.FINAL_SAVING;
} boolean transferStateFromPreviousAttempt;
RMAppFailedAttemptEvent failedEvent = (RMAppFailedAttemptEvent) event;
transferStateFromPreviousAttempt =
failedEvent.getTransferStateFromPreviousAttempt(); RMAppAttempt oldAttempt = app.currentAttempt;
app.createAndStartNewAttempt(transferStateFromPreviousAttempt);
// Transfer the state from the previous attempt to the current attempt.
// Note that the previous failed attempt may still be collecting the
// container events from the scheduler and update its data structures
// before the new attempt is created. We always transferState for
// finished containers so that they can be acked to NM,
// but when pulling finished container we will check this flag again.
((RMAppAttemptImpl) app.currentAttempt)
.transferStateFromPreviousAttempt(oldAttempt);
return initialState;
} else {
if (numberOfFailure >= app.maxAppAttempts) {
app.isNumAttemptsBeyondThreshold = true;
}
app.rememberTargetTransitionsAndStoreState(event,
new AttemptFailedFinalStateSavedTransition(), RMAppState.FAILED,
RMAppState.FAILED);
return RMAppState.FINAL_SAVING;
}
}
} @Override
public String getApplicationType() {
return this.applicationType;
} @Override
public Set<String> getApplicationTags() {
return this.applicationTags;
} @Override
public boolean isAppFinalStateStored() {
RMAppState state = getState();
return state.equals(RMAppState.FINISHING)
|| state.equals(RMAppState.FINISHED) || state.equals(RMAppState.FAILED)
|| state.equals(RMAppState.KILLED);
} @Override
public YarnApplicationState createApplicationState() {
RMAppState rmAppState = getState();
// If App is in FINAL_SAVING state, return its previous state.
if (rmAppState.equals(RMAppState.FINAL_SAVING)) {
rmAppState = stateBeforeFinalSaving;
}
if (rmAppState.equals(RMAppState.KILLING)) {
rmAppState = stateBeforeKilling;
}
return RMServerUtils.createApplicationState(rmAppState);
} public static boolean isAppInFinalState(RMApp rmApp) {
RMAppState appState = ((RMAppImpl) rmApp).getRecoveredFinalState();
if (appState == null) {
appState = rmApp.getState();
}
return appState == RMAppState.FAILED || appState == RMAppState.FINISHED
|| appState == RMAppState.KILLED;
} public RMAppState getRecoveredFinalState() {
return this.recoveredFinalState;
} @Override
public Set<NodeId> getRanNodes() {
return ranNodes;
} @Override
public RMAppMetrics getRMAppMetrics() {
Resource resourcePreempted = Resource.newInstance(0, 0);
int numAMContainerPreempted = 0;
int numNonAMContainerPreempted = 0;
long memorySeconds = 0;
long vcoreSeconds = 0;
for (RMAppAttempt attempt : attempts.values()) {
if (null != attempt) {
RMAppAttemptMetrics attemptMetrics =
attempt.getRMAppAttemptMetrics();
Resources.addTo(resourcePreempted,
attemptMetrics.getResourcePreempted());
numAMContainerPreempted += attemptMetrics.getIsPreempted() ? 1 : 0;
numNonAMContainerPreempted +=
attemptMetrics.getNumNonAMContainersPreempted();
// getAggregateAppResourceUsage() will calculate resource usage stats
// for both running and finished containers.
AggregateAppResourceUsage resUsage =
attempt.getRMAppAttemptMetrics().getAggregateAppResourceUsage();
memorySeconds += resUsage.getMemorySeconds();
vcoreSeconds += resUsage.getVcoreSeconds();
}
} return new RMAppMetrics(resourcePreempted,
numNonAMContainerPreempted, numAMContainerPreempted,
memorySeconds, vcoreSeconds);
} @Private
@VisibleForTesting
public void setSystemClock(Clock clock) {
this.systemClock = clock;
} @Override
public ReservationId getReservationId() {
return submissionContext.getReservationID();
} @Override
public ResourceRequest getAMResourceRequest() {
return this.amReq;
} protected Credentials parseCredentials() throws IOException {
Credentials credentials = new Credentials();
DataInputByteBuffer dibb = new DataInputByteBuffer();
ByteBuffer tokens = submissionContext.getAMContainerSpec().getTokens();
if (tokens != null) {
dibb.reset(tokens);
credentials.readTokenStorageStream(dibb);
tokens.rewind();
}
return credentials;
}
}
RMAppImpl.java
RMAppImpl.java 在 hadoop-2.7.3-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
接下来看一下 this.rmContext.getDispatcher().getEventHandler().handle(new RMAppEvent(applicationId, RMAppEventType.START)), this.rmContext是RMContext接口类的对象,该接口是ResourceManager的上下文或语境,该接口只有一个实现类RMContextImpl。该类的getDispatcher()方法会返回接口类Dispatcher对象,我们这里选择的是Dispatcher的实现类AsyncDispatcher类,而AsyncDispatcher类的getEventHandler()方法会返回接口类EventHandler对象,其实在AsyncDispatcher类内部已经把EventHandler接口的对象初始化为AsyncDispatcher类的内部类GenericEventHandler类的对象,并调用该内部类的handle()方法来处理,其中RMAppEventType注册到*异步调度器的地方在ResourceManager.java中。handle()方法最后会在队列eventQueue中添加事件event。 handle函数里,最终把event事件放进了队列eventQueue中:eventQueue.put(event);注意这个异步调度器AsyncDispatcher类是公用的。RMAppEventType.START事件放入队列eventQueue中,会被 RMAppImpl 类获取,进入其handle函数。
/**
* Dispatches {@link Event}s in a separate thread. Currently only single thread
* does that. Potentially there could be multiple channels for each event type
* class and a thread pool can be used to dispatch the events.
*/
@SuppressWarnings("rawtypes")
@Public
@Evolving
public class AsyncDispatcher extends AbstractService implements Dispatcher { private static final Log LOG = LogFactory.getLog(AsyncDispatcher.class); private final BlockingQueue<Event> eventQueue;
private volatile int lastEventQueueSizeLogged = 0;
private volatile boolean stopped = false; // Configuration flag for enabling/disabling draining dispatcher's events on
// stop functionality.
private volatile boolean drainEventsOnStop = false; // Indicates all the remaining dispatcher's events on stop have been drained
// and processed.
private volatile boolean drained = true;
private Object waitForDrained = new Object(); // For drainEventsOnStop enabled only, block newly coming events into the
// queue while stopping.
private volatile boolean blockNewEvents = false;
private final EventHandler handlerInstance = new GenericEventHandler(); private Thread eventHandlingThread;
protected final Map<Class<? extends Enum>, EventHandler> eventDispatchers;
private boolean exitOnDispatchException; public AsyncDispatcher() {
this(new LinkedBlockingQueue<Event>());
} public AsyncDispatcher(BlockingQueue<Event> eventQueue) {
super("Dispatcher");
this.eventQueue = eventQueue;
this.eventDispatchers = new HashMap<Class<? extends Enum>, EventHandler>();
} Runnable createThread() {
return new Runnable() {
@Override
public void run() {
while (!stopped && !Thread.currentThread().isInterrupted()) {
drained = eventQueue.isEmpty();
// blockNewEvents is only set when dispatcher is draining to stop,
// adding this check is to avoid the overhead of acquiring the lock
// and calling notify every time in the normal run of the loop.
if (blockNewEvents) {
synchronized (waitForDrained) {
if (drained) {
waitForDrained.notify();
}
}
}
Event event;
try {
event = eventQueue.take();
} catch(InterruptedException ie) {
if (!stopped) {
LOG.warn("AsyncDispatcher thread interrupted", ie);
}
return;
}
if (event != null) {
dispatch(event);
}
}
}
};
} @Override
protected void serviceInit(Configuration conf) throws Exception {
this.exitOnDispatchException =
conf.getBoolean(Dispatcher.DISPATCHER_EXIT_ON_ERROR_KEY,
Dispatcher.DEFAULT_DISPATCHER_EXIT_ON_ERROR);
super.serviceInit(conf);
} @Override
protected void serviceStart() throws Exception {
//start all the components
super.serviceStart();
eventHandlingThread = new Thread(createThread());
eventHandlingThread.setName("AsyncDispatcher event handler");
eventHandlingThread.start();
} public void setDrainEventsOnStop() {
drainEventsOnStop = true;
} @Override
protected void serviceStop() throws Exception {
if (drainEventsOnStop) {
blockNewEvents = true;
LOG.info("AsyncDispatcher is draining to stop, igonring any new events.");
long endTime = System.currentTimeMillis() + getConfig()
.getLong(YarnConfiguration.DISPATCHER_DRAIN_EVENTS_TIMEOUT,
YarnConfiguration.DEFAULT_DISPATCHER_DRAIN_EVENTS_TIMEOUT); synchronized (waitForDrained) {
while (!drained && eventHandlingThread != null
&& eventHandlingThread.isAlive()
&& System.currentTimeMillis() < endTime) {
waitForDrained.wait(1000);
LOG.info("Waiting for AsyncDispatcher to drain. Thread state is :" +
eventHandlingThread.getState());
}
}
}
stopped = true;
if (eventHandlingThread != null) {
eventHandlingThread.interrupt();
try {
eventHandlingThread.join();
} catch (InterruptedException ie) {
LOG.warn("Interrupted Exception while stopping", ie);
}
} // stop all the components
super.serviceStop();
} @SuppressWarnings("unchecked")
protected void dispatch(Event event) {
//all events go thru this loop
if (LOG.isDebugEnabled()) {
LOG.debug("Dispatching the event " + event.getClass().getName() + "."
+ event.toString());
} Class<? extends Enum> type = event.getType().getDeclaringClass(); try{
EventHandler handler = eventDispatchers.get(type);
if(handler != null) {
handler.handle(event);
} else {
throw new Exception("No handler for registered for " + type);
}
} catch (Throwable t) {
//TODO Maybe log the state of the queue
LOG.fatal("Error in dispatcher thread", t);
// If serviceStop is called, we should exit this thread gracefully.
if (exitOnDispatchException
&& (ShutdownHookManager.get().isShutdownInProgress()) == false
&& stopped == false) {
Thread shutDownThread = new Thread(createShutDownThread());
shutDownThread.setName("AsyncDispatcher ShutDown handler");
shutDownThread.start();
}
}
} @SuppressWarnings("unchecked")
@Override
public void register(Class<? extends Enum> eventType,
EventHandler handler) {
/* check to see if we have a listener registered */
EventHandler<Event> registeredHandler = (EventHandler<Event>)
eventDispatchers.get(eventType);
LOG.info("Registering " + eventType + " for " + handler.getClass());
if (registeredHandler == null) {
eventDispatchers.put(eventType, handler);
} else if (!(registeredHandler instanceof MultiListenerHandler)){
/* for multiple listeners of an event add the multiple listener handler */
MultiListenerHandler multiHandler = new MultiListenerHandler();
multiHandler.addHandler(registeredHandler);
multiHandler.addHandler(handler);
eventDispatchers.put(eventType, multiHandler);
} else {
/* already a multilistener, just add to it */
MultiListenerHandler multiHandler
= (MultiListenerHandler) registeredHandler;
multiHandler.addHandler(handler);
}
} @Override
public EventHandler getEventHandler() {
return handlerInstance;
} class GenericEventHandler implements EventHandler<Event> {
public void handle(Event event) {
if (blockNewEvents) {
return;
}
drained = false; /* all this method does is enqueue all the events onto the queue */
int qSize = eventQueue.size();
if (qSize != 0 && qSize % 1000 == 0
&& lastEventQueueSizeLogged != qSize) {
lastEventQueueSizeLogged = qSize;
LOG.info("Size of event-queue is " + qSize);
}
int remCapacity = eventQueue.remainingCapacity();
if (remCapacity < 1000) {
LOG.warn("Very low remaining capacity in the event-queue: "
+ remCapacity);
}
try {
eventQueue.put(event);
} catch (InterruptedException e) {
if (!stopped) {
LOG.warn("AsyncDispatcher thread interrupted", e);
}
// Need to reset drained flag to true if event queue is empty,
// otherwise dispatcher will hang on stop.
drained = eventQueue.isEmpty();
throw new YarnRuntimeException(e);
}
};
} /**
* Multiplexing an event. Sending it to different handlers that
* are interested in the event.
* @param <T> the type of event these multiple handlers are interested in.
*/
static class MultiListenerHandler implements EventHandler<Event> {
List<EventHandler<Event>> listofHandlers; public MultiListenerHandler() {
listofHandlers = new ArrayList<EventHandler<Event>>();
} @Override
public void handle(Event event) {
for (EventHandler<Event> handler: listofHandlers) {
handler.handle(event);
}
} void addHandler(EventHandler<Event> handler) {
listofHandlers.add(handler);
} } Runnable createShutDownThread() {
return new Runnable() {
@Override
public void run() {
LOG.info("Exiting, bbye..");
System.exit(-1);
}
};
} @VisibleForTesting
protected boolean isEventThreadWaiting() {
return eventHandlingThread.getState() == Thread.State.WAITING;
} @VisibleForTesting
protected boolean isDrained() {
return this.drained;
}
}
AsyncDispatcher.java
AsyncDispatcher.java 在 hadoop-2.7.3-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java
在GenericEventHandler中handle()方法会通过eventQueue.put(event)往队列中添加数据,即所谓的生产过程。那么,这个GenericEventHandler是如何获得的呢?getEventHandler()方法告诉了我们答案。
关于AsyncDispatcher,可以参考
小结:
关于ApplicationMaster启动流程,可以参考
RMStateStore是存储ResourceManager状态的基础接口,真实的存储器需要实现存储和加载方法。 关于RMStateStore,可以参考
一部分文章在这会说: 在文章的开头有写“事件调度器”, 在ResourceManager那边会有AsyncDispatcher来调度所有事件, 这里的话会通过ApplicationEventDispatcher去做RMAppImpl的transition方法, 看一下RMAppImpl类的初始化的时候的各种event和transition。 介绍的不清楚。
另一部分转而看ResourceManager,因为this.rmContext.getDispatcher().getEventHandler().handle(new RMAppEvent(applicationId, RMAppEventType.START)),触发app启动事件,往异步处理器增加个RMAppEvent事件,类型枚值RMAppEventType.START,在RM内部会注册该类型的事件会用什么处理器来处理。
不过殊途同归。
public enum RMAppEventType {
// Source: ClientRMService
START,
RECOVER,
KILL,
MOVE, // Move app to a new queue // Source: Scheduler and RMAppManager
APP_REJECTED, // Source: Scheduler
APP_ACCEPTED, // Source: RMAppAttempt
ATTEMPT_REGISTERED,
ATTEMPT_UNREGISTERED,
ATTEMPT_FINISHED, // Will send the final state
ATTEMPT_FAILED,
ATTEMPT_KILLED,
NODE_UPDATE, // Source: Container and ResourceTracker
APP_RUNNING_ON_NODE, // Source: RMStateStore
APP_NEW_SAVED,
APP_UPDATE_SAVED,
}
RMAppEventType.java
RMAppEventType.java 在 hadoop-2.7.3-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppEventType.java
public enum RMAppState {
NEW,
NEW_SAVING,
SUBMITTED,
ACCEPTED,
RUNNING,
FINAL_SAVING,
FINISHING,
FINISHED,
FAILED,
KILLING,
KILLED
}
RMAppState.java
RMAppState.java 在 hadoop-2.7.3-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppState.java
在ResourceManager内部:
/**
* The ResourceManager is the main class that is a set of components.
* "I am the ResourceManager. All your resources belong to us..."
*
*/
@SuppressWarnings("unchecked")
public class ResourceManager extends CompositeService implements Recoverable { /**
* Priority of the ResourceManager shutdown hook.
*/
public static final int SHUTDOWN_HOOK_PRIORITY = 30; private static final Log LOG = LogFactory.getLog(ResourceManager.class);
private static long clusterTimeStamp = System.currentTimeMillis(); /**
* "Always On" services. Services that need to run always irrespective of
* the HA state of the RM.
*/
@VisibleForTesting
protected RMContextImpl rmContext;
private Dispatcher rmDispatcher;
@VisibleForTesting
protected AdminService adminService; /**
* "Active" services. Services that need to run only on the Active RM.
* These services are managed (initialized, started, stopped) by the
* {@link CompositeService} RMActiveServices.
*
* RM is active when (1) HA is disabled, or (2) HA is enabled and the RM is
* in Active state.
*/
protected RMActiveServices activeServices;
protected RMSecretManagerService rmSecretManagerService; protected ResourceScheduler scheduler;
protected ReservationSystem reservationSystem;
private ClientRMService clientRM;
protected ApplicationMasterService masterService;
protected NMLivelinessMonitor nmLivelinessMonitor;
protected NodesListManager nodesListManager;
protected RMAppManager rmAppManager;
protected ApplicationACLsManager applicationACLsManager;
protected QueueACLsManager queueACLsManager;
private WebApp webApp;
private AppReportFetcher fetcher = null;
protected ResourceTrackerService resourceTracker; @VisibleForTesting
protected String webAppAddress;
private ConfigurationProvider configurationProvider = null;
/** End of Active services */ private Configuration conf; private UserGroupInformation rmLoginUGI; public ResourceManager() {
super("ResourceManager");
} public RMContext getRMContext() {
return this.rmContext;
} public static long getClusterTimeStamp() {
return clusterTimeStamp;
} @VisibleForTesting
protected static void setClusterTimeStamp(long timestamp) {
clusterTimeStamp = timestamp;
} @VisibleForTesting
Dispatcher getRmDispatcher() {
return rmDispatcher;
} @Override
protected void serviceInit(Configuration conf) throws Exception {
this.conf = conf;
this.rmContext = new RMContextImpl(); this.configurationProvider =
ConfigurationProviderFactory.getConfigurationProvider(conf);
this.configurationProvider.init(this.conf);
rmContext.setConfigurationProvider(configurationProvider); // load core-site.xml
InputStream coreSiteXMLInputStream =
this.configurationProvider.getConfigurationInputStream(this.conf,
YarnConfiguration.CORE_SITE_CONFIGURATION_FILE);
if (coreSiteXMLInputStream != null) {
this.conf.addResource(coreSiteXMLInputStream);
} // Do refreshUserToGroupsMappings with loaded core-site.xml
Groups.getUserToGroupsMappingServiceWithLoadedConfiguration(this.conf)
.refresh(); // Do refreshSuperUserGroupsConfiguration with loaded core-site.xml
// Or use RM specific configurations to overwrite the common ones first
// if they exist
RMServerUtils.processRMProxyUsersConf(conf);
ProxyUsers.refreshSuperUserGroupsConfiguration(this.conf); // load yarn-site.xml
InputStream yarnSiteXMLInputStream =
this.configurationProvider.getConfigurationInputStream(this.conf,
YarnConfiguration.YARN_SITE_CONFIGURATION_FILE);
if (yarnSiteXMLInputStream != null) {
this.conf.addResource(yarnSiteXMLInputStream);
} //校验配置合法性,yarn.resourcemanager.am.max-attempts ,validate expireIntvl >= heartbeatIntvl
validateConfigs(this.conf); // Set HA configuration should be done before login
this.rmContext.setHAEnabled(HAUtil.isHAEnabled(this.conf));
if (this.rmContext.isHAEnabled()) {
HAUtil.verifyAndSetConfiguration(this.conf);
} // Set UGI and do login
// If security is enabled, use login user
// If security is not enabled, use current user
this.rmLoginUGI = UserGroupInformation.getCurrentUser();
try {
doSecureLogin();
} catch(IOException ie) {
throw new YarnRuntimeException("Failed to login", ie);
} // register the handlers for all AlwaysOn services using setupDispatcher().
rmDispatcher = setupDispatcher();
addIfService(rmDispatcher);
rmContext.setDispatcher(rmDispatcher); adminService = createAdminService();
addService(adminService);
rmContext.setRMAdminService(adminService); rmContext.setYarnConfiguration(conf); //创建并初始化ResourceManager的内部类RMActiveServices
createAndInitActiveServices(); webAppAddress = WebAppUtils.getWebAppBindURL(this.conf,
YarnConfiguration.RM_BIND_HOST,
WebAppUtils.getRMWebAppURLWithoutScheme(this.conf)); RMApplicationHistoryWriter rmApplicationHistoryWriter =
createRMApplicationHistoryWriter();
addService(rmApplicationHistoryWriter);
rmContext.setRMApplicationHistoryWriter(rmApplicationHistoryWriter); SystemMetricsPublisher systemMetricsPublisher = createSystemMetricsPublisher();
addService(systemMetricsPublisher);
rmContext.setSystemMetricsPublisher(systemMetricsPublisher); super.serviceInit(this.conf);
} protected QueueACLsManager createQueueACLsManager(ResourceScheduler scheduler,
Configuration conf) {
return new QueueACLsManager(scheduler, conf);
} @VisibleForTesting
protected void setRMStateStore(RMStateStore rmStore) {
rmStore.setRMDispatcher(rmDispatcher);
rmStore.setResourceManager(this);
rmContext.setStateStore(rmStore);
} protected EventHandler<SchedulerEvent> createSchedulerEventDispatcher() {
return new SchedulerEventDispatcher(this.scheduler);
} protected Dispatcher createDispatcher() {
return new AsyncDispatcher();
} protected ResourceScheduler createScheduler() {
String schedulerClassName = conf.get(YarnConfiguration.RM_SCHEDULER,
YarnConfiguration.DEFAULT_RM_SCHEDULER);
LOG.info("Using Scheduler: " + schedulerClassName);
try {
Class<?> schedulerClazz = Class.forName(schedulerClassName);
if (ResourceScheduler.class.isAssignableFrom(schedulerClazz)) {
return (ResourceScheduler) ReflectionUtils.newInstance(schedulerClazz,
this.conf);
} else {
throw new YarnRuntimeException("Class: " + schedulerClassName
+ " not instance of " + ResourceScheduler.class.getCanonicalName());
}
} catch (ClassNotFoundException e) {
throw new YarnRuntimeException("Could not instantiate Scheduler: "
+ schedulerClassName, e);
}
} protected ReservationSystem createReservationSystem() {
String reservationClassName =
conf.get(YarnConfiguration.RM_RESERVATION_SYSTEM_CLASS,
AbstractReservationSystem.getDefaultReservationSystem(scheduler));
if (reservationClassName == null) {
return null;
}
LOG.info("Using ReservationSystem: " + reservationClassName);
try {
Class<?> reservationClazz = Class.forName(reservationClassName);
if (ReservationSystem.class.isAssignableFrom(reservationClazz)) {
return (ReservationSystem) ReflectionUtils.newInstance(
reservationClazz, this.conf);
} else {
throw new YarnRuntimeException("Class: " + reservationClassName
+ " not instance of " + ReservationSystem.class.getCanonicalName());
}
} catch (ClassNotFoundException e) {
throw new YarnRuntimeException(
"Could not instantiate ReservationSystem: " + reservationClassName, e);
}
} protected ApplicationMasterLauncher createAMLauncher() {
return new ApplicationMasterLauncher(this.rmContext);
} private NMLivelinessMonitor createNMLivelinessMonitor() {
return new NMLivelinessMonitor(this.rmContext
.getDispatcher());
} protected AMLivelinessMonitor createAMLivelinessMonitor() {
return new AMLivelinessMonitor(this.rmDispatcher);
} protected RMNodeLabelsManager createNodeLabelManager()
throws InstantiationException, IllegalAccessException {
return new RMNodeLabelsManager();
} protected DelegationTokenRenewer createDelegationTokenRenewer() {
return new DelegationTokenRenewer();
} protected RMAppManager createRMAppManager() {
return new RMAppManager(this.rmContext, this.scheduler, this.masterService,
this.applicationACLsManager, this.conf);
} protected RMApplicationHistoryWriter createRMApplicationHistoryWriter() {
return new RMApplicationHistoryWriter();
} protected SystemMetricsPublisher createSystemMetricsPublisher() {
return new SystemMetricsPublisher();
} // sanity check for configurations
protected static void validateConfigs(Configuration conf) {
// validate max-attempts
int globalMaxAppAttempts =
conf.getInt(YarnConfiguration.RM_AM_MAX_ATTEMPTS,
YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS);
if (globalMaxAppAttempts <= 0) {
throw new YarnRuntimeException("Invalid global max attempts configuration"
+ ", " + YarnConfiguration.RM_AM_MAX_ATTEMPTS
+ "=" + globalMaxAppAttempts + ", it should be a positive integer.");
} // validate expireIntvl >= heartbeatIntvl
long expireIntvl = conf.getLong(YarnConfiguration.RM_NM_EXPIRY_INTERVAL_MS,
YarnConfiguration.DEFAULT_RM_NM_EXPIRY_INTERVAL_MS);
long heartbeatIntvl =
conf.getLong(YarnConfiguration.RM_NM_HEARTBEAT_INTERVAL_MS,
YarnConfiguration.DEFAULT_RM_NM_HEARTBEAT_INTERVAL_MS);
if (expireIntvl < heartbeatIntvl) {
throw new YarnRuntimeException("Nodemanager expiry interval should be no"
+ " less than heartbeat interval, "
+ YarnConfiguration.RM_NM_EXPIRY_INTERVAL_MS + "=" + expireIntvl
+ ", " + YarnConfiguration.RM_NM_HEARTBEAT_INTERVAL_MS + "="
+ heartbeatIntvl);
}
} /**
* RMActiveServices handles all the Active services in the RM.
*/
@Private
public class RMActiveServices extends CompositeService { private DelegationTokenRenewer delegationTokenRenewer;
private EventHandler<SchedulerEvent> schedulerDispatcher;
private ApplicationMasterLauncher applicationMasterLauncher;
private ContainerAllocationExpirer containerAllocationExpirer;
private ResourceManager rm;
private boolean recoveryEnabled;
private RMActiveServiceContext activeServiceContext; RMActiveServices(ResourceManager rm) {
super("RMActiveServices");
this.rm = rm;
} @Override
protected void serviceInit(Configuration configuration) throws Exception {
activeServiceContext = new RMActiveServiceContext();
rmContext.setActiveServiceContext(activeServiceContext); conf.setBoolean(Dispatcher.DISPATCHER_EXIT_ON_ERROR_KEY, true);
rmSecretManagerService = createRMSecretManagerService();
addService(rmSecretManagerService); containerAllocationExpirer = new ContainerAllocationExpirer(rmDispatcher);
addService(containerAllocationExpirer);
rmContext.setContainerAllocationExpirer(containerAllocationExpirer); AMLivelinessMonitor amLivelinessMonitor = createAMLivelinessMonitor();
addService(amLivelinessMonitor);
rmContext.setAMLivelinessMonitor(amLivelinessMonitor); AMLivelinessMonitor amFinishingMonitor = createAMLivelinessMonitor();
addService(amFinishingMonitor);
rmContext.setAMFinishingMonitor(amFinishingMonitor); RMNodeLabelsManager nlm = createNodeLabelManager();
nlm.setRMContext(rmContext);
addService(nlm);
rmContext.setNodeLabelManager(nlm); boolean isRecoveryEnabled = conf.getBoolean(
YarnConfiguration.RECOVERY_ENABLED,
YarnConfiguration.DEFAULT_RM_RECOVERY_ENABLED); RMStateStore rmStore = null;
if (isRecoveryEnabled) {
recoveryEnabled = true;
rmStore = RMStateStoreFactory.getStore(conf);
boolean isWorkPreservingRecoveryEnabled =
conf.getBoolean(
YarnConfiguration.RM_WORK_PRESERVING_RECOVERY_ENABLED,
YarnConfiguration.DEFAULT_RM_WORK_PRESERVING_RECOVERY_ENABLED);
rmContext
.setWorkPreservingRecoveryEnabled(isWorkPreservingRecoveryEnabled);
} else {
recoveryEnabled = false;
rmStore = new NullRMStateStore();
} try {
rmStore.init(conf);
rmStore.setRMDispatcher(rmDispatcher);
rmStore.setResourceManager(rm);
} catch (Exception e) {
// the Exception from stateStore.init() needs to be handled for
// HA and we need to give up master status if we got fenced
LOG.error("Failed to init state store", e);
throw e;
}
rmContext.setStateStore(rmStore); if (UserGroupInformation.isSecurityEnabled()) {
delegationTokenRenewer = createDelegationTokenRenewer();
rmContext.setDelegationTokenRenewer(delegationTokenRenewer);
} // Register event handler for NodesListManager
nodesListManager = new NodesListManager(rmContext);
rmDispatcher.register(NodesListManagerEventType.class, nodesListManager);
addService(nodesListManager);
rmContext.setNodesListManager(nodesListManager); // Initialize the scheduler
scheduler = createScheduler();
scheduler.setRMContext(rmContext);
addIfService(scheduler);
rmContext.setScheduler(scheduler); schedulerDispatcher = createSchedulerEventDispatcher();
addIfService(schedulerDispatcher);
rmDispatcher.register(SchedulerEventType.class, schedulerDispatcher); // Register event handler for RmAppEvents
//注册RMAppEvent事件的事件处理器
//RMAppManager往异步处理器增加个RMAppEvent事件,类型枚值RMAppEventType.START,所以由ApplicationEventDispatcher(rmContext)来处理
rmDispatcher.register(RMAppEventType.class,
new ApplicationEventDispatcher(rmContext)); // Register event handler for RmAppAttemptEvents
rmDispatcher.register(RMAppAttemptEventType.class,
new ApplicationAttemptEventDispatcher(rmContext)); // Register event handler for RmNodes
rmDispatcher.register(
RMNodeEventType.class, new NodeEventDispatcher(rmContext)); nmLivelinessMonitor = createNMLivelinessMonitor();
addService(nmLivelinessMonitor); resourceTracker = createResourceTrackerService();
addService(resourceTracker);
rmContext.setResourceTrackerService(resourceTracker); DefaultMetricsSystem.initialize("ResourceManager");
JvmMetrics.initSingleton("ResourceManager", null); // Initialize the Reservation system
if (conf.getBoolean(YarnConfiguration.RM_RESERVATION_SYSTEM_ENABLE,
YarnConfiguration.DEFAULT_RM_RESERVATION_SYSTEM_ENABLE)) {
reservationSystem = createReservationSystem();
if (reservationSystem != null) {
reservationSystem.setRMContext(rmContext);
addIfService(reservationSystem);
rmContext.setReservationSystem(reservationSystem);
LOG.info("Initialized Reservation system");
}
} // creating monitors that handle preemption
createPolicyMonitors(); masterService = createApplicationMasterService();
addService(masterService) ;
rmContext.setApplicationMasterService(masterService); applicationACLsManager = new ApplicationACLsManager(conf); queueACLsManager = createQueueACLsManager(scheduler, conf); rmAppManager = createRMAppManager();
// Register event handler for RMAppManagerEvents
rmDispatcher.register(RMAppManagerEventType.class, rmAppManager); clientRM = createClientRMService();
addService(clientRM);
rmContext.setClientRMService(clientRM); applicationMasterLauncher = createAMLauncher();
rmDispatcher.register(AMLauncherEventType.class,
applicationMasterLauncher); addService(applicationMasterLauncher);
if (UserGroupInformation.isSecurityEnabled()) {
addService(delegationTokenRenewer);
delegationTokenRenewer.setRMContext(rmContext);
} new RMNMInfo(rmContext, scheduler); super.serviceInit(conf);
} @Override
protected void serviceStart() throws Exception {
RMStateStore rmStore = rmContext.getStateStore();
// The state store needs to start irrespective of recoveryEnabled as apps
// need events to move to further states.
rmStore.start(); if(recoveryEnabled) {
try {
LOG.info("Recovery started");
rmStore.checkVersion();
if (rmContext.isWorkPreservingRecoveryEnabled()) {
rmContext.setEpoch(rmStore.getAndIncrementEpoch());
}
RMState state = rmStore.loadState();
recover(state);
LOG.info("Recovery ended");
} catch (Exception e) {
// the Exception from loadState() needs to be handled for
// HA and we need to give up master status if we got fenced
LOG.error("Failed to load/recover state", e);
throw e;
}
} super.serviceStart();
} @Override
protected void serviceStop() throws Exception { super.serviceStop();
DefaultMetricsSystem.shutdown();
if (rmContext != null) {
RMStateStore store = rmContext.getStateStore();
try {
store.close();
} catch (Exception e) {
LOG.error("Error closing store.", e);
}
} } protected void createPolicyMonitors() {
if (scheduler instanceof PreemptableResourceScheduler
&& conf.getBoolean(YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS,
YarnConfiguration.DEFAULT_RM_SCHEDULER_ENABLE_MONITORS)) {
LOG.info("Loading policy monitors");
List<SchedulingEditPolicy> policies = conf.getInstances(
YarnConfiguration.RM_SCHEDULER_MONITOR_POLICIES,
SchedulingEditPolicy.class);
if (policies.size() > 0) {
for (SchedulingEditPolicy policy : policies) {
LOG.info("LOADING SchedulingEditPolicy:" + policy.getPolicyName());
// periodically check whether we need to take action to guarantee
// constraints
SchedulingMonitor mon = new SchedulingMonitor(rmContext, policy);
addService(mon);
}
} else {
LOG.warn("Policy monitors configured (" +
YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS +
") but none specified (" +
YarnConfiguration.RM_SCHEDULER_MONITOR_POLICIES + ")");
}
}
}
} @Private
public static class SchedulerEventDispatcher extends AbstractService
implements EventHandler<SchedulerEvent> { private final ResourceScheduler scheduler;
private final BlockingQueue<SchedulerEvent> eventQueue =
new LinkedBlockingQueue<SchedulerEvent>();
private volatile int lastEventQueueSizeLogged = 0;
private final Thread eventProcessor;
private volatile boolean stopped = false;
private boolean shouldExitOnError = false; public SchedulerEventDispatcher(ResourceScheduler scheduler) {
super(SchedulerEventDispatcher.class.getName());
this.scheduler = scheduler;
this.eventProcessor = new Thread(new EventProcessor());
this.eventProcessor.setName("ResourceManager Event Processor");
} @Override
protected void serviceInit(Configuration conf) throws Exception {
this.shouldExitOnError =
conf.getBoolean(Dispatcher.DISPATCHER_EXIT_ON_ERROR_KEY,
Dispatcher.DEFAULT_DISPATCHER_EXIT_ON_ERROR);
super.serviceInit(conf);
} @Override
protected void serviceStart() throws Exception {
this.eventProcessor.start();
super.serviceStart();
} private final class EventProcessor implements Runnable {
@Override
public void run() { SchedulerEvent event; while (!stopped && !Thread.currentThread().isInterrupted()) {
try {
event = eventQueue.take();
} catch (InterruptedException e) {
LOG.error("Returning, interrupted : " + e);
return; // TODO: Kill RM.
} try {
scheduler.handle(event);
} catch (Throwable t) {
// An error occurred, but we are shutting down anyway.
// If it was an InterruptedException, the very act of
// shutdown could have caused it and is probably harmless.
if (stopped) {
LOG.warn("Exception during shutdown: ", t);
break;
}
LOG.fatal("Error in handling event type " + event.getType()
+ " to the scheduler", t);
if (shouldExitOnError
&& !ShutdownHookManager.get().isShutdownInProgress()) {
LOG.info("Exiting, bbye..");
System.exit(-1);
}
}
}
}
} @Override
protected void serviceStop() throws Exception {
this.stopped = true;
this.eventProcessor.interrupt();
try {
this.eventProcessor.join();
} catch (InterruptedException e) {
throw new YarnRuntimeException(e);
}
super.serviceStop();
} @Override
public void handle(SchedulerEvent event) {
try {
int qSize = eventQueue.size();
if (qSize != 0 && qSize % 1000 == 0
&& lastEventQueueSizeLogged != qSize) {
lastEventQueueSizeLogged = qSize;
LOG.info("Size of scheduler event-queue is " + qSize);
}
int remCapacity = eventQueue.remainingCapacity();
if (remCapacity < 1000) {
LOG.info("Very low remaining capacity on scheduler event queue: "
+ remCapacity);
}
this.eventQueue.put(event);
} catch (InterruptedException e) {
LOG.info("Interrupted. Trying to exit gracefully.");
}
}
} @Private
public static class RMFatalEventDispatcher
implements EventHandler<RMFatalEvent> { @Override
public void handle(RMFatalEvent event) {
LOG.fatal("Received a " + RMFatalEvent.class.getName() + " of type " +
event.getType().name() + ". Cause:\n" + event.getCause()); ExitUtil.terminate(1, event.getCause());
}
} public void handleTransitionToStandBy() {
if (rmContext.isHAEnabled()) {
try {
// Transition to standby and reinit active services
LOG.info("Transitioning RM to Standby mode");
transitionToStandby(true);
adminService.resetLeaderElection();
return;
} catch (Exception e) {
LOG.fatal("Failed to transition RM to Standby mode.");
ExitUtil.terminate(1, e);
}
}
} @Private
public static final class ApplicationEventDispatcher implements
EventHandler<RMAppEvent> { private final RMContext rmContext; public ApplicationEventDispatcher(RMContext rmContext) {
this.rmContext = rmContext;
} @Override
public void handle(RMAppEvent event) {
ApplicationId appID = event.getApplicationId();
RMApp rmApp = this.rmContext.getRMApps().get(appID);
if (rmApp != null) {
try {
//
rmApp.handle(event);
} catch (Throwable t) {
LOG.error("Error in handling event type " + event.getType()
+ " for application " + appID, t);
}
}
}
} @Private
public static final class ApplicationAttemptEventDispatcher implements
EventHandler<RMAppAttemptEvent> { private final RMContext rmContext; public ApplicationAttemptEventDispatcher(RMContext rmContext) {
this.rmContext = rmContext;
} @Override
public void handle(RMAppAttemptEvent event) {
ApplicationAttemptId appAttemptID = event.getApplicationAttemptId();
ApplicationId appAttemptId = appAttemptID.getApplicationId();
RMApp rmApp = this.rmContext.getRMApps().get(appAttemptId);
if (rmApp != null) {
RMAppAttempt rmAppAttempt = rmApp.getRMAppAttempt(appAttemptID);
if (rmAppAttempt != null) {
try {
rmAppAttempt.handle(event);
} catch (Throwable t) {
LOG.error("Error in handling event type " + event.getType()
+ " for applicationAttempt " + appAttemptId, t);
}
}
}
}
} @Private
public static final class NodeEventDispatcher implements
EventHandler<RMNodeEvent> { private final RMContext rmContext; public NodeEventDispatcher(RMContext rmContext) {
this.rmContext = rmContext;
} @Override
public void handle(RMNodeEvent event) {
NodeId nodeId = event.getNodeId();
RMNode node = this.rmContext.getRMNodes().get(nodeId);
if (node != null) {
try {
((EventHandler<RMNodeEvent>) node).handle(event);
} catch (Throwable t) {
LOG.error("Error in handling event type " + event.getType()
+ " for node " + nodeId, t);
}
}
}
} protected void startWepApp() { // Use the customized yarn filter instead of the standard kerberos filter to
// allow users to authenticate using delegation tokens
// 4 conditions need to be satisfied -
// 1. security is enabled
// 2. http auth type is set to kerberos
// 3. "yarn.resourcemanager.webapp.use-yarn-filter" override is set to true
// 4. hadoop.http.filter.initializers container AuthenticationFilterInitializer Configuration conf = getConfig();
boolean enableCorsFilter =
conf.getBoolean(YarnConfiguration.RM_WEBAPP_ENABLE_CORS_FILTER,
YarnConfiguration.DEFAULT_RM_WEBAPP_ENABLE_CORS_FILTER);
boolean useYarnAuthenticationFilter =
conf.getBoolean(
YarnConfiguration.RM_WEBAPP_DELEGATION_TOKEN_AUTH_FILTER,
YarnConfiguration.DEFAULT_RM_WEBAPP_DELEGATION_TOKEN_AUTH_FILTER);
String authPrefix = "hadoop.http.authentication.";
String authTypeKey = authPrefix + "type";
String filterInitializerConfKey = "hadoop.http.filter.initializers";
String actualInitializers = "";
Class<?>[] initializersClasses =
conf.getClasses(filterInitializerConfKey); // setup CORS
if (enableCorsFilter) {
conf.setBoolean(HttpCrossOriginFilterInitializer.PREFIX
+ HttpCrossOriginFilterInitializer.ENABLED_SUFFIX, true);
} boolean hasHadoopAuthFilterInitializer = false;
boolean hasRMAuthFilterInitializer = false;
if (initializersClasses != null) {
for (Class<?> initializer : initializersClasses) {
if (initializer.getName().equals(
AuthenticationFilterInitializer.class.getName())) {
hasHadoopAuthFilterInitializer = true;
}
if (initializer.getName().equals(
RMAuthenticationFilterInitializer.class.getName())) {
hasRMAuthFilterInitializer = true;
}
}
if (UserGroupInformation.isSecurityEnabled()
&& useYarnAuthenticationFilter
&& hasHadoopAuthFilterInitializer
&& conf.get(authTypeKey, "").equals(
KerberosAuthenticationHandler.TYPE)) {
ArrayList<String> target = new ArrayList<String>();
for (Class<?> filterInitializer : initializersClasses) {
if (filterInitializer.getName().equals(
AuthenticationFilterInitializer.class.getName())) {
if (hasRMAuthFilterInitializer == false) {
target.add(RMAuthenticationFilterInitializer.class.getName());
}
continue;
}
target.add(filterInitializer.getName());
}
actualInitializers = StringUtils.join(",", target); LOG.info("Using RM authentication filter(kerberos/delegation-token)"
+ " for RM webapp authentication");
RMAuthenticationFilter
.setDelegationTokenSecretManager(getClientRMService().rmDTSecretManager);
conf.set(filterInitializerConfKey, actualInitializers);
}
} // if security is not enabled and the default filter initializer has not
// been set, set the initializer to include the
// RMAuthenticationFilterInitializer which in turn will set up the simple
// auth filter. String initializers = conf.get(filterInitializerConfKey);
if (!UserGroupInformation.isSecurityEnabled()) {
if (initializersClasses == null || initializersClasses.length == 0) {
conf.set(filterInitializerConfKey,
RMAuthenticationFilterInitializer.class.getName());
conf.set(authTypeKey, "simple");
} else if (initializers.equals(StaticUserWebFilter.class.getName())) {
conf.set(filterInitializerConfKey,
RMAuthenticationFilterInitializer.class.getName() + ","
+ initializers);
conf.set(authTypeKey, "simple");
}
} Builder<ApplicationMasterService> builder =
WebApps
.$for("cluster", ApplicationMasterService.class, masterService,
"ws")
.with(conf)
.withHttpSpnegoPrincipalKey(
YarnConfiguration.RM_WEBAPP_SPNEGO_USER_NAME_KEY)
.withHttpSpnegoKeytabKey(
YarnConfiguration.RM_WEBAPP_SPNEGO_KEYTAB_FILE_KEY)
.at(webAppAddress);
String proxyHostAndPort = WebAppUtils.getProxyHostAndPort(conf);
if(WebAppUtils.getResolvedRMWebAppURLWithoutScheme(conf).
equals(proxyHostAndPort)) {
if (HAUtil.isHAEnabled(conf)) {
fetcher = new AppReportFetcher(conf);
} else {
fetcher = new AppReportFetcher(conf, getClientRMService());
}
builder.withServlet(ProxyUriUtils.PROXY_SERVLET_NAME,
ProxyUriUtils.PROXY_PATH_SPEC, WebAppProxyServlet.class);
builder.withAttribute(WebAppProxy.FETCHER_ATTRIBUTE, fetcher);
String[] proxyParts = proxyHostAndPort.split(":");
builder.withAttribute(WebAppProxy.PROXY_HOST_ATTRIBUTE, proxyParts[0]); }
webApp = builder.start(new RMWebApp(this));
} /**
* Helper method to create and init {@link #activeServices}. This creates an
* instance of {@link RMActiveServices} and initializes it.
* @throws Exception
*/
protected void createAndInitActiveServices() throws Exception {
activeServices = new RMActiveServices(this);
//最后调用的是RMActiveServices类的serviceInit函数
activeServices.init(conf);
} /**
* Helper method to start {@link #activeServices}.
* @throws Exception
*/
void startActiveServices() throws Exception {
if (activeServices != null) {
clusterTimeStamp = System.currentTimeMillis();
activeServices.start();
}
} /**
* Helper method to stop {@link #activeServices}.
* @throws Exception
*/
void stopActiveServices() throws Exception {
if (activeServices != null) {
activeServices.stop();
activeServices = null;
}
} void reinitialize(boolean initialize) throws Exception {
ClusterMetrics.destroy();
QueueMetrics.clearQueueMetrics();
if (initialize) {
resetDispatcher();
createAndInitActiveServices();
}
} @VisibleForTesting
protected boolean areActiveServicesRunning() {
return activeServices != null && activeServices.isInState(STATE.STARTED);
} synchronized void transitionToActive() throws Exception {
if (rmContext.getHAServiceState() == HAServiceProtocol.HAServiceState.ACTIVE) {
LOG.info("Already in active state");
return;
} LOG.info("Transitioning to active state"); this.rmLoginUGI.doAs(new PrivilegedExceptionAction<Void>() {
@Override
public Void run() throws Exception {
try {
startActiveServices();
return null;
} catch (Exception e) {
reinitialize(true);
throw e;
}
}
}); rmContext.setHAServiceState(HAServiceProtocol.HAServiceState.ACTIVE);
LOG.info("Transitioned to active state");
} synchronized void transitionToStandby(boolean initialize)
throws Exception {
if (rmContext.getHAServiceState() ==
HAServiceProtocol.HAServiceState.STANDBY) {
LOG.info("Already in standby state");
return;
} LOG.info("Transitioning to standby state");
HAServiceState state = rmContext.getHAServiceState();
rmContext.setHAServiceState(HAServiceProtocol.HAServiceState.STANDBY);
if (state == HAServiceProtocol.HAServiceState.ACTIVE) {
stopActiveServices();
reinitialize(initialize);
}
LOG.info("Transitioned to standby state");
} @Override
protected void serviceStart() throws Exception {
if (this.rmContext.isHAEnabled()) {
transitionToStandby(true);
} else {
transitionToActive();
} startWepApp();
if (getConfig().getBoolean(YarnConfiguration.IS_MINI_YARN_CLUSTER,
false)) {
int port = webApp.port();
WebAppUtils.setRMWebAppPort(conf, port);
}
super.serviceStart();
} protected void doSecureLogin() throws IOException {
InetSocketAddress socAddr = getBindAddress(conf);
SecurityUtil.login(this.conf, YarnConfiguration.RM_KEYTAB,
YarnConfiguration.RM_PRINCIPAL, socAddr.getHostName()); // if security is enable, set rmLoginUGI as UGI of loginUser
if (UserGroupInformation.isSecurityEnabled()) {
this.rmLoginUGI = UserGroupInformation.getLoginUser();
}
} @Override
protected void serviceStop() throws Exception {
if (webApp != null) {
webApp.stop();
}
if (fetcher != null) {
fetcher.stop();
}
if (configurationProvider != null) {
configurationProvider.close();
}
super.serviceStop();
transitionToStandby(false);
rmContext.setHAServiceState(HAServiceState.STOPPING);
} protected ResourceTrackerService createResourceTrackerService() {
return new ResourceTrackerService(this.rmContext, this.nodesListManager,
this.nmLivelinessMonitor,
this.rmContext.getContainerTokenSecretManager(),
this.rmContext.getNMTokenSecretManager());
} protected ClientRMService createClientRMService() {
return new ClientRMService(this.rmContext, scheduler, this.rmAppManager,
this.applicationACLsManager, this.queueACLsManager,
this.rmContext.getRMDelegationTokenSecretManager());
} protected ApplicationMasterService createApplicationMasterService() {
return new ApplicationMasterService(this.rmContext, scheduler);
} protected AdminService createAdminService() {
return new AdminService(this, rmContext);
} protected RMSecretManagerService createRMSecretManagerService() {
return new RMSecretManagerService(conf, rmContext);
} @Private
public ClientRMService getClientRMService() {
return this.clientRM;
} /**
* return the scheduler.
* @return the scheduler for the Resource Manager.
*/
@Private
public ResourceScheduler getResourceScheduler() {
return this.scheduler;
} /**
* return the resource tracking component.
* @return the resource tracking component.
*/
@Private
public ResourceTrackerService getResourceTrackerService() {
return this.resourceTracker;
} @Private
public ApplicationMasterService getApplicationMasterService() {
return this.masterService;
} @Private
public ApplicationACLsManager getApplicationACLsManager() {
return this.applicationACLsManager;
} @Private
public QueueACLsManager getQueueACLsManager() {
return this.queueACLsManager;
} @Private
WebApp getWebapp() {
return this.webApp;
} @Override
public void recover(RMState state) throws Exception {
// recover RMdelegationTokenSecretManager
rmContext.getRMDelegationTokenSecretManager().recover(state); // recover AMRMTokenSecretManager
rmContext.getAMRMTokenSecretManager().recover(state); // recover applications
rmAppManager.recover(state); setSchedulerRecoveryStartAndWaitTime(state, conf);
} /*main函数中主要分析服务初始化和服务启动,RM是个综合服务类继承结构CompositeService->AbstractService,RM初始化是会先进入父类的init函数,
* AbstractService抽取了服务的基本操作如start、stop、close,只要我们的服务覆盖serviceStart、serviceStop、serviceInit等函数就可以控制自己的服务了,
* 这相当于对服务做了统一的管理。
*/
public static void main(String argv[]) {
//未捕获异常处理类
Thread.setDefaultUncaughtExceptionHandler(new YarnUncaughtExceptionHandler());
StringUtils.startupShutdownMessage(ResourceManager.class, argv, LOG);
try {
//载入控制文件
Configuration conf = new YarnConfiguration();
//创建空RM对象,并未包含任何服务,也未启动
GenericOptionsParser hParser = new GenericOptionsParser(conf, argv);
argv = hParser.getRemainingArgs();
// If -format-state-store, then delete RMStateStore; else startup normally
if (argv.length == 1 && argv[0].equals("-format-state-store")) {
deleteRMStateStore(conf);
} else {
ResourceManager resourceManager = new ResourceManager();
//添加关闭钩子
ShutdownHookManager.get().addShutdownHook(
new CompositeServiceShutdownHook(resourceManager),
SHUTDOWN_HOOK_PRIORITY);
//初始化服务 ,会调用父类AbstractService的init函数,该函数内部调用serviceInit函数,实际上调用的是ResourceManager的serviceInit函数
resourceManager.init(conf);
//启动RM
resourceManager.start();
}
} catch (Throwable t) {
LOG.fatal("Error starting ResourceManager", t);
System.exit(-1);
}
} /**
* Register the handlers for alwaysOn services
*/
private Dispatcher setupDispatcher() {
Dispatcher dispatcher = createDispatcher();
dispatcher.register(RMFatalEventType.class,
new ResourceManager.RMFatalEventDispatcher());
return dispatcher;
} private void resetDispatcher() {
Dispatcher dispatcher = setupDispatcher();
((Service)dispatcher).init(this.conf);
((Service)dispatcher).start();
removeService((Service)rmDispatcher);
// Need to stop previous rmDispatcher before assigning new dispatcher
// otherwise causes "AsyncDispatcher event handler" thread leak
((Service) rmDispatcher).stop();
rmDispatcher = dispatcher;
addIfService(rmDispatcher);
rmContext.setDispatcher(rmDispatcher);
} private void setSchedulerRecoveryStartAndWaitTime(RMState state,
Configuration conf) {
if (!state.getApplicationState().isEmpty()) {
long waitTime =
conf.getLong(YarnConfiguration.RM_WORK_PRESERVING_RECOVERY_SCHEDULING_WAIT_MS,
YarnConfiguration.DEFAULT_RM_WORK_PRESERVING_RECOVERY_SCHEDULING_WAIT_MS);
rmContext.setSchedulerRecoveryStartAndWaitTime(waitTime);
}
} /**
* Retrieve RM bind address from configuration
*
* @param conf
* @return InetSocketAddress
*/
public static InetSocketAddress getBindAddress(Configuration conf) {
return conf.getSocketAddr(YarnConfiguration.RM_ADDRESS,
YarnConfiguration.DEFAULT_RM_ADDRESS, YarnConfiguration.DEFAULT_RM_PORT);
} /**
* Deletes the RMStateStore
*
* @param conf
* @throws Exception
*/
private static void deleteRMStateStore(Configuration conf) throws Exception {
RMStateStore rmStore = RMStateStoreFactory.getStore(conf);
rmStore.init(conf);
rmStore.start();
try {
LOG.info("Deleting ResourceManager state store...");
rmStore.deleteStore();
LOG.info("State store deleted");
} finally {
rmStore.stop();
}
}
}
ResourceManager.java
我们可以从main()函数开始分析,main()函数内部调用resourceManager.init(conf),该函数初始化服务 ,会调用父类AbstractService的init函数,该函数内部调用serviceInit函数,实际上调用的是ResourceManager的serviceInit函数。且ResourceManager的serviceInit函数内部会调用createAndInitActiveServices(),该函数创建并初始化ResourceManager的内部类RMActiveServices,该函数内部会调用activeServices.init(conf),即最后调用的是ResourceManager类的内部类RMActiveServices类的serviceInit函数。serviceInit函数内部调用rmDispatcher.register(RMAppEventType.class, new ApplicationEventDispatcher(rmContext)),即注册RMAppEvent事件的事件处理器,与前面的RMAppManager类的RMAppEventType.START呼应,即RMAppManager往异步处理器增加个RMAppEvent事件,类型枚值RMAppEventType.START,所以由ApplicationEventDispatcher(rmContext)来处理。 其中ApplicationEventDispatcher类是ResourceManager类的一个内部类,它的handle方法内会调用rmApp.handle(event), rmApp是RMApp接口类的对象,这里是它的实现类RMAppImpl的对象,即调用的是RMAppImpl类的handle方法,该函数内部会调用this.stateMachine.doTransition(event.getType(), event),其实在RMAppImpl类构造函数里有this.stateMachine = stateMachineFactory.make(this), stateMachine通过状态工厂创建,状态工厂核心addTransition,这个stateMachine是个状态机工厂,其中绑定了很多的事件转换。 各种状态转变对应的处理器,有个submit应该是对应到MAppEventType.START ,在RMAppImpl类内部有.addTransition(RMAppState.NEW, RMAppState.NEW_SAVING, RMAppEventType.START, new RMAppNewlySavingTransition()), 意思就是接受RMAppEventType.START类型的事件,已经捕捉了RMAppEventType.START事件, 会把RMApp的状态从NEW变成NEW_SAVING, 调用回调类是RMAppNewlySavingTransition。 参考
其中addTransition()方法是StateMachineFactory类的方法。在addTransition函数中,就将第二个参数postState传给了新构建的内部类SingleInternalArc。
/**
* State machine topology.
* This object is semantically immutable. If you have a
* StateMachineFactory there's no operation in the API that changes
* its semantic properties.
*
* @param <OPERAND> The object type on which this state machine operates.
* @param <STATE> The state of the entity.
* @param <EVENTTYPE> The external eventType to be handled.
* @param <EVENT> The event object.
*
*/
@Public
@Evolving
final public class StateMachineFactory
<OPERAND, STATE extends Enum<STATE>,
EVENTTYPE extends Enum<EVENTTYPE>, EVENT> { private final TransitionsListNode transitionsListNode; private Map<STATE, Map<EVENTTYPE,
Transition<OPERAND, STATE, EVENTTYPE, EVENT>>> stateMachineTable; private STATE defaultInitialState; private final boolean optimized; /**
* Constructor
*
* This is the only constructor in the API.
*
*/
public StateMachineFactory(STATE defaultInitialState) {
this.transitionsListNode = null;
this.defaultInitialState = defaultInitialState;
this.optimized = false;
this.stateMachineTable = null;
} private StateMachineFactory
(StateMachineFactory<OPERAND, STATE, EVENTTYPE, EVENT> that,
ApplicableTransition<OPERAND, STATE, EVENTTYPE, EVENT> t) {
this.defaultInitialState = that.defaultInitialState;
this.transitionsListNode
= new TransitionsListNode(t, that.transitionsListNode);
this.optimized = false;
this.stateMachineTable = null;
} private StateMachineFactory
(StateMachineFactory<OPERAND, STATE, EVENTTYPE, EVENT> that,
boolean optimized) {
this.defaultInitialState = that.defaultInitialState;
this.transitionsListNode = that.transitionsListNode;
this.optimized = optimized;
if (optimized) {
makeStateMachineTable();
} else {
stateMachineTable = null;
}
} private interface ApplicableTransition
<OPERAND, STATE extends Enum<STATE>,
EVENTTYPE extends Enum<EVENTTYPE>, EVENT> {
void apply(StateMachineFactory<OPERAND, STATE, EVENTTYPE, EVENT> subject);
} private class TransitionsListNode {
final ApplicableTransition<OPERAND, STATE, EVENTTYPE, EVENT> transition;
final TransitionsListNode next; TransitionsListNode
(ApplicableTransition<OPERAND, STATE, EVENTTYPE, EVENT> transition,
TransitionsListNode next) {
this.transition = transition;
this.next = next;
}
} static private class ApplicableSingleOrMultipleTransition
<OPERAND, STATE extends Enum<STATE>,
EVENTTYPE extends Enum<EVENTTYPE>, EVENT>
implements ApplicableTransition<OPERAND, STATE, EVENTTYPE, EVENT> {
final STATE preState;
final EVENTTYPE eventType;
final Transition<OPERAND, STATE, EVENTTYPE, EVENT> transition; ApplicableSingleOrMultipleTransition
(STATE preState, EVENTTYPE eventType,
Transition<OPERAND, STATE, EVENTTYPE, EVENT> transition) {
this.preState = preState;
this.eventType = eventType;
this.transition = transition;
} @Override
public void apply
(StateMachineFactory<OPERAND, STATE, EVENTTYPE, EVENT> subject) {
Map<EVENTTYPE, Transition<OPERAND, STATE, EVENTTYPE, EVENT>> transitionMap
= subject.stateMachineTable.get(preState);
if (transitionMap == null) {
// I use HashMap here because I would expect most EVENTTYPE's to not
// apply out of a particular state, so FSM sizes would be
// quadratic if I use EnumMap's here as I do at the top level.
transitionMap = new HashMap<EVENTTYPE,
Transition<OPERAND, STATE, EVENTTYPE, EVENT>>();
subject.stateMachineTable.put(preState, transitionMap);
}
transitionMap.put(eventType, transition);
}
} /**
* @return a NEW StateMachineFactory just like {@code this} with the current
* transition added as a new legal transition. This overload
* has no hook object.
*
* Note that the returned StateMachineFactory is a distinct
* object.
*
* This method is part of the API.
*
* @param preState pre-transition state
* @param postState post-transition state
* @param eventType stimulus for the transition
*/
public StateMachineFactory
<OPERAND, STATE, EVENTTYPE, EVENT>
addTransition(STATE preState, STATE postState, EVENTTYPE eventType) {
return addTransition(preState, postState, eventType, null);
} /**
* @return a NEW StateMachineFactory just like {@code this} with the current
* transition added as a new legal transition. This overload
* has no hook object.
*
*
* Note that the returned StateMachineFactory is a distinct
* object.
*
* This method is part of the API.
*
* @param preState pre-transition state
* @param postState post-transition state
* @param eventTypes List of stimuli for the transitions
*/
public StateMachineFactory<OPERAND, STATE, EVENTTYPE, EVENT> addTransition(
STATE preState, STATE postState, Set<EVENTTYPE> eventTypes) {
return addTransition(preState, postState, eventTypes, null);
} /**
* @return a NEW StateMachineFactory just like {@code this} with the current
* transition added as a new legal transition
*
* Note that the returned StateMachineFactory is a distinct
* object.
*
* This method is part of the API.
*
* @param preState pre-transition state
* @param postState post-transition state
* @param eventTypes List of stimuli for the transitions
* @param hook transition hook
*/
public StateMachineFactory<OPERAND, STATE, EVENTTYPE, EVENT> addTransition(
STATE preState, STATE postState, Set<EVENTTYPE> eventTypes,
SingleArcTransition<OPERAND, EVENT> hook) {
StateMachineFactory<OPERAND, STATE, EVENTTYPE, EVENT> factory = null;
for (EVENTTYPE event : eventTypes) {
if (factory == null) {
factory = addTransition(preState, postState, event, hook);
} else {
factory = factory.addTransition(preState, postState, event, hook);
}
}
return factory;
} /**
* @return a NEW StateMachineFactory just like {@code this} with the current
* transition added as a new legal transition
*
* Note that the returned StateMachineFactory is a distinct object.
*
* This method is part of the API.
*
* @param preState pre-transition state
* @param postState post-transition state
* @param eventType stimulus for the transition
* @param hook transition hook
*/
//
public StateMachineFactory
<OPERAND, STATE, EVENTTYPE, EVENT>
addTransition(STATE preState, STATE postState,
EVENTTYPE eventType,
SingleArcTransition<OPERAND, EVENT> hook){
return new StateMachineFactory<OPERAND, STATE, EVENTTYPE, EVENT>
(this, new ApplicableSingleOrMultipleTransition<OPERAND, STATE, EVENTTYPE, EVENT>
(preState, eventType, new SingleInternalArc(postState, hook)));
} /**
* @return a NEW StateMachineFactory just like {@code this} with the current
* transition added as a new legal transition
*
* Note that the returned StateMachineFactory is a distinct object.
*
* This method is part of the API.
*
* @param preState pre-transition state
* @param postStates valid post-transition states
* @param eventType stimulus for the transition
* @param hook transition hook
*/
public StateMachineFactory
<OPERAND, STATE, EVENTTYPE, EVENT>
addTransition(STATE preState, Set<STATE> postStates,
EVENTTYPE eventType,
MultipleArcTransition<OPERAND, EVENT, STATE> hook){
return new StateMachineFactory<OPERAND, STATE, EVENTTYPE, EVENT>
(this,
new ApplicableSingleOrMultipleTransition<OPERAND, STATE, EVENTTYPE, EVENT>
(preState, eventType, new MultipleInternalArc(postStates, hook)));
} /**
* @return a StateMachineFactory just like {@code this}, except that if
* you won't need any synchronization to build a state machine
*
* Note that the returned StateMachineFactory is a distinct object.
*
* This method is part of the API.
*
* The only way you could distinguish the returned
* StateMachineFactory from {@code this} would be by
* measuring the performance of the derived
* {@code StateMachine} you can get from it.
*
* Calling this is optional. It doesn't change the semantics of the factory,
* if you call it then when you use the factory there is no synchronization.
*/
public StateMachineFactory
<OPERAND, STATE, EVENTTYPE, EVENT>
installTopology() {
return new StateMachineFactory<OPERAND, STATE, EVENTTYPE, EVENT>(this, true);
} /**
* Effect a transition due to the effecting stimulus.
* @param state current state
* @param eventType trigger to initiate the transition
* @param cause causal eventType context
* @return transitioned state
*/
private STATE doTransition
(OPERAND operand, STATE oldState, EVENTTYPE eventType, EVENT event)
throws InvalidStateTransitonException {
// We can assume that stateMachineTable is non-null because we call
// maybeMakeStateMachineTable() when we build an InnerStateMachine ,
// and this code only gets called from inside a working InnerStateMachine .
Map<EVENTTYPE, Transition<OPERAND, STATE, EVENTTYPE, EVENT>> transitionMap
= stateMachineTable.get(oldState);
if (transitionMap != null) {
Transition<OPERAND, STATE, EVENTTYPE, EVENT> transition
= transitionMap.get(eventType);
if (transition != null) {
return transition.doTransition(operand, oldState, event, eventType);
}
}
throw new InvalidStateTransitonException(oldState, eventType);
} private synchronized void maybeMakeStateMachineTable() {
if (stateMachineTable == null) {
makeStateMachineTable();
}
} private void makeStateMachineTable() {
Stack<ApplicableTransition<OPERAND, STATE, EVENTTYPE, EVENT>> stack =
new Stack<ApplicableTransition<OPERAND, STATE, EVENTTYPE, EVENT>>(); Map<STATE, Map<EVENTTYPE, Transition<OPERAND, STATE, EVENTTYPE, EVENT>>>
prototype = new HashMap<STATE, Map<EVENTTYPE, Transition<OPERAND, STATE, EVENTTYPE, EVENT>>>(); prototype.put(defaultInitialState, null); // I use EnumMap here because it'll be faster and denser. I would
// expect most of the states to have at least one transition.
stateMachineTable
= new EnumMap<STATE, Map<EVENTTYPE,
Transition<OPERAND, STATE, EVENTTYPE, EVENT>>>(prototype); for (TransitionsListNode cursor = transitionsListNode;
cursor != null;
cursor = cursor.next) {
stack.push(cursor.transition);
} while (!stack.isEmpty()) {
stack.pop().apply(this);
}
} private interface Transition<OPERAND, STATE extends Enum<STATE>,
EVENTTYPE extends Enum<EVENTTYPE>, EVENT> {
STATE doTransition(OPERAND operand, STATE oldState,
EVENT event, EVENTTYPE eventType);
} private class SingleInternalArc
implements Transition<OPERAND, STATE, EVENTTYPE, EVENT> { private STATE postState;
private SingleArcTransition<OPERAND, EVENT> hook; // transition hook SingleInternalArc(STATE postState,
SingleArcTransition<OPERAND, EVENT> hook) {
this.postState = postState;
this.hook = hook;
} @Override
public STATE doTransition(OPERAND operand, STATE oldState,
EVENT event, EVENTTYPE eventType) {
if (hook != null) {
hook.transition(operand, event);
}
return postState;
}
} private class MultipleInternalArc
implements Transition<OPERAND, STATE, EVENTTYPE, EVENT>{ // Fields
private Set<STATE> validPostStates;
private MultipleArcTransition<OPERAND, EVENT, STATE> hook; // transition hook MultipleInternalArc(Set<STATE> postStates,
MultipleArcTransition<OPERAND, EVENT, STATE> hook) {
this.validPostStates = postStates;
this.hook = hook;
} @Override
public STATE doTransition(OPERAND operand, STATE oldState,
EVENT event, EVENTTYPE eventType)
throws InvalidStateTransitonException {
STATE postState = hook.transition(operand, event); if (!validPostStates.contains(postState)) {
throw new InvalidStateTransitonException(oldState, eventType);
}
return postState;
}
} /*
* @return a {@link StateMachine} that starts in
* {@code initialState} and whose {@link Transition} s are
* applied to {@code operand} .
*
* This is part of the API.
*
* @param operand the object upon which the returned
* {@link StateMachine} will operate.
* @param initialState the state in which the returned
* {@link StateMachine} will start.
*
*/
public StateMachine<STATE, EVENTTYPE, EVENT>
make(OPERAND operand, STATE initialState) {
return new InternalStateMachine(operand, initialState);
} /*
* @return a {@link StateMachine} that starts in the default initial
* state and whose {@link Transition} s are applied to
* {@code operand} .
*
* This is part of the API.
*
* @param operand the object upon which the returned
* {@link StateMachine} will operate.
*
*/
public StateMachine<STATE, EVENTTYPE, EVENT> make(OPERAND operand) {
return new InternalStateMachine(operand, defaultInitialState);
} private class InternalStateMachine
implements StateMachine<STATE, EVENTTYPE, EVENT> {
private final OPERAND operand;
private STATE currentState; InternalStateMachine(OPERAND operand, STATE initialState) {
this.operand = operand;
this.currentState = initialState;
if (!optimized) {
maybeMakeStateMachineTable();
}
} @Override
public synchronized STATE getCurrentState() {
return currentState;
} @Override
public synchronized STATE doTransition(EVENTTYPE eventType, EVENT event)
throws InvalidStateTransitonException {
currentState = StateMachineFactory.this.doTransition
(operand, currentState, eventType, event);
return currentState;
}
} /**
* Generate a graph represents the state graph of this StateMachine
* @param name graph name
* @return Graph object generated
*/
@SuppressWarnings("rawtypes")
public Graph generateStateGraph(String name) {
maybeMakeStateMachineTable();
Graph g = new Graph(name);
for (STATE startState : stateMachineTable.keySet()) {
Map<EVENTTYPE, Transition<OPERAND, STATE, EVENTTYPE, EVENT>> transitions
= stateMachineTable.get(startState);
for (Entry<EVENTTYPE, Transition<OPERAND, STATE, EVENTTYPE, EVENT>> entry :
transitions.entrySet()) {
Transition<OPERAND, STATE, EVENTTYPE, EVENT> transition = entry.getValue();
if (transition instanceof StateMachineFactory.SingleInternalArc) {
StateMachineFactory.SingleInternalArc sa
= (StateMachineFactory.SingleInternalArc) transition;
Graph.Node fromNode = g.getNode(startState.toString());
Graph.Node toNode = g.getNode(sa.postState.toString());
fromNode.addEdge(toNode, entry.getKey().toString());
} else if (transition instanceof StateMachineFactory.MultipleInternalArc) {
StateMachineFactory.MultipleInternalArc ma
= (StateMachineFactory.MultipleInternalArc) transition;
Iterator iter = ma.validPostStates.iterator();
while (iter.hasNext()) {
Graph.Node fromNode = g.getNode(startState.toString());
Graph.Node toNode = g.getNode(iter.next().toString());
fromNode.addEdge(toNode, entry.getKey().toString());
}
}
}
}
return g;
}
}
StateMachineFactory.java
StateMachineFactory.java 在 hadoop-2.7.3-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/StateMachineFactory.java
它调用的是StateMachineFactory的addTransition()函数,
//StateMachineFactory.java
public StateMachineFactory
<OPERAND, STATE, EVENTTYPE, EVENT>
addTransition(STATE preState, STATE postState,
EVENTTYPE eventType,
SingleArcTransition<OPERAND, EVENT> hook){
return new StateMachineFactory<OPERAND, STATE, EVENTTYPE, EVENT>
(this, new ApplicableSingleOrMultipleTransition<OPERAND, STATE, EVENTTYPE, EVENT>
(preState, eventType, new SingleInternalArc(postState, hook)));
}
addTransition()方法内部会调用 new StateMachineFactory<OPERAND, STATE, EVENTTYPE, EVENT> (this, new ApplicableSingleOrMultipleTransition<OPERAND, STATE, EVENTTYPE, EVENT> (preState, eventType, new SingleInternalArc(postState, hook))), 其中初始化的内部类SingleInternalArc中,保存了状态转换之后的值postState,此时的值就是RMAppState.NEW_SAVING。 也保存了回调函数hook=RMAppNewlySavingTransition。
之后就该返回到RMAppImpl类的handle函数中,调用this.stateMachine.doTransition(event.getType(), event), 进入到StateMachineFactory类中内部接口类Transition的doTransition方法, 再调用StateMachineFactory类的doTransition方法。
到 return transition.doTransition(operand, oldState, event, eventType), 其中oldState=RMAppState.NEW, transition=org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc。 进入内部类SingleInternalArc的函数doTransition中。该方法内部
会调用此处调用的hook.transition(operand, event), 并且最后返回值return postState;就是上面保存的RMAppState.NEW_SAVING,到上层保存在变量currentState里,即返回到RMAppImpl类的handle函数中,这个变量在RMAppImpl中被get函数getState()获取。
即if (oldState != getState()) { LOG.info(appID + " State change from " + oldState + " to " + getState()); }, 打印出来状态由 NEW 转变成 NEW_SAVING 。 例如:
2017-02-20 22:59:07,702 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1487602693580_0001 State change from NEW to NEW_SAVING
此时就完成了一次状态机的状态转变。
再回到hook变量,hook初始化的就是上面状态机绑定的回调类RMAppNewlySavingTransition。再次回调到了RMAppImpl类中的内部类RMAppNewlySavingTransition的transition函数里, 该函数内部会调用
app.rmContext.getStateStore().storeNewApplication(app), 其中app.rmContext=RMContextImpl , app.rmContext.getStateStore()=RMStateStore , 进入RMStateStore类的storeNewApplication函数里,
@Private
@Unstable
/**
* Base class to implement storage of ResourceManager state.
* Takes care of asynchronous notifications and interfacing with YARN objects.
* Real store implementations need to derive from it and implement blocking
* store and load methods to actually store and load the state.
*/
public abstract class RMStateStore extends AbstractService { // constants for RM App state and RMDTSecretManagerState.
protected static final String RM_APP_ROOT = "RMAppRoot";
protected static final String RM_DT_SECRET_MANAGER_ROOT = "RMDTSecretManagerRoot";
protected static final String DELEGATION_KEY_PREFIX = "DelegationKey_";
protected static final String DELEGATION_TOKEN_PREFIX = "RMDelegationToken_";
protected static final String DELEGATION_TOKEN_SEQUENCE_NUMBER_PREFIX =
"RMDTSequenceNumber_";
protected static final String AMRMTOKEN_SECRET_MANAGER_ROOT =
"AMRMTokenSecretManagerRoot";
protected static final String VERSION_NODE = "RMVersionNode";
protected static final String EPOCH_NODE = "EpochNode";
private ResourceManager resourceManager;
private final ReadLock readLock;
private final WriteLock writeLock; public static final Log LOG = LogFactory.getLog(RMStateStore.class); /**
* The enum defines state of RMStateStore.
*/
public enum RMStateStoreState {
ACTIVE,
FENCED
}; private static final StateMachineFactory<RMStateStore,
RMStateStoreState,
RMStateStoreEventType,
RMStateStoreEvent>
stateMachineFactory = new StateMachineFactory<RMStateStore,
RMStateStoreState,
RMStateStoreEventType,
RMStateStoreEvent>(
RMStateStoreState.ACTIVE)
.addTransition(RMStateStoreState.ACTIVE,
EnumSet.of(RMStateStoreState.ACTIVE, RMStateStoreState.FENCED),
RMStateStoreEventType.STORE_APP, new StoreAppTransition())
.addTransition(RMStateStoreState.ACTIVE,
EnumSet.of(RMStateStoreState.ACTIVE, RMStateStoreState.FENCED),
RMStateStoreEventType.UPDATE_APP, new UpdateAppTransition())
.addTransition(RMStateStoreState.ACTIVE,
EnumSet.of(RMStateStoreState.ACTIVE, RMStateStoreState.FENCED),
RMStateStoreEventType.REMOVE_APP, new RemoveAppTransition())
.addTransition(RMStateStoreState.ACTIVE,
EnumSet.of(RMStateStoreState.ACTIVE, RMStateStoreState.FENCED),
RMStateStoreEventType.STORE_APP_ATTEMPT,
new StoreAppAttemptTransition())
.addTransition(RMStateStoreState.ACTIVE,
EnumSet.of(RMStateStoreState.ACTIVE, RMStateStoreState.FENCED),
RMStateStoreEventType.UPDATE_APP_ATTEMPT,
new UpdateAppAttemptTransition())
.addTransition(RMStateStoreState.ACTIVE,
EnumSet.of(RMStateStoreState.ACTIVE, RMStateStoreState.FENCED),
RMStateStoreEventType.STORE_MASTERKEY,
new StoreRMDTMasterKeyTransition())
.addTransition(RMStateStoreState.ACTIVE,
EnumSet.of(RMStateStoreState.ACTIVE, RMStateStoreState.FENCED),
RMStateStoreEventType.REMOVE_MASTERKEY,
new RemoveRMDTMasterKeyTransition())
.addTransition(RMStateStoreState.ACTIVE,
EnumSet.of(RMStateStoreState.ACTIVE, RMStateStoreState.FENCED),
RMStateStoreEventType.STORE_DELEGATION_TOKEN,
new StoreRMDTTransition())
.addTransition(RMStateStoreState.ACTIVE,
EnumSet.of(RMStateStoreState.ACTIVE, RMStateStoreState.FENCED),
RMStateStoreEventType.REMOVE_DELEGATION_TOKEN,
new RemoveRMDTTransition())
.addTransition(RMStateStoreState.ACTIVE,
EnumSet.of(RMStateStoreState.ACTIVE, RMStateStoreState.FENCED),
RMStateStoreEventType.UPDATE_DELEGATION_TOKEN,
new UpdateRMDTTransition())
.addTransition(RMStateStoreState.ACTIVE,
EnumSet.of(RMStateStoreState.ACTIVE, RMStateStoreState.FENCED),
RMStateStoreEventType.UPDATE_AMRM_TOKEN,
new StoreOrUpdateAMRMTokenTransition())
.addTransition(RMStateStoreState.ACTIVE, RMStateStoreState.FENCED,
RMStateStoreEventType.FENCED)
.addTransition(RMStateStoreState.FENCED, RMStateStoreState.FENCED,
EnumSet.of(
RMStateStoreEventType.STORE_APP,
RMStateStoreEventType.UPDATE_APP,
RMStateStoreEventType.REMOVE_APP,
RMStateStoreEventType.STORE_APP_ATTEMPT,
RMStateStoreEventType.UPDATE_APP_ATTEMPT,
RMStateStoreEventType.FENCED,
RMStateStoreEventType.STORE_MASTERKEY,
RMStateStoreEventType.REMOVE_MASTERKEY,
RMStateStoreEventType.STORE_DELEGATION_TOKEN,
RMStateStoreEventType.REMOVE_DELEGATION_TOKEN,
RMStateStoreEventType.UPDATE_DELEGATION_TOKEN,
RMStateStoreEventType.UPDATE_AMRM_TOKEN)); private final StateMachine<RMStateStoreState,
RMStateStoreEventType,
RMStateStoreEvent> stateMachine; private static class StoreAppTransition
implements MultipleArcTransition<RMStateStore, RMStateStoreEvent,
RMStateStoreState> {
@Override
public RMStateStoreState transition(RMStateStore store,
RMStateStoreEvent event) {
if (!(event instanceof RMStateStoreAppEvent)) {
// should never happen
LOG.error("Illegal event type: " + event.getClass());
return RMStateStoreState.ACTIVE;
}
boolean isFenced = false;
ApplicationStateData appState =
((RMStateStoreAppEvent) event).getAppState();
ApplicationId appId =
appState.getApplicationSubmissionContext().getApplicationId();
LOG.info("Storing info for app: " + appId);
try {
store.storeApplicationStateInternal(appId, appState);
store.notifyApplication(new RMAppEvent(appId,
RMAppEventType.APP_NEW_SAVED));
} catch (Exception e) {
LOG.error("Error storing app: " + appId, e);
isFenced = store.notifyStoreOperationFailedInternal(e);
}
return finalState(isFenced);
};
} private static class UpdateAppTransition implements
MultipleArcTransition<RMStateStore, RMStateStoreEvent,
RMStateStoreState> {
@Override
public RMStateStoreState transition(RMStateStore store,
RMStateStoreEvent event) {
if (!(event instanceof RMStateUpdateAppEvent)) {
// should never happen
LOG.error("Illegal event type: " + event.getClass());
return RMStateStoreState.ACTIVE;
}
boolean isFenced = false;
ApplicationStateData appState =
((RMStateUpdateAppEvent) event).getAppState();
ApplicationId appId =
appState.getApplicationSubmissionContext().getApplicationId();
LOG.info("Updating info for app: " + appId);
try {
store.updateApplicationStateInternal(appId, appState);
store.notifyApplication(new RMAppEvent(appId,
RMAppEventType.APP_UPDATE_SAVED));
} catch (Exception e) {
LOG.error("Error updating app: " + appId, e);
isFenced = store.notifyStoreOperationFailedInternal(e);
}
return finalState(isFenced);
};
} private static class RemoveAppTransition implements
MultipleArcTransition<RMStateStore, RMStateStoreEvent,
RMStateStoreState> {
@Override
public RMStateStoreState transition(RMStateStore store,
RMStateStoreEvent event) {
if (!(event instanceof RMStateStoreRemoveAppEvent)) {
// should never happen
LOG.error("Illegal event type: " + event.getClass());
return RMStateStoreState.ACTIVE;
}
boolean isFenced = false;
ApplicationStateData appState =
((RMStateStoreRemoveAppEvent) event).getAppState();
ApplicationId appId =
appState.getApplicationSubmissionContext().getApplicationId();
LOG.info("Removing info for app: " + appId);
try {
store.removeApplicationStateInternal(appState);
} catch (Exception e) {
LOG.error("Error removing app: " + appId, e);
isFenced = store.notifyStoreOperationFailedInternal(e);
}
return finalState(isFenced);
};
} private static class StoreAppAttemptTransition implements
MultipleArcTransition<RMStateStore, RMStateStoreEvent,
RMStateStoreState> {
@Override
public RMStateStoreState transition(RMStateStore store,
RMStateStoreEvent event) {
if (!(event instanceof RMStateStoreAppAttemptEvent)) {
// should never happen
LOG.error("Illegal event type: " + event.getClass());
return RMStateStoreState.ACTIVE;
}
boolean isFenced = false;
ApplicationAttemptStateData attemptState =
((RMStateStoreAppAttemptEvent) event).getAppAttemptState();
try {
if (LOG.isDebugEnabled()) {
LOG.debug("Storing info for attempt: " + attemptState.getAttemptId());
}
store.storeApplicationAttemptStateInternal(attemptState.getAttemptId(),
attemptState);
store.notifyApplicationAttempt(new RMAppAttemptEvent
(attemptState.getAttemptId(),
RMAppAttemptEventType.ATTEMPT_NEW_SAVED));
} catch (Exception e) {
LOG.error("Error storing appAttempt: " + attemptState.getAttemptId(), e);
isFenced = store.notifyStoreOperationFailedInternal(e);
}
return finalState(isFenced);
};
} private static class UpdateAppAttemptTransition implements
MultipleArcTransition<RMStateStore, RMStateStoreEvent,
RMStateStoreState> {
@Override
public RMStateStoreState transition(RMStateStore store,
RMStateStoreEvent event) {
if (!(event instanceof RMStateUpdateAppAttemptEvent)) {
// should never happen
LOG.error("Illegal event type: " + event.getClass());
return RMStateStoreState.ACTIVE;
}
boolean isFenced = false;
ApplicationAttemptStateData attemptState =
((RMStateUpdateAppAttemptEvent) event).getAppAttemptState();
try {
if (LOG.isDebugEnabled()) {
LOG.debug("Updating info for attempt: " + attemptState.getAttemptId());
}
store.updateApplicationAttemptStateInternal(attemptState.getAttemptId(),
attemptState);
store.notifyApplicationAttempt(new RMAppAttemptEvent
(attemptState.getAttemptId(),
RMAppAttemptEventType.ATTEMPT_UPDATE_SAVED));
} catch (Exception e) {
LOG.error("Error updating appAttempt: " + attemptState.getAttemptId(), e);
isFenced = store.notifyStoreOperationFailedInternal(e);
}
return finalState(isFenced);
};
} private static class StoreRMDTTransition implements
MultipleArcTransition<RMStateStore, RMStateStoreEvent,
RMStateStoreState> {
@Override
public RMStateStoreState transition(RMStateStore store,
RMStateStoreEvent event) {
if (!(event instanceof RMStateStoreRMDTEvent)) {
// should never happen
LOG.error("Illegal event type: " + event.getClass());
return RMStateStoreState.ACTIVE;
}
boolean isFenced = false;
RMStateStoreRMDTEvent dtEvent = (RMStateStoreRMDTEvent) event;
try {
LOG.info("Storing RMDelegationToken and SequenceNumber");
store.storeRMDelegationTokenState(
dtEvent.getRmDTIdentifier(), dtEvent.getRenewDate());
} catch (Exception e) {
LOG.error("Error While Storing RMDelegationToken and SequenceNumber ",
e);
isFenced = store.notifyStoreOperationFailedInternal(e);
}
return finalState(isFenced);
}
} private static class RemoveRMDTTransition implements
MultipleArcTransition<RMStateStore, RMStateStoreEvent,
RMStateStoreState> {
@Override
public RMStateStoreState transition(RMStateStore store,
RMStateStoreEvent event) {
if (!(event instanceof RMStateStoreRMDTEvent)) {
// should never happen
LOG.error("Illegal event type: " + event.getClass());
return RMStateStoreState.ACTIVE;
}
boolean isFenced = false;
RMStateStoreRMDTEvent dtEvent = (RMStateStoreRMDTEvent) event;
try {
LOG.info("Removing RMDelegationToken and SequenceNumber");
store.removeRMDelegationTokenState(dtEvent.getRmDTIdentifier());
} catch (Exception e) {
LOG.error("Error While Removing RMDelegationToken and SequenceNumber ",
e);
isFenced = store.notifyStoreOperationFailedInternal(e);
}
return finalState(isFenced);
}
} private static class UpdateRMDTTransition implements
MultipleArcTransition<RMStateStore, RMStateStoreEvent,
RMStateStoreState> {
@Override
public RMStateStoreState transition(RMStateStore store,
RMStateStoreEvent event) {
if (!(event instanceof RMStateStoreRMDTEvent)) {
// should never happen
LOG.error("Illegal event type: " + event.getClass());
return RMStateStoreState.ACTIVE;
}
boolean isFenced = false;
RMStateStoreRMDTEvent dtEvent = (RMStateStoreRMDTEvent) event;
try {
LOG.info("Updating RMDelegationToken and SequenceNumber");
store.updateRMDelegationTokenState(
dtEvent.getRmDTIdentifier(), dtEvent.getRenewDate());
} catch (Exception e) {
LOG.error("Error While Updating RMDelegationToken and SequenceNumber ",
e);
isFenced = store.notifyStoreOperationFailedInternal(e);
}
return finalState(isFenced);
}
} private static class StoreRMDTMasterKeyTransition implements
MultipleArcTransition<RMStateStore, RMStateStoreEvent,
RMStateStoreState> {
@Override
public RMStateStoreState transition(RMStateStore store,
RMStateStoreEvent event) {
if (!(event instanceof RMStateStoreRMDTMasterKeyEvent)) {
// should never happen
LOG.error("Illegal event type: " + event.getClass());
return RMStateStoreState.ACTIVE;
}
boolean isFenced = false;
RMStateStoreRMDTMasterKeyEvent dtEvent =
(RMStateStoreRMDTMasterKeyEvent) event;
try {
LOG.info("Storing RMDTMasterKey.");
store.storeRMDTMasterKeyState(dtEvent.getDelegationKey());
} catch (Exception e) {
LOG.error("Error While Storing RMDTMasterKey.", e);
isFenced = store.notifyStoreOperationFailedInternal(e);
}
return finalState(isFenced);
}
} private static class RemoveRMDTMasterKeyTransition implements
MultipleArcTransition<RMStateStore, RMStateStoreEvent,
RMStateStoreState> {
@Override
public RMStateStoreState transition(RMStateStore store,
RMStateStoreEvent event) {
if (!(event instanceof RMStateStoreRMDTMasterKeyEvent)) {
// should never happen
LOG.error("Illegal event type: " + event.getClass());
return RMStateStoreState.ACTIVE;
}
boolean isFenced = false;
RMStateStoreRMDTMasterKeyEvent dtEvent =
(RMStateStoreRMDTMasterKeyEvent) event;
try {
LOG.info("Removing RMDTMasterKey.");
store.removeRMDTMasterKeyState(dtEvent.getDelegationKey());
} catch (Exception e) {
LOG.error("Error While Removing RMDTMasterKey.", e);
isFenced = store.notifyStoreOperationFailedInternal(e);
}
return finalState(isFenced);
}
} private static class StoreOrUpdateAMRMTokenTransition implements
MultipleArcTransition<RMStateStore, RMStateStoreEvent,
RMStateStoreState> {
@Override
public RMStateStoreState transition(RMStateStore store,
RMStateStoreEvent event) {
if (!(event instanceof RMStateStoreAMRMTokenEvent)) {
// should never happen
LOG.error("Illegal event type: " + event.getClass());
return RMStateStoreState.ACTIVE;
}
RMStateStoreAMRMTokenEvent amrmEvent = (RMStateStoreAMRMTokenEvent) event;
boolean isFenced = false;
try {
LOG.info("Updating AMRMToken");
store.storeOrUpdateAMRMTokenSecretManagerState(
amrmEvent.getAmrmTokenSecretManagerState(), amrmEvent.isUpdate());
} catch (Exception e) {
LOG.error("Error storing info for AMRMTokenSecretManager", e);
isFenced = store.notifyStoreOperationFailedInternal(e);
}
return finalState(isFenced);
}
} private static RMStateStoreState finalState(boolean isFenced) {
return isFenced ? RMStateStoreState.FENCED : RMStateStoreState.ACTIVE;
} public RMStateStore() {
super(RMStateStore.class.getName());
ReentrantReadWriteLock lock = new ReentrantReadWriteLock();
this.readLock = lock.readLock();
this.writeLock = lock.writeLock();
stateMachine = stateMachineFactory.make(this);
} public static class RMDTSecretManagerState {
// DTIdentifier -> renewDate
Map<RMDelegationTokenIdentifier, Long> delegationTokenState =
new HashMap<RMDelegationTokenIdentifier, Long>(); Set<DelegationKey> masterKeyState =
new HashSet<DelegationKey>(); int dtSequenceNumber = 0; public Map<RMDelegationTokenIdentifier, Long> getTokenState() {
return delegationTokenState;
} public Set<DelegationKey> getMasterKeyState() {
return masterKeyState;
} public int getDTSequenceNumber() {
return dtSequenceNumber;
}
} /**
* State of the ResourceManager
*/
public static class RMState {
Map<ApplicationId, ApplicationStateData> appState =
new TreeMap<ApplicationId, ApplicationStateData>(); RMDTSecretManagerState rmSecretManagerState = new RMDTSecretManagerState(); AMRMTokenSecretManagerState amrmTokenSecretManagerState = null; public Map<ApplicationId, ApplicationStateData> getApplicationState() {
return appState;
} public RMDTSecretManagerState getRMDTSecretManagerState() {
return rmSecretManagerState;
} public AMRMTokenSecretManagerState getAMRMTokenSecretManagerState() {
return amrmTokenSecretManagerState;
}
} private Dispatcher rmDispatcher; /**
* Dispatcher used to send state operation completion events to
* ResourceManager services
*/
public void setRMDispatcher(Dispatcher dispatcher) {
this.rmDispatcher = dispatcher;
} AsyncDispatcher dispatcher; @Override
protected void serviceInit(Configuration conf) throws Exception{
// create async handler
dispatcher = new AsyncDispatcher();
dispatcher.init(conf);
dispatcher.register(RMStateStoreEventType.class,
new ForwardingEventHandler());
dispatcher.setDrainEventsOnStop();
initInternal(conf);
} @Override
protected void serviceStart() throws Exception {
dispatcher.start();
startInternal();
} /**
* Derived classes initialize themselves using this method.
*/
protected abstract void initInternal(Configuration conf) throws Exception; /**
* Derived classes start themselves using this method.
* The base class is started and the event dispatcher is ready to use at
* this point
*/
protected abstract void startInternal() throws Exception; @Override
protected void serviceStop() throws Exception {
dispatcher.stop();
closeInternal();
} /**
* Derived classes close themselves using this method.
* The base class will be closed and the event dispatcher will be shutdown
* after this
*/
protected abstract void closeInternal() throws Exception; /**
* 1) Versioning scheme: major.minor. For e.g. 1.0, 1.1, 1.2...1.25, 2.0 etc.
* 2) Any incompatible change of state-store is a major upgrade, and any
* compatible change of state-store is a minor upgrade.
* 3) If theres's no version, treat it as CURRENT_VERSION_INFO.
* 4) Within a minor upgrade, say 1.1 to 1.2:
* overwrite the version info and proceed as normal.
* 5) Within a major upgrade, say 1.2 to 2.0:
* throw exception and indicate user to use a separate upgrade tool to
* upgrade RM state.
*/
public void checkVersion() throws Exception {
Version loadedVersion = loadVersion();
LOG.info("Loaded RM state version info " + loadedVersion);
if (loadedVersion != null && loadedVersion.equals(getCurrentVersion())) {
return;
}
// if there is no version info, treat it as CURRENT_VERSION_INFO;
if (loadedVersion == null) {
loadedVersion = getCurrentVersion();
}
if (loadedVersion.isCompatibleTo(getCurrentVersion())) {
LOG.info("Storing RM state version info " + getCurrentVersion());
storeVersion();
} else {
throw new RMStateVersionIncompatibleException(
"Expecting RM state version " + getCurrentVersion()
+ ", but loading version " + loadedVersion);
}
} /**
* Derived class use this method to load the version information from state
* store.
*/
protected abstract Version loadVersion() throws Exception; /**
* Derived class use this method to store the version information.
*/
protected abstract void storeVersion() throws Exception; /**
* Get the current version of the underlying state store.
*/
protected abstract Version getCurrentVersion(); /**
* Get the current epoch of RM and increment the value.
*/
public abstract long getAndIncrementEpoch() throws Exception; /**
* Blocking API
* The derived class must recover state from the store and return a new
* RMState object populated with that state
* This must not be called on the dispatcher thread
*/
public abstract RMState loadState() throws Exception; /**
* Non-Blocking API
* ResourceManager services use this to store the application's state
* This does not block the dispatcher threads
* RMAppStoredEvent will be sent on completion to notify the RMApp
*/
@SuppressWarnings("unchecked")
public void storeNewApplication(RMApp app) {
ApplicationSubmissionContext context = app
.getApplicationSubmissionContext();
assert context instanceof ApplicationSubmissionContextPBImpl;
ApplicationStateData appState =
ApplicationStateData.newInstance(
app.getSubmitTime(), app.getStartTime(), context, app.getUser());
dispatcher.getEventHandler().handle(new RMStateStoreAppEvent(appState));
} @SuppressWarnings("unchecked")
public void updateApplicationState(
ApplicationStateData appState) {
dispatcher.getEventHandler().handle(new RMStateUpdateAppEvent(appState));
} public void updateFencedState() {
handleStoreEvent(new RMStateStoreEvent(RMStateStoreEventType.FENCED));
} /**
* Blocking API
* Derived classes must implement this method to store the state of an
* application.
*/
protected abstract void storeApplicationStateInternal(ApplicationId appId,
ApplicationStateData appStateData) throws Exception; protected abstract void updateApplicationStateInternal(ApplicationId appId,
ApplicationStateData appStateData) throws Exception; @SuppressWarnings("unchecked")
/**
* Non-blocking API
* ResourceManager services call this to store state on an application attempt
* This does not block the dispatcher threads
* RMAppAttemptStoredEvent will be sent on completion to notify the RMAppAttempt
*/
public void storeNewApplicationAttempt(RMAppAttempt appAttempt) {
Credentials credentials = getCredentialsFromAppAttempt(appAttempt); AggregateAppResourceUsage resUsage =
appAttempt.getRMAppAttemptMetrics().getAggregateAppResourceUsage();
ApplicationAttemptStateData attemptState =
ApplicationAttemptStateData.newInstance(
appAttempt.getAppAttemptId(),
appAttempt.getMasterContainer(),
credentials, appAttempt.getStartTime(),
resUsage.getMemorySeconds(),
resUsage.getVcoreSeconds()); dispatcher.getEventHandler().handle(
new RMStateStoreAppAttemptEvent(attemptState));
} @SuppressWarnings("unchecked")
public void updateApplicationAttemptState(
ApplicationAttemptStateData attemptState) {
dispatcher.getEventHandler().handle(
new RMStateUpdateAppAttemptEvent(attemptState));
} /**
* Blocking API
* Derived classes must implement this method to store the state of an
* application attempt
*/
protected abstract void storeApplicationAttemptStateInternal(
ApplicationAttemptId attemptId,
ApplicationAttemptStateData attemptStateData) throws Exception; protected abstract void updateApplicationAttemptStateInternal(
ApplicationAttemptId attemptId,
ApplicationAttemptStateData attemptStateData) throws Exception; /**
* RMDTSecretManager call this to store the state of a delegation token
* and sequence number
*/
public void storeRMDelegationToken(
RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate) {
handleStoreEvent(new RMStateStoreRMDTEvent(rmDTIdentifier, renewDate,
RMStateStoreEventType.STORE_DELEGATION_TOKEN));
} /**
* Blocking API
* Derived classes must implement this method to store the state of
* RMDelegationToken and sequence number
*/
protected abstract void storeRMDelegationTokenState(
RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate)
throws Exception; /**
* RMDTSecretManager call this to remove the state of a delegation token
*/
public void removeRMDelegationToken(
RMDelegationTokenIdentifier rmDTIdentifier) {
handleStoreEvent(new RMStateStoreRMDTEvent(rmDTIdentifier, null,
RMStateStoreEventType.REMOVE_DELEGATION_TOKEN));
} /**
* Blocking API
* Derived classes must implement this method to remove the state of RMDelegationToken
*/
protected abstract void removeRMDelegationTokenState(
RMDelegationTokenIdentifier rmDTIdentifier) throws Exception; /**
* RMDTSecretManager call this to update the state of a delegation token
* and sequence number
*/
public void updateRMDelegationToken(
RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate) {
handleStoreEvent(new RMStateStoreRMDTEvent(rmDTIdentifier, renewDate,
RMStateStoreEventType.UPDATE_DELEGATION_TOKEN));
} /**
* Blocking API
* Derived classes must implement this method to update the state of
* RMDelegationToken and sequence number
*/
protected abstract void updateRMDelegationTokenState(
RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate)
throws Exception; /**
* RMDTSecretManager call this to store the state of a master key
*/
public void storeRMDTMasterKey(DelegationKey delegationKey) {
handleStoreEvent(new RMStateStoreRMDTMasterKeyEvent(delegationKey,
RMStateStoreEventType.STORE_MASTERKEY));
} /**
* Blocking API
* Derived classes must implement this method to store the state of
* DelegationToken Master Key
*/
protected abstract void storeRMDTMasterKeyState(DelegationKey delegationKey)
throws Exception; /**
* RMDTSecretManager call this to remove the state of a master key
*/
public void removeRMDTMasterKey(DelegationKey delegationKey) {
handleStoreEvent(new RMStateStoreRMDTMasterKeyEvent(delegationKey,
RMStateStoreEventType.REMOVE_MASTERKEY));
} /**
* Blocking API
* Derived classes must implement this method to remove the state of
* DelegationToken Master Key
*/
protected abstract void removeRMDTMasterKeyState(DelegationKey delegationKey)
throws Exception; /**
* Blocking API Derived classes must implement this method to store or update
* the state of AMRMToken Master Key
*/
protected abstract void storeOrUpdateAMRMTokenSecretManagerState(
AMRMTokenSecretManagerState amrmTokenSecretManagerState, boolean isUpdate)
throws Exception; /**
* Store or Update state of AMRMToken Master Key
*/
public void storeOrUpdateAMRMTokenSecretManager(
AMRMTokenSecretManagerState amrmTokenSecretManagerState, boolean isUpdate) {
handleStoreEvent(new RMStateStoreAMRMTokenEvent(
amrmTokenSecretManagerState, isUpdate,
RMStateStoreEventType.UPDATE_AMRM_TOKEN));
} /**
* Non-blocking API
* ResourceManager services call this to remove an application from the state
* store
* This does not block the dispatcher threads
* There is no notification of completion for this operation.
*/
@SuppressWarnings("unchecked")
public void removeApplication(RMApp app) {
ApplicationStateData appState =
ApplicationStateData.newInstance(
app.getSubmitTime(), app.getStartTime(),
app.getApplicationSubmissionContext(), app.getUser());
for(RMAppAttempt appAttempt : app.getAppAttempts().values()) {
appState.attempts.put(appAttempt.getAppAttemptId(), null);
} dispatcher.getEventHandler().handle(new RMStateStoreRemoveAppEvent(appState));
} /**
* Blocking API
* Derived classes must implement this method to remove the state of an
* application and its attempts
*/
protected abstract void removeApplicationStateInternal(
ApplicationStateData appState) throws Exception; // TODO: This should eventually become cluster-Id + "AM_RM_TOKEN_SERVICE". See
// YARN-1779
public static final Text AM_RM_TOKEN_SERVICE = new Text(
"AM_RM_TOKEN_SERVICE"); public static final Text AM_CLIENT_TOKEN_MASTER_KEY_NAME =
new Text("YARN_CLIENT_TOKEN_MASTER_KEY"); public Credentials getCredentialsFromAppAttempt(RMAppAttempt appAttempt) {
Credentials credentials = new Credentials(); SecretKey clientTokenMasterKey =
appAttempt.getClientTokenMasterKey();
if(clientTokenMasterKey != null){
credentials.addSecretKey(AM_CLIENT_TOKEN_MASTER_KEY_NAME,
clientTokenMasterKey.getEncoded());
}
return credentials;
} @VisibleForTesting
protected boolean isFencedState() {
return (RMStateStoreState.FENCED == getRMStateStoreState());
} // Dispatcher related code
protected void handleStoreEvent(RMStateStoreEvent event) {
this.writeLock.lock();
try { if (LOG.isDebugEnabled()) {
LOG.debug("Processing event of type " + event.getType());
} final RMStateStoreState oldState = getRMStateStoreState(); this.stateMachine.doTransition(event.getType(), event); if (oldState != getRMStateStoreState()) {
LOG.info("RMStateStore state change from " + oldState + " to "
+ getRMStateStoreState());
} } catch (InvalidStateTransitonException e) {
LOG.error("Can't handle this event at current state", e);
} finally {
this.writeLock.unlock();
}
} /**
* This method is called to notify the ResourceManager that the store
* operation has failed.
* @param failureCause the exception due to which the operation failed
*/
protected void notifyStoreOperationFailed(Exception failureCause) {
if (isFencedState()) {
return;
}
if (notifyStoreOperationFailedInternal(failureCause)) {
updateFencedState();
}
} @SuppressWarnings("unchecked")
private boolean notifyStoreOperationFailedInternal(
Exception failureCause) {
boolean isFenced = false;
LOG.error("State store operation failed ", failureCause);
if (HAUtil.isHAEnabled(getConfig())) {
LOG.warn("State-store fenced ! Transitioning RM to standby");
isFenced = true;
Thread standByTransitionThread =
new Thread(new StandByTransitionThread());
standByTransitionThread.setName("StandByTransitionThread Handler");
standByTransitionThread.start();
} else if (YarnConfiguration.shouldRMFailFast(getConfig())) {
LOG.fatal("Fail RM now due to state-store error!");
rmDispatcher.getEventHandler().handle(
new RMFatalEvent(RMFatalEventType.STATE_STORE_OP_FAILED,
failureCause));
} else {
LOG.warn("Skip the state-store error.");
}
return isFenced;
} @SuppressWarnings("unchecked")
/**
* This method is called to notify the application that
* new application is stored or updated in state store
* @param event App event containing the app id and event type
*/
private void notifyApplication(RMAppEvent event) {
rmDispatcher.getEventHandler().handle(event);
} @SuppressWarnings("unchecked")
/**
* This method is called to notify the application attempt
* that new attempt is stored or updated in state store
* @param event App attempt event containing the app attempt
* id and event type
*/
private void notifyApplicationAttempt(RMAppAttemptEvent event) {
rmDispatcher.getEventHandler().handle(event);
} /**
* EventHandler implementation which forward events to the FSRMStateStore
* This hides the EventHandle methods of the store from its public interface
*/
private final class ForwardingEventHandler
implements EventHandler<RMStateStoreEvent> { @Override
public void handle(RMStateStoreEvent event) {
handleStoreEvent(event);
}
} /**
* Derived classes must implement this method to delete the state store
* @throws Exception
*/
public abstract void deleteStore() throws Exception; public void setResourceManager(ResourceManager rm) {
this.resourceManager = rm;
} private class StandByTransitionThread implements Runnable {
@Override
public void run() {
LOG.info("RMStateStore has been fenced");
resourceManager.handleTransitionToStandBy();
}
} public RMStateStoreState getRMStateStoreState() {
this.readLock.lock();
try {
return this.stateMachine.getCurrentState();
} finally {
this.readLock.unlock();
}
}
}
RMStateStore.java
RMStateStore.java 在 hadoop-2.7.3-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
storeNewApplication函数内dispatcher.getEventHandler().handle(new RMStateStoreAppEvent(appState)), handle函数的参数new RMStateStoreAppEvent(appState),初始化了一个RMStateStoreEventType.STORE_APP事件。
public class RMStateStoreAppEvent extends RMStateStoreEvent { private final ApplicationStateData appState; public RMStateStoreAppEvent(ApplicationStateData appState) {
super(RMStateStoreEventType.STORE_APP);
this.appState = appState;
} public ApplicationStateData getAppState() {
return appState;
}
}
此处的dispatcher是RMStateStore类自己的变量,只在初始化时绑定了一个RMStateStoreEventType, 看RMStateStore类的serviceInit()函数, 该函数内部调用dispatcher.register(RMStateStoreEventType.class, new ForwardingEventHandler()), 调用的类是RMStateStore 类的内部类 ForwardingEventHandler, 该内部类的handle函数调用了函数handleStoreEvent(), 该函数内部调用 this.stateMachine.doTransition(event.getType(), event), 同前面一样,又会进入类StateMachineFactory中的内部类SingleInternalArc里, 不过这次的状态机工厂是RMStateStore类的内部变量,上面的状态机工厂是RMAppImpl类的,他们绑定的事件不同。 可以在RMStateStore类最开始.addTransition(RMStateStoreState.ACTIVE, EnumSet.of(RMStateStoreState.ACTIVE, RMStateStoreState.FENCED), RMStateStoreEventType.STORE_APP, new StoreAppTransition()), 看到RMStateStoreEventType.STORE_APP事件只是将状态RMStateStoreState.ACTIVE转变为 EnumSet.of(RMStateStoreState.ACTIVE, RMStateStoreState.FENCED)。 {不确定,,,主要作用就是完成RMAppImpl类当前信息的日志记录。日记记录是为了RM的重启。 详情见ResourceManager重启过程。 }
RMStateStoreEventType.STORE_APP事件绑定的类是StoreAppTransition, 我们追踪一下addTransition()函数, 如下:
//StateMachineFactory.java
public StateMachineFactory
<OPERAND, STATE, EVENTTYPE, EVENT>
addTransition(STATE preState, Set<STATE> postStates,
EVENTTYPE eventType,
MultipleArcTransition<OPERAND, EVENT, STATE> hook){
return new StateMachineFactory<OPERAND, STATE, EVENTTYPE, EVENT>
(this,
new ApplicableSingleOrMultipleTransition<OPERAND, STATE, EVENTTYPE, EVENT>
(preState, eventType, new MultipleInternalArc(postStates, hook)));
}
最后会在StateMachineFactory中的内部类MultipleInternalArc里调用hook.transition(operand, event),它是接口类MultipleArcTransition的函数。 在看一下绑定类StoreAppTransition,它是RMStateStore类的内部类, 该类实现了MultipleArcTransition类,所以最后调用的是RMStateStore类的内部类StoreAppTransition的函数transition()。
在该函数内部进一步调用store.notifyApplication(new RMAppEvent(appId, RMAppEventType.APP_NEW_SAVED)), 而notifyApplication函数内部进一步调用 rmDispatcher.getEventHandler().handle(event), 向*调度器发送了一个RMAppEvent(appId, RMAppEventType.APP_NEW_SAVED)事件, 事件类型是 RMAppEventType.APP_NEW_SAVED, 由于在ResourceManager中,将RMAppEventType类型的事件绑定到了ResourceManager类的内部类ApplicationEventDispatcher中, ApplicationEventDispatcher类的handle()函数中调用rmApp.handle(event), 最终调用的是RMAppImpl类的handle()函数, 所以和前面的RMAppEventType.START事件一样,会被RMAppImpl类处理。
RMAppImpl类内部 .addTransition(RMAppState.NEW_SAVING, RMAppState.SUBMITTED,
RMAppEventType.APP_NEW_SAVED, new AddApplicationToSchedulerTransition()),
RMAppImpl收到RMAppEventType.APP_NEW_SAVED事件后,
将自身的运行状态由NEW_SAVING转换为SUBMITTED ,调用回调类是RMAppImpl类的内部类
AddApplicationToSchedulerTransition, 该内部类的transition()函数内部会调用 app.handler.handle(
new AppAddedSchedulerEvent(app.applicationId,
app.submissionContext.getQueue(), app.user, app.submissionContext.getReservationID())),
如下所示:
//RMAppImpl.java
private static final class AddApplicationToSchedulerTransition extends
RMAppTransition {
@Override
public void transition(RMAppImpl app, RMAppEvent event) {
app.handler.handle(new AppAddedSchedulerEvent(app.applicationId,
app.submissionContext.getQueue(), app.user,
app.submissionContext.getReservationID()));
}
}
在AppAddedSchedulerEvent类中,如下所示:
//AppAddedSchedulerEvent.java
public AppAddedSchedulerEvent(ApplicationId applicationId, String queue,
String user, ReservationId reservationID) {
this(applicationId, queue, user, false, reservationID);
} public AppAddedSchedulerEvent(ApplicationId applicationId, String queue,
String user, boolean isAppRecovering, ReservationId reservationID) {
super(SchedulerEventType.APP_ADDED);
this.applicationId = applicationId;
this.queue = queue;
this.user = user;
this.reservationID = reservationID;
this.isAppRecovering = isAppRecovering;
}
scheduler收到SchedulerEventType.APP_ADDED事件之后,首先进行权限检查,然后将应用程序信息保存到内部的数据结构中,并向RMAppImpl发送APP_ACCEPTED事件。 具体过程如下:
public enum SchedulerEventType { // Source: Node
NODE_ADDED,
NODE_REMOVED,
NODE_UPDATE,
NODE_RESOURCE_UPDATE,
NODE_LABELS_UPDATE, // Source: RMApp
APP_ADDED,
APP_REMOVED, // Source: RMAppAttempt
APP_ATTEMPT_ADDED,
APP_ATTEMPT_REMOVED, // Source: ContainerAllocationExpirer
CONTAINER_EXPIRED, // Source: RMContainer
CONTAINER_RESCHEDULED, // Source: SchedulingEditPolicy
DROP_RESERVATION,
PREEMPT_CONTAINER,
KILL_CONTAINER
}
SchedulerEventType.java
SchedulerEventType.java 在 hadoop-2.7.3-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/SchedulerEventType.java
新建的事件类AppAddedSchedulerEvent, 其中有super(SchedulerEventType.APP_ADDED), 事件类型SchedulerEventType.APP_ADDED, 由于在ResourceManager中,将SchedulerEventType类型的事件绑定到了EventHandler<SchedulerEvent> 的对象schedulerDispatcher上, 绑定过程是在ResourceManager类的内部类RMActiveServices的serviceInit()函数中, 如下所示:
//ResourceManager类的内部类RMActiveServices的serviceInit()函数的绑定SchedulerEventType类型的事件
schedulerDispatcher = createSchedulerEventDispatcher();
addIfService(schedulerDispatcher);
rmDispatcher.register(SchedulerEventType.class, schedulerDispatcher);
我们进入createSchedulerEventDispatcher()函数, 如下所示:
//ResourceManager.java
protected EventHandler<SchedulerEvent> createSchedulerEventDispatcher() {
return new SchedulerEventDispatcher(this.scheduler);
}
这里发现传入的调度器scheduler是this.scheduler, this.scheduler 在绑定ScheduleEventType类型的事件前面进行的初始化,如下所示:
//ResourceManager类的内部类RMActiveServices的serviceInit()函数
// Initialize the scheduler
scheduler = createScheduler();
scheduler.setRMContext(rmContext);
addIfService(scheduler);
rmContext.setScheduler(scheduler); schedulerDispatcher = createSchedulerEventDispatcher();
addIfService(schedulerDispatcher);
rmDispatcher.register(SchedulerEventType.class, schedulerDispatcher); // Register event handler for RmAppEvents
//注册RMAppEvent事件的事件处理器
//RMAppManager往异步处理器增加个RMAppEvent事件,类型枚值RMAppEventType.START,所以由ApplicationEventDispatcher(rmContext)来处理
rmDispatcher.register(RMAppEventType.class,
new ApplicationEventDispatcher(rmContext)); // Register event handler for RmAppAttemptEvents
rmDispatcher.register(RMAppAttemptEventType.class,
new ApplicationAttemptEventDispatcher(rmContext));
再进入createScheduler函数, 如下所示:
//ResourceManager.java
protected ResourceScheduler createScheduler() {
String schedulerClassName = conf.get(YarnConfiguration.RM_SCHEDULER,
YarnConfiguration.DEFAULT_RM_SCHEDULER);
LOG.info("Using Scheduler: " + schedulerClassName);
try {
Class<?> schedulerClazz = Class.forName(schedulerClassName);
if (ResourceScheduler.class.isAssignableFrom(schedulerClazz)) {
return (ResourceScheduler) ReflectionUtils.newInstance(schedulerClazz,
this.conf);
} else {
throw new YarnRuntimeException("Class: " + schedulerClassName
+ " not instance of " + ResourceScheduler.class.getCanonicalName());
}
} catch (ClassNotFoundException e) {
throw new YarnRuntimeException("Could not instantiate Scheduler: "
+ schedulerClassName, e);
}
}
发现用的默认调度器是YarnConfiguration.DEFAULT_RM_SCHEDULER, 而它是在YarnConfiguration.java中,取值为org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler。 如下所示:
//YarnConfiguration.java
/** The class to use as the resource scheduler.*/
public static final String RM_SCHEDULER =
RM_PREFIX + "scheduler.class"; public static final String DEFAULT_RM_SCHEDULER =
"org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
进入CapacityScheduler类中,如下所示:
@LimitedPrivate("yarn")
@Evolving
@SuppressWarnings("unchecked")
public class CapacityScheduler extends
AbstractYarnScheduler<FiCaSchedulerApp, FiCaSchedulerNode> implements
PreemptableResourceScheduler, CapacitySchedulerContext, Configurable { private static final Log LOG = LogFactory.getLog(CapacityScheduler.class);
private YarnAuthorizationProvider authorizer; private CSQueue root;
// timeout to join when we stop this service
protected final long THREAD_JOIN_TIMEOUT_MS = 1000; static final Comparator<CSQueue> queueComparator = new Comparator<CSQueue>() {
@Override
public int compare(CSQueue q1, CSQueue q2) {
if (q1.getUsedCapacity() < q2.getUsedCapacity()) {
return -1;
} else if (q1.getUsedCapacity() > q2.getUsedCapacity()) {
return 1;
} return q1.getQueuePath().compareTo(q2.getQueuePath());
}
}; static final Comparator<FiCaSchedulerApp> applicationComparator =
new Comparator<FiCaSchedulerApp>() {
@Override
public int compare(FiCaSchedulerApp a1, FiCaSchedulerApp a2) {
return a1.getApplicationId().compareTo(a2.getApplicationId());
}
}; @Override
public void setConf(Configuration conf) {
yarnConf = conf;
} private void validateConf(Configuration conf) {
// validate scheduler memory allocation setting
int minMem = conf.getInt(
YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB,
YarnConfiguration.DEFAULT_RM_SCHEDULER_MINIMUM_ALLOCATION_MB);
int maxMem = conf.getInt(
YarnConfiguration.RM_SCHEDULER_MAXIMUM_ALLOCATION_MB,
YarnConfiguration.DEFAULT_RM_SCHEDULER_MAXIMUM_ALLOCATION_MB); if (minMem <= 0 || minMem > maxMem) {
throw new YarnRuntimeException("Invalid resource scheduler memory"
+ " allocation configuration"
+ ", " + YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB
+ "=" + minMem
+ ", " + YarnConfiguration.RM_SCHEDULER_MAXIMUM_ALLOCATION_MB
+ "=" + maxMem + ", min and max should be greater than 0"
+ ", max should be no smaller than min.");
} // validate scheduler vcores allocation setting
int minVcores = conf.getInt(
YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_VCORES,
YarnConfiguration.DEFAULT_RM_SCHEDULER_MINIMUM_ALLOCATION_VCORES);
int maxVcores = conf.getInt(
YarnConfiguration.RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES,
YarnConfiguration.DEFAULT_RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES); if (minVcores <= 0 || minVcores > maxVcores) {
throw new YarnRuntimeException("Invalid resource scheduler vcores"
+ " allocation configuration"
+ ", " + YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_VCORES
+ "=" + minVcores
+ ", " + YarnConfiguration.RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES
+ "=" + maxVcores + ", min and max should be greater than 0"
+ ", max should be no smaller than min.");
}
} @Override
public Configuration getConf() {
return yarnConf;
} private CapacitySchedulerConfiguration conf;
private Configuration yarnConf; private Map<String, CSQueue> queues = new ConcurrentHashMap<String, CSQueue>(); private AtomicInteger numNodeManagers = new AtomicInteger(0); private ResourceCalculator calculator;
private boolean usePortForNodeName; private boolean scheduleAsynchronously;
private AsyncScheduleThread asyncSchedulerThread;
private RMNodeLabelsManager labelManager; /**
* EXPERT
*/
private long asyncScheduleInterval;
private static final String ASYNC_SCHEDULER_INTERVAL =
CapacitySchedulerConfiguration.SCHEDULE_ASYNCHRONOUSLY_PREFIX
+ ".scheduling-interval-ms";
private static final long DEFAULT_ASYNC_SCHEDULER_INTERVAL = 5; private boolean overrideWithQueueMappings = false;
private List<QueueMapping> mappings = null;
private Groups groups; @VisibleForTesting
public synchronized String getMappedQueueForTest(String user)
throws IOException {
return getMappedQueue(user);
} public CapacityScheduler() {
super(CapacityScheduler.class.getName());
} @Override
public QueueMetrics getRootQueueMetrics() {
return root.getMetrics();
} public CSQueue getRootQueue() {
return root;
} @Override
public CapacitySchedulerConfiguration getConfiguration() {
return conf;
} @Override
public synchronized RMContainerTokenSecretManager
getContainerTokenSecretManager() {
return this.rmContext.getContainerTokenSecretManager();
} @Override
public Comparator<FiCaSchedulerApp> getApplicationComparator() {
return applicationComparator;
} @Override
public ResourceCalculator getResourceCalculator() {
return calculator;
} @Override
public Comparator<CSQueue> getQueueComparator() {
return queueComparator;
} @Override
public int getNumClusterNodes() {
return numNodeManagers.get();
} @Override
public synchronized RMContext getRMContext() {
return this.rmContext;
} @Override
public synchronized void setRMContext(RMContext rmContext) {
this.rmContext = rmContext;
} private synchronized void initScheduler(Configuration configuration) throws
IOException {
this.conf = loadCapacitySchedulerConfiguration(configuration);
validateConf(this.conf);
this.minimumAllocation = this.conf.getMinimumAllocation();
initMaximumResourceCapability(this.conf.getMaximumAllocation());
this.calculator = this.conf.getResourceCalculator();
this.usePortForNodeName = this.conf.getUsePortForNodeName();
this.applications =
new ConcurrentHashMap<ApplicationId,
SchedulerApplication<FiCaSchedulerApp>>();
this.labelManager = rmContext.getNodeLabelManager();
authorizer = YarnAuthorizationProvider.getInstance(yarnConf);
initializeQueues(this.conf); scheduleAsynchronously = this.conf.getScheduleAynschronously();
asyncScheduleInterval =
this.conf.getLong(ASYNC_SCHEDULER_INTERVAL,
DEFAULT_ASYNC_SCHEDULER_INTERVAL);
if (scheduleAsynchronously) {
asyncSchedulerThread = new AsyncScheduleThread(this);
} LOG.info("Initialized CapacityScheduler with " +
"calculator=" + getResourceCalculator().getClass() + ", " +
"minimumAllocation=<" + getMinimumResourceCapability() + ">, " +
"maximumAllocation=<" + getMaximumResourceCapability() + ">, " +
"asynchronousScheduling=" + scheduleAsynchronously + ", " +
"asyncScheduleInterval=" + asyncScheduleInterval + "ms");
} private synchronized void startSchedulerThreads() {
if (scheduleAsynchronously) {
Preconditions.checkNotNull(asyncSchedulerThread,
"asyncSchedulerThread is null");
asyncSchedulerThread.start();
}
} @Override
public void serviceInit(Configuration conf) throws Exception {
Configuration configuration = new Configuration(conf);
super.serviceInit(conf);
initScheduler(configuration);
} @Override
public void serviceStart() throws Exception {
startSchedulerThreads();
super.serviceStart();
} @Override
public void serviceStop() throws Exception {
synchronized (this) {
if (scheduleAsynchronously && asyncSchedulerThread != null) {
asyncSchedulerThread.interrupt();
asyncSchedulerThread.join(THREAD_JOIN_TIMEOUT_MS);
}
}
super.serviceStop();
} @Override
public synchronized void
reinitialize(Configuration conf, RMContext rmContext) throws IOException {
Configuration configuration = new Configuration(conf);
CapacitySchedulerConfiguration oldConf = this.conf;
this.conf = loadCapacitySchedulerConfiguration(configuration);
validateConf(this.conf);
try {
LOG.info("Re-initializing queues...");
refreshMaximumAllocation(this.conf.getMaximumAllocation());
reinitializeQueues(this.conf);
} catch (Throwable t) {
this.conf = oldConf;
refreshMaximumAllocation(this.conf.getMaximumAllocation());
throw new IOException("Failed to re-init queues", t);
}
} long getAsyncScheduleInterval() {
return asyncScheduleInterval;
} private final static Random random = new Random(System.currentTimeMillis()); /**
* Schedule on all nodes by starting at a random point.
* @param cs
*/
static void schedule(CapacityScheduler cs) {
// First randomize the start point
int current = 0;
Collection<FiCaSchedulerNode> nodes = cs.getAllNodes().values();
int start = random.nextInt(nodes.size());
for (FiCaSchedulerNode node : nodes) {
if (current++ >= start) {
cs.allocateContainersToNode(node);
}
}
// Now, just get everyone to be safe
for (FiCaSchedulerNode node : nodes) {
cs.allocateContainersToNode(node);
}
try {
Thread.sleep(cs.getAsyncScheduleInterval());
} catch (InterruptedException e) {}
} static class AsyncScheduleThread extends Thread { private final CapacityScheduler cs;
private AtomicBoolean runSchedules = new AtomicBoolean(false); public AsyncScheduleThread(CapacityScheduler cs) {
this.cs = cs;
setDaemon(true);
} @Override
public void run() {
while (true) {
if (!runSchedules.get()) {
try {
Thread.sleep(100);
} catch (InterruptedException ie) {}
} else {
schedule(cs);
}
}
} public void beginSchedule() {
runSchedules.set(true);
} public void suspendSchedule() {
runSchedules.set(false);
} } @Private
public static final String ROOT_QUEUE =
CapacitySchedulerConfiguration.PREFIX + CapacitySchedulerConfiguration.ROOT; static class QueueHook {
public CSQueue hook(CSQueue queue) {
return queue;
}
}
private static final QueueHook noop = new QueueHook(); private void initializeQueueMappings() throws IOException {
overrideWithQueueMappings = conf.getOverrideWithQueueMappings();
LOG.info("Initialized queue mappings, override: "
+ overrideWithQueueMappings);
// Get new user/group mappings
List<QueueMapping> newMappings = conf.getQueueMappings();
//check if mappings refer to valid queues
for (QueueMapping mapping : newMappings) {
if (!mapping.queue.equals(CURRENT_USER_MAPPING) &&
!mapping.queue.equals(PRIMARY_GROUP_MAPPING)) {
CSQueue queue = queues.get(mapping.queue);
if (queue == null || !(queue instanceof LeafQueue)) {
throw new IOException(
"mapping contains invalid or non-leaf queue " + mapping.queue);
}
}
}
//apply the new mappings since they are valid
mappings = newMappings;
// initialize groups if mappings are present
if (mappings.size() > 0) {
groups = new Groups(conf);
}
} @Lock(CapacityScheduler.class)
private void initializeQueues(CapacitySchedulerConfiguration conf)
throws IOException { root =
parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT,
queues, queues, noop);
labelManager.reinitializeQueueLabels(getQueueToLabels());
LOG.info("Initialized root queue " + root);
initializeQueueMappings();
setQueueAcls(authorizer, queues);
} @Lock(CapacityScheduler.class)
private void reinitializeQueues(CapacitySchedulerConfiguration conf)
throws IOException {
// Parse new queues
Map<String, CSQueue> newQueues = new HashMap<String, CSQueue>();
CSQueue newRoot =
parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT,
newQueues, queues, noop); // Ensure all existing queues are still present
validateExistingQueues(queues, newQueues); // Add new queues
addNewQueues(queues, newQueues); // Re-configure queues
root.reinitialize(newRoot, clusterResource);
initializeQueueMappings(); // Re-calculate headroom for active applications
root.updateClusterResource(clusterResource, new ResourceLimits(
clusterResource)); labelManager.reinitializeQueueLabels(getQueueToLabels());
setQueueAcls(authorizer, queues);
} @VisibleForTesting
public static void setQueueAcls(YarnAuthorizationProvider authorizer,
Map<String, CSQueue> queues) throws IOException {
for (CSQueue queue : queues.values()) {
AbstractCSQueue csQueue = (AbstractCSQueue) queue;
authorizer.setPermission(csQueue.getPrivilegedEntity(),
csQueue.getACLs(), UserGroupInformation.getCurrentUser());
}
} private Map<String, Set<String>> getQueueToLabels() {
Map<String, Set<String>> queueToLabels = new HashMap<String, Set<String>>();
for (CSQueue queue : queues.values()) {
queueToLabels.put(queue.getQueueName(), queue.getAccessibleNodeLabels());
}
return queueToLabels;
} /**
* Ensure all existing queues are present. Queues cannot be deleted
* @param queues existing queues
* @param newQueues new queues
*/
@Lock(CapacityScheduler.class)
private void validateExistingQueues(
Map<String, CSQueue> queues, Map<String, CSQueue> newQueues)
throws IOException {
// check that all static queues are included in the newQueues list
for (Map.Entry<String, CSQueue> e : queues.entrySet()) {
if (!(e.getValue() instanceof ReservationQueue)) {
String queueName = e.getKey();
CSQueue oldQueue = e.getValue();
CSQueue newQueue = newQueues.get(queueName);
if (null == newQueue) {
throw new IOException(queueName + " cannot be found during refresh!");
} else if (!oldQueue.getQueuePath().equals(newQueue.getQueuePath())) {
throw new IOException(queueName + " is moved from:"
+ oldQueue.getQueuePath() + " to:" + newQueue.getQueuePath()
+ " after refresh, which is not allowed.");
}
}
}
} /**
* Add the new queues (only) to our list of queues...
* ... be careful, do not overwrite existing queues.
* @param queues
* @param newQueues
*/
@Lock(CapacityScheduler.class)
private void addNewQueues(
Map<String, CSQueue> queues, Map<String, CSQueue> newQueues)
{
for (Map.Entry<String, CSQueue> e : newQueues.entrySet()) {
String queueName = e.getKey();
CSQueue queue = e.getValue();
if (!queues.containsKey(queueName)) {
queues.put(queueName, queue);
}
}
} @Lock(CapacityScheduler.class)
static CSQueue parseQueue(
CapacitySchedulerContext csContext,
CapacitySchedulerConfiguration conf,
CSQueue parent, String queueName, Map<String, CSQueue> queues,
Map<String, CSQueue> oldQueues,
QueueHook hook) throws IOException {
CSQueue queue;
String fullQueueName =
(parent == null) ? queueName
: (parent.getQueuePath() + "." + queueName);
String[] childQueueNames =
conf.getQueues(fullQueueName);
boolean isReservableQueue = conf.isReservable(fullQueueName);
if (childQueueNames == null || childQueueNames.length == 0) {
if (null == parent) {
throw new IllegalStateException(
"Queue configuration missing child queue names for " + queueName);
}
// Check if the queue will be dynamically managed by the Reservation
// system
if (isReservableQueue) {
queue =
new PlanQueue(csContext, queueName, parent,
oldQueues.get(queueName));
} else {
queue =
new LeafQueue(csContext, queueName, parent,
oldQueues.get(queueName)); // Used only for unit tests
queue = hook.hook(queue);
}
} else {
if (isReservableQueue) {
throw new IllegalStateException(
"Only Leaf Queues can be reservable for " + queueName);
}
ParentQueue parentQueue =
new ParentQueue(csContext, queueName, parent, oldQueues.get(queueName)); // Used only for unit tests
queue = hook.hook(parentQueue); List<CSQueue> childQueues = new ArrayList<CSQueue>();
for (String childQueueName : childQueueNames) {
CSQueue childQueue =
parseQueue(csContext, conf, queue, childQueueName,
queues, oldQueues, hook);
childQueues.add(childQueue);
}
parentQueue.setChildQueues(childQueues);
} if(queue instanceof LeafQueue == true && queues.containsKey(queueName)
&& queues.get(queueName) instanceof LeafQueue == true) {
throw new IOException("Two leaf queues were named " + queueName
+ ". Leaf queue names must be distinct");
}
queues.put(queueName, queue); LOG.info("Initialized queue: " + queue);
return queue;
} public CSQueue getQueue(String queueName) {
if (queueName == null) {
return null;
}
return queues.get(queueName);
} private static final String CURRENT_USER_MAPPING = "%user"; private static final String PRIMARY_GROUP_MAPPING = "%primary_group"; private String getMappedQueue(String user) throws IOException {
for (QueueMapping mapping : mappings) {
if (mapping.type == MappingType.USER) {
if (mapping.source.equals(CURRENT_USER_MAPPING)) {
if (mapping.queue.equals(CURRENT_USER_MAPPING)) {
return user;
}
else if (mapping.queue.equals(PRIMARY_GROUP_MAPPING)) {
return groups.getGroups(user).get(0);
}
else {
return mapping.queue;
}
}
if (user.equals(mapping.source)) {
return mapping.queue;
}
}
if (mapping.type == MappingType.GROUP) {
for (String userGroups : groups.getGroups(user)) {
if (userGroups.equals(mapping.source)) {
return mapping.queue;
}
}
}
}
return null;
} private String getQueueMappings(ApplicationId applicationId, String queueName,
String user) {
if (mappings != null && mappings.size() > 0) {
try {
String mappedQueue = getMappedQueue(user);
if (mappedQueue != null) {
// We have a mapping, should we use it?
if (queueName.equals(YarnConfiguration.DEFAULT_QUEUE_NAME)
|| overrideWithQueueMappings) {
LOG.info("Application " + applicationId + " user " + user
+ " mapping [" + queueName + "] to [" + mappedQueue
+ "] override " + overrideWithQueueMappings);
queueName = mappedQueue;
RMApp rmApp = rmContext.getRMApps().get(applicationId);
rmApp.setQueue(queueName);
}
}
} catch (IOException ioex) {
String message = "Failed to submit application " + applicationId +
" submitted by user " + user + " reason: " + ioex.getMessage();
this.rmContext.getDispatcher().getEventHandler()
.handle(new RMAppEvent(applicationId,
RMAppEventType.APP_REJECTED, message));
return null;
}
}
return queueName;
} private synchronized void addApplicationOnRecovery(
ApplicationId applicationId, String queueName, String user) {
queueName = getQueueMappings(applicationId, queueName, user);
if (queueName == null) {
// Exception encountered while getting queue mappings.
return;
}
// sanity checks.
CSQueue queue = getQueue(queueName);
if (queue == null) {
//During a restart, this indicates a queue was removed, which is
//not presently supported
if (!YarnConfiguration.shouldRMFailFast(getConfig())) {
this.rmContext.getDispatcher().getEventHandler().handle(
new RMAppEvent(applicationId, RMAppEventType.KILL,
"Application killed on recovery as it was submitted to queue " +
queueName + " which no longer exists after restart."));
return;
} else {
String queueErrorMsg = "Queue named " + queueName
+ " missing during application recovery."
+ " Queue removal during recovery is not presently supported by the"
+ " capacity scheduler, please restart with all queues configured"
+ " which were present before shutdown/restart.";
LOG.fatal(queueErrorMsg);
throw new QueueInvalidException(queueErrorMsg);
}
}
if (!(queue instanceof LeafQueue)) {
// During RM restart, this means leaf queue was converted to a parent
// queue, which is not supported for running apps.
if (!YarnConfiguration.shouldRMFailFast(getConfig())) {
this.rmContext.getDispatcher().getEventHandler().handle(
new RMAppEvent(applicationId, RMAppEventType.KILL,
"Application killed on recovery as it was submitted to queue " +
queueName + " which is no longer a leaf queue after restart."));
return;
} else {
String queueErrorMsg = "Queue named " + queueName
+ " is no longer a leaf queue during application recovery."
+ " Changing a leaf queue to a parent queue during recovery is"
+ " not presently supported by the capacity scheduler. Please"
+ " restart with leaf queues before shutdown/restart continuing"
+ " as leaf queues.";
LOG.fatal(queueErrorMsg);
throw new QueueInvalidException(queueErrorMsg);
}
}
// Submit to the queue
try {
queue.submitApplication(applicationId, user, queueName);
} catch (AccessControlException ace) {
// Ignore the exception for recovered app as the app was previously
// accepted.
}
queue.getMetrics().submitApp(user);
SchedulerApplication<FiCaSchedulerApp> application =
new SchedulerApplication<FiCaSchedulerApp>(queue, user);
applications.put(applicationId, application);
LOG.info("Accepted application " + applicationId + " from user: " + user
+ ", in queue: " + queueName);
if (LOG.isDebugEnabled()) {
LOG.debug(applicationId + " is recovering. Skip notifying APP_ACCEPTED");
}
} private synchronized void addApplication(ApplicationId applicationId,
String queueName, String user) {
queueName = getQueueMappings(applicationId, queueName, user);
if (queueName == null) {
// Exception encountered while getting queue mappings.
return;
}
// sanity checks.
CSQueue queue = getQueue(queueName);
if (queue == null) {
String message = "Application " + applicationId +
" submitted by user " + user + " to unknown queue: " + queueName;
this.rmContext.getDispatcher().getEventHandler()
.handle(new RMAppEvent(applicationId,
RMAppEventType.APP_REJECTED, message));
return;
}
if (!(queue instanceof LeafQueue)) {
String message = "Application " + applicationId +
" submitted by user " + user + " to non-leaf queue: " + queueName;
this.rmContext.getDispatcher().getEventHandler()
.handle(new RMAppEvent(applicationId,
RMAppEventType.APP_REJECTED, message));
return;
}
// Submit to the queue
try {
queue.submitApplication(applicationId, user, queueName);
} catch (AccessControlException ace) {
LOG.info("Failed to submit application " + applicationId + " to queue "
+ queueName + " from user " + user, ace);
this.rmContext.getDispatcher().getEventHandler()
.handle(new RMAppEvent(applicationId,
RMAppEventType.APP_REJECTED, ace.toString()));
return;
}
// update the metrics
queue.getMetrics().submitApp(user);
SchedulerApplication<FiCaSchedulerApp> application =
new SchedulerApplication<FiCaSchedulerApp>(queue, user);
applications.put(applicationId, application);
LOG.info("Accepted application " + applicationId + " from user: " + user
+ ", in queue: " + queueName);
rmContext.getDispatcher().getEventHandler()
.handle(new RMAppEvent(applicationId, RMAppEventType.APP_ACCEPTED));
} private synchronized void addApplicationAttempt(
ApplicationAttemptId applicationAttemptId,
boolean transferStateFromPreviousAttempt,
boolean isAttemptRecovering) {
SchedulerApplication<FiCaSchedulerApp> application =
applications.get(applicationAttemptId.getApplicationId());
if (application == null) {
LOG.warn("Application " + applicationAttemptId.getApplicationId() +
" cannot be found in scheduler.");
return;
}
CSQueue queue = (CSQueue) application.getQueue(); FiCaSchedulerApp attempt =
new FiCaSchedulerApp(applicationAttemptId, application.getUser(),
queue, queue.getActiveUsersManager(), rmContext);
if (transferStateFromPreviousAttempt) {
attempt.transferStateFromPreviousAttempt(application
.getCurrentAppAttempt());
}
application.setCurrentAppAttempt(attempt); queue.submitApplicationAttempt(attempt, application.getUser());
LOG.info("Added Application Attempt " + applicationAttemptId
+ " to scheduler from user " + application.getUser() + " in queue "
+ queue.getQueueName());
if (isAttemptRecovering) {
if (LOG.isDebugEnabled()) {
LOG.debug(applicationAttemptId
+ " is recovering. Skipping notifying ATTEMPT_ADDED");
}
} else {
rmContext.getDispatcher().getEventHandler().handle(
new RMAppAttemptEvent(applicationAttemptId,
RMAppAttemptEventType.ATTEMPT_ADDED));
}
} private synchronized void doneApplication(ApplicationId applicationId,
RMAppState finalState) {
SchedulerApplication<FiCaSchedulerApp> application =
applications.get(applicationId);
if (application == null){
// The AppRemovedSchedulerEvent maybe sent on recovery for completed apps,
// ignore it.
LOG.warn("Couldn't find application " + applicationId);
return;
}
CSQueue queue = (CSQueue) application.getQueue();
if (!(queue instanceof LeafQueue)) {
LOG.error("Cannot finish application " + "from non-leaf queue: "
+ queue.getQueueName());
} else {
queue.finishApplication(applicationId, application.getUser());
}
application.stop(finalState);
applications.remove(applicationId);
} private synchronized void doneApplicationAttempt(
ApplicationAttemptId applicationAttemptId,
RMAppAttemptState rmAppAttemptFinalState, boolean keepContainers) {
LOG.info("Application Attempt " + applicationAttemptId + " is done." +
" finalState=" + rmAppAttemptFinalState); FiCaSchedulerApp attempt = getApplicationAttempt(applicationAttemptId);
SchedulerApplication<FiCaSchedulerApp> application =
applications.get(applicationAttemptId.getApplicationId()); if (application == null || attempt == null) {
LOG.info("Unknown application " + applicationAttemptId + " has completed!");
return;
} // Release all the allocated, acquired, running containers
for (RMContainer rmContainer : attempt.getLiveContainers()) {
if (keepContainers
&& rmContainer.getState().equals(RMContainerState.RUNNING)) {
// do not kill the running container in the case of work-preserving AM
// restart.
LOG.info("Skip killing " + rmContainer.getContainerId());
continue;
}
completedContainer(
rmContainer,
SchedulerUtils.createAbnormalContainerStatus(
rmContainer.getContainerId(), SchedulerUtils.COMPLETED_APPLICATION),
RMContainerEventType.KILL);
} // Release all reserved containers
for (RMContainer rmContainer : attempt.getReservedContainers()) {
completedContainer(
rmContainer,
SchedulerUtils.createAbnormalContainerStatus(
rmContainer.getContainerId(), "Application Complete"),
RMContainerEventType.KILL);
} // Clean up pending requests, metrics etc.
attempt.stop(rmAppAttemptFinalState); // Inform the queue
String queueName = attempt.getQueue().getQueueName();
CSQueue queue = queues.get(queueName);
if (!(queue instanceof LeafQueue)) {
LOG.error("Cannot finish application " + "from non-leaf queue: "
+ queueName);
} else {
queue.finishApplicationAttempt(attempt, queue.getQueueName());
}
} @Override
@Lock(Lock.NoLock.class)
public Allocation allocate(ApplicationAttemptId applicationAttemptId,
List<ResourceRequest> ask, List<ContainerId> release,
List<String> blacklistAdditions, List<String> blacklistRemovals) { FiCaSchedulerApp application = getApplicationAttempt(applicationAttemptId);
if (application == null) {
LOG.info("Calling allocate on removed " +
"or non existant application " + applicationAttemptId);
return EMPTY_ALLOCATION;
} // Sanity check
SchedulerUtils.normalizeRequests(
ask, getResourceCalculator(), getClusterResource(),
getMinimumResourceCapability(), getMaximumResourceCapability()); // Release containers
releaseContainers(release, application); synchronized (application) { // make sure we aren't stopping/removing the application
// when the allocate comes in
if (application.isStopped()) {
LOG.info("Calling allocate on a stopped " +
"application " + applicationAttemptId);
return EMPTY_ALLOCATION;
} if (!ask.isEmpty()) { if(LOG.isDebugEnabled()) {
LOG.debug("allocate: pre-update" +
" applicationAttemptId=" + applicationAttemptId +
" application=" + application);
}
application.showRequests(); // Update application requests
application.updateResourceRequests(ask); LOG.debug("allocate: post-update");
application.showRequests();
} if(LOG.isDebugEnabled()) {
LOG.debug("allocate:" +
" applicationAttemptId=" + applicationAttemptId +
" #ask=" + ask.size());
} application.updateBlacklist(blacklistAdditions, blacklistRemovals); return application.getAllocation(getResourceCalculator(),
clusterResource, getMinimumResourceCapability());
}
} @Override
@Lock(Lock.NoLock.class)
public QueueInfo getQueueInfo(String queueName,
boolean includeChildQueues, boolean recursive)
throws IOException {
CSQueue queue = null;
queue = this.queues.get(queueName);
if (queue == null) {
throw new IOException("Unknown queue: " + queueName);
}
return queue.getQueueInfo(includeChildQueues, recursive);
} @Override
@Lock(Lock.NoLock.class)
public List<QueueUserACLInfo> getQueueUserAclInfo() {
UserGroupInformation user = null;
try {
user = UserGroupInformation.getCurrentUser();
} catch (IOException ioe) {
// should never happen
return new ArrayList<QueueUserACLInfo>();
} return root.getQueueUserAclInfo(user);
} private synchronized void nodeUpdate(RMNode nm) {
if (LOG.isDebugEnabled()) {
LOG.debug("nodeUpdate: " + nm + " clusterResources: " + clusterResource);
} FiCaSchedulerNode node = getNode(nm.getNodeID()); List<UpdatedContainerInfo> containerInfoList = nm.pullContainerUpdates();
List<ContainerStatus> newlyLaunchedContainers = new ArrayList<ContainerStatus>();
List<ContainerStatus> completedContainers = new ArrayList<ContainerStatus>();
for(UpdatedContainerInfo containerInfo : containerInfoList) {
newlyLaunchedContainers.addAll(containerInfo.getNewlyLaunchedContainers());
completedContainers.addAll(containerInfo.getCompletedContainers());
} // Processing the newly launched containers
for (ContainerStatus launchedContainer : newlyLaunchedContainers) {
containerLaunchedOnNode(launchedContainer.getContainerId(), node);
} // Process completed containers
for (ContainerStatus completedContainer : completedContainers) {
ContainerId containerId = completedContainer.getContainerId();
LOG.debug("Container FINISHED: " + containerId);
completedContainer(getRMContainer(containerId),
completedContainer, RMContainerEventType.FINISHED);
} // Now node data structures are upto date and ready for scheduling.
if(LOG.isDebugEnabled()) {
LOG.debug("Node being looked for scheduling " + nm
+ " availableResource: " + node.getAvailableResource());
}
} /**
* Process resource update on a node.
*/
private synchronized void updateNodeAndQueueResource(RMNode nm,
ResourceOption resourceOption) {
updateNodeResource(nm, resourceOption);
root.updateClusterResource(clusterResource, new ResourceLimits(
clusterResource));
} /**
* Process node labels update on a node.
*
* TODO: Currently capacity scheduler will kill containers on a node when
* labels on the node changed. It is a simply solution to ensure guaranteed
* capacity on labels of queues. When YARN-2498 completed, we can let
* preemption policy to decide if such containers need to be killed or just
* keep them running.
*/
private synchronized void updateLabelsOnNode(NodeId nodeId,
Set<String> newLabels) {
FiCaSchedulerNode node = nodes.get(nodeId);
if (null == node) {
return;
} // labels is same, we don't need do update
if (node.getLabels().size() == newLabels.size()
&& node.getLabels().containsAll(newLabels)) {
return;
} // Kill running containers since label is changed
for (RMContainer rmContainer : node.getRunningContainers()) {
ContainerId containerId = rmContainer.getContainerId();
completedContainer(rmContainer,
ContainerStatus.newInstance(containerId,
ContainerState.COMPLETE,
String.format(
"Container=%s killed since labels on the node=%s changed",
containerId.toString(), nodeId.toString()),
ContainerExitStatus.KILLED_BY_RESOURCEMANAGER),
RMContainerEventType.KILL);
} // Unreserve container on this node
RMContainer reservedContainer = node.getReservedContainer();
if (null != reservedContainer) {
dropContainerReservation(reservedContainer);
} // Update node labels after we've done this
node.updateLabels(newLabels);
} private synchronized void allocateContainersToNode(FiCaSchedulerNode node) {
if (rmContext.isWorkPreservingRecoveryEnabled()
&& !rmContext.isSchedulerReadyForAllocatingContainers()) {
return;
} // Assign new containers...
// 1. Check for reserved applications
// 2. Schedule if there are no reservations RMContainer reservedContainer = node.getReservedContainer();
if (reservedContainer != null) {
FiCaSchedulerApp reservedApplication =
getCurrentAttemptForContainer(reservedContainer.getContainerId()); // Try to fulfill the reservation
LOG.info("Trying to fulfill reservation for application " +
reservedApplication.getApplicationId() + " on node: " +
node.getNodeID()); LeafQueue queue = ((LeafQueue)reservedApplication.getQueue());
CSAssignment assignment =
queue.assignContainers(
clusterResource,
node,
// TODO, now we only consider limits for parent for non-labeled
// resources, should consider labeled resources as well.
new ResourceLimits(labelManager.getResourceByLabel(
RMNodeLabelsManager.NO_LABEL, clusterResource))); RMContainer excessReservation = assignment.getExcessReservation();
if (excessReservation != null) {
Container container = excessReservation.getContainer();
queue.completedContainer(
clusterResource, assignment.getApplication(), node,
excessReservation,
SchedulerUtils.createAbnormalContainerStatus(
container.getId(),
SchedulerUtils.UNRESERVED_CONTAINER),
RMContainerEventType.RELEASED, null, true);
} } // Try to schedule more if there are no reservations to fulfill
if (node.getReservedContainer() == null) {
if (calculator.computeAvailableContainers(node.getAvailableResource(),
minimumAllocation) > 0) {
if (LOG.isDebugEnabled()) {
LOG.debug("Trying to schedule on node: " + node.getNodeName() +
", available: " + node.getAvailableResource());
}
root.assignContainers(
clusterResource,
node,
// TODO, now we only consider limits for parent for non-labeled
// resources, should consider labeled resources as well.
new ResourceLimits(labelManager.getResourceByLabel(
RMNodeLabelsManager.NO_LABEL, clusterResource)));
}
} else {
LOG.info("Skipping scheduling since node " + node.getNodeID() +
" is reserved by application " +
node.getReservedContainer().getContainerId().getApplicationAttemptId()
);
} } @Override
public void handle(SchedulerEvent event) {
switch(event.getType()) {
case NODE_ADDED:
{
NodeAddedSchedulerEvent nodeAddedEvent = (NodeAddedSchedulerEvent)event;
addNode(nodeAddedEvent.getAddedRMNode());
recoverContainersOnNode(nodeAddedEvent.getContainerReports(),
nodeAddedEvent.getAddedRMNode());
}
break;
case NODE_REMOVED:
{
NodeRemovedSchedulerEvent nodeRemovedEvent = (NodeRemovedSchedulerEvent)event;
removeNode(nodeRemovedEvent.getRemovedRMNode());
}
break;
case NODE_RESOURCE_UPDATE:
{
NodeResourceUpdateSchedulerEvent nodeResourceUpdatedEvent =
(NodeResourceUpdateSchedulerEvent)event;
updateNodeAndQueueResource(nodeResourceUpdatedEvent.getRMNode(),
nodeResourceUpdatedEvent.getResourceOption());
}
break;
case NODE_LABELS_UPDATE:
{
NodeLabelsUpdateSchedulerEvent labelUpdateEvent =
(NodeLabelsUpdateSchedulerEvent) event; for (Entry<NodeId, Set<String>> entry : labelUpdateEvent
.getUpdatedNodeToLabels().entrySet()) {
NodeId id = entry.getKey();
Set<String> labels = entry.getValue();
updateLabelsOnNode(id, labels);
}
}
break;
case NODE_UPDATE:
{
NodeUpdateSchedulerEvent nodeUpdatedEvent = (NodeUpdateSchedulerEvent)event;
RMNode node = nodeUpdatedEvent.getRMNode();
nodeUpdate(node);
if (!scheduleAsynchronously) {
allocateContainersToNode(getNode(node.getNodeID()));
}
}
break;
case APP_ADDED:
{
AppAddedSchedulerEvent appAddedEvent = (AppAddedSchedulerEvent) event;
String queueName =
resolveReservationQueueName(appAddedEvent.getQueue(),
appAddedEvent.getApplicationId(),
appAddedEvent.getReservationID());
if (queueName != null) {
if (!appAddedEvent.getIsAppRecovering()) {
addApplication(appAddedEvent.getApplicationId(), queueName,
appAddedEvent.getUser());
} else {
addApplicationOnRecovery(appAddedEvent.getApplicationId(), queueName,
appAddedEvent.getUser());
}
}
}
break;
case APP_REMOVED:
{
AppRemovedSchedulerEvent appRemovedEvent = (AppRemovedSchedulerEvent)event;
doneApplication(appRemovedEvent.getApplicationID(),
appRemovedEvent.getFinalState());
}
break;
case APP_ATTEMPT_ADDED:
{
AppAttemptAddedSchedulerEvent appAttemptAddedEvent =
(AppAttemptAddedSchedulerEvent) event;
addApplicationAttempt(appAttemptAddedEvent.getApplicationAttemptId(),
appAttemptAddedEvent.getTransferStateFromPreviousAttempt(),
appAttemptAddedEvent.getIsAttemptRecovering());
}
break;
case APP_ATTEMPT_REMOVED:
{
AppAttemptRemovedSchedulerEvent appAttemptRemovedEvent =
(AppAttemptRemovedSchedulerEvent) event;
doneApplicationAttempt(appAttemptRemovedEvent.getApplicationAttemptID(),
appAttemptRemovedEvent.getFinalAttemptState(),
appAttemptRemovedEvent.getKeepContainersAcrossAppAttempts());
}
break;
case CONTAINER_EXPIRED:
{
ContainerExpiredSchedulerEvent containerExpiredEvent =
(ContainerExpiredSchedulerEvent) event;
ContainerId containerId = containerExpiredEvent.getContainerId();
completedContainer(getRMContainer(containerId),
SchedulerUtils.createAbnormalContainerStatus(
containerId,
SchedulerUtils.EXPIRED_CONTAINER),
RMContainerEventType.EXPIRE);
}
break;
case DROP_RESERVATION:
{
ContainerPreemptEvent dropReservationEvent = (ContainerPreemptEvent)event;
RMContainer container = dropReservationEvent.getContainer();
dropContainerReservation(container);
}
break;
case PREEMPT_CONTAINER:
{
ContainerPreemptEvent preemptContainerEvent =
(ContainerPreemptEvent)event;
ApplicationAttemptId aid = preemptContainerEvent.getAppId();
RMContainer containerToBePreempted = preemptContainerEvent.getContainer();
preemptContainer(aid, containerToBePreempted);
}
break;
case KILL_CONTAINER:
{
ContainerPreemptEvent killContainerEvent = (ContainerPreemptEvent)event;
RMContainer containerToBeKilled = killContainerEvent.getContainer();
killContainer(containerToBeKilled);
}
break;
case CONTAINER_RESCHEDULED:
{
ContainerRescheduledEvent containerRescheduledEvent =
(ContainerRescheduledEvent) event;
RMContainer container = containerRescheduledEvent.getContainer();
recoverResourceRequestForContainer(container);
}
break;
default:
LOG.error("Invalid eventtype " + event.getType() + ". Ignoring!");
}
} private synchronized void addNode(RMNode nodeManager) {
FiCaSchedulerNode schedulerNode = new FiCaSchedulerNode(nodeManager,
usePortForNodeName, nodeManager.getNodeLabels());
this.nodes.put(nodeManager.getNodeID(), schedulerNode);
Resources.addTo(clusterResource, schedulerNode.getTotalResource()); // update this node to node label manager
if (labelManager != null) {
labelManager.activateNode(nodeManager.getNodeID(),
schedulerNode.getTotalResource());
} root.updateClusterResource(clusterResource, new ResourceLimits(
clusterResource));
int numNodes = numNodeManagers.incrementAndGet();
updateMaximumAllocation(schedulerNode, true); LOG.info("Added node " + nodeManager.getNodeAddress() +
" clusterResource: " + clusterResource); if (scheduleAsynchronously && numNodes == 1) {
asyncSchedulerThread.beginSchedule();
}
} private synchronized void removeNode(RMNode nodeInfo) {
// update this node to node label manager
if (labelManager != null) {
labelManager.deactivateNode(nodeInfo.getNodeID());
} FiCaSchedulerNode node = nodes.get(nodeInfo.getNodeID());
if (node == null) {
return;
}
Resources.subtractFrom(clusterResource, node.getTotalResource());
root.updateClusterResource(clusterResource, new ResourceLimits(
clusterResource));
int numNodes = numNodeManagers.decrementAndGet(); if (scheduleAsynchronously && numNodes == 0) {
asyncSchedulerThread.suspendSchedule();
} // Remove running containers
List<RMContainer> runningContainers = node.getRunningContainers();
for (RMContainer container : runningContainers) {
completedContainer(container,
SchedulerUtils.createAbnormalContainerStatus(
container.getContainerId(),
SchedulerUtils.LOST_CONTAINER),
RMContainerEventType.KILL);
} // Remove reservations, if any
RMContainer reservedContainer = node.getReservedContainer();
if (reservedContainer != null) {
completedContainer(reservedContainer,
SchedulerUtils.createAbnormalContainerStatus(
reservedContainer.getContainerId(),
SchedulerUtils.LOST_CONTAINER),
RMContainerEventType.KILL);
} this.nodes.remove(nodeInfo.getNodeID());
updateMaximumAllocation(node, false); LOG.info("Removed node " + nodeInfo.getNodeAddress() +
" clusterResource: " + clusterResource);
} @Lock(CapacityScheduler.class)
@Override
protected synchronized void completedContainer(RMContainer rmContainer,
ContainerStatus containerStatus, RMContainerEventType event) {
if (rmContainer == null) {
LOG.info("Null container completed...");
return;
} Container container = rmContainer.getContainer(); // Get the application for the finished container
FiCaSchedulerApp application =
getCurrentAttemptForContainer(container.getId());
ApplicationId appId =
container.getId().getApplicationAttemptId().getApplicationId();
if (application == null) {
LOG.info("Container " + container + " of" + " unknown application "
+ appId + " completed with event " + event);
return;
} // Get the node on which the container was allocated
FiCaSchedulerNode node = getNode(container.getNodeId()); // Inform the queue
LeafQueue queue = (LeafQueue)application.getQueue();
queue.completedContainer(clusterResource, application, node,
rmContainer, containerStatus, event, null, true); LOG.info("Application attempt " + application.getApplicationAttemptId()
+ " released container " + container.getId() + " on node: " + node
+ " with event: " + event);
} @Lock(Lock.NoLock.class)
@VisibleForTesting
@Override
public FiCaSchedulerApp getApplicationAttempt(
ApplicationAttemptId applicationAttemptId) {
return super.getApplicationAttempt(applicationAttemptId);
} @Lock(Lock.NoLock.class)
public FiCaSchedulerNode getNode(NodeId nodeId) {
return nodes.get(nodeId);
} @Lock(Lock.NoLock.class)
Map<NodeId, FiCaSchedulerNode> getAllNodes() {
return nodes;
} @Override
@Lock(Lock.NoLock.class)
public void recover(RMState state) throws Exception {
// NOT IMPLEMENTED
} @Override
public void dropContainerReservation(RMContainer container) {
if(LOG.isDebugEnabled()){
LOG.debug("DROP_RESERVATION:" + container.toString());
}
completedContainer(container,
SchedulerUtils.createAbnormalContainerStatus(
container.getContainerId(),
SchedulerUtils.UNRESERVED_CONTAINER),
RMContainerEventType.KILL);
} @Override
public void preemptContainer(ApplicationAttemptId aid, RMContainer cont) {
if(LOG.isDebugEnabled()){
LOG.debug("PREEMPT_CONTAINER: application:" + aid.toString() +
" container: " + cont.toString());
}
FiCaSchedulerApp app = getApplicationAttempt(aid);
if (app != null) {
app.addPreemptContainer(cont.getContainerId());
}
} @Override
public void killContainer(RMContainer cont) {
if (LOG.isDebugEnabled()) {
LOG.debug("KILL_CONTAINER: container" + cont.toString());
}
completedContainer(cont, SchedulerUtils.createPreemptedContainerStatus(
cont.getContainerId(), SchedulerUtils.PREEMPTED_CONTAINER),
RMContainerEventType.KILL);
} @Override
public synchronized boolean checkAccess(UserGroupInformation callerUGI,
QueueACL acl, String queueName) {
CSQueue queue = getQueue(queueName);
if (queue == null) {
if (LOG.isDebugEnabled()) {
LOG.debug("ACL not found for queue access-type " + acl
+ " for queue " + queueName);
}
return false;
}
return queue.hasAccess(acl, callerUGI);
} @Override
public List<ApplicationAttemptId> getAppsInQueue(String queueName) {
CSQueue queue = queues.get(queueName);
if (queue == null) {
return null;
}
List<ApplicationAttemptId> apps = new ArrayList<ApplicationAttemptId>();
queue.collectSchedulerApplications(apps);
return apps;
} private CapacitySchedulerConfiguration loadCapacitySchedulerConfiguration(
Configuration configuration) throws IOException {
try {
InputStream CSInputStream =
this.rmContext.getConfigurationProvider()
.getConfigurationInputStream(configuration,
YarnConfiguration.CS_CONFIGURATION_FILE);
if (CSInputStream != null) {
configuration.addResource(CSInputStream);
return new CapacitySchedulerConfiguration(configuration, false);
}
return new CapacitySchedulerConfiguration(configuration, true);
} catch (Exception e) {
throw new IOException(e);
}
} private synchronized String resolveReservationQueueName(String queueName,
ApplicationId applicationId, ReservationId reservationID) {
CSQueue queue = getQueue(queueName);
// Check if the queue is a plan queue
if ((queue == null) || !(queue instanceof PlanQueue)) {
return queueName;
}
if (reservationID != null) {
String resQName = reservationID.toString();
queue = getQueue(resQName);
if (queue == null) {
String message =
"Application "
+ applicationId
+ " submitted to a reservation which is not yet currently active: "
+ resQName;
this.rmContext.getDispatcher().getEventHandler()
.handle(new RMAppEvent(applicationId,
RMAppEventType.APP_REJECTED, message));
return null;
}
if (!queue.getParent().getQueueName().equals(queueName)) {
String message =
"Application: " + applicationId + " submitted to a reservation "
+ resQName + " which does not belong to the specified queue: "
+ queueName;
this.rmContext.getDispatcher().getEventHandler()
.handle(new RMAppEvent(applicationId,
RMAppEventType.APP_REJECTED, message));
return null;
}
// use the reservation queue to run the app
queueName = resQName;
} else {
// use the default child queue of the plan for unreserved apps
queueName = queueName + ReservationConstants.DEFAULT_QUEUE_SUFFIX;
}
return queueName;
} @Override
public synchronized void removeQueue(String queueName)
throws SchedulerDynamicEditException {
LOG.info("Removing queue: " + queueName);
CSQueue q = this.getQueue(queueName);
if (!(q instanceof ReservationQueue)) {
throw new SchedulerDynamicEditException("The queue that we are asked "
+ "to remove (" + queueName + ") is not a ReservationQueue");
}
ReservationQueue disposableLeafQueue = (ReservationQueue) q;
// at this point we should have no more apps
if (disposableLeafQueue.getNumApplications() > 0) {
throw new SchedulerDynamicEditException("The queue " + queueName
+ " is not empty " + disposableLeafQueue.getApplications().size()
+ " active apps " + disposableLeafQueue.pendingApplications.size()
+ " pending apps");
} ((PlanQueue) disposableLeafQueue.getParent()).removeChildQueue(q);
this.queues.remove(queueName);
LOG.info("Removal of ReservationQueue " + queueName + " has succeeded");
} @Override
public synchronized void addQueue(Queue queue)
throws SchedulerDynamicEditException { if (!(queue instanceof ReservationQueue)) {
throw new SchedulerDynamicEditException("Queue " + queue.getQueueName()
+ " is not a ReservationQueue");
} ReservationQueue newQueue = (ReservationQueue) queue; if (newQueue.getParent() == null
|| !(newQueue.getParent() instanceof PlanQueue)) {
throw new SchedulerDynamicEditException("ParentQueue for "
+ newQueue.getQueueName()
+ " is not properly set (should be set and be a PlanQueue)");
} PlanQueue parentPlan = (PlanQueue) newQueue.getParent();
String queuename = newQueue.getQueueName();
parentPlan.addChildQueue(newQueue);
this.queues.put(queuename, newQueue);
LOG.info("Creation of ReservationQueue " + newQueue + " succeeded");
} @Override
public synchronized void setEntitlement(String inQueue,
QueueEntitlement entitlement) throws SchedulerDynamicEditException,
YarnException {
LeafQueue queue = getAndCheckLeafQueue(inQueue);
ParentQueue parent = (ParentQueue) queue.getParent(); if (!(queue instanceof ReservationQueue)) {
throw new SchedulerDynamicEditException("Entitlement can not be"
+ " modified dynamically since queue " + inQueue
+ " is not a ReservationQueue");
} if (!(parent instanceof PlanQueue)) {
throw new SchedulerDynamicEditException("The parent of ReservationQueue "
+ inQueue + " must be an PlanQueue");
} ReservationQueue newQueue = (ReservationQueue) queue; float sumChilds = ((PlanQueue) parent).sumOfChildCapacities();
float newChildCap = sumChilds - queue.getCapacity() + entitlement.getCapacity(); if (newChildCap >= 0 && newChildCap < 1.0f + CSQueueUtils.EPSILON) {
// note: epsilon checks here are not ok, as the epsilons might accumulate
// and become a problem in aggregate
if (Math.abs(entitlement.getCapacity() - queue.getCapacity()) == 0
&& Math.abs(entitlement.getMaxCapacity() - queue.getMaximumCapacity()) == 0) {
return;
}
newQueue.setEntitlement(entitlement);
} else {
throw new SchedulerDynamicEditException(
"Sum of child queues would exceed 100% for PlanQueue: "
+ parent.getQueueName());
}
LOG.info("Set entitlement for ReservationQueue " + inQueue + " to "
+ queue.getCapacity() + " request was (" + entitlement.getCapacity() + ")");
} @Override
public synchronized String moveApplication(ApplicationId appId,
String targetQueueName) throws YarnException {
FiCaSchedulerApp app =
getApplicationAttempt(ApplicationAttemptId.newInstance(appId, 0));
String sourceQueueName = app.getQueue().getQueueName();
LeafQueue source = getAndCheckLeafQueue(sourceQueueName);
String destQueueName = handleMoveToPlanQueue(targetQueueName);
LeafQueue dest = getAndCheckLeafQueue(destQueueName);
// Validation check - ACLs, submission limits for user & queue
String user = app.getUser();
try {
dest.submitApplication(appId, user, destQueueName);
} catch (AccessControlException e) {
throw new YarnException(e);
}
// Move all live containers
for (RMContainer rmContainer : app.getLiveContainers()) {
source.detachContainer(clusterResource, app, rmContainer);
// attach the Container to another queue
dest.attachContainer(clusterResource, app, rmContainer);
}
// Detach the application..
source.finishApplicationAttempt(app, sourceQueueName);
source.getParent().finishApplication(appId, app.getUser());
// Finish app & update metrics
app.move(dest);
// Submit to a new queue
dest.submitApplicationAttempt(app, user);
applications.get(appId).setQueue(dest);
LOG.info("App: " + app.getApplicationId() + " successfully moved from "
+ sourceQueueName + " to: " + destQueueName);
return targetQueueName;
} /**
* Check that the String provided in input is the name of an existing,
* LeafQueue, if successful returns the queue.
*
* @param queue
* @return the LeafQueue
* @throws YarnException
*/
private LeafQueue getAndCheckLeafQueue(String queue) throws YarnException {
CSQueue ret = this.getQueue(queue);
if (ret == null) {
throw new YarnException("The specified Queue: " + queue
+ " doesn't exist");
}
if (!(ret instanceof LeafQueue)) {
throw new YarnException("The specified Queue: " + queue
+ " is not a Leaf Queue. Move is supported only for Leaf Queues.");
}
return (LeafQueue) ret;
} /** {@inheritDoc} */
@Override
public EnumSet<SchedulerResourceTypes> getSchedulingResourceTypes() {
if (calculator.getClass().getName()
.equals(DefaultResourceCalculator.class.getName())) {
return EnumSet.of(SchedulerResourceTypes.MEMORY);
}
return EnumSet
.of(SchedulerResourceTypes.MEMORY, SchedulerResourceTypes.CPU);
} @Override
public Resource getMaximumResourceCapability(String queueName) {
CSQueue queue = getQueue(queueName);
if (queue == null) {
LOG.error("Unknown queue: " + queueName);
return getMaximumResourceCapability();
}
if (!(queue instanceof LeafQueue)) {
LOG.error("queue " + queueName + " is not an leaf queue");
return getMaximumResourceCapability();
}
return ((LeafQueue)queue).getMaximumAllocation();
} private String handleMoveToPlanQueue(String targetQueueName) {
CSQueue dest = getQueue(targetQueueName);
if (dest != null && dest instanceof PlanQueue) {
// use the default child reservation queue of the plan
targetQueueName = targetQueueName + ReservationConstants.DEFAULT_QUEUE_SUFFIX;
}
return targetQueueName;
} @Override
public Set<String> getPlanQueues() {
Set<String> ret = new HashSet<String>();
for (Map.Entry<String, CSQueue> l : queues.entrySet()) {
if (l.getValue() instanceof PlanQueue) {
ret.add(l.getKey());
}
}
return ret;
}
}
CapacityScheduler.java
CapacityScheduler.java 在 hadoop-2.7.3-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
其实由前面可知, ResourceManager 类的 createSchedulerEventDispatcher() 函数调用的是 ResourceManager 类的内部类 SchedulerEventDispatcher, 如下所示
//ResourceManager类的内部类SchedulerEventDispatcher
public SchedulerEventDispatcher(ResourceScheduler scheduler) {
super(SchedulerEventDispatcher.class.getName());
this.scheduler = scheduler;
this.eventProcessor = new Thread(new EventProcessor());
this.eventProcessor.setName("ResourceManager Event Processor");
}
会进一步调用内部类 SchedulerEventDispatcher 的内部类 EventProcessor, 如下所示:
//ResourceManager类的内部类SchedulerEventDispatcher的内部类EventProcessor
private final class EventProcessor implements Runnable {
@Override
public void run() { SchedulerEvent event; while (!stopped && !Thread.currentThread().isInterrupted()) {
try {
event = eventQueue.take();
} catch (InterruptedException e) {
LOG.error("Returning, interrupted : " + e);
return; // TODO: Kill RM.
} try {
scheduler.handle(event);
} catch (Throwable t) {
// An error occurred, but we are shutting down anyway.
// If it was an InterruptedException, the very act of
// shutdown could have caused it and is probably harmless.
if (stopped) {
LOG.warn("Exception during shutdown: ", t);
break;
}
LOG.fatal("Error in handling event type " + event.getType()
+ " to the scheduler", t);
if (shouldExitOnError
&& !ShutdownHookManager.get().isShutdownInProgress()) {
LOG.info("Exiting, bbye..");
System.exit(-1);
}
}
}
}
}
在函数run()内部, 有scheduler.handle(event), 我们知道,这个scheduler类型是默认调度起CapacityScheduler的对象。 所以最后调用的是 CapacityScheduler类的 handle()函数, 如下所示:
//CapacityScheduler.java
public void handle(SchedulerEvent event) {
switch(event.getType()) {
......
5 case APP_ADDED:
{
AppAddedSchedulerEvent appAddedEvent = (AppAddedSchedulerEvent) event;
String queueName =
resolveReservationQueueName(appAddedEvent.getQueue(),
appAddedEvent.getApplicationId(),
appAddedEvent.getReservationID());
if (queueName != null) {
if (!appAddedEvent.getIsAppRecovering()) {
addApplication(appAddedEvent.getApplicationId(), queueName,
appAddedEvent.getUser());
} else {
addApplicationOnRecovery(appAddedEvent.getApplicationId(), queueName,
appAddedEvent.getUser());
}
}
}
break;
.......
}
我们可以看到APP_ADDED对应的操作, 其中 handle()函数内部调用 addApplication(appAddedEvent.getApplicationId(), queueName, appAddedEvent.getUser()), 进入函数addApplication(),如下所示:
//CapacityScheduler.java
private synchronized void addApplication(ApplicationId applicationId,
String queueName, String user) {
queueName = getQueueMappings(applicationId, queueName, user);
if (queueName == null) {
// Exception encountered while getting queue mappings.
return;
}
// sanity checks.
CSQueue queue = getQueue(queueName);
if (queue == null) {
String message = "Application " + applicationId +
" submitted by user " + user + " to unknown queue: " + queueName;
this.rmContext.getDispatcher().getEventHandler()
.handle(new RMAppEvent(applicationId,
RMAppEventType.APP_REJECTED, message));
return;
}
if (!(queue instanceof LeafQueue)) {
String message = "Application " + applicationId +
" submitted by user " + user + " to non-leaf queue: " + queueName;
this.rmContext.getDispatcher().getEventHandler()
.handle(new RMAppEvent(applicationId,
RMAppEventType.APP_REJECTED, message));
return;
}
// Submit to the queue
try {
queue.submitApplication(applicationId, user, queueName);
} catch (AccessControlException ace) {
LOG.info("Failed to submit application " + applicationId + " to queue "
+ queueName + " from user " + user, ace);
this.rmContext.getDispatcher().getEventHandler()
.handle(new RMAppEvent(applicationId,
RMAppEventType.APP_REJECTED, ace.toString()));
return;
}
// update the metrics
queue.getMetrics().submitApp(user);
SchedulerApplication<FiCaSchedulerApp> application =
new SchedulerApplication<FiCaSchedulerApp>(queue, user);
applications.put(applicationId, application);
LOG.info("Accepted application " + applicationId + " from user: " + user
+ ", in queue: " + queueName);
rmContext.getDispatcher().getEventHandler()
.handle(new RMAppEvent(applicationId, RMAppEventType.APP_ACCEPTED));
}
该函数内部会调用 this.rmContext.getDispatcher().getEventHandler().handle(new RMAppEvent(applicationId, RMAppEventType.APP_REJECTED, message)), 前面我们类似的已经分析过,其中this.rmContext=RMContextImpl this.rmContext.getDispatcher()=AsyncDispatcher this.rmContext.getDispatcher().getEventHandler()=AsyncDispatcher$GenericEventHandler。 其中RMAppEventType事件已经在ResourceManager类的内部类RMActiveServices的serviceInit()函数中注册过,我们知道最后调用的是 RMAppImpl 类的 handle()函数, 进一步在RMAppImpl 类中, (上一次的状态是从RMAppState.NEW_SAVING 转变为 RMAppState.SUBMITTED)调用StateMachineFactory类的 .addTransition(。。。)函数。
同理rmContext.getDispatcher().getEventHandler().handle(new RMAppEvent(applicationId, RMAppEventType.APP_ACCEPTED)), 最后在 RMAppImpl 类中调用StateMachineFactory类的 .addTransition(。。。)函数。
我们知道,上一次的状态是RMAppState.SUBMITTED,具体如下所示:
//RMAppImpl.java
private static final StateMachineFactory<RMAppImpl,
RMAppState,
RMAppEventType,
RMAppEvent> stateMachineFactory
= new StateMachineFactory<RMAppImpl,
RMAppState,
RMAppEventType,
RMAppEvent>(RMAppState.NEW) // Transitions from NEW state
.......
.addTransition(RMAppState.NEW, RMAppState.NEW_SAVING,
RMAppEventType.START, new RMAppNewlySavingTransition())
...... // Transitions from NEW_SAVING state
......
.addTransition(RMAppState.NEW_SAVING, RMAppState.SUBMITTED,
RMAppEventType.APP_NEW_SAVED, new AddApplicationToSchedulerTransition())
...... // Transitions from SUBMITTED state
......
.addTransition(RMAppState.SUBMITTED, RMAppState.FINAL_SAVING,
RMAppEventType.APP_REJECTED,
new FinalSavingTransition(
new AppRejectedTransition(), RMAppState.FAILED))
.addTransition(RMAppState.SUBMITTED, RMAppState.ACCEPTED,
RMAppEventType.APP_ACCEPTED, new StartAppAttemptTransition())
......
.installTopology();
由此可知, 对于RMAppImpl收到RMAppEventType.APP_ACCEPTED 事件后,将自身的运行状态由 RMAppState.SUBMITTED转变为 RMAppState.ACCEPTED , 调用回调类是RMAppImpl类的内部类StartAppAttemptTransition, 如下所示:
//RMAppImpl.java 的内部类 StartAppAttemptTransition
private static final class StartAppAttemptTransition extends RMAppTransition {
@Override
public void transition(RMAppImpl app, RMAppEvent event) {
app.createAndStartNewAttempt(false);
};
}
进入函数createAndStartNewAttempt, 如下所示:
//RMAppImpl.java
private void createNewAttempt() {
ApplicationAttemptId appAttemptId =
ApplicationAttemptId.newInstance(applicationId, attempts.size() + 1);
RMAppAttempt attempt =
new RMAppAttemptImpl(appAttemptId, rmContext, scheduler, masterService,
submissionContext, conf,
// The newly created attempt maybe last attempt if (number of
// previously failed attempts(which should not include Preempted,
// hardware error and NM resync) + 1) equal to the max-attempt
// limit.
maxAppAttempts == (getNumFailedAppAttempts() + 1), amReq);
attempts.put(appAttemptId, attempt);
currentAttempt = attempt;
} private void
createAndStartNewAttempt(boolean transferStateFromPreviousAttempt) {
createNewAttempt();
handler.handle(new RMAppStartAttemptEvent(currentAttempt.getAppAttemptId(),
transferStateFromPreviousAttempt));
}
createNewAttempt()函数创建了一个运行实例对象RMAppAttemptImpl。 并且handler.handle(new RMAppStartAttemptEvent(currentAttempt.getAppAttemptId(), transferStateFromPreviousAttempt)),调用 RMAppStartAttemptEvent类,如下所示:
//RMAppStartAttemptEvent.java
public class RMAppStartAttemptEvent extends RMAppAttemptEvent { private final boolean transferStateFromPreviousAttempt; public RMAppStartAttemptEvent(ApplicationAttemptId appAttemptId,
boolean transferStateFromPreviousAttempt) {
super(appAttemptId, RMAppAttemptEventType.START);
this.transferStateFromPreviousAttempt = transferStateFromPreviousAttempt;
} public boolean getTransferStateFromPreviousAttempt() {
return transferStateFromPreviousAttempt;
}
}
其中事件类型RMAppAttemptEventType.START, 由于在ResourceManager中,将RMAppAttemptEventType类型的事件绑定到了ApplicationAttemptEventDispatcher类,如下所示:
public enum RMAppAttemptEventType {
// Source: RMApp
START,
KILL, // Source: AMLauncher
LAUNCHED,
LAUNCH_FAILED, // Source: AMLivelinessMonitor
EXPIRE, // Source: ApplicationMasterService
REGISTERED,
STATUS_UPDATE,
UNREGISTERED, // Source: Containers
CONTAINER_ALLOCATED,
CONTAINER_FINISHED, // Source: RMStateStore
ATTEMPT_NEW_SAVED,
ATTEMPT_UPDATE_SAVED, // Source: Scheduler
ATTEMPT_ADDED, // Source: RMAttemptImpl.recover
RECOVER }
RMAppAttemptEventType.java
RMAppAttemptEventType.java 在 hadoop-2.7.3-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptEventType.java
//ResourceManager.java的内部类RMActiveServices的serviceInit()函数
// Register event handler for RmAppAttemptEvents
rmDispatcher.register(RMAppAttemptEventType.class,
new ApplicationAttemptEventDispatcher(rmContext));
进入ApplicationAttemptEventDispatcher,如下所示:
//ResourceManager.java的内部类ApplicationAttemptEventDispatcher
@Private
public static final class ApplicationAttemptEventDispatcher implements
EventHandler<RMAppAttemptEvent> { private final RMContext rmContext; public ApplicationAttemptEventDispatcher(RMContext rmContext) {
this.rmContext = rmContext;
} @Override
public void handle(RMAppAttemptEvent event) {
ApplicationAttemptId appAttemptID = event.getApplicationAttemptId();
ApplicationId appAttemptId = appAttemptID.getApplicationId();
RMApp rmApp = this.rmContext.getRMApps().get(appAttemptId);
if (rmApp != null) {
RMAppAttempt rmAppAttempt = rmApp.getRMAppAttempt(appAttemptID);
if (rmAppAttempt != null) {
try {
rmAppAttempt.handle(event);
} catch (Throwable t) {
LOG.error("Error in handling event type " + event.getType()
+ " for applicationAttempt " + appAttemptId, t);
}
}
}
}
}
该类函数handle()函数内部调用 rmAppAttempt.handle(event), 其中rmAppAttempt 是 接口RMAppAttempt的对象, 该接口只有一个实现类 public class RMAppAttemptImpl implements RMAppAttempt, Recoverable {。。。}。 所以最终调用的是RMAppAttemptImpl类的handle()函数。
@SuppressWarnings({"unchecked", "rawtypes"})
public class RMAppAttemptImpl implements RMAppAttempt, Recoverable { private static final Log LOG = LogFactory.getLog(RMAppAttemptImpl.class); private static final RecordFactory recordFactory = RecordFactoryProvider
.getRecordFactory(null); public final static Priority AM_CONTAINER_PRIORITY = recordFactory
.newRecordInstance(Priority.class);
static {
AM_CONTAINER_PRIORITY.setPriority(0);
} private final StateMachine<RMAppAttemptState,
RMAppAttemptEventType,
RMAppAttemptEvent> stateMachine; private final RMContext rmContext;
private final EventHandler eventHandler;
private final YarnScheduler scheduler;
private final ApplicationMasterService masterService; private final ReadLock readLock;
private final WriteLock writeLock; private final ApplicationAttemptId applicationAttemptId;
private final ApplicationSubmissionContext submissionContext;
private Token<AMRMTokenIdentifier> amrmToken = null;
private volatile Integer amrmTokenKeyId = null;
private SecretKey clientTokenMasterKey = null; private ConcurrentMap<NodeId, List<ContainerStatus>>
justFinishedContainers =
new ConcurrentHashMap<NodeId, List<ContainerStatus>>();
// Tracks the previous finished containers that are waiting to be
// verified as received by the AM. If the AM sends the next allocate
// request it implicitly acks this list.
private ConcurrentMap<NodeId, List<ContainerStatus>>
finishedContainersSentToAM =
new ConcurrentHashMap<NodeId, List<ContainerStatus>>();
private Container masterContainer; private float progress = 0;
private String host = "N/A";
private int rpcPort = -1;
private String originalTrackingUrl = "N/A";
private String proxiedTrackingUrl = "N/A";
private long startTime = 0;
private long finishTime = 0;
private long launchAMStartTime = 0;
private long launchAMEndTime = 0; // Set to null initially. Will eventually get set
// if an RMAppAttemptUnregistrationEvent occurs
private FinalApplicationStatus finalStatus = null;
private final StringBuilder diagnostics = new StringBuilder();
private int amContainerExitStatus = ContainerExitStatus.INVALID; private Configuration conf;
// Since AM preemption, hardware error and NM resync are not counted towards
// AM failure count, even if this flag is true, a new attempt can still be
// re-created if this attempt is eventually failed because of preemption,
// hardware error or NM resync. So this flag indicates that this may be
// last attempt.
private final boolean maybeLastAttempt;
private static final ExpiredTransition EXPIRED_TRANSITION =
new ExpiredTransition(); private RMAppAttemptEvent eventCausingFinalSaving;
private RMAppAttemptState targetedFinalState;
private RMAppAttemptState recoveredFinalState;
private RMAppAttemptState stateBeforeFinalSaving;
private Object transitionTodo; private RMAppAttemptMetrics attemptMetrics = null;
private ResourceRequest amReq = null; private static final StateMachineFactory<RMAppAttemptImpl,
RMAppAttemptState,
RMAppAttemptEventType,
RMAppAttemptEvent>
stateMachineFactory = new StateMachineFactory<RMAppAttemptImpl,
RMAppAttemptState,
RMAppAttemptEventType,
RMAppAttemptEvent>(RMAppAttemptState.NEW) // Transitions from NEW State
.addTransition(RMAppAttemptState.NEW, RMAppAttemptState.SUBMITTED,
RMAppAttemptEventType.START, new AttemptStartedTransition())
.addTransition(RMAppAttemptState.NEW, RMAppAttemptState.FINAL_SAVING,
RMAppAttemptEventType.KILL,
new FinalSavingTransition(new BaseFinalTransition(
RMAppAttemptState.KILLED), RMAppAttemptState.KILLED))
.addTransition(RMAppAttemptState.NEW, RMAppAttemptState.FINAL_SAVING,
RMAppAttemptEventType.REGISTERED,
new FinalSavingTransition(
new UnexpectedAMRegisteredTransition(), RMAppAttemptState.FAILED))
.addTransition( RMAppAttemptState.NEW,
EnumSet.of(RMAppAttemptState.FINISHED, RMAppAttemptState.KILLED,
RMAppAttemptState.FAILED, RMAppAttemptState.LAUNCHED),
RMAppAttemptEventType.RECOVER, new AttemptRecoveredTransition()) // Transitions from SUBMITTED state
.addTransition(RMAppAttemptState.SUBMITTED,
EnumSet.of(RMAppAttemptState.LAUNCHED_UNMANAGED_SAVING,
RMAppAttemptState.SCHEDULED),
RMAppAttemptEventType.ATTEMPT_ADDED,
new ScheduleTransition())
.addTransition(RMAppAttemptState.SUBMITTED, RMAppAttemptState.FINAL_SAVING,
RMAppAttemptEventType.KILL,
new FinalSavingTransition(new BaseFinalTransition(
RMAppAttemptState.KILLED), RMAppAttemptState.KILLED))
.addTransition(RMAppAttemptState.SUBMITTED, RMAppAttemptState.FINAL_SAVING,
RMAppAttemptEventType.REGISTERED,
new FinalSavingTransition(
new UnexpectedAMRegisteredTransition(), RMAppAttemptState.FAILED)) // Transitions from SCHEDULED State
.addTransition(RMAppAttemptState.SCHEDULED,
EnumSet.of(RMAppAttemptState.ALLOCATED_SAVING,
RMAppAttemptState.SCHEDULED),
RMAppAttemptEventType.CONTAINER_ALLOCATED,
new AMContainerAllocatedTransition())
.addTransition(RMAppAttemptState.SCHEDULED, RMAppAttemptState.FINAL_SAVING,
RMAppAttemptEventType.KILL,
new FinalSavingTransition(new BaseFinalTransition(
RMAppAttemptState.KILLED), RMAppAttemptState.KILLED))
.addTransition(RMAppAttemptState.SCHEDULED,
RMAppAttemptState.FINAL_SAVING,
RMAppAttemptEventType.CONTAINER_FINISHED,
new FinalSavingTransition(
new AMContainerCrashedBeforeRunningTransition(),
RMAppAttemptState.FAILED)) // Transitions from ALLOCATED_SAVING State
.addTransition(RMAppAttemptState.ALLOCATED_SAVING,
RMAppAttemptState.ALLOCATED,
RMAppAttemptEventType.ATTEMPT_NEW_SAVED, new AttemptStoredTransition()) // App could be killed by the client. So need to handle this.
.addTransition(RMAppAttemptState.ALLOCATED_SAVING,
RMAppAttemptState.FINAL_SAVING,
RMAppAttemptEventType.KILL,
new FinalSavingTransition(new BaseFinalTransition(
RMAppAttemptState.KILLED), RMAppAttemptState.KILLED))
.addTransition(RMAppAttemptState.ALLOCATED_SAVING,
RMAppAttemptState.FINAL_SAVING,
RMAppAttemptEventType.CONTAINER_FINISHED,
new FinalSavingTransition(
new AMContainerCrashedBeforeRunningTransition(),
RMAppAttemptState.FAILED)) // Transitions from LAUNCHED_UNMANAGED_SAVING State
.addTransition(RMAppAttemptState.LAUNCHED_UNMANAGED_SAVING,
RMAppAttemptState.LAUNCHED,
RMAppAttemptEventType.ATTEMPT_NEW_SAVED,
new UnmanagedAMAttemptSavedTransition())
// attempt should not try to register in this state
.addTransition(RMAppAttemptState.LAUNCHED_UNMANAGED_SAVING,
RMAppAttemptState.FINAL_SAVING,
RMAppAttemptEventType.REGISTERED,
new FinalSavingTransition(
new UnexpectedAMRegisteredTransition(), RMAppAttemptState.FAILED))
// App could be killed by the client. So need to handle this.
.addTransition(RMAppAttemptState.LAUNCHED_UNMANAGED_SAVING,
RMAppAttemptState.FINAL_SAVING,
RMAppAttemptEventType.KILL,
new FinalSavingTransition(new BaseFinalTransition(
RMAppAttemptState.KILLED), RMAppAttemptState.KILLED)) // Transitions from ALLOCATED State
.addTransition(RMAppAttemptState.ALLOCATED, RMAppAttemptState.LAUNCHED,
RMAppAttemptEventType.LAUNCHED, new AMLaunchedTransition())
.addTransition(RMAppAttemptState.ALLOCATED, RMAppAttemptState.FINAL_SAVING,
RMAppAttemptEventType.LAUNCH_FAILED,
new FinalSavingTransition(new LaunchFailedTransition(),
RMAppAttemptState.FAILED))
.addTransition(RMAppAttemptState.ALLOCATED, RMAppAttemptState.FINAL_SAVING,
RMAppAttemptEventType.KILL,
new FinalSavingTransition(
new KillAllocatedAMTransition(), RMAppAttemptState.KILLED)) .addTransition(RMAppAttemptState.ALLOCATED, RMAppAttemptState.FINAL_SAVING,
RMAppAttemptEventType.CONTAINER_FINISHED,
new FinalSavingTransition(
new AMContainerCrashedBeforeRunningTransition(), RMAppAttemptState.FAILED)) // Transitions from LAUNCHED State
.addTransition(RMAppAttemptState.LAUNCHED, RMAppAttemptState.RUNNING,
RMAppAttemptEventType.REGISTERED, new AMRegisteredTransition())
.addTransition(RMAppAttemptState.LAUNCHED,
EnumSet.of(RMAppAttemptState.LAUNCHED, RMAppAttemptState.FINAL_SAVING),
RMAppAttemptEventType.CONTAINER_FINISHED,
new ContainerFinishedTransition(
new AMContainerCrashedBeforeRunningTransition(),
RMAppAttemptState.LAUNCHED))
.addTransition(
RMAppAttemptState.LAUNCHED, RMAppAttemptState.FINAL_SAVING,
RMAppAttemptEventType.EXPIRE,
new FinalSavingTransition(EXPIRED_TRANSITION,
RMAppAttemptState.FAILED))
.addTransition(RMAppAttemptState.LAUNCHED, RMAppAttemptState.FINAL_SAVING,
RMAppAttemptEventType.KILL,
new FinalSavingTransition(new FinalTransition(
RMAppAttemptState.KILLED), RMAppAttemptState.KILLED)) // Transitions from RUNNING State
.addTransition(RMAppAttemptState.RUNNING,
EnumSet.of(RMAppAttemptState.FINAL_SAVING, RMAppAttemptState.FINISHED),
RMAppAttemptEventType.UNREGISTERED, new AMUnregisteredTransition())
.addTransition(RMAppAttemptState.RUNNING, RMAppAttemptState.RUNNING,
RMAppAttemptEventType.STATUS_UPDATE, new StatusUpdateTransition())
.addTransition(RMAppAttemptState.RUNNING, RMAppAttemptState.RUNNING,
RMAppAttemptEventType.CONTAINER_ALLOCATED)
.addTransition(
RMAppAttemptState.RUNNING,
EnumSet.of(RMAppAttemptState.RUNNING, RMAppAttemptState.FINAL_SAVING),
RMAppAttemptEventType.CONTAINER_FINISHED,
new ContainerFinishedTransition(
new AMContainerCrashedAtRunningTransition(),
RMAppAttemptState.RUNNING))
.addTransition(
RMAppAttemptState.RUNNING, RMAppAttemptState.FINAL_SAVING,
RMAppAttemptEventType.EXPIRE,
new FinalSavingTransition(EXPIRED_TRANSITION,
RMAppAttemptState.FAILED))
.addTransition(
RMAppAttemptState.RUNNING, RMAppAttemptState.FINAL_SAVING,
RMAppAttemptEventType.KILL,
new FinalSavingTransition(new FinalTransition(
RMAppAttemptState.KILLED), RMAppAttemptState.KILLED)) // Transitions from FINAL_SAVING State
.addTransition(RMAppAttemptState.FINAL_SAVING,
EnumSet.of(RMAppAttemptState.FINISHING, RMAppAttemptState.FAILED,
RMAppAttemptState.KILLED, RMAppAttemptState.FINISHED),
RMAppAttemptEventType.ATTEMPT_UPDATE_SAVED,
new FinalStateSavedTransition())
.addTransition(RMAppAttemptState.FINAL_SAVING, RMAppAttemptState.FINAL_SAVING,
RMAppAttemptEventType.CONTAINER_FINISHED,
new ContainerFinishedAtFinalSavingTransition())
.addTransition(RMAppAttemptState.FINAL_SAVING, RMAppAttemptState.FINAL_SAVING,
RMAppAttemptEventType.EXPIRE,
new AMExpiredAtFinalSavingTransition())
.addTransition(RMAppAttemptState.FINAL_SAVING, RMAppAttemptState.FINAL_SAVING,
EnumSet.of(
RMAppAttemptEventType.UNREGISTERED,
RMAppAttemptEventType.STATUS_UPDATE,
RMAppAttemptEventType.LAUNCHED,
RMAppAttemptEventType.LAUNCH_FAILED,
// should be fixed to reject container allocate request at Final
// Saving in scheduler
RMAppAttemptEventType.CONTAINER_ALLOCATED,
RMAppAttemptEventType.ATTEMPT_NEW_SAVED,
RMAppAttemptEventType.KILL)) // Transitions from FAILED State
// For work-preserving AM restart, failed attempt are still capturing
// CONTAINER_FINISHED event and record the finished containers for the
// use by the next new attempt.
.addTransition(RMAppAttemptState.FAILED, RMAppAttemptState.FAILED,
RMAppAttemptEventType.CONTAINER_FINISHED,
new ContainerFinishedAtFinalStateTransition())
.addTransition(
RMAppAttemptState.FAILED,
RMAppAttemptState.FAILED,
EnumSet.of(
RMAppAttemptEventType.EXPIRE,
RMAppAttemptEventType.KILL,
RMAppAttemptEventType.UNREGISTERED,
RMAppAttemptEventType.STATUS_UPDATE,
RMAppAttemptEventType.CONTAINER_ALLOCATED)) // Transitions from FINISHING State
.addTransition(RMAppAttemptState.FINISHING,
EnumSet.of(RMAppAttemptState.FINISHING, RMAppAttemptState.FINISHED),
RMAppAttemptEventType.CONTAINER_FINISHED,
new AMFinishingContainerFinishedTransition())
.addTransition(RMAppAttemptState.FINISHING, RMAppAttemptState.FINISHED,
RMAppAttemptEventType.EXPIRE,
new FinalTransition(RMAppAttemptState.FINISHED))
.addTransition(RMAppAttemptState.FINISHING, RMAppAttemptState.FINISHING,
EnumSet.of(
RMAppAttemptEventType.UNREGISTERED,
RMAppAttemptEventType.STATUS_UPDATE,
RMAppAttemptEventType.CONTAINER_ALLOCATED,
// ignore Kill as we have already saved the final Finished state in
// state store.
RMAppAttemptEventType.KILL)) // Transitions from FINISHED State
.addTransition(
RMAppAttemptState.FINISHED,
RMAppAttemptState.FINISHED,
EnumSet.of(
RMAppAttemptEventType.EXPIRE,
RMAppAttemptEventType.UNREGISTERED,
RMAppAttemptEventType.CONTAINER_ALLOCATED,
RMAppAttemptEventType.KILL))
.addTransition(RMAppAttemptState.FINISHED,
RMAppAttemptState.FINISHED,
RMAppAttemptEventType.CONTAINER_FINISHED,
new ContainerFinishedAtFinalStateTransition()) // Transitions from KILLED State
.addTransition(
RMAppAttemptState.KILLED,
RMAppAttemptState.KILLED,
EnumSet.of(RMAppAttemptEventType.ATTEMPT_ADDED,
RMAppAttemptEventType.LAUNCHED,
RMAppAttemptEventType.LAUNCH_FAILED,
RMAppAttemptEventType.EXPIRE,
RMAppAttemptEventType.REGISTERED,
RMAppAttemptEventType.CONTAINER_ALLOCATED,
RMAppAttemptEventType.UNREGISTERED,
RMAppAttemptEventType.KILL,
RMAppAttemptEventType.STATUS_UPDATE))
.addTransition(RMAppAttemptState.KILLED,
RMAppAttemptState.KILLED,
RMAppAttemptEventType.CONTAINER_FINISHED,
new ContainerFinishedAtFinalStateTransition())
.installTopology(); public RMAppAttemptImpl(ApplicationAttemptId appAttemptId,
RMContext rmContext, YarnScheduler scheduler,
ApplicationMasterService masterService,
ApplicationSubmissionContext submissionContext,
Configuration conf, boolean maybeLastAttempt, ResourceRequest amReq) {
this.conf = conf;
this.applicationAttemptId = appAttemptId;
this.rmContext = rmContext;
this.eventHandler = rmContext.getDispatcher().getEventHandler();
this.submissionContext = submissionContext;
this.scheduler = scheduler;
this.masterService = masterService; ReentrantReadWriteLock lock = new ReentrantReadWriteLock();
this.readLock = lock.readLock();
this.writeLock = lock.writeLock(); this.proxiedTrackingUrl = generateProxyUriWithScheme();
this.maybeLastAttempt = maybeLastAttempt;
this.stateMachine = stateMachineFactory.make(this); this.attemptMetrics =
new RMAppAttemptMetrics(applicationAttemptId, rmContext); this.amReq = amReq;
} @Override
public ApplicationAttemptId getAppAttemptId() {
return this.applicationAttemptId;
} @Override
public ApplicationSubmissionContext getSubmissionContext() {
return this.submissionContext;
} @Override
public FinalApplicationStatus getFinalApplicationStatus() {
this.readLock.lock();
try {
return this.finalStatus;
} finally {
this.readLock.unlock();
}
} @Override
public RMAppAttemptState getAppAttemptState() {
this.readLock.lock();
try {
return this.stateMachine.getCurrentState();
} finally {
this.readLock.unlock();
}
} @Override
public String getHost() {
this.readLock.lock(); try {
return this.host;
} finally {
this.readLock.unlock();
}
} @Override
public int getRpcPort() {
this.readLock.lock(); try {
return this.rpcPort;
} finally {
this.readLock.unlock();
}
} @Override
public String getTrackingUrl() {
this.readLock.lock();
try {
return (getSubmissionContext().getUnmanagedAM()) ?
this.originalTrackingUrl : this.proxiedTrackingUrl;
} finally {
this.readLock.unlock();
}
} @Override
public String getOriginalTrackingUrl() {
this.readLock.lock();
try {
return this.originalTrackingUrl;
} finally {
this.readLock.unlock();
}
} @Override
public String getWebProxyBase() {
this.readLock.lock();
try {
return ProxyUriUtils.getPath(applicationAttemptId.getApplicationId());
} finally {
this.readLock.unlock();
}
} private String generateProxyUriWithScheme() {
this.readLock.lock();
try {
final String scheme = WebAppUtils.getHttpSchemePrefix(conf);
String proxy = WebAppUtils.getProxyHostAndPort(conf);
URI proxyUri = ProxyUriUtils.getUriFromAMUrl(scheme, proxy);
URI result = ProxyUriUtils.getProxyUri(null, proxyUri,
applicationAttemptId.getApplicationId());
return result.toASCIIString();
} catch (URISyntaxException e) {
LOG.warn("Could not proxify the uri for "
+ applicationAttemptId.getApplicationId(), e);
return null;
} finally {
this.readLock.unlock();
}
} private void setTrackingUrlToRMAppPage(RMAppAttemptState stateToBeStored) {
originalTrackingUrl = pjoin(
WebAppUtils.getResolvedRMWebAppURLWithScheme(conf),
"cluster", "app", getAppAttemptId().getApplicationId());
switch (stateToBeStored) {
case KILLED:
case FAILED:
proxiedTrackingUrl = originalTrackingUrl;
break;
default:
break;
}
} private void setTrackingUrlToAHSPage(RMAppAttemptState stateToBeStored) {
originalTrackingUrl = pjoin(
WebAppUtils.getHttpSchemePrefix(conf) +
WebAppUtils.getAHSWebAppURLWithoutScheme(conf),
"applicationhistory", "app", getAppAttemptId().getApplicationId());
switch (stateToBeStored) {
case KILLED:
case FAILED:
proxiedTrackingUrl = originalTrackingUrl;
break;
default:
break;
}
} private void invalidateAMHostAndPort() {
this.host = "N/A";
this.rpcPort = -1;
} // This is only used for RMStateStore. Normal operation must invoke the secret
// manager to get the key and not use the local key directly.
@Override
public SecretKey getClientTokenMasterKey() {
return this.clientTokenMasterKey;
} @Override
public Token<AMRMTokenIdentifier> getAMRMToken() {
this.readLock.lock();
try {
return this.amrmToken;
} finally {
this.readLock.unlock();
}
} @Private
public void setAMRMToken(Token<AMRMTokenIdentifier> lastToken) {
this.writeLock.lock();
try {
this.amrmToken = lastToken;
this.amrmTokenKeyId = null;
} finally {
this.writeLock.unlock();
}
} @Private
public int getAMRMTokenKeyId() {
Integer keyId = this.amrmTokenKeyId;
if (keyId == null) {
this.readLock.lock();
try {
if (this.amrmToken == null) {
throw new YarnRuntimeException("Missing AMRM token for "
+ this.applicationAttemptId);
}
keyId = this.amrmToken.decodeIdentifier().getKeyId();
this.amrmTokenKeyId = keyId;
} catch (IOException e) {
throw new YarnRuntimeException("AMRM token decode error for "
+ this.applicationAttemptId, e);
} finally {
this.readLock.unlock();
}
}
return keyId;
} @Override
public Token<ClientToAMTokenIdentifier> createClientToken(String client) {
this.readLock.lock(); try {
Token<ClientToAMTokenIdentifier> token = null;
ClientToAMTokenSecretManagerInRM secretMgr =
this.rmContext.getClientToAMTokenSecretManager();
if (client != null &&
secretMgr.getMasterKey(this.applicationAttemptId) != null) {
token = new Token<ClientToAMTokenIdentifier>(
new ClientToAMTokenIdentifier(this.applicationAttemptId, client),
secretMgr);
}
return token;
} finally {
this.readLock.unlock();
}
} @Override
public String getDiagnostics() {
this.readLock.lock(); try {
return this.diagnostics.toString();
} finally {
this.readLock.unlock();
}
} public int getAMContainerExitStatus() {
this.readLock.lock();
try {
return this.amContainerExitStatus;
} finally {
this.readLock.unlock();
}
} @Override
public float getProgress() {
this.readLock.lock(); try {
return this.progress;
} finally {
this.readLock.unlock();
}
} @VisibleForTesting
@Override
public List<ContainerStatus> getJustFinishedContainers() {
this.readLock.lock();
try {
List<ContainerStatus> returnList = new ArrayList<ContainerStatus>();
for (Collection<ContainerStatus> containerStatusList :
justFinishedContainers.values()) {
returnList.addAll(containerStatusList);
}
return returnList;
} finally {
this.readLock.unlock();
}
} @Override
public ConcurrentMap<NodeId, List<ContainerStatus>>
getJustFinishedContainersReference
() {
this.readLock.lock();
try {
return this.justFinishedContainers;
} finally {
this.readLock.unlock();
}
} @Override
public ConcurrentMap<NodeId, List<ContainerStatus>>
getFinishedContainersSentToAMReference() {
this.readLock.lock();
try {
return this.finishedContainersSentToAM;
} finally {
this.readLock.unlock();
}
} @Override
public List<ContainerStatus> pullJustFinishedContainers() {
this.writeLock.lock(); try {
List<ContainerStatus> returnList = new ArrayList<ContainerStatus>(); // A new allocate means the AM received the previously sent
// finishedContainers. We can ack this to NM now
sendFinishedContainersToNM(); // Mark every containerStatus as being sent to AM though we may return
// only the ones that belong to the current attempt
boolean keepContainersAcressAttempts = this.submissionContext
.getKeepContainersAcrossApplicationAttempts();
for (NodeId nodeId:justFinishedContainers.keySet()) { // Clear and get current values
List<ContainerStatus> finishedContainers = justFinishedContainers.put
(nodeId, new ArrayList<ContainerStatus>()); if (keepContainersAcressAttempts) {
returnList.addAll(finishedContainers);
} else {
// Filter out containers from previous attempt
for (ContainerStatus containerStatus: finishedContainers) {
if (containerStatus.getContainerId().getApplicationAttemptId()
.equals(this.getAppAttemptId())) {
returnList.add(containerStatus);
}
}
} finishedContainersSentToAM.putIfAbsent(nodeId, new ArrayList
<ContainerStatus>());
finishedContainersSentToAM.get(nodeId).addAll(finishedContainers);
} return returnList;
} finally {
this.writeLock.unlock();
}
} @Override
public Container getMasterContainer() {
this.readLock.lock(); try {
return this.masterContainer;
} finally {
this.readLock.unlock();
}
} @InterfaceAudience.Private
@VisibleForTesting
public void setMasterContainer(Container container) {
masterContainer = container;
} @Override
public void handle(RMAppAttemptEvent event) { this.writeLock.lock(); try {
ApplicationAttemptId appAttemptID = event.getApplicationAttemptId();
LOG.debug("Processing event for " + appAttemptID + " of type "
+ event.getType());
final RMAppAttemptState oldState = getAppAttemptState();
try {
/* keep the master in sync with the state machine */
this.stateMachine.doTransition(event.getType(), event);
} catch (InvalidStateTransitonException e) {
LOG.error("Can't handle this event at current state", e);
/* TODO fail the application on the failed transition */
} if (oldState != getAppAttemptState()) {
LOG.info(appAttemptID + " State change from " + oldState + " to "
+ getAppAttemptState());
}
} finally {
this.writeLock.unlock();
}
} @Override
public ApplicationResourceUsageReport getApplicationResourceUsageReport() {
this.readLock.lock();
try {
ApplicationResourceUsageReport report =
scheduler.getAppResourceUsageReport(this.getAppAttemptId());
if (report == null) {
report = RMServerUtils.DUMMY_APPLICATION_RESOURCE_USAGE_REPORT;
}
AggregateAppResourceUsage resUsage =
this.attemptMetrics.getAggregateAppResourceUsage();
report.setMemorySeconds(resUsage.getMemorySeconds());
report.setVcoreSeconds(resUsage.getVcoreSeconds());
return report;
} finally {
this.readLock.unlock();
}
} @Override
public void recover(RMState state) {
ApplicationStateData appState =
state.getApplicationState().get(getAppAttemptId().getApplicationId());
ApplicationAttemptStateData attemptState =
appState.getAttempt(getAppAttemptId());
assert attemptState != null;
LOG.info("Recovering attempt: " + getAppAttemptId() + " with final state: "
+ attemptState.getState());
diagnostics.append("Attempt recovered after RM restart");
diagnostics.append(attemptState.getDiagnostics());
this.amContainerExitStatus = attemptState.getAMContainerExitStatus();
if (amContainerExitStatus == ContainerExitStatus.PREEMPTED) {
this.attemptMetrics.setIsPreempted();
} Credentials credentials = attemptState.getAppAttemptTokens();
setMasterContainer(attemptState.getMasterContainer());
recoverAppAttemptCredentials(credentials, attemptState.getState());
this.recoveredFinalState = attemptState.getState();
this.originalTrackingUrl = attemptState.getFinalTrackingUrl();
this.finalStatus = attemptState.getFinalApplicationStatus();
this.startTime = attemptState.getStartTime();
this.finishTime = attemptState.getFinishTime();
this.attemptMetrics.updateAggregateAppResourceUsage(
attemptState.getMemorySeconds(),attemptState.getVcoreSeconds());
} public void transferStateFromPreviousAttempt(RMAppAttempt attempt) {
this.justFinishedContainers = attempt.getJustFinishedContainersReference();
this.finishedContainersSentToAM =
attempt.getFinishedContainersSentToAMReference();
} private void recoverAppAttemptCredentials(Credentials appAttemptTokens,
RMAppAttemptState state) {
if (appAttemptTokens == null || state == RMAppAttemptState.FAILED
|| state == RMAppAttemptState.FINISHED
|| state == RMAppAttemptState.KILLED) {
return;
} if (UserGroupInformation.isSecurityEnabled()) {
byte[] clientTokenMasterKeyBytes = appAttemptTokens.getSecretKey(
RMStateStore.AM_CLIENT_TOKEN_MASTER_KEY_NAME);
if (clientTokenMasterKeyBytes != null) {
clientTokenMasterKey = rmContext.getClientToAMTokenSecretManager()
.registerMasterKey(applicationAttemptId, clientTokenMasterKeyBytes);
}
} setAMRMToken(rmContext.getAMRMTokenSecretManager().createAndGetAMRMToken(
applicationAttemptId));
} private static class BaseTransition implements
SingleArcTransition<RMAppAttemptImpl, RMAppAttemptEvent> { @Override
public void transition(RMAppAttemptImpl appAttempt,
RMAppAttemptEvent event) {
} } private static final class AttemptStartedTransition extends BaseTransition {
@Override
public void transition(RMAppAttemptImpl appAttempt,
RMAppAttemptEvent event) { boolean transferStateFromPreviousAttempt = false;
if (event instanceof RMAppStartAttemptEvent) {
transferStateFromPreviousAttempt =
((RMAppStartAttemptEvent) event)
.getTransferStateFromPreviousAttempt();
}
appAttempt.startTime = System.currentTimeMillis(); // Register with the ApplicationMasterService
appAttempt.masterService
.registerAppAttempt(appAttempt.applicationAttemptId); if (UserGroupInformation.isSecurityEnabled()) {
appAttempt.clientTokenMasterKey =
appAttempt.rmContext.getClientToAMTokenSecretManager()
.createMasterKey(appAttempt.applicationAttemptId);
} // Add the applicationAttempt to the scheduler and inform the scheduler
// whether to transfer the state from previous attempt.
appAttempt.eventHandler.handle(new AppAttemptAddedSchedulerEvent(
appAttempt.applicationAttemptId, transferStateFromPreviousAttempt));
}
} private static final List<ContainerId> EMPTY_CONTAINER_RELEASE_LIST =
new ArrayList<ContainerId>(); private static final List<ResourceRequest> EMPTY_CONTAINER_REQUEST_LIST =
new ArrayList<ResourceRequest>(); @VisibleForTesting
public static final class ScheduleTransition
implements
MultipleArcTransition<RMAppAttemptImpl, RMAppAttemptEvent, RMAppAttemptState> {
@Override
public RMAppAttemptState transition(RMAppAttemptImpl appAttempt,
RMAppAttemptEvent event) {
ApplicationSubmissionContext subCtx = appAttempt.submissionContext;
if (!subCtx.getUnmanagedAM()) {
// Need reset #containers before create new attempt, because this request
// will be passed to scheduler, and scheduler will deduct the number after
// AM container allocated // Currently, following fields are all hard code,
// TODO: change these fields when we want to support
// priority/resource-name/relax-locality specification for AM containers
// allocation.
appAttempt.amReq.setNumContainers(1);
appAttempt.amReq.setPriority(AM_CONTAINER_PRIORITY);
appAttempt.amReq.setResourceName(ResourceRequest.ANY);
appAttempt.amReq.setRelaxLocality(true); // AM resource has been checked when submission
Allocation amContainerAllocation =
appAttempt.scheduler.allocate(appAttempt.applicationAttemptId,
Collections.singletonList(appAttempt.amReq),
EMPTY_CONTAINER_RELEASE_LIST, null, null);
if (amContainerAllocation != null
&& amContainerAllocation.getContainers() != null) {
assert (amContainerAllocation.getContainers().size() == 0);
}
return RMAppAttemptState.SCHEDULED;
} else {
// save state and then go to LAUNCHED state
appAttempt.storeAttempt();
return RMAppAttemptState.LAUNCHED_UNMANAGED_SAVING;
}
}
} private static final class AMContainerAllocatedTransition
implements
MultipleArcTransition<RMAppAttemptImpl, RMAppAttemptEvent, RMAppAttemptState> {
@Override
public RMAppAttemptState transition(RMAppAttemptImpl appAttempt,
RMAppAttemptEvent event) {
// Acquire the AM container from the scheduler.
Allocation amContainerAllocation =
appAttempt.scheduler.allocate(appAttempt.applicationAttemptId,
EMPTY_CONTAINER_REQUEST_LIST, EMPTY_CONTAINER_RELEASE_LIST, null,
null);
// There must be at least one container allocated, because a
// CONTAINER_ALLOCATED is emitted after an RMContainer is constructed,
// and is put in SchedulerApplication#newlyAllocatedContainers. // Note that YarnScheduler#allocate is not guaranteed to be able to
// fetch it since container may not be fetchable for some reason like
// DNS unavailable causing container token not generated. As such, we
// return to the previous state and keep retry until am container is
// fetched.
if (amContainerAllocation.getContainers().size() == 0) {
appAttempt.retryFetchingAMContainer(appAttempt);
return RMAppAttemptState.SCHEDULED;
} // Set the masterContainer
appAttempt.setMasterContainer(amContainerAllocation.getContainers()
.get(0));
RMContainerImpl rmMasterContainer = (RMContainerImpl)appAttempt.scheduler
.getRMContainer(appAttempt.getMasterContainer().getId());
rmMasterContainer.setAMContainer(true);
// The node set in NMTokenSecrentManager is used for marking whether the
// NMToken has been issued for this node to the AM.
// When AM container was allocated to RM itself, the node which allocates
// this AM container was marked as the NMToken already sent. Thus,
// clear this node set so that the following allocate requests from AM are
// able to retrieve the corresponding NMToken.
appAttempt.rmContext.getNMTokenSecretManager()
.clearNodeSetForAttempt(appAttempt.applicationAttemptId);
appAttempt.getSubmissionContext().setResource(
appAttempt.getMasterContainer().getResource());
appAttempt.storeAttempt();
return RMAppAttemptState.ALLOCATED_SAVING;
}
} private void retryFetchingAMContainer(final RMAppAttemptImpl appAttempt) {
// start a new thread so that we are not blocking main dispatcher thread.
new Thread() {
@Override
public void run() {
try {
Thread.sleep(500);
} catch (InterruptedException e) {
LOG.warn("Interrupted while waiting to resend the"
+ " ContainerAllocated Event.");
}
appAttempt.eventHandler.handle(
new RMAppAttemptEvent(appAttempt.applicationAttemptId,
RMAppAttemptEventType.CONTAINER_ALLOCATED));
}
}.start();
} private static final class AttemptStoredTransition extends BaseTransition {
@Override
public void transition(RMAppAttemptImpl appAttempt,
RMAppAttemptEvent event) {
appAttempt.launchAttempt();
}
} private static class AttemptRecoveredTransition
implements
MultipleArcTransition<RMAppAttemptImpl, RMAppAttemptEvent, RMAppAttemptState> {
@Override
public RMAppAttemptState transition(RMAppAttemptImpl appAttempt,
RMAppAttemptEvent event) {
RMApp rmApp = appAttempt.rmContext.getRMApps().get(
appAttempt.getAppAttemptId().getApplicationId()); /*
* If last attempt recovered final state is null .. it means attempt was
* started but AM container may or may not have started / finished.
* Therefore we should wait for it to finish.
*/
if (appAttempt.recoveredFinalState != null) {
appAttempt.progress = 1.0f;
// We will replay the final attempt only if last attempt is in final
// state but application is not in final state.
if (rmApp.getCurrentAppAttempt() == appAttempt
&& !RMAppImpl.isAppInFinalState(rmApp)) {
// Add the previous finished attempt to scheduler synchronously so
// that scheduler knows the previous attempt.
appAttempt.scheduler.handle(new AppAttemptAddedSchedulerEvent(
appAttempt.getAppAttemptId(), false, true));
(new BaseFinalTransition(appAttempt.recoveredFinalState)).transition(
appAttempt, event);
}
return appAttempt.recoveredFinalState;
} else if (RMAppImpl.isAppInFinalState(rmApp)) {
// Somehow attempt final state was not saved but app final state was saved.
// Skip adding the attempt into scheduler
RMAppState appState = ((RMAppImpl) rmApp).getRecoveredFinalState();
LOG.warn(rmApp.getApplicationId() + " final state (" + appState
+ ") was recorded, but " + appAttempt.applicationAttemptId
+ " final state (" + appAttempt.recoveredFinalState
+ ") was not recorded.");
switch (appState) {
case FINISHED:
return RMAppAttemptState.FINISHED;
case FAILED:
return RMAppAttemptState.FAILED;
case KILLED:
return RMAppAttemptState.KILLED;
}
return RMAppAttemptState.FAILED;
} else{
// Add the current attempt to the scheduler.
if (appAttempt.rmContext.isWorkPreservingRecoveryEnabled()) {
// Need to register an app attempt before AM can register
appAttempt.masterService
.registerAppAttempt(appAttempt.applicationAttemptId); // Add attempt to scheduler synchronously to guarantee scheduler
// knows attempts before AM or NM re-registers.
appAttempt.scheduler.handle(new AppAttemptAddedSchedulerEvent(
appAttempt.getAppAttemptId(), false, true));
} /*
* Since the application attempt's final state is not saved that means
* for AM container (previous attempt) state must be one of these.
* 1) AM container may not have been launched (RM failed right before
* this).
* 2) AM container was successfully launched but may or may not have
* registered / unregistered.
* In whichever case we will wait (by moving attempt into LAUNCHED
* state) and mark this attempt failed (assuming non work preserving
* restart) only after
* 1) Node manager during re-registration heart beats back saying
* am container finished.
* 2) OR AMLivelinessMonitor expires this attempt (when am doesn't
* heart beat back).
*/
(new AMLaunchedTransition()).transition(appAttempt, event);
return RMAppAttemptState.LAUNCHED;
}
}
} private void rememberTargetTransitions(RMAppAttemptEvent event,
Object transitionToDo, RMAppAttemptState targetFinalState) {
transitionTodo = transitionToDo;
targetedFinalState = targetFinalState;
eventCausingFinalSaving = event;
} private void rememberTargetTransitionsAndStoreState(RMAppAttemptEvent event,
Object transitionToDo, RMAppAttemptState targetFinalState,
RMAppAttemptState stateToBeStored) { rememberTargetTransitions(event, transitionToDo, targetFinalState);
stateBeforeFinalSaving = getState(); // As of today, finalState, diagnostics, final-tracking-url and
// finalAppStatus are the only things that we store into the StateStore
// AFTER the initial saving on app-attempt-start
// These fields can be visible from outside only after they are saved in
// StateStore
String diags = null; // don't leave the tracking URL pointing to a non-existent AM
if (conf.getBoolean(YarnConfiguration.APPLICATION_HISTORY_ENABLED,
YarnConfiguration.DEFAULT_APPLICATION_HISTORY_ENABLED)) {
setTrackingUrlToAHSPage(stateToBeStored);
} else {
setTrackingUrlToRMAppPage(stateToBeStored);
}
String finalTrackingUrl = getOriginalTrackingUrl();
FinalApplicationStatus finalStatus = null;
int exitStatus = ContainerExitStatus.INVALID;
switch (event.getType()) {
case LAUNCH_FAILED:
diags = event.getDiagnosticMsg();
break;
case REGISTERED:
diags = getUnexpectedAMRegisteredDiagnostics();
break;
case UNREGISTERED:
RMAppAttemptUnregistrationEvent unregisterEvent =
(RMAppAttemptUnregistrationEvent) event;
diags = unregisterEvent.getDiagnosticMsg();
// reset finalTrackingUrl to url sent by am
finalTrackingUrl = sanitizeTrackingUrl(unregisterEvent.getFinalTrackingUrl());
finalStatus = unregisterEvent.getFinalApplicationStatus();
break;
case CONTAINER_FINISHED:
RMAppAttemptContainerFinishedEvent finishEvent =
(RMAppAttemptContainerFinishedEvent) event;
diags = getAMContainerCrashedDiagnostics(finishEvent);
exitStatus = finishEvent.getContainerStatus().getExitStatus();
break;
case KILL:
break;
case EXPIRE:
diags = getAMExpiredDiagnostics(event);
break;
default:
break;
}
AggregateAppResourceUsage resUsage =
this.attemptMetrics.getAggregateAppResourceUsage();
RMStateStore rmStore = rmContext.getStateStore();
setFinishTime(System.currentTimeMillis()); ApplicationAttemptStateData attemptState =
ApplicationAttemptStateData.newInstance(
applicationAttemptId, getMasterContainer(),
rmStore.getCredentialsFromAppAttempt(this),
startTime, stateToBeStored, finalTrackingUrl, diags,
finalStatus, exitStatus,
getFinishTime(), resUsage.getMemorySeconds(),
resUsage.getVcoreSeconds());
LOG.info("Updating application attempt " + applicationAttemptId
+ " with final state: " + targetedFinalState + ", and exit status: "
+ exitStatus);
rmStore.updateApplicationAttemptState(attemptState);
} private static class FinalSavingTransition extends BaseTransition { Object transitionToDo;
RMAppAttemptState targetedFinalState; public FinalSavingTransition(Object transitionToDo,
RMAppAttemptState targetedFinalState) {
this.transitionToDo = transitionToDo;
this.targetedFinalState = targetedFinalState;
} @Override
public void transition(RMAppAttemptImpl appAttempt, RMAppAttemptEvent event) {
// For cases Killed/Failed, targetedFinalState is the same as the state to
// be stored
appAttempt.rememberTargetTransitionsAndStoreState(event, transitionToDo,
targetedFinalState, targetedFinalState);
}
} private static class FinalStateSavedTransition implements
MultipleArcTransition<RMAppAttemptImpl, RMAppAttemptEvent, RMAppAttemptState> {
@Override
public RMAppAttemptState transition(RMAppAttemptImpl appAttempt,
RMAppAttemptEvent event) {
RMAppAttemptEvent causeEvent = appAttempt.eventCausingFinalSaving; if (appAttempt.transitionTodo instanceof SingleArcTransition) {
((SingleArcTransition) appAttempt.transitionTodo).transition(
appAttempt, causeEvent);
} else if (appAttempt.transitionTodo instanceof MultipleArcTransition) {
((MultipleArcTransition) appAttempt.transitionTodo).transition(
appAttempt, causeEvent);
}
return appAttempt.targetedFinalState;
}
} private static class BaseFinalTransition extends BaseTransition { private final RMAppAttemptState finalAttemptState; public BaseFinalTransition(RMAppAttemptState finalAttemptState) {
this.finalAttemptState = finalAttemptState;
} @Override
public void transition(RMAppAttemptImpl appAttempt,
RMAppAttemptEvent event) {
ApplicationAttemptId appAttemptId = appAttempt.getAppAttemptId(); // Tell the AMS. Unregister from the ApplicationMasterService
appAttempt.masterService.unregisterAttempt(appAttemptId); // Tell the application and the scheduler
ApplicationId applicationId = appAttemptId.getApplicationId();
RMAppEvent appEvent = null;
boolean keepContainersAcrossAppAttempts = false;
switch (finalAttemptState) {
case FINISHED:
{
appEvent =
new RMAppEvent(applicationId, RMAppEventType.ATTEMPT_FINISHED,
appAttempt.getDiagnostics());
}
break;
case KILLED:
{
appAttempt.invalidateAMHostAndPort();
// Forward diagnostics received in attempt kill event.
appEvent =
new RMAppFailedAttemptEvent(applicationId,
RMAppEventType.ATTEMPT_KILLED,
event.getDiagnosticMsg(), false);
}
break;
case FAILED:
{
appAttempt.invalidateAMHostAndPort(); if (appAttempt.submissionContext
.getKeepContainersAcrossApplicationAttempts()
&& !appAttempt.submissionContext.getUnmanagedAM()) {
// See if we should retain containers for non-unmanaged applications
if (!appAttempt.shouldCountTowardsMaxAttemptRetry()) {
// Premption, hardware failures, NM resync doesn't count towards
// app-failures and so we should retain containers.
keepContainersAcrossAppAttempts = true;
} else if (!appAttempt.maybeLastAttempt) {
// Not preemption, hardware failures or NM resync.
// Not last-attempt too - keep containers.
keepContainersAcrossAppAttempts = true;
}
}
appEvent =
new RMAppFailedAttemptEvent(applicationId,
RMAppEventType.ATTEMPT_FAILED, appAttempt.getDiagnostics(),
keepContainersAcrossAppAttempts); }
break;
default:
{
LOG.error("Cannot get this state!! Error!!");
}
break;
} appAttempt.eventHandler.handle(appEvent);
appAttempt.eventHandler.handle(new AppAttemptRemovedSchedulerEvent(
appAttemptId, finalAttemptState, keepContainersAcrossAppAttempts));
appAttempt.removeCredentials(appAttempt); appAttempt.rmContext.getRMApplicationHistoryWriter()
.applicationAttemptFinished(appAttempt, finalAttemptState);
appAttempt.rmContext.getSystemMetricsPublisher()
.appAttemptFinished(appAttempt, finalAttemptState,
appAttempt.rmContext.getRMApps().get(
appAttempt.applicationAttemptId.getApplicationId()),
System.currentTimeMillis());
}
} private static class AMLaunchedTransition extends BaseTransition {
@Override
public void transition(RMAppAttemptImpl appAttempt,
RMAppAttemptEvent event) {
if (event.getType() == RMAppAttemptEventType.LAUNCHED) {
appAttempt.launchAMEndTime = System.currentTimeMillis();
long delay = appAttempt.launchAMEndTime -
appAttempt.launchAMStartTime;
ClusterMetrics.getMetrics().addAMLaunchDelay(delay);
}
// Register with AMLivelinessMonitor
appAttempt.attemptLaunched(); // register the ClientTokenMasterKey after it is saved in the store,
// otherwise client may hold an invalid ClientToken after RM restarts.
if (UserGroupInformation.isSecurityEnabled()) {
appAttempt.rmContext.getClientToAMTokenSecretManager()
.registerApplication(appAttempt.getAppAttemptId(),
appAttempt.getClientTokenMasterKey());
}
}
} @Override
public boolean shouldCountTowardsMaxAttemptRetry() {
try {
this.readLock.lock();
int exitStatus = getAMContainerExitStatus();
return !(exitStatus == ContainerExitStatus.PREEMPTED
|| exitStatus == ContainerExitStatus.ABORTED
|| exitStatus == ContainerExitStatus.DISKS_FAILED
|| exitStatus == ContainerExitStatus.KILLED_BY_RESOURCEMANAGER);
} finally {
this.readLock.unlock();
}
} private static final class UnmanagedAMAttemptSavedTransition
extends AMLaunchedTransition {
@Override
public void transition(RMAppAttemptImpl appAttempt,
RMAppAttemptEvent event) {
// create AMRMToken
appAttempt.amrmToken =
appAttempt.rmContext.getAMRMTokenSecretManager().createAndGetAMRMToken(
appAttempt.applicationAttemptId); super.transition(appAttempt, event);
}
} private static final class LaunchFailedTransition extends BaseFinalTransition { public LaunchFailedTransition() {
super(RMAppAttemptState.FAILED);
} @Override
public void transition(RMAppAttemptImpl appAttempt,
RMAppAttemptEvent event) { // Use diagnostic from launcher
appAttempt.diagnostics.append(event.getDiagnosticMsg()); // Tell the app, scheduler
super.transition(appAttempt, event); }
} private static final class KillAllocatedAMTransition extends
BaseFinalTransition {
public KillAllocatedAMTransition() {
super(RMAppAttemptState.KILLED);
} @Override
public void transition(RMAppAttemptImpl appAttempt,
RMAppAttemptEvent event) { // Tell the application and scheduler
super.transition(appAttempt, event); // Tell the launcher to cleanup.
appAttempt.eventHandler.handle(new AMLauncherEvent(
AMLauncherEventType.CLEANUP, appAttempt)); }
} private static final class AMRegisteredTransition extends BaseTransition {
@Override
public void transition(RMAppAttemptImpl appAttempt,
RMAppAttemptEvent event) {
long delay = System.currentTimeMillis() - appAttempt.launchAMEndTime;
ClusterMetrics.getMetrics().addAMRegisterDelay(delay);
RMAppAttemptRegistrationEvent registrationEvent
= (RMAppAttemptRegistrationEvent) event;
appAttempt.host = registrationEvent.getHost();
appAttempt.rpcPort = registrationEvent.getRpcport();
appAttempt.originalTrackingUrl =
sanitizeTrackingUrl(registrationEvent.getTrackingurl()); // Let the app know
appAttempt.eventHandler.handle(new RMAppEvent(appAttempt
.getAppAttemptId().getApplicationId(),
RMAppEventType.ATTEMPT_REGISTERED)); // TODO:FIXME: Note for future. Unfortunately we only do a state-store
// write at AM launch time, so we don't save the AM's tracking URL anywhere
// as that would mean an extra state-store write. For now, we hope that in
// work-preserving restart, AMs are forced to reregister. appAttempt.rmContext.getRMApplicationHistoryWriter()
.applicationAttemptStarted(appAttempt);
appAttempt.rmContext.getSystemMetricsPublisher()
.appAttemptRegistered(appAttempt, System.currentTimeMillis());
}
} private static final class AMContainerCrashedBeforeRunningTransition extends
BaseFinalTransition { public AMContainerCrashedBeforeRunningTransition() {
super(RMAppAttemptState.FAILED);
} @Override
public void transition(RMAppAttemptImpl appAttempt,
RMAppAttemptEvent event) {
RMAppAttemptContainerFinishedEvent finishEvent =
((RMAppAttemptContainerFinishedEvent)event); // UnRegister from AMLivelinessMonitor
appAttempt.rmContext.getAMLivelinessMonitor().unregister(
appAttempt.getAppAttemptId()); // Setup diagnostic message and exit status
appAttempt.setAMContainerCrashedDiagnosticsAndExitStatus(finishEvent); // Tell the app, scheduler
super.transition(appAttempt, finishEvent);
}
} private void setAMContainerCrashedDiagnosticsAndExitStatus(
RMAppAttemptContainerFinishedEvent finishEvent) {
ContainerStatus status = finishEvent.getContainerStatus();
String diagnostics = getAMContainerCrashedDiagnostics(finishEvent);
this.diagnostics.append(diagnostics);
this.amContainerExitStatus = status.getExitStatus();
} private String getAMContainerCrashedDiagnostics(
RMAppAttemptContainerFinishedEvent finishEvent) {
ContainerStatus status = finishEvent.getContainerStatus();
StringBuilder diagnosticsBuilder = new StringBuilder();
diagnosticsBuilder.append("AM Container for ").append(
finishEvent.getApplicationAttemptId()).append(
" exited with ").append(" exitCode: ").append(status.getExitStatus()).
append("\n");
if (this.getTrackingUrl() != null) {
diagnosticsBuilder.append("For more detailed output,").append(
" check application tracking page:").append(
this.getTrackingUrl()).append(
"Then, click on links to logs of each attempt.\n");
}
diagnosticsBuilder.append("Diagnostics: ").append(status.getDiagnostics())
.append("Failing this attempt");
return diagnosticsBuilder.toString();
} private static class FinalTransition extends BaseFinalTransition { public FinalTransition(RMAppAttemptState finalAttemptState) {
super(finalAttemptState);
} @Override
public void transition(RMAppAttemptImpl appAttempt,
RMAppAttemptEvent event) { appAttempt.progress = 1.0f; // Tell the app and the scheduler
super.transition(appAttempt, event); // UnRegister from AMLivelinessMonitor. Perhaps for
// FAILING/KILLED/UnManaged AMs
appAttempt.rmContext.getAMLivelinessMonitor().unregister(
appAttempt.getAppAttemptId());
appAttempt.rmContext.getAMFinishingMonitor().unregister(
appAttempt.getAppAttemptId()); if(!appAttempt.submissionContext.getUnmanagedAM()) {
// Tell the launcher to cleanup.
appAttempt.eventHandler.handle(new AMLauncherEvent(
AMLauncherEventType.CLEANUP, appAttempt));
}
}
} private static class ExpiredTransition extends FinalTransition { public ExpiredTransition() {
super(RMAppAttemptState.FAILED);
} @Override
public void transition(RMAppAttemptImpl appAttempt,
RMAppAttemptEvent event) {
appAttempt.diagnostics.append(getAMExpiredDiagnostics(event));
super.transition(appAttempt, event);
}
} private static String getAMExpiredDiagnostics(RMAppAttemptEvent event) {
String diag =
"ApplicationMaster for attempt " + event.getApplicationAttemptId()
+ " timed out";
return diag;
} private static class UnexpectedAMRegisteredTransition extends
BaseFinalTransition { public UnexpectedAMRegisteredTransition() {
super(RMAppAttemptState.FAILED);
} @Override
public void transition(RMAppAttemptImpl appAttempt, RMAppAttemptEvent event) {
assert appAttempt.submissionContext.getUnmanagedAM();
appAttempt.diagnostics.append(getUnexpectedAMRegisteredDiagnostics());
super.transition(appAttempt, event);
} } private static String getUnexpectedAMRegisteredDiagnostics() {
return "Unmanaged AM must register after AM attempt reaches LAUNCHED state.";
} private static final class StatusUpdateTransition extends
BaseTransition {
@Override
public void transition(RMAppAttemptImpl appAttempt,
RMAppAttemptEvent event) { RMAppAttemptStatusupdateEvent statusUpdateEvent
= (RMAppAttemptStatusupdateEvent) event; // Update progress
appAttempt.progress = statusUpdateEvent.getProgress(); // Ping to AMLivelinessMonitor
appAttempt.rmContext.getAMLivelinessMonitor().receivedPing(
statusUpdateEvent.getApplicationAttemptId());
}
} private static final class AMUnregisteredTransition implements
MultipleArcTransition<RMAppAttemptImpl, RMAppAttemptEvent, RMAppAttemptState> { @Override
public RMAppAttemptState transition(RMAppAttemptImpl appAttempt,
RMAppAttemptEvent event) {
// Tell the app
if (appAttempt.getSubmissionContext().getUnmanagedAM()) {
// Unmanaged AMs have no container to wait for, so they skip
// the FINISHING state and go straight to FINISHED.
appAttempt.updateInfoOnAMUnregister(event);
new FinalTransition(RMAppAttemptState.FINISHED).transition(
appAttempt, event);
return RMAppAttemptState.FINISHED;
}
// Saving the attempt final state
appAttempt.rememberTargetTransitionsAndStoreState(event,
new FinalStateSavedAfterAMUnregisterTransition(),
RMAppAttemptState.FINISHING, RMAppAttemptState.FINISHED);
ApplicationId applicationId =
appAttempt.getAppAttemptId().getApplicationId(); // Tell the app immediately that AM is unregistering so that app itself
// can save its state as soon as possible. Whether we do it like this, or
// we wait till AppAttempt is saved, it doesn't make any difference on the
// app side w.r.t failure conditions. The only event going out of
// AppAttempt to App after this point of time is AM/AppAttempt Finished.
appAttempt.eventHandler.handle(new RMAppEvent(applicationId,
RMAppEventType.ATTEMPT_UNREGISTERED));
return RMAppAttemptState.FINAL_SAVING;
}
} private static class FinalStateSavedAfterAMUnregisterTransition extends
BaseTransition {
@Override
public void
transition(RMAppAttemptImpl appAttempt, RMAppAttemptEvent event) {
// Unregister from the AMlivenessMonitor and register with AMFinishingMonitor
appAttempt.rmContext.getAMLivelinessMonitor().unregister(
appAttempt.applicationAttemptId);
appAttempt.rmContext.getAMFinishingMonitor().register(
appAttempt.applicationAttemptId); // Do not make any more changes to this transition code. Make all changes
// to the following method. Unless you are absolutely sure that you have
// stuff to do that shouldn't be used by the callers of the following
// method.
appAttempt.updateInfoOnAMUnregister(event);
}
} private void updateInfoOnAMUnregister(RMAppAttemptEvent event) {
progress = 1.0f;
RMAppAttemptUnregistrationEvent unregisterEvent =
(RMAppAttemptUnregistrationEvent) event;
diagnostics.append(unregisterEvent.getDiagnosticMsg());
originalTrackingUrl = sanitizeTrackingUrl(unregisterEvent.getFinalTrackingUrl());
finalStatus = unregisterEvent.getFinalApplicationStatus();
} private static final class ContainerFinishedTransition
implements
MultipleArcTransition<RMAppAttemptImpl, RMAppAttemptEvent, RMAppAttemptState> { // The transition To Do after attempt final state is saved.
private BaseTransition transitionToDo;
private RMAppAttemptState currentState; public ContainerFinishedTransition(BaseTransition transitionToDo,
RMAppAttemptState currentState) {
this.transitionToDo = transitionToDo;
this.currentState = currentState;
} @Override
public RMAppAttemptState transition(RMAppAttemptImpl appAttempt,
RMAppAttemptEvent event) { RMAppAttemptContainerFinishedEvent containerFinishedEvent =
(RMAppAttemptContainerFinishedEvent) event;
ContainerStatus containerStatus =
containerFinishedEvent.getContainerStatus(); // Is this container the AmContainer? If the finished container is same as
// the AMContainer, AppAttempt fails
if (appAttempt.masterContainer != null
&& appAttempt.masterContainer.getId().equals(
containerStatus.getContainerId())) {
appAttempt.sendAMContainerToNM(appAttempt, containerFinishedEvent); // Remember the follow up transition and save the final attempt state.
appAttempt.rememberTargetTransitionsAndStoreState(event,
transitionToDo, RMAppAttemptState.FAILED, RMAppAttemptState.FAILED);
return RMAppAttemptState.FINAL_SAVING;
} // Add all finished containers so that they can be acked to NM
addJustFinishedContainer(appAttempt, containerFinishedEvent);
return this.currentState;
}
} // Ack NM to remove finished containers from context.
private void sendFinishedContainersToNM() {
for (NodeId nodeId : finishedContainersSentToAM.keySet()) { // Clear and get current values
List<ContainerStatus> currentSentContainers =
finishedContainersSentToAM.put(nodeId,
new ArrayList<ContainerStatus>());
List<ContainerId> containerIdList =
new ArrayList<ContainerId>(currentSentContainers.size());
for (ContainerStatus containerStatus : currentSentContainers) {
containerIdList.add(containerStatus.getContainerId());
}
eventHandler.handle(new RMNodeFinishedContainersPulledByAMEvent(nodeId,
containerIdList));
}
} // Add am container to the list so that am container instance will be
// removed from NMContext.
private void sendAMContainerToNM(RMAppAttemptImpl appAttempt,
RMAppAttemptContainerFinishedEvent containerFinishedEvent) {
NodeId nodeId = containerFinishedEvent.getNodeId();
finishedContainersSentToAM.putIfAbsent(nodeId,
new ArrayList<ContainerStatus>());
appAttempt.finishedContainersSentToAM.get(nodeId).add(
containerFinishedEvent.getContainerStatus());
if (!appAttempt.getSubmissionContext()
.getKeepContainersAcrossApplicationAttempts()) {
appAttempt.sendFinishedContainersToNM();
}
} private static void addJustFinishedContainer(RMAppAttemptImpl appAttempt,
RMAppAttemptContainerFinishedEvent containerFinishedEvent) {
appAttempt.justFinishedContainers.putIfAbsent(containerFinishedEvent
.getNodeId(), new ArrayList<ContainerStatus>());
appAttempt.justFinishedContainers.get(containerFinishedEvent
.getNodeId()).add(containerFinishedEvent.getContainerStatus());
} private static final class ContainerFinishedAtFinalStateTransition
extends BaseTransition {
@Override
public void
transition(RMAppAttemptImpl appAttempt, RMAppAttemptEvent event) {
RMAppAttemptContainerFinishedEvent containerFinishedEvent =
(RMAppAttemptContainerFinishedEvent) event; // Normal container. Add it in completed containers list
addJustFinishedContainer(appAttempt, containerFinishedEvent);
}
} private static class AMContainerCrashedAtRunningTransition extends
BaseTransition {
@Override
public void
transition(RMAppAttemptImpl appAttempt, RMAppAttemptEvent event) {
RMAppAttemptContainerFinishedEvent finishEvent =
(RMAppAttemptContainerFinishedEvent) event;
// container associated with AM. must not be unmanaged
assert appAttempt.submissionContext.getUnmanagedAM() == false;
// Setup diagnostic message and exit status
appAttempt.setAMContainerCrashedDiagnosticsAndExitStatus(finishEvent);
new FinalTransition(RMAppAttemptState.FAILED).transition(appAttempt,
event);
}
} private static final class AMFinishingContainerFinishedTransition
implements
MultipleArcTransition<RMAppAttemptImpl, RMAppAttemptEvent, RMAppAttemptState> { @Override
public RMAppAttemptState transition(RMAppAttemptImpl appAttempt,
RMAppAttemptEvent event) { RMAppAttemptContainerFinishedEvent containerFinishedEvent
= (RMAppAttemptContainerFinishedEvent) event;
ContainerStatus containerStatus =
containerFinishedEvent.getContainerStatus(); // Is this container the ApplicationMaster container?
if (appAttempt.masterContainer.getId().equals(
containerStatus.getContainerId())) {
new FinalTransition(RMAppAttemptState.FINISHED).transition(
appAttempt, containerFinishedEvent);
appAttempt.sendAMContainerToNM(appAttempt, containerFinishedEvent);
return RMAppAttemptState.FINISHED;
}
// Add all finished containers so that they can be acked to NM.
addJustFinishedContainer(appAttempt, containerFinishedEvent); return RMAppAttemptState.FINISHING;
}
} private static class ContainerFinishedAtFinalSavingTransition extends
BaseTransition {
@Override
public void
transition(RMAppAttemptImpl appAttempt, RMAppAttemptEvent event) {
RMAppAttemptContainerFinishedEvent containerFinishedEvent =
(RMAppAttemptContainerFinishedEvent) event;
ContainerStatus containerStatus =
containerFinishedEvent.getContainerStatus(); // If this is the AM container, it means the AM container is finished,
// but we are not yet acknowledged that the final state has been saved.
// Thus, we still return FINAL_SAVING state here.
if (appAttempt.masterContainer.getId().equals(
containerStatus.getContainerId())) {
appAttempt.sendAMContainerToNM(appAttempt, containerFinishedEvent); if (appAttempt.targetedFinalState.equals(RMAppAttemptState.FAILED)
|| appAttempt.targetedFinalState.equals(RMAppAttemptState.KILLED)) {
// ignore Container_Finished Event if we were supposed to reach
// FAILED/KILLED state.
return;
} // pass in the earlier AMUnregistered Event also, as this is needed for
// AMFinishedAfterFinalSavingTransition later on
appAttempt.rememberTargetTransitions(event,
new AMFinishedAfterFinalSavingTransition(
appAttempt.eventCausingFinalSaving), RMAppAttemptState.FINISHED);
return;
} // Add all finished containers so that they can be acked to NM.
addJustFinishedContainer(appAttempt, containerFinishedEvent);
}
} private static class AMFinishedAfterFinalSavingTransition extends
BaseTransition {
RMAppAttemptEvent amUnregisteredEvent;
public AMFinishedAfterFinalSavingTransition(
RMAppAttemptEvent amUnregisteredEvent) {
this.amUnregisteredEvent = amUnregisteredEvent;
} @Override
public void
transition(RMAppAttemptImpl appAttempt, RMAppAttemptEvent event) {
appAttempt.updateInfoOnAMUnregister(amUnregisteredEvent);
new FinalTransition(RMAppAttemptState.FINISHED).transition(appAttempt,
event);
}
} private static class AMExpiredAtFinalSavingTransition extends
BaseTransition {
@Override
public void
transition(RMAppAttemptImpl appAttempt, RMAppAttemptEvent event) {
if (appAttempt.targetedFinalState.equals(RMAppAttemptState.FAILED)
|| appAttempt.targetedFinalState.equals(RMAppAttemptState.KILLED)) {
// ignore Container_Finished Event if we were supposed to reach
// FAILED/KILLED state.
return;
} // pass in the earlier AMUnregistered Event also, as this is needed for
// AMFinishedAfterFinalSavingTransition later on
appAttempt.rememberTargetTransitions(event,
new AMFinishedAfterFinalSavingTransition(
appAttempt.eventCausingFinalSaving), RMAppAttemptState.FINISHED);
}
} @Override
public long getStartTime() {
this.readLock.lock();
try {
return this.startTime;
} finally {
this.readLock.unlock();
}
} @Override
public RMAppAttemptState getState() {
this.readLock.lock(); try {
return this.stateMachine.getCurrentState();
} finally {
this.readLock.unlock();
}
} @Override
public YarnApplicationAttemptState createApplicationAttemptState() {
RMAppAttemptState state = getState();
// If AppAttempt is in FINAL_SAVING state, return its previous state.
if (state.equals(RMAppAttemptState.FINAL_SAVING)) {
state = stateBeforeFinalSaving;
}
return RMServerUtils.createApplicationAttemptState(state);
} private void launchAttempt(){
launchAMStartTime = System.currentTimeMillis();
// Send event to launch the AM Container
eventHandler.handle(new AMLauncherEvent(AMLauncherEventType.LAUNCH, this));
} private void attemptLaunched() {
// Register with AMLivelinessMonitor
rmContext.getAMLivelinessMonitor().register(getAppAttemptId());
} private void storeAttempt() {
// store attempt data in a non-blocking manner to prevent dispatcher
// thread starvation and wait for state to be saved
LOG.info("Storing attempt: AppId: " +
getAppAttemptId().getApplicationId()
+ " AttemptId: " +
getAppAttemptId()
+ " MasterContainer: " + masterContainer);
rmContext.getStateStore().storeNewApplicationAttempt(this);
} private void removeCredentials(RMAppAttemptImpl appAttempt) {
// Unregister from the ClientToAMTokenSecretManager
if (UserGroupInformation.isSecurityEnabled()) {
appAttempt.rmContext.getClientToAMTokenSecretManager()
.unRegisterApplication(appAttempt.getAppAttemptId());
} // Remove the AppAttempt from the AMRMTokenSecretManager
appAttempt.rmContext.getAMRMTokenSecretManager()
.applicationMasterFinished(appAttempt.getAppAttemptId());
} private static String sanitizeTrackingUrl(String url) {
return (url == null || url.trim().isEmpty()) ? "N/A" : url;
} @Override
public ApplicationAttemptReport createApplicationAttemptReport() {
this.readLock.lock();
ApplicationAttemptReport attemptReport = null;
try {
// AM container maybe not yet allocated. and also unmangedAM doesn't have
// am container.
ContainerId amId =
masterContainer == null ? null : masterContainer.getId();
attemptReport = ApplicationAttemptReport.newInstance(this
.getAppAttemptId(), this.getHost(), this.getRpcPort(), this
.getTrackingUrl(), this.getOriginalTrackingUrl(), this.getDiagnostics(),
YarnApplicationAttemptState .valueOf(this.getState().toString()), amId);
} finally {
this.readLock.unlock();
}
return attemptReport;
} // for testing
public boolean mayBeLastAttempt() {
return maybeLastAttempt;
} @Override
public RMAppAttemptMetrics getRMAppAttemptMetrics() {
// didn't use read/write lock here because RMAppAttemptMetrics has its own
// lock
return attemptMetrics;
} @Override
public long getFinishTime() {
try {
this.readLock.lock();
return this.finishTime;
} finally {
this.readLock.unlock();
}
} private void setFinishTime(long finishTime) {
try {
this.writeLock.lock();
this.finishTime = finishTime;
} finally {
this.writeLock.unlock();
}
}
}
RMAppAttemptImpl.java
RMAppAttemptImpl.java 在 hadoop-2.7.3-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
RMAppAttemptImpl接受RMAppAttemptEventType.START事件后,进行一系列初始化工作。 在类RMAppAttemptImpl中,也有状态机工厂。 同前面的分析类似, 在RMAppAttemptImpl类的handle()函数如下所示:
// RMAppAttemptImpl.java
@Override
public void handle(RMAppAttemptEvent event) { this.writeLock.lock(); try {
ApplicationAttemptId appAttemptID = event.getApplicationAttemptId();
LOG.debug("Processing event for " + appAttemptID + " of type "
+ event.getType());
final RMAppAttemptState oldState = getAppAttemptState();
try {
/* keep the master in sync with the state machine */
this.stateMachine.doTransition(event.getType(), event);
} catch (InvalidStateTransitonException e) {
LOG.error("Can't handle this event at current state", e);
/* TODO fail the application on the failed transition */
} if (oldState != getAppAttemptState()) {
LOG.info(appAttemptID + " State change from " + oldState + " to "
+ getAppAttemptState());
}
} finally {
this.writeLock.unlock();
}
}
handle()函数内部调用 this.stateMachine.doTransition(event.getType(), event), 其中this.stateMachine 在 RMAppAttemptImpl类的构造函数中进行初始化, 如下所示:
//RMAppAttemptImpl.java
public RMAppAttemptImpl(ApplicationAttemptId appAttemptId,
RMContext rmContext, YarnScheduler scheduler,
ApplicationMasterService masterService,
ApplicationSubmissionContext submissionContext,
Configuration conf, boolean maybeLastAttempt, ResourceRequest amReq) {
this.conf = conf;
this.applicationAttemptId = appAttemptId;
this.rmContext = rmContext;
this.eventHandler = rmContext.getDispatcher().getEventHandler();
this.submissionContext = submissionContext;
this.scheduler = scheduler;
this.masterService = masterService; ReentrantReadWriteLock lock = new ReentrantReadWriteLock();
this.readLock = lock.readLock();
this.writeLock = lock.writeLock(); this.proxiedTrackingUrl = generateProxyUriWithScheme();
this.maybeLastAttempt = maybeLastAttempt;
this.stateMachine = stateMachineFactory.make(this); this.attemptMetrics =
new RMAppAttemptMetrics(applicationAttemptId, rmContext); this.amReq = amReq;
}
this.stateMachine = stateMachineFactory.make(this), 会触发 StateMachineFactory类进行状态转换, 如下所示:
//RMAppAttemptImpl.java
private static final StateMachineFactory<RMAppAttemptImpl,
RMAppAttemptState,
RMAppAttemptEventType,
RMAppAttemptEvent>
stateMachineFactory = new StateMachineFactory<RMAppAttemptImpl,
RMAppAttemptState,
RMAppAttemptEventType,
RMAppAttemptEvent>(RMAppAttemptState.NEW) // Transitions from NEW State
.addTransition(RMAppAttemptState.NEW, RMAppAttemptState.SUBMITTED,
RMAppAttemptEventType.START, new AttemptStartedTransition())
......
.installTopology();
接受RMAppAttemptEventType.START事件,将自身状态由RMAppAttemptState.NEW转换为RMAppAttemptState.SUBMITTED,并调用AttemptStartedTransition。 如下所示:
public enum RMAppAttemptState {
NEW, SUBMITTED, SCHEDULED, ALLOCATED, LAUNCHED, FAILED, RUNNING, FINISHING,
FINISHED, KILLED, ALLOCATED_SAVING, LAUNCHED_UNMANAGED_SAVING, FINAL_SAVING
}
RMAppAttemptState.java
RMAppAttemptState.java 在 hadoop-2.7.3-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptState.java
//RMAppAttemptImpl.java 的内部类 AttemptStartedTransition
private static final class AttemptStartedTransition extends BaseTransition {
@Override
public void transition(RMAppAttemptImpl appAttempt,
RMAppAttemptEvent event) { boolean transferStateFromPreviousAttempt = false;
if (event instanceof RMAppStartAttemptEvent) {
transferStateFromPreviousAttempt =
((RMAppStartAttemptEvent) event)
.getTransferStateFromPreviousAttempt();
}
appAttempt.startTime = System.currentTimeMillis(); // Register with the ApplicationMasterService
appAttempt.masterService
.registerAppAttempt(appAttempt.applicationAttemptId); if (UserGroupInformation.isSecurityEnabled()) {
appAttempt.clientTokenMasterKey =
appAttempt.rmContext.getClientToAMTokenSecretManager()
.createMasterKey(appAttempt.applicationAttemptId);
} // Add the applicationAttempt to the scheduler and inform the scheduler
// whether to transfer the state from previous attempt.
appAttempt.eventHandler.handle(new AppAttemptAddedSchedulerEvent(
appAttempt.applicationAttemptId, transferStateFromPreviousAttempt));
}
}
最后会调用appAttempt.eventHandler.handle(new AppAttemptAddedSchedulerEvent( appAttempt.applicationAttemptId, transferStateFromPreviousAttempt)), 会调用AppAttemptAddedSchedulerEvent类。如下所示:
//AppAttemptAddedSchedulerEvent.java
public AppAttemptAddedSchedulerEvent(
ApplicationAttemptId applicationAttemptId,
boolean transferStateFromPreviousAttempt) {
this(applicationAttemptId, transferStateFromPreviousAttempt, false);
} public AppAttemptAddedSchedulerEvent(
ApplicationAttemptId applicationAttemptId,
boolean transferStateFromPreviousAttempt,
boolean isAttemptRecovering) {
super(SchedulerEventType.APP_ATTEMPT_ADDED);
this.applicationAttemptId = applicationAttemptId;
this.transferStateFromPreviousAttempt = transferStateFromPreviousAttempt;
this.isAttemptRecovering = isAttemptRecovering;
}
这里最后发送了一个SchedulerEventType.APP_ATTEMPT_ADDED事件。 其中SchedulerEventType类型事件是在ResourceManager类的内部类RMActiveServices的serviceInit()函数中进行绑定的,将SchedulerEventType类型的事件绑定到了EventHandler<SchedulerEvent> 的对象schedulerDispatcher上, 同上面的分析一样, 最终CapacityScheduler收到SchedulerEventType.APP_ATTEMPT_ADDED事件后,走到默认调度器CapacityScheduler中的handle()函数。 如下所示:
//CapacityScheduler.java
public void handle(SchedulerEvent event) {
switch(event.getType()) {
......
case APP_ATTEMPT_ADDED:
{
AppAttemptAddedSchedulerEvent appAttemptAddedEvent =
(AppAttemptAddedSchedulerEvent) event;
addApplicationAttempt(appAttemptAddedEvent.getApplicationAttemptId(),
appAttemptAddedEvent.getTransferStateFromPreviousAttempt(),
appAttemptAddedEvent.getIsAttemptRecovering());
}
break;
......
}
}
我们可以看到, 调用了addApplicationAttempt() 函数, 进入函数addApplicationAttempt(), 如下所示:
//CapacityScheduler.java
private synchronized void addApplicationAttempt(
ApplicationAttemptId applicationAttemptId,
boolean transferStateFromPreviousAttempt,
boolean isAttemptRecovering) {
SchedulerApplication<FiCaSchedulerApp> application =
applications.get(applicationAttemptId.getApplicationId());
if (application == null) {
LOG.warn("Application " + applicationAttemptId.getApplicationId() +
" cannot be found in scheduler.");
return;
}
CSQueue queue = (CSQueue) application.getQueue(); FiCaSchedulerApp attempt =
new FiCaSchedulerApp(applicationAttemptId, application.getUser(),
queue, queue.getActiveUsersManager(), rmContext);
if (transferStateFromPreviousAttempt) {
attempt.transferStateFromPreviousAttempt(application
.getCurrentAppAttempt());
}
application.setCurrentAppAttempt(attempt); queue.submitApplicationAttempt(attempt, application.getUser());
LOG.info("Added Application Attempt " + applicationAttemptId
+ " to scheduler from user " + application.getUser() + " in queue "
+ queue.getQueueName());
if (isAttemptRecovering) {
if (LOG.isDebugEnabled()) {
LOG.debug(applicationAttemptId
+ " is recovering. Skipping notifying ATTEMPT_ADDED");
}
} else {
rmContext.getDispatcher().getEventHandler().handle(
new RMAppAttemptEvent(applicationAttemptId,
RMAppAttemptEventType.ATTEMPT_ADDED));
}
}
其中addApplicationAttempt() 函数的24和25行是将运行实例加入到队列中, 并打印:例如 Added Application Attempt appattempt_1487944669971_0001_000001 to scheduler from user root in queue default
34~36行是发送事件RMAppAttemptEventType.ATTEMPT_ADDED给RMAppAttemptImpl。 具体分析同上, RMAppAttemptEventType类型事件是在ResourceManager类的内部类RMActiveServices的serviceInit()函数中进行绑定的,将RMAppAttemptEventType类型的事件绑定到了内部类ApplicationAttemptEventDispatcher, 该类内部会调用rmAppAttempt.handle(event), 即RMAppAttemptImpl类的handle()函数, 函数内部会调用this.stateMachine.doTransition(event.getType(), event), 触发StateMachineFactory类的转换事件, 我们知道,上一步的状态是RMAppAttemptState.SUBMITTED, 如下所示:
//RMAppAttemptImpl.java
private static final StateMachineFactory<RMAppAttemptImpl,
RMAppAttemptState,
RMAppAttemptEventType,
RMAppAttemptEvent>
stateMachineFactory = new StateMachineFactory<RMAppAttemptImpl,
RMAppAttemptState,
RMAppAttemptEventType,
RMAppAttemptEvent>(RMAppAttemptState.NEW) ......
// Transitions from SUBMITTED state
.addTransition(RMAppAttemptState.SUBMITTED,
EnumSet.of(RMAppAttemptState.LAUNCHED_UNMANAGED_SAVING,
RMAppAttemptState.SCHEDULED),
RMAppAttemptEventType.ATTEMPT_ADDED,
new ScheduleTransition()) ......
.installTopology();
将自身状态由RMAppAttemptState.SUBMITTED转换为EnumSet.of(RMAppAttemptState.LAUNCHED_UNMANAGED_SAVING, RMAppAttemptState.SCHEDULED),并调用ScheduleTransition。 如下所示:
// RMAppAttemptImpl.java 的内部类 ScheduleTransition
@VisibleForTesting
public static final class ScheduleTransition
implements
MultipleArcTransition<RMAppAttemptImpl, RMAppAttemptEvent, RMAppAttemptState> {
@Override
public RMAppAttemptState transition(RMAppAttemptImpl appAttempt,
RMAppAttemptEvent event) {
ApplicationSubmissionContext subCtx = appAttempt.submissionContext;
if (!subCtx.getUnmanagedAM()) {
// Need reset #containers before create new attempt, because this request
// will be passed to scheduler, and scheduler will deduct the number after
// AM container allocated // Currently, following fields are all hard code,
// TODO: change these fields when we want to support
// priority/resource-name/relax-locality specification for AM containers
// allocation.
appAttempt.amReq.setNumContainers(1);
appAttempt.amReq.setPriority(AM_CONTAINER_PRIORITY);
appAttempt.amReq.setResourceName(ResourceRequest.ANY);
appAttempt.amReq.setRelaxLocality(true); // AM resource has been checked when submission
Allocation amContainerAllocation =
appAttempt.scheduler.allocate(appAttempt.applicationAttemptId,
Collections.singletonList(appAttempt.amReq),
EMPTY_CONTAINER_RELEASE_LIST, null, null);
if (amContainerAllocation != null
&& amContainerAllocation.getContainers() != null) {
assert (amContainerAllocation.getContainers().size() == 0);
}
return RMAppAttemptState.SCHEDULED;
} else {
// save state and then go to LAUNCHED state
appAttempt.storeAttempt();
return RMAppAttemptState.LAUNCHED_UNMANAGED_SAVING;
}
}
}
为AM申请Container资源,该资源描述如19~22行, 即一个优先级为AM_CONTAINER_PRIORITY(值为0),可在任意节点上ResourceRequest.ANY。
该函数内部调用 Allocation amContainerAllocation = appAttempt.scheduler.allocate(appAttempt.applicationAttemptId, Collections.singletonList(appAttempt.amReq), EMPTY_CONTAINER_RELEASE_LIST, null, null), 即去scheduler里面执行allocate 然后会返回一个Allocation对象,会等NodeManager去heartBeat的时候,ResourceManager发现这个NM还有资源, 然后就assign这个Allocation到这个NM上面, 再去Launch AM 。
这里的allocate()即是接口 YarnScheduler.java 的allocate()函数, 我们前面已经分析过, 默认调度器类是 CapacityScheduler, 所以最终调用的是 接口YarnScheduler 的 实现类CapacityScheduler.java 的allocate()函数。如下所示:
//CapacityScheduler.java
@Override
@Lock(Lock.NoLock.class)
public Allocation allocate(ApplicationAttemptId applicationAttemptId,
List<ResourceRequest> ask, List<ContainerId> release,
List<String> blacklistAdditions, List<String> blacklistRemovals) { //
FiCaSchedulerApp application = getApplicationAttempt(applicationAttemptId);
if (application == null) {
LOG.info("Calling allocate on removed " +
"or non existant application " + applicationAttemptId);
return EMPTY_ALLOCATION;
} // Sanity check
//完整性检查
SchedulerUtils.normalizeRequests(
ask, getResourceCalculator(), getClusterResource(),
getMinimumResourceCapability(), getMaximumResourceCapability()); // Release containers
//释放容器
releaseContainers(release, application); synchronized (application) { // make sure we aren't stopping/removing the application
// when the allocate comes in
if (application.isStopped()) {
LOG.info("Calling allocate on a stopped " +
"application " + applicationAttemptId);
return EMPTY_ALLOCATION;
} if (!ask.isEmpty()) { if(LOG.isDebugEnabled()) {
LOG.debug("allocate: pre-update" +
" applicationAttemptId=" + applicationAttemptId +
" application=" + application);
}
application.showRequests(); // Update application requests
application.updateResourceRequests(ask); LOG.debug("allocate: post-update");
application.showRequests();
} if(LOG.isDebugEnabled()) {
LOG.debug("allocate:" +
" applicationAttemptId=" + applicationAttemptId +
" #ask=" + ask.size());
} application.updateBlacklist(blacklistAdditions, blacklistRemovals); return application.getAllocation(getResourceCalculator(),
clusterResource, getMinimumResourceCapability());
}
}
最后会调用application.getAllocation(getResourceCalculator(), clusterResource, getMinimumResourceCapability()), 看getAllocation()函数,如下所示:
//FiCaSchedulerApp.java
/**
* This method produces an Allocation that includes the current view
* of the resources that will be allocated to and preempted from this
* application.
*
* @param rc
* @param clusterResource
* @param minimumAllocation
* @return an allocation
*/
public synchronized Allocation getAllocation(ResourceCalculator rc,
Resource clusterResource, Resource minimumAllocation) { Set<ContainerId> currentContPreemption = Collections.unmodifiableSet(
new HashSet<ContainerId>(containersToPreempt));
containersToPreempt.clear();
Resource tot = Resource.newInstance(0, 0);
for(ContainerId c : currentContPreemption){
Resources.addTo(tot,
liveContainers.get(c).getContainer().getResource());
}
int numCont = (int) Math.ceil(
Resources.divide(rc, clusterResource, tot, minimumAllocation));
ResourceRequest rr = ResourceRequest.newInstance(
Priority.UNDEFINED, ResourceRequest.ANY,
minimumAllocation, numCont);
ContainersAndNMTokensAllocation allocation =
pullNewlyAllocatedContainersAndNMTokens();
Resource headroom = getHeadroom();
setApplicationHeadroomForMetrics(headroom);
return new Allocation(allocation.getContainerList(), headroom, null,
currentContPreemption, Collections.singletonList(rr),
allocation.getNMTokenList());
}
该方法会继续调用 ContainersAndNMTokensAllocation allocation = pullNewlyAllocatedContainersAndNMTokens(), 如下所示:
//SchedulerApplicationAttempt.java
// Create container token and NMToken altogether, if either of them fails for
// some reason like DNS unavailable, do not return this container and keep it
// in the newlyAllocatedContainers waiting to be refetched.
public synchronized ContainersAndNMTokensAllocation
pullNewlyAllocatedContainersAndNMTokens() {
List<Container> returnContainerList =
new ArrayList<Container>(newlyAllocatedContainers.size());
List<NMToken> nmTokens = new ArrayList<NMToken>();
for (Iterator<RMContainer> i = newlyAllocatedContainers.iterator(); i
.hasNext();) {
RMContainer rmContainer = i.next();
Container container = rmContainer.getContainer();
try {
// create container token and NMToken altogether.
container.setContainerToken(rmContext.getContainerTokenSecretManager()
.createContainerToken(container.getId(), container.getNodeId(),
getUser(), container.getResource(), container.getPriority(),
rmContainer.getCreationTime(), this.logAggregationContext));
NMToken nmToken =
rmContext.getNMTokenSecretManager().createAndGetNMToken(getUser(),
getApplicationAttemptId(), container);
if (nmToken != null) {
nmTokens.add(nmToken);
}
} catch (IllegalArgumentException e) {
// DNS might be down, skip returning this container.
LOG.error("Error trying to assign container token and NM token to" +
" an allocated container " + container.getId(), e);
continue;
}
returnContainerList.add(container);
i.remove();
rmContainer.handle(new RMContainerEvent(rmContainer.getContainerId(),
RMContainerEventType.ACQUIRED));
}
return new ContainersAndNMTokensAllocation(returnContainerList, nmTokens);
}
到这一步就遇到瓶颈,追踪中断,参考 参考1 参考2 参考3 分析如下:
我们使用的是 Capacity 调度器,CapacityScheduler.allocate() 方法的主要做两件事情:
- 调用 FicaSchedulerApp.updateResourceRequests() 更新 APP (指从调度器角度看的 APP) 的资源需求。
- 通过 FicaSchedulerApp.pullNewlyAllocatedContainersAndNMTokens() 把 FicaSchedulerApp.newlyAllocatedContainers 这个 List 中的Container取出来,封装后返回。
FicaSchedulerApp.newlyAllocatedContainers 这个数据结构中存放的,正是最近申请到的 Container 。那么,这个 List 中的元素是怎么来的呢,这要从 NM 的心跳说起。
也即此刻,某个node(称为“AM-NODE”)正好通过heartbeat向ResourceManager.ResourceTrackerService汇报自己所在节点的资源使用情况。
所以去NodeManager.java 中分析。 参考1 参考2 参考3 参考4
以下的分析先暂停,因为任务要求要分析调度器的源码。