03_MapReduce框架原理_3.2 Job提交流程(源码)

Hadoop 2. Job提交流程(源码)1. 客户端 执行Driver类的main方法2. var configuration = new Configuration读取配置文件 Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml3. val bool: Boolean = job.waitForCompletion(true)提交job到集群,并且等待他完成4. submit()5. connect()建立连接,获取集群代理对象提交Jobreturn new Cluster(getConfiguration())读取配置文件,建立集群代理initialize(jobTrackAddr, conf)判断是本地运行环境 还是 yarn集群运行环境6. return submitter.submitJobInternal(Job.this, cluster)提交Job 到 指定集群1. Path jobStagingArea = JobSubmissionFiles.getStagingDir(cluster, conf)在指定集群中 创建staging(暂存)目录,并返回路径 示例 : file:/tmp/hadoop/mapred/staging/dxm1446706250/.staging2. JobID jobId = submitClient.getNewJobID()获取JobID3. Path submitJobDir = new Path(jobStagingArea, jobId.toString());根据Jobid 创建Job提交路径 示例 : file:/tmp/hadoop/mapred/staging/dxm870750042/.staging/job_local870750042_00014. copyAndConfigureFiles(job, submitJobDir);上传 configure files, libjars, jobjars, and archives pertaining(相关文档) 到指定路径rUploader.uploadResources(job, jobSubmitDir)5. int maps = writeSplits(job, submitJobDir)根据输入文件,计算切片,并生成切片规划文件,并上传到stag路径 job.split job.splitmetainfo6. writeConf(Configuration conf, Path jobFile)上传 job.xml 到stag路径conf.writeXml(out)7. status = submitClient.submitJob( jobId, submitJobDir.toString(), job.getCredentials())提交job,并返回提交状态8. return isSuccessful()Job完成,返回 true
点击查看代码

上一篇:浅谈Hadoop的应用


下一篇:java-泽西(Jersey)中的JSON ArrayList