一、OutputFormat
OutputFormat描述的是MapReduce的输出格式,它主要的任务是:
1.验证job输出格式的有效性,如:检查输出的目录是否存在。
2.通过实现RecordWriter,将输出的结果写到文件系统的文件中。
OutputFormat的主要是由三个抽象方法组成,下面根据源代码介绍每个方法的功能,源代码详解如下:
public abstract class OutputFormat<K, V> { /**
* Get the {@link RecordWriter} for the given task.
* 得到给定任务的K-V对,即RecordWriter。
* @param context the information about the current task.
* @return a {@link RecordWriter} to write the output for the job.
* @throws IOException
*/
public abstract RecordWriter<K, V> getRecordWriter(TaskAttemptContext context)
throws IOException, InterruptedException; /**
* Check for validity of the output-specification for the job.
* 为job检查输出格式的有效性。
* <p>This is to validate the output specification for the job when it is
* a job is submitted. Typically checks that it does not already exist,
* throwing an exception when it already exists, so that output is not
* overwritten.</p>
* 这里,当job被提交时验证输出格式。实际上检查输出目录是否已经存在,当存在时抛出exception。
* 以至于原来的输出不会被覆盖。
* @param context information about the job
* @throws IOException when output should not be attempted
*/
public abstract void checkOutputSpecs(JobContext context) throws IOException, InterruptedException; /**
* Get the output committer for this output format. This is responsible
* for ensuring the output is committed correctly.
* 获得一个OutPutCommitter对象。这是用来确保输出被正确的提交。
* @param context the task context
* @return an output committer
* @throws IOException
* @throws InterruptedException
*/
public abstract OutputCommitter getOutputCommitter(TaskAttemptContext context)
throws IOException, InterruptedException;
}