Hadoop WordCount案例

新建文件

  • java文件夹下的com.syh中新建一个java文件
    Hadoop WordCount案例Hadoop WordCount案例
  • WordCount.java中写入
package com.syh;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
/**
 * 词频统计
 */
public class WordCountApp {
    /**
     * map 阶段
     */
    public static class MyMapper extends Mapper<LongWritable, Text, Text, LongWritable> {
        LongWritable one = new LongWritable(1);
        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            // 分
            String line = value.toString();
            // 拆分
            String[] s = line.split(" ");
            for (String word : s) {
                // 输出
                context.write(new Text(word), one);
            }
        }
    }
    /**
     * reduce 阶段
     */
    public static class MyReducer extends Reducer<Text, LongWritable, Text, LongWritable> {
        @Override
        protected void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException {
            long sum = 0;
            // 合并统计
            for (LongWritable value : values) {
                // 求和
                sum += value.get();
            }
            context.write(key, new LongWritable(sum));
        }
    }
    public static void main(String[] args) throws Exception {
        Configuration configuration = new Configuration();
        Job job = Job.getInstance(configuration, "wordcount");
        job.setJarByClass(WordCountApp.class);
        // 设置 map 相关参数
        FileInputFormat.setInputPaths(job, new Path(args[0]));
        job.setMapperClass(MyMapper.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(LongWritable.class);
        // 设置 reduce 相关参数
        job.setReducerClass(MyReducer.class);
        job.setOutputKeyClass(MyReducer.class);
        job.setOutputValueClass(LongWritable.class);
        Path outPath = new Path(args[1]);
        FileSystem fileSystem = FileSystem.get(configuration);
        if (fileSystem.exists(outPath)) {
            // 删除文件
            fileSystem.delete(outPath, true);
            System.out.println("输出路径已存在, 已被删除");
        }
        FileOutputFormat.setOutputPath(job, outPath);
        // 控制台输出详细信息
        // 输出:1  不输出:0
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

打包程序

  • Maven -> hadoopdemo -> Lifecycle -> package
    Hadoop WordCount案例Hadoop WordCount案例
  • 点击package后开始打包
    Hadoop WordCount案例Hadoop WordCount案例

上传jar包

  • 将打包好的jar拖入到虚拟机中
    Hadoop WordCount案例Hadoop WordCount案例
    Hadoop WordCount案例Hadoop WordCount案例
  • 通过shell方式将输出文件夹删除
hadoop fs -rm -r /output/wc
  • 上传到用户目录lib文件夹下进行操作
语法:
hadoop jar 主函数全限定名 输入 输出
示例:
hadoop jar hadoopdemo-1.0-SNAPSHOT.jar com.syh.WordCountApp hdfs://hadoop000:8020/WordCount.txt hdfs://hadoop000:8020/output/wc
  • 完成作业
    Hadoop WordCount案例Hadoop WordCount案例

查看统计结果

  • 通过shell方式查看
hadoop fs -cat /output/wc/part-r-00000

Hadoop WordCount案例

上一篇:PyTorch 编辑器的选择及配置


下一篇:冬季实战营第二期:Linux操作系统实战入门 学习报告