使用流处理实现WordCount,代码如下:
1 package com.jy.bjz.wc; 2 3 import org.apache.flink.api.java.tuple.Tuple2; 4 import org.apache.flink.streaming.api.datastream.DataStream; 5 import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; 6 7 /** 8 * TODO 9 * 10 * @author baojiazhong 11 * @since 2021/9/8 14:30 12 */ 13 public class StreamWordCount { 14 public static void main(String[] args) throws Exception { 15 // 创建流处理的执行环境 16 StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); 17 18 DataStream<String> inputDataStream = env.socketTextStream("10.0.4.42", 9999); 19 20 DataStream<Tuple2<String, Integer>> resultStream = inputDataStream.flatMap(new WordCount.myFlatMapper()) 21 .keyBy(0) 22 .sum(1); 23 24 resultStream.print().setParallelism(1); 25 26 env.execute(); 27 } 28 }
流处理和批处理的存在一些区别:
1.执行环境不同(StreamExecutionEnvironment)
2.获取数据的方式不同,流数据是*流数据,一般不可能直接从文件中读取,通过 socket 或者mq之类
3.不能使用group by
4.setParallelism(1) 设置并行度为1,不然本地输出会出现多线程编号