一、一个例子
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
# create sc with two working threads
sc = SparkContext("local[2]","test")
# create local StreamingContext with batch interval of 1 second
ssc = StreamingContext(sc,1)
# create DStream that connects to localhost:9999
lines = ssc.socketTextStream("localhost",9999)
words = lines.flatMap(lambda line: line.split(" "))
pairs = words.map(lambda x: (x,1))
wordcount = pairs.reduceByKey(lambda x,y: x+y)
# 打印DStream里每个RDD的前10个元素
wordcount.pprint()
ssc.start()
ssc.awaitTermination()
运行过程:
1、linux 首先查看9999端口是否已经使用
netstat -ntpl | grep 9999
2、开启999端口
nc -lk 9999
如果在win10,使用
nc -l -p 9999
3、在新的窗口运行脚本,在之前的窗口输入字符串,在新窗口查看打印输出
-------------------------------------------
Time: 2021-10-21 15:49:17
-------------------------------------------
('kaka', 2)
('tt', 1)