实时数据采集:整合Flume和Kafka

目录

一、环境条件

两台服务器:CAD01-ubuntu、CAD02-ubuntu
1)CAD01-ubuntu
Flume(版本:1.8.0)
Zookeeper(版本:3.4.10)
Kafka(版本:2.4.0):其中kafka已经配置好且启动,并有hello_topic的主题
2)CAD02-ubuntu
Flume(版本:1.8.0)

二、业务流程

实时数据采集:整合Flume和Kafka

三、配置Flume

1)CAD02-ubuntu:exec-memory-avro.conf

#agent名字a1,source名字r1,sink名字k1,channel名字c1
a1.sources = r1
a1.sinks = k1
a1.channels = c1
#source配置
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /home/gxx/test.txt
#sink配置
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = CAD01-ubuntu
a1.sinks.k1.port = 12364
#channel配置
a1.channels.c1.type = memory
#连接
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

2)CAD02-ubuntu:avro-memory-kafka.conf

#agent名字a1,source名字r1,sink名字k1,channel名字c1
a1.sources = r1
a1.sinks = k1
a1.channels = c1
#r1配置
a1.sources.r1.type = avro
a1.sources.r1.bind = CAD01-ubuntu
a1.sources.r1.port = 12364
#sink配置
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.kafka.bootstrap.servers = CAD01-ubuntu:9092
a1.sinks.k1.kafka.topic = hello_topic
a1.sinks.k1.flumeBatchSize = 5
#channel配置
a1.channels.c1.type = memory
#连接
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

四、启动Flume、Kafka与测试

1)启动Zookeeper、Kafka、创建主题
可参考: Kafka环境搭建–单节点单broker.
2)启动Flume
先启动CAD01-ubuntu,后启动CAD02-ubuntu

flume-ng agent --name a1 --conf /home/gxx/apache-flume-1.8.0-bin/conf --conf-file /home/gxx/apache-flume-1.8.0-bin/conf/avro-memory-kafka.conf -Dflume.root.logger=INFO,console
flume-ng agent --name a1 --conf /home/gxx/apache-flume-1.8.0-bin/conf --conf-file /home/gxx/apache-flume-1.8.0-bin/conf/exec-memory-avro.conf -Dflume.root.logger=INFO,console

3)kafka消费者启动

$KAFKA_HOME/bin/kafka-console-consumer.sh --bootstrap-server CAD01-ubuntu:9092 --topic hello_topic --from-beginning

4)测试
往/home/gxx/test.txt写数据,看是否,kafka消费者可以接收到,可多写几条

echo spark>>/home/gxx/test.txt
上一篇:Illustrator(AI)利用旋转工具设计制作六个丰富漂亮的图案实例教程


下一篇:C#中override和overload的区别