之前我们讲过怎么flume日志采集组件,使用非常简单,在这里我们只需要把flume的sink模块换成kafka的模块就可以了。我们去flume的官方网站中找到这段sink配置
我们进入flume官网后点击Documentation–》Flume User Guide
我们只需把这段sink代码写到原来的agent的文件当中即可。
a1.sinks.k1.channel = c1
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.kafka.topic = mytopic
a1.sinks.k1.kafka.bootstrap.servers = localhost:9092
a1.sinks.k1.kafka.flumeBatchSize = 20
a1.sinks.k1.kafka.producer.acks = 1
a1.sinks.k1.kafka.producer.linger.ms = 1
a1.sinks.k1.kafka.producer.compression.type = snappy
a1.conf
#bin/flume-ng agent -n a1 -f myagent/a1.conf -c conf -Dflume.root.logger=INFO,console
#定义agent名, source、channel、sink的名称
a1.sources = r1
a1.channels = c1
a1.sinks = k1
#具体定义source
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /root/training/logs
#定义拦截器,为消息添加时间戳
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = org.apache.flume.interceptor.TimestampInterceptor$Builder
#具体定义channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
a1.channels.c1.transactionCapacity = 100
a1.sinks.k1.channel = c1
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.kafka.topic = mytopic1
a1.sinks.k1.kafka.bootstrap.servers = bigdata111:9092,bigdata111:9093
a1.sinks.k1.kafka.flumeBatchSize = 20
a1.sinks.k1.kafka.producer.acks = 1
a1.sinks.k1.kafka.producer.linger.ms = 1
a1.sinks.k1.kafka.producer.compression.type = snappy
#组装source、channel、sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
我们把上边的这个a1.conf放到之前我们的减压后的flume/myagent下,并使用配置文件的第一行启动flume即可,然后我们就以用source监控文件目录了。我们再打开一个consumer客户端去消费采集到的数据:
./kafka-console-consumer.sh --bootstrap-server bigdata111:9092,bigdata111:9093 --topic mytopic1
然后我们在/root/training/logs目录下添加一个data.txt的文件
I love Beijing
I love china
Beijing is the capital of china
然后我们就看到消费者可以消费到这个采集到的消息了