开发者学堂课程【数据采集系统 Flume :多数据源汇总案例实现】学习笔记,与课程紧密联系,让用户快速学习知识。
课程地址:https://developer.aliyun.com/learning/course/99/detail/1640
多数据源汇总案例实现
0.准备工作
分发 Flume
[atguiguehadoop102 module]S xsync flume
在 hadoop102.hadoop103 以及hadoop104的/opt/hmodule/fume/job 目录下创建一个 growp3 文件夹。
[atguiguehadoog102 job]s|
i mkdin group3
[atguigu@hadoop103 job]$
mkdiz group3
[atguigu@hadoop104 job]$
imkdir group3
1.创建 fume1-logger-flume.conf
配置 Source 用于监控 hive.log 文件,配置 Sink 输出数据到下一级 Flume。
在 hadoop103 上创建配置文件并打开
[atguigu@hadoop103 group3]s touch flumel-logger-flume.conf
[atguigu@hadoop103 group3]s vim flumel-logger-flume.conf
添加如下内容
# Name the components on this agent
al.sources = r1
al.sinks = k1
al.channels = c1
# Describe/configure the source
lal.sources.rl.type - exec
lal.sources.r1.command - tail -F /opt/module/group.loge
a1.sources.r1.shell =/bin/bash -c
# Describe the sink
al.sinks.kl.type = avre
al.sinks.k1.hostname =hadoop1044
al.sinks.k1.port = 4141
# Describe the channel
a1.channels.cl.type = memory
al.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
al.sources.r1.channels = c1
al.sinks.kl.channel = c1
2.创建 flume2-netcat-flume.gonf
配置 Source 监控端口 44444 数据流,配置 Sink 数据到下一级 Flume:
在 hadoop102 上创建配置文件并打开
[atquigu@hadoop102 group3]$ touch flume2-netcat-flume.conf!
[atguigu@hadoop102 group3]$ xim flume2-netcat-flume.conf
添加如下内容
# Name the components on this agenty
a2.sources = r1
a2.sinks = k1
a2.channels = c1
# Describe/configure the source
a2.sources.rl.type = netcaty
a2.sources.rl.bind = hadoop102
a2.sources.r1.port = 44444
# Describe the sink
a2.sinks.kl.type = avro
a2.sinks.kl.hostname = hadoop104
a2.sinks.kl port = 4141
# Use a channel which buffers events in memory
a2.channels.cl type = memory
a2.channels cl.capacity = 1000
a2.channels.cl.transactionCapacity = 100
# Bind the source and sink to the channel
a2.sources.r1.channels = c1
a2.sinks.kl.channel = c1
3.创建 flume3-flume-logger.conf
配置 source 用于接收 Aumel 与 flume2 发送过来的数据流,最终合并后 sink 到控制台。
在 hadoop104 上创建配置文件并打开
[atguigu@hadoop104 groups]$ touch flume3-flume-logger.conf
[atguigu@hadoppl04 groupsj)s vim flume3-rlume-logger.conf
添加如下内容
# Name the components on thils agent
la3.sources = r1
a3.sinks = k1
a3.channels = c1
# Describe/configure the source
la3.sources.r1.type=avro
la3.sources.r1.bind=hadoop104
a3.sources.r1.port=4141
# Describe the sink
# Describe the sink
la3.sinks.k1.type = logger
# Describe the channel
a3.channels.cl.type = memory
a3.channels.cl.capacity = 1000
a3.channels.c1.transactionCapacity = 100
Bind the source and sink to the channel
la3.sources.rl.channels = c1
a3.sinks.k1.channel=c1
4.执行配置文件
分别开启对应配置文件:
Alume3-Alume-logger.conf,flume2-netcat-flume.conf,flume1-
logger-flume.conf
[atguiguehaaoqp104 flume)]$ pinElume-ng agent --conf conf/
name a3
--conf-file
job/group3/Flume3-flume-logger.conf
Dflume.root.logger=INFO,console
[atguigu@hadoop102 flume]$ bin/flume-ng agent --conf conf/
name a2 --conf-file job/group3/flume2-netcat-flume.conf
[atguigu@hadoop103 flume]$ bin/flume-ng agent --conf gonf/
name al --conf-file job/group3/flume1-logger-flume.conf
5.在 hadoop103 上向 /opt/module 目录下的 group.log 追加内容
[atguigu@hadoop103 module]$ echo 'hello'> group.log
6.在 hadoop102 上向 44444 端口发送数据
[atguiguehadoop102 flume】$ telnet.hadoop102 44444