多数据源汇总案例实现 | 学习笔记

开发者学堂课程数据采集系统 Flume 多数据源汇总案例实现】学习笔记,与课程紧密联系,让用户快速学习知识。

课程地址:https://developer.aliyun.com/learning/course/99/detail/1640


多数据源汇总案例实现


0.准备工作

分发 Flume

[atguiguehadoop102 module]S xsync flume

在 hadoop102.hadoop103 以及hadoop104的/opt/hmodule/fume/job 目录下创建一个 growp3 文件夹。

[atguiguehadoog102 job]s|

i mkdin group3

[atguigu@hadoop103 job]$

mkdiz group3

[atguigu@hadoop104 job]$

imkdir group3

 

1.创建 fume1-logger-flume.conf

配置 Source 用于监控 hive.log 文件,配置 Sink 输出数据到下一级 Flume。

在 hadoop103 上创建配置文件并打开

[atguigu@hadoop103 group3]s touch flumel-logger-flume.conf

[atguigu@hadoop103 group3]s vim flumel-logger-flume.conf

添加如下内容

# Name the components on this agent

al.sources = r1

al.sinks = k1

al.channels = c1

# Describe/configure the source

lal.sources.rl.type - exec

lal.sources.r1.command - tail -F /opt/module/group.loge

a1.sources.r1.shell =/bin/bash -c

# Describe the sink

al.sinks.kl.type = avre

al.sinks.k1.hostname =hadoop1044

al.sinks.k1.port = 4141

# Describe the channel

a1.channels.cl.type = memory

al.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel

 

al.sources.r1.channels = c1

al.sinks.kl.channel = c1

 

2.创建 flume2-netcat-flume.gonf

配置 Source 监控端口 44444 数据流,配置 Sink 数据到下一级 Flume:

在 hadoop102 上创建配置文件并打开

[atquigu@hadoop102 group3]$ touch flume2-netcat-flume.conf!

[atguigu@hadoop102 group3]$ xim flume2-netcat-flume.conf

添加如下内容

# Name the components on this agenty

a2.sources = r1

a2.sinks = k1

a2.channels = c1

# Describe/configure the source

a2.sources.rl.type = netcaty

a2.sources.rl.bind = hadoop102

a2.sources.r1.port = 44444

# Describe the sink

a2.sinks.kl.type = avro

a2.sinks.kl.hostname = hadoop104

a2.sinks.kl port = 4141

# Use a channel which buffers events in memory

a2.channels.cl type = memory

a2.channels cl.capacity = 1000

a2.channels.cl.transactionCapacity = 100

# Bind the source and sink to the channel

a2.sources.r1.channels = c1

a2.sinks.kl.channel = c1

 

3.创建 flume3-flume-logger.conf

配置 source 用于接收 Aumel 与 flume2 发送过来的数据流,最终合并后 sink 到控制台。

在 hadoop104 上创建配置文件并打开

[atguigu@hadoop104 groups]$ touch flume3-flume-logger.conf

[atguigu@hadoppl04 groupsj)s vim flume3-rlume-logger.conf

添加如下内容

# Name the components on thils agent

la3.sources = r1

a3.sinks = k1

a3.channels = c1

# Describe/configure the source

la3.sources.r1.type=avro

la3.sources.r1.bind=hadoop104

a3.sources.r1.port=4141

 

# Describe the sink

# Describe the sink

la3.sinks.k1.type = logger

# Describe the channel

a3.channels.cl.type = memory

a3.channels.cl.capacity = 1000

a3.channels.c1.transactionCapacity = 100

 Bind the source and sink to the channel

la3.sources.rl.channels = c1

a3.sinks.k1.channel=c1

 

4.执行配置文件

分别开启对应配置文件:

Alume3-Alume-logger.conf,flume2-netcat-flume.conf,flume1-

logger-flume.conf

[atguiguehaaoqp104 flume)]$ pinElume-ng  agent --conf conf/

name a3

--conf-file

job/group3/Flume3-flume-logger.conf

Dflume.root.logger=INFO,console

[atguigu@hadoop102 flume]$ bin/flume-ng agent --conf conf/

name a2 --conf-file job/group3/flume2-netcat-flume.conf

[atguigu@hadoop103 flume]$ bin/flume-ng agent --conf gonf/

name al --conf-file job/group3/flume1-logger-flume.conf

 

5.在 hadoop103 上向 /opt/module 目录下的 group.log 追加内容

[atguigu@hadoop103 module]$ echo 'hello'> group.log

 

6.在 hadoop102 上向 44444 端口发送数据

[atguiguehadoop102 flume】$ telnet.hadoop102 44444


7.检查 hadoop104 上数据

 

上一篇:Linux底层函数库“glibc”再现重大安全漏洞


下一篇:Mockito框架学习 - how does expected annotation work