结合elk展示hadoop冷热数据

整体方案

fsimage⽂件是hadoop⽂件系统元数据的⼀个永久性的检查点,其中包含hadoop⽂件系统中的所有⽬录和⽂件idnode的序列化
信息;⽂件在hdfs主节点上⾃动更新
利⽤HDFS oiv命令可以解析fsimage⽂件,解析后的⽂件放⼊ELK中即可进⾏集群元数据的详细分析。
本⽅案的主要过程:
1、通过hdfs oiv命令将最新的fsimage⽂件解析为csv格式的⽂件
2、将csv⽂件通过filebeat导⼊elk中
3、分析导⼊的csv⽂件
4、根据分析结果联系⽤⼾检查数据是否可以删除

0、 节点准备

需要在处理数据的节点上配置好到hadoop主节点的免密登录,以便后续自动拉取最新的hfds元数据

1、安装最新的hadoop

⽬的:原有的hdfs oiv功能不⾜ ,不⽀持-delimiter及 -p Delimited 参数

1、下载hadoop并解压
# 配置好代理后下载hadoop
wget -c https://mirrors.cnnic.cn/apache/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz
tar -zxvf hadoop-3.3.0.tar.gz
mv hadoop-3.3.0 /usr/local
2、测试hdfs oiv功能
export JAVA_HOME=/usr
/usr/local/hadoop-3.3.0/bin/hdfs oiv --help
3、配置PATH环境变量

vim /root/.bashrc,添加如下信息

export JAVA_HOME=/usr
exprot PATH="/usr/local/hadoop-3.3.0/bin":$PATH

2、同步最新的fsimage并转换为csv格式

1、getfsimage
FsImage=`ssh hadoop-master "ls -art /disk1/hadoop/dfs/hadoop-nn1/current/fsimage* |grep -v md5|tail -1"`
mkdir /root/fsimage-analysis/fsimages
scp 172.20.24.39:$FsImage /root/fsimage-analysis/fsimages/fsimage_`date +%F`
2、hdfs -oiv
hdfs -oiv -i ../fsimages/fsimage_2021_03_02 -o fsimage.csv -p Delimited -delimiter ","
测试正常,将上述两步写到/etc/crontab中

analysis.sh

#!/bin/bash
# get last fsimage
FsImage = `ssh hadoop-master "ls -art /disk1/hadoop/dfs/hadoop-nn1/current/fsimage* |grep -v md5|tail -1"` 
echo $FsImage
mkdir /root/fsimage-analysis/fsimage
scp hadoop-master:$FsImage /root/fsimage-analysis/fsimage_`date +%F`
# analysis fsimage to csv
/usr/local/hadoop-3.3.0/bin/hdfs oiv -i /root/fsimage-analysis/fsimage_`date +%F` -o \ 
/root/fsimage-analysis/csv/fsimage_`date +%F` -p Delimited -delimiter "," > /root/fsimage-analysis/csv/log 2>&1

/etc/crontab

0 1 * * 1 root /bin/bash /root/fsimage-analysis/analysis.sh

3、将csv导⼊elk

1、安装logstash

可以下载后本地安装,这里直接用离线源中的rpm包

yum install http://mirrors.ayers.iflytek.cn/elk/7.10.1/rpm/logstash-7.10.1-x86_64.rpm
2、修改logstash的conf⽂件

vim /etc/logstash/conf.d/hadoop-logstash.conf

#1.读取数据csv⽂件
input {
	file{
		path="/root/fsimage-analysis/csv/fsimage*.csv"
		start_position => beginning
		type => "csv_hadoop_fsimage"
		sincedb_path => "/dev/null"
	}
}
#2.过滤格式化数据阶段
filter {
	csv{
		separator =? ","
		columns => ["Path","Replication","ModificationTime","AccessTime","PreferredBlockSize","BlocksCount","FileSize","NSQUOTA","DSQUOTA"
	}
	mutate {
		convert => {
			"Path" => "string"
			"Replication" => "string"
			"PreferredBlockSize" => "integer"
			"BlocksCount" => "integer"
			"FileSize" => "integer"
			"NSQUOTA" => "integer"
			"DSQUOTA" => "integer"
			"Permission" => "string"
			"UserName" => "string"
			"GroupName" => "string"
		}
	}
	date {
		locale => "en"
		match => ["ModificationTime", "yyyy-MM-dd HH:mm"]
		timezone => "Asia/Kolkata"
		target => "ModificationTime"
	}
	date {
		locale => "en"
		match => ["AccessTime", "yyyy-MM-dd HH:mm"]
		timezone => "Asia/Kolkata"
		target => "AccessTime"
	}
}
output {
	elasticsearch {
		hosts => ["http://elk-master:9200"]
		index => "hadoop-fsimages-ayers"
	}
}
3、执⾏logstash 导⼊数据
/usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/hadoop-logstash.conf

4、分析⽇志信息

创建仪表盘进⾏分析
结合elk展示hadoop冷热数据结合elk展示hadoop冷热数据
结合elk展示hadoop冷热数据

5、添加脚本字段

因为oiv工具取出的filesize数据是按字节的,kibana中聚合时没有找到很好的方法来处理聚合后的数据,这里通过创建脚本字段来实现

1、FIleSizeTB

将filesize除以4个1024.0,注意一定要是浮点数,否则小数据直接是0了

def FileSize = doc['FileSize'].value;
if (FileSize !== 0) {
	def FileSizeTB = FileSize/1024.0/1024.0/1024.0/1024.0;
	return FileSizeTB;
} else {
	return 0.0;
}
2、DiskUsed

文件占用磁盘总大小,需要文件大小乘以副本数然后再处理4个1024.0
注意这里的rep是string格式,需要装换成int再参与计算
同时doc[‘Replication’]会报错,这里需要用Replication.keyword替代

def FileSize = doc['FileSize'].value;
def rep = doc['Replication.keyword'].value;
if (FileSize !== 0) {
	def FileSizeTB = Inter.parseInt(rep) * FileSize/1024.0/1024.0/1024.0/1024.0;
	return DiskUsed;
} else {
	return 0.0;
}

6、饼图中自定义图例

如上图所示的饼图中,如果直接选时间范围或者Date Histogram的话,图例会变成详细格式的时间点1 to 时间点2;很不好看

自定义图例

1、需要在存储桶中选拆分切片;然后聚合时选择筛选
2、点击筛选后将KQL语法选项去除,直接Lucene语法
3、在删选框中写上我们想要筛选的时间范围:

{"range":{"AccessTime":{"lt":"2019-07-05T08:04:58.074Z","format":"strict_date_optional_time"}}}

点击左上角的标签,写上我们自定义的标签即可
结合elk展示hadoop冷热数据

上一篇:学Web前端开发,选择培训学校是关键--青岛思途


下一篇:ELK+filebeat日志系统搭建攻略