由于业务端日志采集格式不规范,经常会产生各种异常导致flume停止工作,如果对这些参数格式一一进行校验,影响采集速度,日志数据有一定的容错性。编一个一个脚本,不断监控flume运行状况,遇到异常,自动重启flume。
#!/bin/bash
export FLUME_HOME=/opt/flume
while true
do
pc=`ps -ef | grep kafka-flume-hdfs.conf | grep -v "grep" | wc -l`
if [[ $pc -lt 1 ]]
then
echo "detected no flume process.... preparing to launch flume agent...... "
nohup ${FLUME_HOME}/bin/flume-ng agent --conf-file ${FLUME_HOME}/conf/kafka-flume-hdfs.conf --name a1 -Dflume.root.logger=INFO,LOGFILE >${FLUME_HOME}/flume.log 2>&1 &
else
echo "detected flume process number is : $pc "
fi
sleep 30m
done
``
以守护线程的方式启动脚本
nohup check_flume.sh &