一、监控端安装nagios
1、Nagios简介
Nagios通常由一个主程序(Nagios)、一个插件程序(Nagios-plugins)和四个可选的ADDON(NRPE、NSCA、NSClient++和NDOUtils)组成。Nagios的监控工作都是通过插件实现的,因此,Nagios和Nagios-plugins是服务器端工作所必须的组件。而四个ADDON中,NRPE用来在监控的远程Linux/Unix主机上执行脚本插件以实现对这些主机资源的监控;NSCA用来让被监控的远程Linux/Unix主机主动将监控信息发送给Nagios服务器(这在冗余监控模式中特别要用到);NSClient++是用来监控Windows主机时安装在Windows主机上的组件;而NDOUtils则用来将Nagios的配置信息和各event产生的数据存入数据库,以实现这些数据的快速检索和处理。这四个ADDON(附件)中,NRPE和NSClient++工作于客户端,NDOUtils工作于服务器端,而NSCA则需要同时安装在服务器端和客户端。
目前,Nagios只能安装在Linux系统主机上,其编译需要用到gcc。同时,如果打算使用web界面的管理工具的话,还需要有apache服务器和GD图形库的支持。
2、安装前的准备工作
(1)解决安装Nagios的依赖关系
Nagios基本组件的运行依赖于httpd、gcc和gd。可以通过以下命令来检查nagios所依赖的rpm包是否已经完全安装:
# yum -y install httpd gcc glibc glibc-common gd gd-devel php php-mysql mysql mysql-devel mysql-server
说明:如果通过编译源代码的方式安装,后面许多相关文件的路径需要按照源代码安装时的配置逐一修改。此外,得按需启动必要的服务,如httpd等。
(2)添加nagios运行所需要的用户和组
# groupadd nagcmd
# useradd -G nagcmd nagios
# passwd nagios
把daemon加入到nagcmd组,以便于在通过web Interface操作nagios时能够具有足够的权限:
# usermod -a -G nagcmd daemon
3、编译安装nagios
# wget https://assets.nagios.com/downloads/nagioscore/releases/nagios-3.5.1.tar.gz#_ga=1.194172141.1791413989.1469521324
# tar xvf nagios-3.5.1.tar.gz -C /usr/local/src/
# cd /usr/local/src/nagios/
# ./configure --prefix=/usr/local/nagios --sysconfdir=/etc/nagios --with-command-group=nagcmd --enable-event-broker
# make all
# make install
# make install-init
# make install-commandmode
# make install-config
在httpd的配置文件目录(conf.d)中创建Nagios的Web程序配置文件:
# make install-webconf
创建一个登录nagios web程序的用户,在以后通过web登录nagios认证时所用:
# htpasswd -c /etc/nagios/htpasswd.users nagiosadmin
以上过程配置结束以后需要重新启动httpd:
# service httpd restart
4、编译、安装nagios-plugins
nagios的所有监控工作都是通过插件完成的,因此,在启动nagios之前还需要为其安装官方提供的插件。
# wget http://www.nagios-plugins.org/download/nagios-plugins-2.1.1.tar.gz#_ga=1.1710961.1791413989.1469521324
# tar xvf nagios-plugins-2.1.1.tar.gz -C /usr/local/src/
# cd /usr/local/src/nagios-plugins-2.1.1
# ./configure --with-nagios-user=nagios --with-nagios-group=nagios
# make && make install
5、配置并启动Nagios
(1)把nagios添加为系统服务并将之加入到自动启动服务队列
# chkconfig --add nagios
# chkconfig nagios on
(2)检查其主配置文件的语法是否正确
/usr/local/nagios/bin/nagios -v /etc/nagios/nagios.cfg
(3)如果上面的语法检查没有问题,接下来就可以正式启动nagios服务了:
# service nagios start
(4)配置selinux
如果系统开启了selinux服务,则默认为拒绝nagios web cgi程序的运行。可以通过下面的命令来检查系统是否开启了selinux:
# getenforce
如果上面命令的结果显示开启了selinux服务,可通过下面的命令暂时性的将其关闭:
# setenforce 0
通过编辑/etc/sysconfig/selinux文件完全关闭selinux,将其中的selinux后面的值“force”修改为“disable”即可。
当然,也可以通过以下方式将nagios的CGI程序运行于SELinux/targeted模式而不用关闭selinux:
# chcon -R -t httpd_sys_content_t /usr/local/nagios/sbin
# chcon -R -t httpd_sys_content_t /usr/local/nagios/share
(5)通过web界面查看nagios:http://your_nagios_IP/nagios
通过身份验证后即可查看当前默认监控本机的状况
二、基于NRPE监控远程Linux主机
1、NRPE简介
Nagios监控远程主机的方法有多种,其方式包括SNMP、NRPE、SSH和NCSA等。这里介绍其通过NRPE监控远程Linux主机的方式。
NRPE(Nagios Remote Plugin Executor)是用于在远端服务器上运行检测命令的守护进程,它用于让Nagios监控端基于安装的方式触发远端主机上的检测命令,并将检测结果输出至监控端。而其执行的开销远低于基于SSH的检测方式,而且检测过程并不需要远程主机上的系统帐号等信息,其安全性也高于SSH的检测方式。
2、安装配置被监控端
(1)添加nagios用户
# useradd -s /sbin/nologin nagios
(2)NRPE依赖于nagios-plugins,因此,需要先安装之
# tar xvf nagios-plugins-2.1.1.tar.gz -C /usr/local/src/
# cd /usr/local/src/nagios-plugins-2.1.1/
# ./configure --with-nagios-user=nagios --with-nagios-group=nagios
# make all
# make install
(3)安装NRPE
# tar xvf nrpe-3.0.tar.gz -C /usr/local/src/
# cd /usr/local/src/nrpe-3.0
# ./configure --prefix=/usr/local/nrpe \
--sysconfdir=/etc/nrpe \
--with-nrpe-user=nagios \
--with-nrpe-group=nagios \
--with-nagios-user=nagios \
--with-nagios-group=nagios \
--enable-command-args \
--enable-ssl
# make all
# make install-plugin
# make install-daemon
# make install-config
(4)配置NRPE
# vim /etc/nrpe/nrpe.cfg
log_facility=daemon
pid_file=/var/run/nrpe.pid
server_address=172.16.100.11
server_port=5666
nrpe_user=nagios
nrpe_group=nagios
allowed_hosts=172.16.100.1
command_timeout=60
connection_timeout=300
debug=0
上述配置指令可以做到见名知义,因此,配置过程中根据实际需要进行修改即可。其中,需要特定说明的是allowed_hosts指令用于定义本机所允许的监控端的IP地址。
(5)启动NRPE
# /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg –d
为了便于NRPE服务的启动,可以将如下内容定义为/etc/init.d/nrped脚本:
#!/bin/bash
# chkconfig: 2345 88 12
# description: NRPE DAEMON
NRPE=/usr/local/nagios/bin/nrpe
NRPECONF=/usr/local/nagios/etc/nrpe.cfg
case "$1" in
start)
echo -n "Starting NRPE daemon..."
$NRPE -c $NRPECONF -d
echo " done."
;;
stop)
echo -n "Stopping NRPE daemon..."
pkill -u nagios nrpe
echo " done."
;;
restart)
$0 stop
sleep 2
$0 start
;;
*)
echo "Usage: $0 start|stop|restart"
;;
esac
exit 0
也可在/etc/xinetd.d目录中创建nrpe文件,使其成为一个基于非独立守护进程的服务,通过重启xinetd来实现启动NRPE进程,文件内容如下
service nrpe
{
flags = REUSE
socket_type = stream
wait = no
user = nagios
group = nagios
server = /usr/local/nagios/bin/nrpe
server_args = -c /etc/nagios/nrpe.cfg -i
log_on_failure += USERID
disable = no
}
(6)配置允许远程主机监控的对象
在被监控端,可以通过NRPE监控的服务或资源需要通过nrpe.conf文件使用命令进行定义,定义命令的语法格式为:
command[<command_name>]=<command_to_execute>
比如:
- command[check_rootdisk]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /
- command[check_swap]=/usr/local/nagios/libexec/check_disk -w 40% -c 20%
- command[check_sensors]=/usr/local/nagios/libexec/check_sensors
- command[check_users]=/usr/local/nagios/libexec/check_users -w 10 -c 20
- command[check_load]=/usr/local/nagios/libexec/check_load -w 10,8,5 -c 20,18,15
- command[check_zombies]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
- command[check_all_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
3、配置监控端
(1)安装NRPE
# tar xvf nrpe-3.0.tar.gz -C /usr/local/src/
# cd /usr/local/src/nrpe-3.0
# ./configure --with-nrpe-user=nagios \
--with-nrpe-group=nagios \
--with-nagios-user=nagios \
--with-nagios-group=nagios \
--enable-command-args \
--enable-ssl
# make all
# make install-plugin
# /usr/local/nagios/libexec/check_nrpe -H client_IP ##测试是否能与被监控端通信
NRPE vnrpe-3.0
(2)定义如何监控远程主机及服务:
通过NRPE监控远程Linux主机要使用chech_nrpe插件进行,其语法格式如下:
check_nrpe -H <host> [-n] [-u] [-p <port>] [-t <timeout>] [-c <command>] [-a <arglist...>]
定义监控远程Linux主机资源的通用性命令:
# vim /etc/nagios/objects/commands.cfg
define command {
command_name check_nrpe
command_line $USER1$/check_nrpe –H "$HOSTADDRESS$" -c $ARG1$
}
定义远程Linux主机的资源配置文件,可根据实际情况定义
# cat /etc/nagios/objects/linux.cfg |grep "^\s*[^#\t].*$"
define host{
use linux-server ; Inherit default values from a template
host_name webserver ; The name we're giving to this host
alias My Web Server ; A longer name associated with the host
address 192.168.1.72 ; IP address of the host
}
define hostgroup{
hostgroup_name web-servers ; The name of the hostgroup
alias Web Servers ; Long name of the group
}
define service{
use generic-service
host_name webserver
service_description CHECK USERS
check_command check_nrpe!check_users
}
define service{
use generic-service
host_name webserver
service_description Load
check_command check_nrpe!check_load
}
define service{
use generic-service
host_name webserver
service_description Disk
check_command check_nrpe!check_xvda1
}
define service{
use generic-service
host_name webserver
service_description Total Processes
check_command check_nrpe!check_total_procs
}
define service{
use generic-service
host_name webserver
service_description Zombie Processes
check_command check_nrpe!check_zombie_procs
}
而后编辑nagios配置文件,使用刚才定义的linux.cfg,检查语法正确后重启nagios服务
# vim /etc/nagios/nagios.cfg
cfg_file=/etc/nagios/objects/linux.cfg
# /usr/local/nagios/bin/nagios -v /etc/nagios/nagios.cfg
# service nagios start
三、基于NSClinet++监控Windows主机
windows主机安装NSClinet++完成后,默认已经启动服务,12489是check_nt与NSClinet++通信的端口,5666为nrpe监听的端口
1、基于check_nt
Windows端要启用的模块,修改配置后要重启服务
[modules]
CheckSystem.dll
CheckDisk.dll
FileLogger.dll
NSClientListener.dll
[settings]
allowed_hosts =
在nagios端使用如下命令测试测试
# /usr/local/nagios/libexec/check_nt -h ##查看帮助信息
check_nt -H <client ip> -p <port> -v <command> ...
# check_nt -H 192.168.1.250 -p 12489 -v CPULOAD -w 80 -c 90 -l 5,80,90
定义监控远程Windows主机资源的通用性命令:
# vim /etc/nagios/objects/commands.cfg
define command{
command_name check_nt
command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -s PASSWORD -v $ARG1$ $ARG2$
}
根据实际情况定义远程Windows主机的资源配置文件
# cat /etc/nagios/objects/windows.cfg |grep "^\s*[^#\t].*$"
define host{
use windows-server ; Inherit default values from a template
host_name winserver ; The name we're giving to this host
alias My Windows Server ; A longer name associated with the host
address 192.168.1.2 ; IP address of the host
}
define hostgroup{
hostgroup_name windows-servers ; The name of the hostgroup
alias Windows Servers ; Long name of the group
}
define service{
use generic-service
host_name winserver
service_description NSClient++ Version
check_command check_nt!CLIENTVERSION
}
define service{
use generic-service
host_name winserver
service_description Uptime
check_command check_nt!UPTIME
}
define service{
use generic-service
host_name winserver
service_description CPU Load
check_command check_nt!CPULOAD!-l 5,80,90
}
define service{
use generic-service
host_name winserver
service_description Memory Usage
check_command check_nt!MEMUSE!-w 80 -c 90
}
define service{
use generic-service
host_name winserver
service_description C:\ Drive Space
check_command check_nt!USEDDISKSPACE!-l c -w 80 -c 90
}
define service{
use generic-service
host_name winserver
service_description W3SVC
check_command check_nt!SERVICESTATE!-d SHOWALL -l W3SVC
}
define service{
use generic-service
host_name winserver
service_description Explorer
check_command check_nt!PROCSTATE!-d SHOWALL -l Explorer.exe
}
而后编辑nagios配置文件,使用刚才定义的windows.cfg,检查语法正确后重启nagios服务
# vim /etc/nagios/nagios.cfg
cfg_file=/etc/nagios/objects/windows.cfg
# /usr/local/nagios/bin/nagios -v /etc/nagios/nagios.cfg
# service nagios start