一、搭建监控服务器
案例:在IP地址是192.168.4.77的服务器上部署Nagios监控服务器。
准备安装环境
[root@server yum.repos.d]# yum repolist
[root@server yum.repos.d]# rpm -q gcc gcc-c++
[root@server yum.repos.d]# groupadd nagcmd
[root@server yum.repos.d]# useradd nagios
[root@server yum.repos.d]# usermod -G nagcmd nagios
[root@server yum.repos.d]# yum -y install php
[root@server yum.repos.d]# yum -y install httpd
[root@server yum.repos.d]# echo 77> /var/www/html/index.html
[root@server yum.repos.d]# /etc/init.d/httpd start; chkconfig httpd on
[root@server yum.repos.d]# vim /var/www/html/test.php
<?php
echo "hello boy"
?>
:wq
[root@server yum.repos.d]# yum -y install elinks
[root@server yum.repos.d]# elinks --dump http:/localhost
[root@server yum.repos.d]# elinks --dump http:/localhost/test.php
[root@server yum.repos.d]# elinks -dump http://192.168.4.77
77
[root@server yum.repos.d]# elinks -dump http://192.168.4.77/test.php
hello boy
[root@server yum.repos.d]# rpm -qa gcc gcc-c++ php httpd
httpd-2.2.15-45.el6.x86_64
gcc-4.4.7-16.el6.x86_64
php-5.3.3-40.el6_6.x86_64
gcc-c++-4.4.7-16.el6.x86_64
安装nagios
[root@server ~]# cd nagios
[root@server nagios]# tar -xf nagios-3.2.1.tar.gz
[root@server nagios-3.2.1]# cd nagios-3.2.1
[root@server nagios-3.2.1]# ./config --help
[root@server nagios-3.2.1]# ./configure --with-nagios-user=nagios --with-nagios-group=nagcmd --with-command-user=nagios --with-command-group=nagcmd
[root@server nagios-3.2.1]# make all
[root@server nagios-3.2.1]# make install
[root@server nagios-3.2.1]# make install-init
[root@server nagios-3.2.1]# make install-commandmode
[root@server nagios-3.2.1]# make install-config
[root@server nagios-3.2.1]# make install-webconf
[root@server nagios-3.2.1]# ls /usr/local/nagios/
bin etc libexec sbin share var
bin 命令存放的文件夹
etc 配置文件存放的文件夹
libexec 插件存放目录(nagios软件自带的一些监控命令)
sbin cgi文件(已经安装的实现某种功能的命令)
share nagios服务的网页文件
var 变化的数据(比如 日志文件)
1.3 安装nagios 监控插件
[root@server nagios]# tar -xf nagios-plugins-1.4.14.tar.gz
[root@server nagios]# cd nagios-plugins-1.4.14
[root@server nagios-plugins-1.4.14]# ./configure
[root@server nagios-plugins-1.4.14]# make
[root@server nagios-plugins-1.4.14]# make install
[root@server nagios-plugins-1.4.14]# ls /usr/local/nagios/libexec/check_*
nagios 服务是如何对主机的资源做监控的?
nagios 服务运行时,自动调用监控插件目录下的插件对指定服务器的资源做监控,在调用插件时,管理员可以配置,调用监控插件对资源做监控的值,值分2种:
一种是警告值 数字 百分比
一种是错误值 数字 百分比
如何监控对象的使用率小于 警告值 是正常状态 OK
如果监控对象的使用率大于警告值并且小于错误值 是警告状态 warning
如果监控对象的使用率大于错误值 是错误状态 critical
1.4 启动nagios 监控服务
* 默认不用做任何配置,nagios就对本机做监控服务
[root@server nagios-plugins-1.4.14]# service httpd restart
[root@server nagios-plugins-1.4.14]# /etc/rc.d/init.d/nagios start
[root@server nagios-plugins-1.4.14]# /etc/rc.d/init.d/nagios status
1.5 添访问nagios监控页面的认证用户
[root@server nagios-plugins-1.4.14]# vim /etc/httpd/conf.d/nagios.conf
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/nagios/etc/htpasswd.users
Require valid-user
[root@server nagios-plugins-1.4.14]# htpasswd -c /usr/local/nagios/etc/htpasswd.users jim
New password:
Re-type new password:
Adding password for user jim
[root@server nagios-plugins-1.4.14]# cat /usr/local/nagios/etc/htpasswd.users
jim:UmR5.6.KoQcvY
访问页面是出现以下告警
It appears as though you do not have permission to view information for any of the services you requested...
If you believe this is an error, check the HTTP server authentication requirements for accessing this CGI
and check the authorization options in your CGI configuration file.
用户名和账号不能随便定义,账号在/usr/local/nagios/etc/cgi.cfg里面已经定义
15 main_config_file=/usr/local/nagios/etc/nagios.cfg
24 physical_html_path=/usr/local/nagios/share
36 url_html_path=/nagios
119 authorized_for_system_information=nagiosadmin
131 authorized_for_configuration_information=nagiosadmin
[root@server etc]# rm -f /usr/local/nagios/etc/htpasswd.users
[root@server etc]# htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
New password:
Re-type new password:
Adding password for user nagiosadmin
[root@server etc]# cat /usr/local/nagios/etc/htpasswd.users
nagiosadmin:xxFlLUqalWZGs
1.6 访问nagios 监控服务的WEB页面,查看监控信息。
ping 192.168.4.77
http://192.168.4.77/nagios
1.7 默认监控的资源
Current Load cpu负载
Current Users 登录的用户数
HTTP 网站服务
PING 是否在线
Root Partition 系统根分区的使用量
SSH 网站服务
Swap Usage 交换分区的使用量
Total Processes 系统总的进程数
nagios服务配置文件说明?
/usr/local/nagios/etc
cgi.cfg 定义访问CGI文件的用户
nagios.cfg nagios服务的主配置文件 *
resource.cfg 宏定义文件(定义nagios服务使用的变量)
htpasswd.users
/usr/local/nagios/etc/objects
commands.cfg 定义监控命令 *
timeperiods.cfg 定义监控时间模板
contacts.cfg 定义接收报警邮件的邮箱地址 *
templates.cfg 定义监控模板
localhost.cfg 监控本机配置
windows.cfg
switch.cfg
printer.cfg
1.8 监控插件的使用
cd /usr/local/nagios/libexec
./插件名 -h 查看帮助信息
./check_users -h
./check_users
[root@server libexec]# ./check_users -w 3 -c 5
USERS OK - 3 users currently logged in |users=3;3;5;0
[root@server libexec]# ./check_users -w 1 -c 2
USERS CRITICAL - 3 users currently logged in |users=3;1;2;0
./check_disk
[root@server libexec]# df -h
[root@server libexec]# ./check_disk -w 50% -c 30% -p /boot
[root@server libexec]# dd if=/dev/zero of=/boot/test.txt bs=1M count=400
[root@server libexec]# df -h
[root@server libexec]# ./check_disk -w 50% -c 30% -p /boot
./check_procs
[root@server libexec]# ./check_procs -w 90 -c 100 -s R
./check_http
[root@server libexec]# ./check_http -H 192.168.4.77 -p 80
./check_ssh
[root@server libexec]# ./check_ssh -H 192.168.4.77
./check_tcp
[root@server libexec]# ./check_tcp -H 127.0.0.1 -p 22
[root@server libexec]# ./check_tcp -H 172.40.58.140 -p 3128
+++++++++++++++++++++++++++++++++++++++++++
二、监控配置步骤
1、定义监控命令
define command{
command_name 命令名
command_line 设置使用的插件
}
2、在监控服务器的配置文件里调用定义的监控命令
define service{
use 监控模板名
host_name 主机别名
service_description 描述信息
check_command 使用的监控命令
}
3、在nagios.cfg文件里加载监控服务器的配置文件
cfg_file=/usr/local/nagios/etc/objects/监控服务器的主配置文件
4、验证nagios.cfg配置格式
[root@server objects]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Total Warnings: 0
Total Errors: 0
5、重启监控服务
[root@server ~]# /etc/init.d/nagios restat
6、访问监控页面查看监控信息
http://monitor-ip/nagios/
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2.1 监控远端服务器
监控服务器172.40.50.67服务器上网站服务和ftp服务、登录用户数量、引导分区使用量、进程数
1、定义监控命令
[root@server objects]# vim /usr/local/nagios/etc/objects/commands.cfg
vim
define command {
command_name check_67_ftp
command_line $USER1$/check_ftp -H 192.168.4.67
}
define command {
command_name check_67_httpd
command_line $USER1$/check_http -H 192.168.4.67
}
2、在监控服务器的配置文件里调用定义的监控命令
[root@server objects]# vim /usr/local/nagios/etc/objects/192.168.4.67.cfg
define host{
use linux-server
host_name ser67
alias This is my server
address 192.168.4.67
}
define service{
use local-service
host_name ser67
service_description ftp
check_command check_67_ftp
}
define service{
use local-service
host_name ser67
service_description httpd
check_command check_67_httpd
}
3、在nagios.cfg文件里加载监控服务器的配置文件
[root@server objects]# vim /usr/local/nagios/etc/nagios.cfg
cfg_file=/usr/local/nagios/etc/objects/192.168.4.67.cfg
4、验证nagios.cfg配置格式
[root@server objects]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
5、重启监控服务
[root@server ~]# /etc/init.d/nagios restat
6、访问监控页面查看监控信息
http://192.168.4.67/nagios/
+++++++++++++++++++++++++++++++++++++++++++++++++
配置监控服务器监控远端服务器上的私有数据(登录用户数量、引导分区使用量、进程数)
NRPE----> 私有数据
远端服务器的配置(192.168.4.67)
1、安装监控插件
[root@web1 nagios]# tar xf nagios-plugins-1.4.14.tar.gz
[root@web1 nagios]# cd nagios-plugins-1.4.14
[root@web1 nagios-plugins-1.4.14]# yum -y install gcc gcc-c++
[root@web1 nagios-plugins-1.4.14]# ./configure
[root@web1 nagios-plugins-1.4.14]# make&&make install
[root@web1 nagios-plugins-1.4.14]# ls /usr/local/nagios/libexec/check_*
[root@web1 ~]# /usr/local/nagios/libexec/check_users -w 3 -c 5
USERS OK - 3 users currently logged in |users=3;3;5;0
[root@web1 ~]# /usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /boot
DISK OK - free space: /boot 415 MB (92% inode=99%);| /boot=35MB;380;428;0;476
[root@web1 ~]# /usr/local/nagios/libexec/check_procs -w 50 -c 60
PROCS CRITICAL: 101 processes
2、运行nrpe服务
在192.168.4.67
安装nrpe
[root@web1 nrpe-2.12]# useradd nagios
[root@web1 nrpe-2.12]# groupadd nagcmd
[root@web1 nrpe-2.12]# usermod -G nagcmd nagios
[root@web1 nrpe-2.12]# tar xf nrpe-2.12.tar.gz
[root@web1 nrpe-2.12]# cd nrpe-2.12
[root@web1 nrpe-2.12]# yum -y install openssl-devel
[root@web1 nrpe-2.12]# ./configure&&make && make install
[root@web1 nrpe-2.12]# make install-plugin
[root@web1 nrpe-2.12]# make install-daemon
[root@web1 nrpe-2.12]# make install-daemon-config
[root@web1 nrpe-2.12]# make install-xinetd
[root@web1 nagios]# grep only_from /etc/xinetd.d/nrpe
only_from = 127.0.0.1 192.168.4.77
[root@web1 nagios]# tail -1 /etc/services
nrpe 5666/tcp # NRPE
[root@web1 nagios]# yum -y install xinetd
[root@web1 nagios]# /etc/init.d/xinetd restart
[root@web1 nagios]# chkconfig xinetd on
[root@web1 nagios]# netstat -anptu |grep :5666
tcp 0 0 :::5666 :::* LISTEN 2033/xinetd
修改NRP服务的配置文件nrpe.cfg,定义监控命令
command[命令名]=使用的插件
[root@web1 nagios]# sed -n '199,203'p /usr/local/nagios/etc/nrpe.cfg
command[check_67_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_67_cpu]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_67_boot]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /boot
command[check_67_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_67_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
[root@web1 libexec]# /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -c check_67_zombie_procs
监控服务器的配置 192.168.4.77
1、安装check_nrpe插件
[root@server nagios]# tar -xf nrpe-2.12.tar.gz
[root@server nagios]# cd nrpe-2.12
[root@server nrpe-2.12]# yum -y install openssl-devel
[root@server nrpe-2.12]# ./configure
[root@server nrpe-2.12]# make all
[root@server nrpe-2.12]# make install-plugin
[root@server nrpe-2.12]# ls /usr/local/nagios/libexec/check_nrpe
/usr/local/nagios/libexec/check_nrpe
2、使用插件连接被监控主机的nrpe服务并测试定义的监控命令
[root@server nrpe-2.12]# /usr/local/nagios/libexec/check_nrpe -H 192.168.4.67 -c check_67_users
USERS OK - 2 users currently logged in |users=2;5;10;0
[root@server nrpe-2.12]# /usr/local/nagios/libexec/check_nrpe -H 192.168.4.67 -c check_67_boot
DISK OK - free space: /boot 415 MB (92% inode=99%);| /boot=35MB;380;428;0;476
1、定义监控命令
vim commands.cfg
define command{
command_name check_ser_67_users
command_line $USER$/check_nrpe -H 192.168.4.67 -c check_67_users
}
define command{
command_name check_ser_67_boot
command_line $USER$/check_nrpe -H 192.168.4.67 -c check_67_boot
}
define command{
command_name check_ser_67_total_process
command_line $USER$/check_nrpe -H 192.168.4.67 -c check_67_total_process
}
define command{
command_name check_ser_67_zombie_process
command_line $USER$/check_nrpe -H 192.168.4.67 -c check_67_zombie_process
}
2、在监控远端服务器的配置文件里调用定义的监控命令
vim 192.168.4.67.cfg
define service{
use local-service
host_name ser67
service_description user
check_command check_ser_67_users
}
define service{
use local-service
host_name ser67
service_description boot
check_command check_ser_67_boot
}
define service{
use local-service
host_name ser67
service_description totalprocess
check_command check_ser_67_total_process
}
define service{
use local-service
host_name ser67
service_description zombieprocess
check_command check_ser_67_zombie_process
}
3、在nagios.cfg文件里加载监控服务器的配置文件
[root@server objects]# vim /usr/local/nagios/etc/nagios.cfg
cfg_file=/usr/local/nagios/etc/objects/192.168.4.67.cfg
4、验证nagios.cfg配置格式
[root@server objects]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
5、重启监控服务
[root@server ~]# /etc/init.d/nagios restat
6、访问监控页面查看监控信息
http://192.168.4.67/nagios/
+++++++++++++++++++++++++++++++++++++++++++++++++++++++
2.2 配置监控本机 localhost.cfg
2.3 自定义监控本机资源
监控本机引导分区的使用情况
不监控本机的交换分区
监控本机ftp服务的状态
+++++++++++++++++++++++++++++++++++++++++++++++++++++
1、定义监控命令
define command {
command_name check_local_boot
command_line $USER1$/check_disk -w 20% -c 10% -p /boot
}
2、在监控服务器的配置文件里调用定义的监控命令
define command {
command_name check_local_ftp
command_line $USER1$/check_ftp -H localhost
}
define service{
use local-service
host_name localhost
service_description ftp
check_command check_local_ftp
}
define service{
use local-service
host_name localhost
service_description boot
check_command check_local_boot
}
3、在nagios.cfg文件里加载监控服务器的配置文件
cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
4、验证nagios.cfg配置格式
[root@server objects]# alias checknagios='/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg'
[root@server objects]# checknagios
5、重启监控服务
[root@server ~]# /etc/init.d/nagios restat
6、访问监控页面查看监控信息
http://192.168.4.77/nagios/
2.4 配置监控报警
[root@server objects]# grep email /usr/local/nagios/etc/objects/contacts.cfg
email nagios@localhost ; <<***** CHANGE THIS TO YOUR EMAIL ADDRESS ******
lyd@163.com
grep nagios /etc/passwd
/etc/init.d/postfix start
++++++++++++++++++++++++++++++++++++++++++++++
扩展知识
1、使用第三方邮件服务器发送报警邮件
sendmail
2、主机依赖
3、服务依赖
4、改变报警方式(微信报警 短信报警)
相关文章
- 11-04栈与队列的应用
- 11-0414 JUC的Semaphore,CountDownLatch,Cyclicbarrier的应用与原理
- 11-04协程与Swoole的原理,相关应用以及适用场景等
- 11-04《数据密集型应用系统设计》读书笔记-ch1可靠、可扩展与可维护的应用系统
- 11-04VGG网络结构详解与模型的搭建
- 11-04线程与进程的一些应用
- 11-04python学习Day12 函数的默认值、三元表达式、函数对象(函数名)的应用场景、名称空间与作用域
- 11-04idea上gradle与springcloud的简单搭建(二)
- 11-04Elasticsearch 第八篇:数据类型 Array、Nested、Object 的设计与应用
- 11-04游戏AI领域,机器人技术的研究与应用