背景:
既然有了Lvs+keepalived这样高性能的组合,那为何还要有Nginx+keepalived呢,keepalived的初衷就是为了Lvs而设计的,我们都知道Lvs是一个四层的负载均衡设备,虽然有着高性能的优势,但同时它却没有后端服务器的健康检查机制,keepalived为lvs设计了一系列的健康检查机制TCP_CHECK,UDP_CHECK,HTTP_GET等。同时lvs也可以自己写健康检查脚脚本。或者结合ldirectory来实现后端检测。但是固LVS始终无法摆脱它是一个四层设备,无法对上层协议进行解析。Nginx不一样,Nginx是一个七层的设备可以对七层协议进行解析,可以对一些请求进行过滤,还可以对请求结果进行缓存。这些都是Nginx独有的优势。但是keepalived并没有为Nginx提供健康检测。需要自己去写一些脚步来进行健康检测。关于keepalived+lvs的构建可参考我的另一篇博文:
http://ljohn.blog.51cto.com/11932290/1980547
更新:2017.11.28
笔者写了一个Ansible-role 一键部署《keppalived+nginx 高可用集群》
https://github.com/Ljohn001/Ansible-roles/tree/master/Ansible-keepalived-nginx-role
CentOS6.9,CentOS7.4 经过反复测试没毛病。正真实现一件就搞定,都准备放弃脚本投奔Ansible了。
废话不多说开始构建 keppalived+nginx 高可用集群
1、首先还是要准备环境
四台机器(CentOS7.3)
192.168.0.56 ---> proxy1 (nginx)
192.168.0.57 ---> proxy2 (nginx)
192.168.0.58 ---> web1 (httpd)
192.168.0.59 ---> web1 (httpd)
拓扑
注:
1
2
3
4
5
6
|
1、这里需要保持4台机器时间同步 # ntpdate IP(ntp时钟服务器自行搭建不多介绍) * /5 * * * * root /usr/sbin/ntpdate 192.168.1.99 &> /dev/null ;hwclock -w
2、为了操作的方便后面配置对所有参与机器进行免密处理 # ssh-keygen -t rsa -P '' -f "/root/.ssh/id_rsa" # for i in 56 57 58 59;do ssh-copy-id -i .ssh/id_rsa.pub root@192.168.0.$i;done |
2、配置realserver(web1和web2)
a、安装web程序
注意这里的realserver 可以是任何web容器(tomcat、jetty、httpd、nginx..),因为是学习,所以这里使用httpd来演示
1
2
3
4
5
6
7
8
9
|
#yum install httpd -y #systemctl restart httpd # netstat -nultp| grep httpd##确保httpd启动 tcp6 0 0 :::80 :::* LISTEN 4619 /httpd
配置页面 web1: # echo "<h1>The page from web1(58)</h1>" > /var/www/html/index.html web2: # echo "<h1>The page from web1(59)</h1>" > /var/www/html/index.html |
b、设置VIP及内核参数
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
|
#编写脚本如下: # cat setka.sh #!/bin/bash vip=192.168.0.100 case $1 in
start) echo 1 > /proc/sys/net/ipv4/conf/all/arp_ignore
echo 1 > /proc/sys/net/ipv4/conf/lo/arp_ignore
echo 2 > /proc/sys/net/ipv4/conf/all/arp_announce
echo 2 > /proc/sys/net/ipv4/conf/lo/arp_announce
ifconfig lo:0 $vip broadcast $vip netmask 255.255.255.255 up
;; stop) ifconfig lo:0 down
echo 0 > /proc/sys/net/ipv4/conf/all/arp_ignore
echo 0 > /proc/sys/net/ipv4/conf/lo/arp_ignore
echo 0 > /proc/sys/net/ipv4/conf/all/arp_announce
echo 0 > /proc/sys/net/ipv4/conf/lo/arp_announce
;; esac # bash setka.sh start #运行该脚本设置相关内核参数及VIP地址。 # cat /proc/sys/net/ipv4/conf/lo/arp_ignore #内核参数设置成功 1 # cat /proc/sys/net/ipv4/conf/all/arp_announce 2 # ifconfig lo:0 #VIP设置成功 lo:0 Link encap:Local Loopback inet addr:192.168.137.10 Mask:255.255.255.255 UP LOOPBACK RUNNING MTU:65536 Metric:1 # scp setka.sh root@192.168.0.59:/root #拷贝一份给web2并执行 setka.sh 100% 547 0.5KB /s 00:00
|
3、配置nginx proxy1和2 负载均衡
a、两台proxy都安装nginx
1
|
yum install -y nginx
|
b、配置nginx proxy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
vim /etc/nginx/nginx .conf
upstream backserver { server 192.168.0.58:80 weight=1 max_fails=3 fail_timeout=3s; server 192.168.0.59:80 weight=2 max_fails=3 fail_timeout=3s; } server { listen 80; server_name localhost; location / { root html; index index.html index.htm; proxy_pass http: //backserver ;
proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; } |
复制一份至另一台proxy(nginx)
1
2
|
# scp /etc/nginx/nginx.conf root@192.168.0.57:/etc/nginx/nginx.conf # systemctl restart nginx;ssh 192.168.0.57 ‘systemctl restart nginx’ |
c、测试访问
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
|
# for i in {1..10};do curl http://192.168.0.56;done <h1>The page from web1(58)< /h1 >
<h1>The page from web1(59)< /h1 >
<h1>The page from web1(59)< /h1 >
<h1>The page from web1(58)< /h1 >
<h1>The page from web1(59)< /h1 >
<h1>The page from web1(59)< /h1 >
<h1>The page from web1(58)< /h1 >
<h1>The page from web1(59)< /h1 >
<h1>The page from web1(59)< /h1 >
<h1>The page from web1(58)< /h1 >
# for i in {1..10};do curl http://192.168.0.57;done <h1>The page from web1(59)< /h1 >
<h1>The page from web1(58)< /h1 >
<h1>The page from web1(59)< /h1 >
<h1>The page from web1(59)< /h1 >
<h1>The page from web1(58)< /h1 >
<h1>The page from web1(59)< /h1 >
<h1>The page from web1(59)< /h1 >
<h1>The page from web1(58)< /h1 >
<h1>The page from web1(59)< /h1 >
<h1>The page from web1(59)< /h1 >
|
#注:这里可以看出,每台proxy(nginx) 都完成了负载均衡配置,且实现了负载均衡调度。细心的同学发现后端real server 配置的权重1:2,调度大致请求调度返回也为1:2
4、配置keepalived 高可用nginx
a、每台proxy安装httpd 和keepalived
1
|
# yum install -y httpd keepalived |
每台proxy 配置Sorry页面,这里一旦后端服务器同时无法访问,会自动调用本地的httpd服务
1
2
|
# echo "<h1>Sorry,Under maintances(56).</h1>" >/var/www/html/index.html # echo "<h1>Sorry,Under maintances(57).</h1>" >/var/www/html/index.html |
1
2
3
|
# 编辑各个proxy 的 nginx.conf,添加在upstream下添加 server 127.0.0.1:8080 backup; 或者: # sed -i '/server 192.168.0.59:80 weight=2 max_fails=3 fail_timeout=3s;/a\server 127.0.0.1:8080 backup;' /etc/nginx/nginx.conf
|
b、编辑配置文件
proxy1:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
|
# cat /etc/keepalived/keepalived.conf ! Configuration File for keepalived
global_defs { notification_email { root@localhost #定义邮箱报警的邮箱地址
} notification_email_from root@localhost #定义发送报警信息的地址
smtp_server 127.0.0.1 #定义发送邮件的邮件服务器地址
smtp_connect_timeout 30 #定义发送邮件的超时时间
router_id ha_nginx #全局标识
} vrrp_script chk_nginx { #定义检查nginx服务的脚本
script "/etc/keepalived/chk_nginx.sh" #脚本重启nginx,如果进程还是无法检测到,则强制停止keepalived
interval 2 #检查的间隔时间
weight -2 #检查失败的话权重减2
fall 2 #检查失败2次才认为是真正的检查失败
} vrrp_instance VI_1 { state MASTER interface ens33 virtual_router_id 51 priority 100 #备用机器的keepalived的权重要小于这个权重,并且当nginx服务挂掉后100-2要小于备用机器的权重。
advert_int 1 smtp_alert #比较重要 定义使用邮件发送,不然上面的邮件定义都是没有用的,使用了这个当状态发生改变就会发送邮件报警
authentication { auth_type PASS auth_pass 1111 } track_script { #定义使用哪个脚本来检查。
chk_nginx } virtual_ipaddress { 192.168.0.100 /16 dev ens33 label ens33:1
} } |
##拷贝配置文件至proxy2
1
|
# scp /etc/keepalived/keepalived.conf root@192.168.0.57:/etc/keepalived/keepalived.conf |
proxy2:
修改keepalived.conf 如下两个参数:
state BACKUP
priority 99
#这里贴出我实验的配置
# cat /etc/keepalived/keepalived.conf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
|
! Configuration File for keepalived
global_defs { notification_email { root@localhost #定义邮箱报警的邮箱地址
} notification_email_from root@localhost #定义发送报警信息的地址
smtp_server 127.0.0.1 #定义发送邮件的邮件服务器地址
smtp_connect_timeout 30 #定义发送邮件的超时时间
router_id ha_nginx #全局标识
} vrrp_script chk_nginx { #定义检查nginx服务的脚本
script "/etc/keepalived/chk_nginx.sh" #脚本重启nginx,如果进程还是无法检测到,则强制停止keepalived
interval 2 #检查的间隔时间
weight -2 #检查失败的话权重减2
fall 2 #检查失败2次才认为是真正的检查失败
} vrrp_instance VI_1 { state BACKUP interface ens33 virtual_router_id 51 priority 99 #备用机器的keepalived的权重要小于这个权重,并且当nginx服务挂掉后100-2要小于备用机器的权重。
advert_int 1 smtp_alert #比较重要 定义使用邮件发送,不然上面的邮件定义都是没有用的,使用了这个当状态发生改变就会发送邮件报警
authentication { auth_type PASS auth_pass 1111 } track_script { #定义使用哪个脚本来检查。
chk_nginx } virtual_ipaddress { 192.168.0.100 /16 dev ens33 label ens33:1
} } |
c、nginx检测脚本
#脚本先判断是否有nginx进程,然后如果没有,则会重启nginx,如果进程还是无法检测到,则强制停止keepalived,启用备用节点的作为master。
#该脚本可以有效防止HA脑裂,发现无发启动nginx尝试重启,不行再将keepalived关闭,彻底移除此节点。
1
2
3
4
5
6
7
8
9
10
|
#cat /etc/keepalived/chk_nginx.sh #!/bin/bash N=` ps -C nginx --no-header| wc -l`
if [ $N - eq 0 ]; then
systemctl restart keepalived sleep 1
if [ ` ps -C nginx --no-header| wc -l` - eq 0 ]; then
systemctl stop keepalived fi fi |
d、启动keepalived及nginx
1
2
|
# systemctl restart nginx;ssh 192.168.0.57 'systemctl restart nginx' # systemctl restart keepalived.service;ssh 192.168.0.57 'systemctl restart keepalived.service' |
5、测试
1、查看地址是否在proxy1上:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link /loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1 /8 scope host lo
valid_lft forever preferred_lft forever inet6 ::1 /128 scope host
valid_lft forever preferred_lft forever 2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link /ether 00:0c:29:d6:84:65 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.56 /24 brd 192.168.0.255 scope global ens33
valid_lft forever preferred_lft forever inet 192.168.0.100 /16 scope global ens33:1
valid_lft forever preferred_lft forever inet6 fe80::20c:29ff:fed6:8465 /64 scope link
valid_lft forever preferred_lft forever |
测试网页是否能够打开。
1
2
3
4
5
6
|
# for i in {1..5};do curl http://192.168.0.100;done <h1>The page from web1(58)< /h1 >
<h1>The page from web1(59)< /h1 >
<h1>The page from web1(59)< /h1 >
<h1>The page from web1(58)< /h1 >
<h1>The page from web1(59)< /h1 >
|
这里能够正常访问,测试通过。
2、关闭主节点的keepalived测试VIP是否能够漂移至备用节点
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
[root@node2 keepalived] # ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link /loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1 /8 scope host lo
valid_lft forever preferred_lft forever inet6 ::1 /128 scope host
valid_lft forever preferred_lft forever 2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link /ether 00:0c:29:7c:92:ae brd ff:ff:ff:ff:ff:ff
inet 192.168.0.57 /24 brd 192.168.0.255 scope global ens33
valid_lft forever preferred_lft forever inet 192.168.0.100 /16 scope global ens33:1
valid_lft forever preferred_lft forever inet6 fe80::20c:29ff:fe7c:92ae /64 scope link
valid_lft forever preferred_lft forever |
查看VIP是否能正常访问
1
2
3
4
5
6
|
# for i in {1..5};do curl http://192.168.0.100;done <h1>The page from web1(59)< /h1 >
<h1>The page from web1(59)< /h1 >
<h1>The page from web1(58)< /h1 >
<h1>The page from web1(59)< /h1 >
<h1>The page from web1(59)< /h1 >
|
这里能够正常访问,测试通过。
5、由于我们配置的抢占模式,一旦主节点恢复,则VIP会直接漂回去。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
[root@node1 ~] # systemctl start keepalived
[root@node1 ~] # ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link /loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1 /8 scope host lo
valid_lft forever preferred_lft forever inet6 ::1 /128 scope host
valid_lft forever preferred_lft forever 2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link /ether 00:0c:29:d6:84:65 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.56 /24 brd 192.168.0.255 scope global ens33
valid_lft forever preferred_lft forever inet 192.168.0.100 /16 scope global ens33:1
valid_lft forever preferred_lft forever inet6 fe80::20c:29ff:fed6:8465 /64 scope link
valid_lft forever preferred_lft forever |
测试VIP访问
1
2
3
4
5
6
|
# for i in {1..5};do curl http://192.168.0.100;done <h1>The page from web1(59)< /h1 >
<h1>The page from web1(59)< /h1 >
<h1>The page from web1(58)< /h1 >
<h1>The page from web1(59)< /h1 >
<h1>The page from web1(59)< /h1 >
|
VIP地址又被主节点抢回来,服务正常访问,通过测试
最后我们来看下一下备用节点的在主节点keepalived关闭时,系统到底做了什么
##查看日志,观察VIP漂移的整个过程
1、传递MASTER的状态
2、设置自己为MASTER状态,
3、设置VIP 虚拟地址
4、发送ARP广播地址 ,地址为:192.168.0.100 #告诉大家我现在叫MASTER,对外提供服务了。
备用节点上/var/log/messages:
Nov 13 16:46:02 localhost Keepalived_vrrp[12348]: VRRP_Instance(VI_1) Transition to MASTER STATE
Nov 13 16:46:03 localhost Keepalived_vrrp[12348]: VRRP_Instance(VI_1) Entering MASTER STATE
Nov 13 16:46:03 localhost Keepalived_vrrp[12348]: VRRP_Instance(VI_1) setting protocol VIPs.
Nov 13 16:46:03 localhost Keepalived_vrrp[12348]: Sending gratuitous ARP on ens33 for 192.168.0.100
Nov 13 16:46:03 localhost Keepalived_vrrp[12348]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on ens33 for 192.168.0.100
Nov 13 16:46:03 localhost Keepalived_vrrp[12348]: Sending gratuitous ARP on ens33 for 192.168.0.100
Nov 13 16:46:03 localhost Keepalived_vrrp[12348]: Sending gratuitous ARP on ens33 for 192.168.0.100
Nov 13 16:46:03 localhost Keepalived_vrrp[12348]: Sending gratuitous ARP on ens33 for 192.168.0.100
Nov 13 16:46:03 localhost Keepalived_vrrp[12348]: Sending gratuitous ARP on ens33 for 192.168.0.100
Nov 13 16:46:03 localhost Keepalived_vrrp[12348]: Remote SMTP server [127.0.0.1]:25 connected.
FAQ:
Q:再/var/log/messages日志中发现:Unable to access script `killall`
A:因为centos7精简安装,没有killall命令,需要安装一下
# yum install psmisc -y
至此在CentOS7.3上完成了:keepalived 高可用nginx,有兴趣的同学可以玩一下,有什么问题,还请批评指正。