最近项目上线部署,要求redis作高可用,由于redis cluster还不是特别成熟,就选择了redis sentinel做高可用。redis本身有replication,实现主从备份。结合sentinel可以做主、从自动切换。
生产环境中,一般要求有3个redis节点。但本文为了试验方便,只用了两个节点,一主一从。
部署规划
172.16.203.10 主节点
172.16.203.4 从节点
redis版本为3.0.1
主节点
redis采用源码编译的方式安装,非常简单,解压出来,进入解压目录,执行make就可以了,这里就不再详细介绍了。
下面来看redis.conf需要做的修改。
daemonize yes #让redis后台运行
pidfile /apps/run/redis/redis.pid #指定redis的pid文件存放位置
port 6379 #redis使用端口
logfile "/apps/logs/redis/redis.log" #log文件的位置。如果为空,则默认打印到/dev/null
requirepass 123456 #redis的密码,如果不需要密码验证,则可以不做修改
masterauth 123456 #如果上面设置了redis的密码,则这里必须设置,而且要和他一样。当该节点作为从节点连接主节点时,要用到这个密码和主节点做校验。
启动redis: src/redis-server redis.conf
查看当前主从状态: src/redis-cli -h 172.16.203.10 -a 123456 info Replication
# Replication
role:master
connected_slaves:0
master_repl_offset:544693
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:544692
可以看到,172.16.203.10为master,当前没有slave。
接下来,就该配置sentinel.conf了:
port 26379 #sentinel使用的端口
daemonize yes #sentinel后台运行。这行配置是添加的
logfile "/apps/logs/redis/sentinel.log" #log文件地址,这行配置是添加的
sentinel monitor mymaster 172.16.203.10 6379 1 #指定master。后面的数字表示,当有几个节点认为主节点down时才认为主节点进入ODOWN状态,就是真正挂了。
sentinel down-after-milliseconds mymaster 5000 #当多久,连接不上节点时,认为被连接节点进入S_DOWN(主观认为它down了);
sentinel failover-timeout mymaster 15000 #这个配置有很多作用。1、重新执行failover的时间是该值的2倍;2、取消一个没更改配置的failover3、failover中等待所有slave更改新的配置的最大时间。
sentinel auth-pass mymaster 123456 #设置校验的密码。如果redis设置了密码,这个一定要设置
要修改的就是上面几项,一定要特别注意sentinel auth-pass这一项,别忘记改 。修改好后,先拷贝一个备份。因为运行过程中,redis会自动修改这个配置。如果之后出了问题,可以通过备份恢复成最开始正确的状态。
启动sentinel
src/redis-sentine sentinel.conf
查看sentinel log:
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 3.0.1 (00000000/0) 64 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in sentinel mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 26379
| `-._ `._ / _.-' | PID: 19957
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | http://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'
19957:X 12 Dec 13:13:36.746 # Sentinel runid is 6ab6f8abdc3dba4097da202954ecece7bc6d3215
19957:X 12 Dec 13:13:36.746 # +monitor master mymaster 172.16.203.10 6379 quorum 1
第一行表示当前Sentinel 的id,第二行显示当前的主节点是172.16.203.10 6379
查看下午Sentinel的状态: src/redis-cli -h 172.16.203.10 -a 123456 -p 26379 info Sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
master0:name=mymaster,status=ok,address=172.16.203.10:6379,slaves=0,sentinels=1
从节点
redis.conf的配置与主节点只有一点不同,增加下面一行:
slaveof 172.16.203.10 6379
启动redis
src/redis-server redis.conf
在从节点查看主、从状态:
src/redis-cli -h 172.16.203.4 -a 123456 info Replication
# Replication
role:slave
master_host:172.16.203.10
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_repl_offset:617956
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
可以看到当前节点为slave。
sentinel的配置和主节点保持一致就可以,启动sentinel: src/redis-sentine sentinel.conf
查看sentinel log:
12190:X 12 Dec 13:21:38.658 # Sentinel runid is 270f322d0f3f8605b92902417e499cedc8866163
12190:X 12 Dec 13:21:38.658 # +monitor master mymaster 172.16.203.10 6379 quorum 1
12190:X 12 Dec 13:21:38.659 * +slave slave 172.16.203.4:6379 172.16.203.4 6379 @ mymaster 172.16.203.10 6379
12190:X 12 Dec 13:21:39.609 * +sentinel sentinel 172.16.203.10:26379 172.16.203.10 26379 @ mymaster 172.16.203.10 6379
查看下sentinel状态: src/redis-cli -h 172.16.203.4 -a 123456 -p 26379 info Sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
master0:name=mymaster,status=ok,address=172.16.203.10:6379,slaves=1,sentinels=2
可以看出,当前有两个sentinels,一个slave。
到此,redis主从高可用就算配置结束了,下面开始验证
验证
1、从节点down机,redis、sentinel都挂了,关注主节点sentinel的log
+sdown sentinel 172.16.203.4:26379 172.16.203.4 26379 @ mymaster 172.16.203.10 6379
+sdown slave 172.16.203.4:6379 172.16.203.4 6379 @ mymaster 172.16.203.10 6379
2、重新启动从节点上的redis、sentinel
-sdown slave 172.16.203.4:6379 172.16.203.4 6379 @ mymaster 172.16.203.10 6379
-sdown sentinel 172.16.203.4:26379 172.16.203.4 26379 @ mymaster 172.16.203.10 6379
-dup-sentinel master mymaster 172.16.203.10 6379 #duplicate of 172.16.203.4:26379 or 0b0bf0cddcf7aa5b518a8a62c65188f9c4a1ecaf
+sentinel sentinel 172.16.203.4:26379 172.16.203.4 26379 @ mymaster 172.16.203.10 6379
可以看到,sentinel 的id变了,自动更新了sentinel配置文件中的相应配置。
查看主、从情况:
src/redis-cli -h 172.16.203.10 -a 123456 -p 6379 info Replication
# Replication
role:master
connected_slaves:1
slave0:ip=172.16.203.4,port=6379,state=online,offset=862487,lag=1
master_repl_offset:862642
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:862641
3、主节点down机
先停掉redis看主节点sentinel:
19957:X 12 Dec 15:10:19.207 # +sdown master mymaster 172.16.203.10 6379
19957:X 12 Dec 15:10:19.207 # +odown master mymaster 172.16.203.10 6379 #quorum 1/1
19957:X 12 Dec 15:10:19.207 # +new-epoch 1
19957:X 12 Dec 15:10:19.207 # +try-failover master mymaster 172.16.203.10 6379
19957:X 12 Dec 15:10:19.208 # +vote-for-leader 6ab6f8abdc3dba4097da202954ecece7bc6d3215 1
19957:X 12 Dec 15:10:19.211 # 172.16.203.4:26379 voted for 6ab6f8abdc3dba4097da202954ecece7bc6d3215 1
19957:X 12 Dec 15:10:19.275 # +elected-leader master mymaster 172.16.203.10 6379
19957:X 12 Dec 15:10:19.275 # +failover-state-select-slave master mymaster 172.16.203.10 6379
19957:X 12 Dec 15:10:19.375 # +selected-slave slave 172.16.203.4:6379 172.16.203.4 6379 @ mymaster 172.16.203.10 6379
19957:X 12 Dec 15:10:19.375 * +failover-state-send-slaveof-noone slave 172.16.203.4:6379 172.16.203.4 6379 @ mymaster 172.16.203.10 6379
19957:X 12 Dec 15:10:19.447 * +failover-state-wait-promotion slave 172.16.203.4:6379 172.16.203.4 6379 @ mymaster 172.16.203.10 6379
19957:X 12 Dec 15:10:20.216 # +promoted-slave slave 172.16.203.4:6379 172.16.203.4 6379 @ mymaster 172.16.203.10 6379
19957:X 12 Dec 15:10:20.216 # +failover-state-reconf-slaves master mymaster 172.16.203.10 6379
19957:X 12 Dec 15:10:20.297 # +failover-end master mymaster 172.16.203.10 6379
19957:X 12 Dec 15:10:20.297 # +switch-master mymaster 172.16.203.10 6379 172.16.203.4 6379
19957:X 12 Dec 15:10:20.298 * +slave slave 172.16.203.10:6379 172.16.203.10 6379 @ mymaster 172.16.203.4 6379
19957:X 12 Dec 15:10:25.350 # +sdown slave 172.16.203.10:6379 172.16.203.10 6379 @ mymaster 172.16.203.4 6379
redis主节点挂了后,首先重新选择leader(注意区分leader和master,leader对应sentinel,master对应redis),可以看到,leader选择为172.16.203.10,之后他开始选择master:
failover-state-select-slave
下面表示找到了合适的slave:172.16.203.4 6379
selected-slave 172.16.203.4 6379
然后更改选中的这个节点的配置文件
failover-state-send-slaveof-noone
等待其他sentinel的确认:
failover-state-wait-promotion
确认成功:
promoted-slave
开始对slaves进行reconfig操作。
failover-state-reconf-slaves
failover结束
failover-end
监听新的master
switch-master
看看从节点的sentinel日志:
24199:X 12 Dec 15:10:19.210 # +vote-for-leader 6ab6f8abdc3dba4097da202954ecece7bc6d3215 1
24199:X 12 Dec 15:10:19.249 # +sdown master mymaster 172.16.203.10 6379
24199:X 12 Dec 15:10:19.249 # +odown master mymaster 172.16.203.10 6379 #quorum 1/1
24199:X 12 Dec 15:10:19.249 # Next failover delay: I will not start a failover before Sat Dec 12 15:10:50 2015
24199:X 12 Dec 15:10:20.299 # +config-update-from sentinel 172.16.203.10:26379 172.16.203.10 26379 @ mymaster 172.16.203.10 6379
24199:X 12 Dec 15:10:20.299 # +switch-master mymaster 172.16.203.10 6379 172.16.203.4 6379
24199:X 12 Dec 15:10:20.299 * +slave slave 172.16.203.10:6379 172.16.203.10 6379 @ mymaster 172.16.203.4 6379
24199:X 12 Dec 15:10:25.315 # +sdown slave 172.16.203.10:6379 172.16.203.10 6379 @ mymaster 172.16.203.4 6379
再停掉master的sentinel
+sdown sentinel 172.16.203.10:26379 172.16.203.10 26379 @ mymaster 172.16.203.4 6379
问题
1、停掉一个sentinel,然后再停掉master,sentinel一直这个状态:
18430:X 12 Dec 11:36:37.949 # +new-epoch 68
18430:X 12 Dec 11:36:37.949 # +try-failover master mymaster 127.0.0.1 6380
18430:X 12 Dec 11:36:39.179 # +vote-for-leader 1c9ea5336e95283251d9e53dccf8f6dedd51536d 68
18430:X 12 Dec 11:36:48.077 # -failover-abort-not-elected master mymaster 127.0.0.1 6380
18430:X 12 Dec 11:36:48.177 # Next failover delay: I will not start a failover before Sat Dec 12 11:42:38 2015
18430:X 12 Dec 11:42:38.057 # +new-epoch 69
18430:X 12 Dec 11:42:38.057 # +try-failover master mymaster 127.0.0.1 6380
18430:X 12 Dec 11:42:38.106 # +vote-for-leader 1c9ea5336e95283251d9e53dccf8f6dedd51536d 69
18430:X 12 Dec 11:42:48.443 # -failover-abort-not-elected master mymaster 127.0.0.1 6380
18430:X 12 Dec 11:42:48.544 # Next failover delay: I will not start a failover before Sat Dec 12 11:48:38 2015
这里要提下sentinel的leader选举流程:每个发现主服务器进入客观下线的sentinel,在发送is-master-down-by-addr询问的时候,
会带上自己的run id,要求其他sentinel将自己设置为局部领头sentinel。局部领头sentinel是先到先得:只有第一个发送is-master-down-by-addr询问的sentinel被设为局部领头sentinel,后续的都会被拒绝。如果有某个sentinel被**半数以上**sentinel设置局部领头sentinel,则这个sentinel成为领头sentinel。
注意半数以上 ,虽然我们停掉了一个sentinel,但由于配置文件纪录了他,所以sentinel数量还是2。半数以上也就是2,但实际我们只有一个sentinel,因此永远也选不出leader,也就不会进行failover。
---------------------
作者:sdlyjzh
来源:CSDN
原文:https://blog.csdn.net/sdlyjzh/article/details/50274499
版权声明:本文为博主原创文章,转载请附上博文链接!