Redis Sentinel 高可用实现说明

2023-08-21 19:42:28

背景：

前面介绍了Redis 复制、Sentinel的搭建和原理说明，通过这篇文章大致能了解Sentinel的原理和实现方法以及相关的搭建。这篇文章就针对Redis Sentinel的搭建做下详细的说明。

安装：

这里对源码编译进行一下说明，本文实例的操作系统是Ubuntu16.04，使用Redis的版本是3.2.0。安装步骤如下：

下载源码包：wget http://download.redis.io/releases/redis-3.2.0.tar.gz
安装依赖包：sudo apt-get install gcc tcl
解压编译：
```
#tar zxvf redis-3.2..tar.gz

...

...

#make

...

Hint: It's a good idea to run 'make test' ;)

#make test

...

\o/ All tests passed without errors!

...

#make install
```
注意：这里很可能会在make test 这步出现一个错误：

[err]: Test replication partial resync: ok psync (diskless: yes, reconnect: 1) in tests/integration/replication-psync.tcl

Expected condition '[s -1 sync_partial_ok] > 0' to be true ([s -1 sync_partial_ok] > 0)

出现这个问题的原因可能是"测试点在配置比较低的机器上会因为超时而过不了"，本文的环境是一个lxc的虚拟机。不过有2个方法可以避免：
```
:在解压目录中修改

# vi tests/integration/replication-psync.tcl

把 after  改成 after 

：用taskset来make test

# taskset -c  make test
```
到此redis编译安装完成。
编译文件的目录里有2个配置：
redis.conf、sentinel.conf，配置文件说明请见这篇文章。
本文测试的环境架构：
3个redis实例1主、2从、3sentinel。M：10.0.3.110、S：10.0.3.92、10.0.3.66，每个redis实例上配置一个sentinel实例。修改配置文件：
redis.conf

# Redis configuration file example.

# ./redis-server /path/to/redis.conf

################################## INCLUDES ###################################

# include /path/to/local.conf

# include /path/to/other.conf

################################## NETWORK #####################################

bind 10.0.3.110

protected-mode yes

port 

tcp-backlog 

unixsocket "/tmp/redis.sock"

unixsocketperm 

timeout 

tcp-keepalive 

################################# GENERAL #####################################

daemonize yes

pidfile "/var/run/redis6379.pid"

loglevel notice

logfile "/var/log/redis/redis_6379.log"

# syslog-enabled no

# syslog-ident redis

# syslog-facility local0

databases

supervised no

################################ SNAPSHOTTING  ################################

save

save

save  

stop-writes-on-bgsave-error yes

rdbcompression yes

rdbchecksum yes

dbfilename "dump_6379.rdb"

dir "/var/lib/redis_6379"

################################# REPLICATION #################################

# slaveof <masterip> <masterport>

masterauth "dxydxy"

slave-serve-stale-data yes

slave-read-only yes

repl-diskless-sync no

repl-diskless-sync-delay 

# repl-ping-slave-period

# repl-timeout 

repl-disable-tcp-nodelay no

repl-backlog-size 5mb

repl-backlog-ttl 

slave-priority 

#min-slaves-to-write

#min-slaves-max-lag 

################################## SECURITY ###################################

requirepass "dxydxy"

# rename-command CONFIG b840fc02d524045429941cc15f59e41cb7be6c52

# rename-command CONFIG ""

################################### LIMITS ####################################

maxclients

#maxmemory <bytes>

maxmemory-policy noeviction

# maxmemory-samples 

############################## APPEND ONLY MODE ###############################

appendonly yes

appendfilename "appendonly_6379.aof"

# appendfsync always

appendfsync everysec

# appendfsync no

no-appendfsync-on-rewrite no

auto-aof-rewrite-percentage

auto-aof-rewrite-min-size 64mb

aof-load-truncated yes

################################ LUA SCRIPTING  ###############################

lua-time-limit 

################################ REDIS CLUSTER  ###############################

# cluster-enabled yes

# cluster-config-file nodes-.conf

# cluster-node-timeout

# cluster-slave-validity-factor

# cluster-migration-barrier

# cluster-require-full-coverage yes

################################## SLOW LOG ###################################

slowlog-log-slower-than

slowlog-max-len 

################################ LATENCY MONITOR ##############################

latency-monitor-threshold 

############################# EVENT NOTIFICATION ##############################

notify-keyspace-events ""

############################### ADVANCED CONFIG ###############################

hash-max-ziplist-entries

hash-max-ziplist-value 

list-max-ziplist-entries

list-max-ziplist-value 

list-compress-depth

set-max-intset-entries 

zset-max-ziplist-entries

zset-max-ziplist-value 

hll-sparse-max-bytes 

activerehashing yes

client-output-buffer-limit normal

client-output-buffer-limit slave 256mb 64mb

client-output-buffer-limit pubsub 32mb 8mb 

hz

aof-rewrite-incremental-fsync yes

list-max-ziplist-size -

sentinel.conf

port 

dir "/var/lib/sentinel_16379"

logfile "/var/log/redis/sentinel_16379.log"

daemonize yes

protected-mode no

sentinel monitor dxy 10.0.3.110  

sentinel auth-pass dxy dxydxy

sentinel down-after-milliseconds dxy 

sentinel failover-timeout dxy 

#发生切换之后执行的一个自定义脚本：如发邮件、vip切换等

#sentinel notification-script <master-name> <script-path>

#sentinel client-reconfig-script <master-name> <script-path>

配置文件保存在 /etc/redis/目录下，按照配置文件创建相应的目录。和Redis 复制、Sentinel的搭建和原理说明这里不同的是各个redis实例都配置了密码访问的限制（requirepass）。
注意：当一个master配置需要密码才能连接时，客户端和slave在连接时都需要提供密码。master通过requirepass设置自身的密码，不提供密码无法连接到这个master。slave通过masterauth来设置访问master时的密码。客户端需要auth提供密码，但是当使用了sentinel时，由于一个master可能会变成一个slave，一个slave也可能会变成master，所以需要同时设置上述两个配置项，并且sentinel需要连接master和slave，需要设置参数：sentinel auth-pass <master_name> xxxxx。

创建redis用户和组，把配置文件里指定的目录均授权。

# useradd redis

# groupadd redis

# chown -R redis.redis redis/

# chown -R redis.redis /etc/redis/

开启各个redis实例
```
redis-server /etc/redis/redis.conf
```

注意：开启的时redis的日志会报几个WARNING：

:M  Jun ::42.186 # WARNING: The TCP backlog setting of  cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of .
处理：修改/etc/sysctl.conf文件，增加一行 net.core.somaxconn= ；然后执行命令：sysctl -p

:M  Jun ::42.186 # WARNING overcommit_memory is set to ! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
处理：echo  > /proc/sys/vm/

:M  Jun ::42.187 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
处理：echo never > /sys/kernel/mm/transparent_hugepage/enabled

WARNING说明：

net.core.somaxconn是linux中的一个kernel参数，表示socket监听（listen）的backlog上限。

backlog是socket的监听队列，当一个请求（request）尚未被处理或建立时，他会进入backlog。

而socket server可以一次性处理backlog中的所有请求，处理后的请求不再位于监听队列中。

当server处理请求较慢，以至于监听队列被填满后，新来的请求会被拒绝。

所以说net.core.somaxconn限制了接收新 TCP 连接侦听队列的大小。

对于一个经常处理新连接的高负载 web服务环境来说，默认的  太小了。大多数环境这个值建议增加到  或者更多。

overcommit_memory参数说明：

设置内存分配策略（可选，根据服务器的实际情况进行设置）

/proc/sys/vm/overcommit_memory

可选值：、、。

， 表示内核将检查是否有足够的可用内存供应用进程使用；如果有足够的可用内存，内存申请允许；否则，内存申请失败，并把错误返回给应用进程。

， 表示内核允许分配所有的物理内存，而不管当前的内存状态如何。

， 表示内核允许分配超过所有物理内存和交换空间总和的内存

注意：redis在dump数据的时候，会fork出一个子进程，理论上child进程所占用的内存和parent是一样的，比如parent占用的内存为8G，这个时候也要同样分配8G的内存给child,如果内存无法负担，往往会造成redis服务器的down机或者IO负载过高，效率下降。所以这里比较优化的内存分配策略应该设置为 （表示内核允许分配所有的物理内存，而不管当前的内存状态如何）。

建立好复制后（slaveof）开启各个sentinel实例

redis-sentinel /etc/redis/sentinel.conf

注意：这里出现一个问题，这个问题罪魁祸首是参数：protected-mode。看下日志：

:X  Jun ::09.185 * +sentinel sentinel ebf9b1b4a5cc98bffead5d0996b8f43deb806641 10.0.3.92  @ dxy 10.0.3.110

:X  Jun ::24.234 # +sdown sentinel ebf9b1b4a5cc98bffead5d0996b8f43deb806641 10.0.3.92  @ dxy 10.0.3.110

:X  Jun ::18.888 * +sentinel sentinel 07e189ae6c30d4951d3eb48e9effd948de026c3b 10.0.3.66  @ dxy 10.0.3.110

:X  Jun ::33.962 # +sdown sentinel 07e189ae6c30d4951d3eb48e9effd948de026c3b 10.0.3.66  @ dxy 10.0.3.110

从日志里可以看到，除了本地的sentinel正常，其他2个sentinel都主观不可用了（SDOWN），时间刚好15秒(down-after-milliseconds 15000)，sentinel会向master发送心跳PING来确认master是否存活，如果master在“一定时间范围”内不回应PONG 或者是回复了一个错误消息，那么这个sentinel会主观地(单方面地)认为这个master已经不可用了(subjectively down, 也简称为SDOWN)。而这个down-after-milliseconds就是用来指定这个“一定时间范围”的，单位是毫秒。
通过时间点的判断可以看到，sentinel之间发现不了对方，导致SDOWN（从Redis 复制、Sentinel的搭建和原理说明里介绍的发现机制）。因为没有错误信息，这里找了半天原因都没发现什么问题。最后登陆sentinel上查看一下：

# redis -h 10.0.3.110 -p

10.0.3.110:> info

DENIED Redis is running in protected mode because protected mode is enabled, no bind address was specified, no authentication password is requested to clients. In this mode connections are only accepted from the loopback interface. If you want to connect from external computers to Redis you may adopt one of the following solutions: ) Just disable protected mode sending the command 'CONFIG SET protected-mode no' from the loopback interface by connecting to Redis from the same host the server is running, however MAKE SURE Redis is not publicly accessible from internet if you do so. Use CONFIG REWRITE to make this change permanent. ) Alternatively you can just disable the protected mode by editing the Redis configuration file, and setting the protected mode option to 'no', and then restarting the server. ) If you started the server manually just for testing, restart it with the '--protected-mode no' option. ) Setup a bind address or an authentication password. NOTE: You only need to do one of the above things in order for the server to start accepting connections from the outside.

这里看到一大串的信息，总的就是在说redis在没有开启bind和密码的情况下，保护模式被开启。然后Redis的只接受来自环回IPv4和IPv6地址的连接。拒绝外部连接，使用户知道发生了什么错误。其实应该为用户提供了线索，而不是拒绝连接。具体的说明可以看作者的讨论，最后作者给出的建议是关闭保护模式：--portected-mode no。所以最后我们这里的错误信息可以得到解释：由于sentinel没有指定bind和密码访问，所以被开启了protected-mode保护模式，拒绝其他sentinel的连接。导致进入了ODWON。在sentinel.conf里加入：

protected-mode no

问题得到解决。portected-mode是3.2被引入，默认开启。具体的信息如下：

# Protected mode is a layer of security protection, in order to avoid that

# Redis instances left open on the internet are accessed and exploited.

#

# When protected mode is on and if:

#

# ) The server is not binding explicitly to a set of addresses using the

#    "bind" directive.

# ) No password is configured.

#

# The server only accepts connections from clients connecting from the

# IPv4 and IPv6 loopback addresses 127.0.0.1 and ::, and from Unix domain

# sockets.

#

# By default protected mode is enabled. You should disable it only if

# you are sure you want clients from other hosts to connect to Redis

# even if no authentication is configured, nor a specific set of interfaces

# are explicitly listed using the "bind" directive.

protected-mode yes

开启sentinel，查看日志：(成功开启）

:X  Jun ::05.477 # Sentinel ID is 68fdb1e07c0998b119e4678f7aead7742a7b1f64

:X  Jun ::05.477 # +monitor master dxy 10.0.3.110  quorum

:X  Jun ::05.478 * +slave slave 10.0.3.92: 10.0.3.92  @ dxy 10.0.3.110

:X  Jun ::05.512 * +slave slave 10.0.3.66: 10.0.3.66  @ dxy 10.0.3.110

:X  Jun ::14.894 * +sentinel sentinel b2fb07a1cce853ddec86a993428fb09edf15b6c1 10.0.3.92  @ dxy 10.0.3.110

:X  Jun ::23.346 * +sentinel sentinel d9b198d75ede190fc63d95af8a7ca58e1a395c9b 10.0.3.66  @ dxy 10.0.3.110

查看状态，验证sentinel是否建立成功。（任意登陆一个sentinel查看）

10.0.3.92:> info sentinel

# Sentinel

sentinel_masters:

sentinel_tilt:

sentinel_running_scripts:

sentinel_scripts_queue_length:

sentinel_simulate_failure_flags:

master0:name=dxy,status=ok,address=10.0.3.110:,slaves=,sentinels=

上面粗体的字说明sentinel开启成功。

测试：

注意：因为上面的虚拟机连不了邮件服务器，所以更换了环境。新环境：版本2.8.4，3个redis实例1主、2从、3sentinel。M：192.168.200.208<6379>、S：192.168.200.199、192.168.200.73，每个redis实例上配置一个sentinel<7379>实例。

① 查看：info

192.168.200.208:6379> info replication

# Replication

role:master

connected_slaves:

slave0:ip=192.168.200.199,port=,state=online,offset=,lag=

slave1:ip=192.168.200.73,port=,state=online,offset=,lag=

master_repl_offset:

repl_backlog_active:

repl_backlog_size:

repl_backlog_first_byte_offset:

repl_backlog_histlen:

192.168.200.208:>

192.168.200.208:7379> info sentinel

# Sentinel

sentinel_masters:

sentinel_tilt:

sentinel_running_scripts:

sentinel_scripts_queue_length:

192.168.200.208:> sentinel master dxy

 ) "name"

 ) "dxy"

 ) "ip"

 ) "192.168.200.208"

 ) "port"

 ) ""

 ) "runid"

 ) "50ad7cfe6676fc1a1e671ead4a780958942879fc"

 ) "flags"

) "master"

) "pending-commands"

) ""

) "last-ok-ping-reply"

) ""

) "last-ping-reply"

) ""

) "info-refresh"

) ""

) "role-reported"

) "master"

) "role-reported-time"

) ""

) "config-epoch"

) ""

) "num-slaves"

) ""

) "num-other-sentinels"

) ""

) "quorum"

) ""

) "down-after-milliseconds"

) ""

) "failover-timeout"

) ""

) "parallel-syncs"

) ""

) "client-reconfig-script"

) "/opt/bin/notify.py"

192.168.200.208:> sentinel slaves dxy

)  ) "name"

    ) "192.168.200.199:6379"

    ) "ip"

    ) "192.168.200.199"

    ) "port"

    ) ""

    ) "runid"

    ) "c4e7bf53f7cee3c28bc369e1db656f879bf41947"

    ) "flags"

   ) "slave"

   ) "pending-commands"

   ) ""

   ) "last-ok-ping-reply"

   ) ""

   ) "last-ping-reply"

   ) ""

   ) "info-refresh"

   ) ""

   ) "role-reported"

   ) "slave"

   ) "role-reported-time"

   ) ""

   ) "master-link-down-time"

   ) ""

   ) "master-link-status"

   ) "ok"

   ) "master-host"

   ) "192.168.200.208"

   ) "master-port"

   ) ""

   ) "slave-priority"

   ) ""

   ) "slave-repl-offset"

   ) ""

)  ) "name"

    ) "192.168.200.73:6379"

    ) "ip"

    ) "192.168.200.73"

    ) "port"

    ) ""

    ) "runid"

    ) "64ad290c43bba2b062220029c4c91274bb4465b9"

    ) "flags"

   ) "slave"

   ) "pending-commands"

   ) ""

   ) "last-ok-ping-reply"

   ) ""

   ) "last-ping-reply"

   ) ""

   ) "info-refresh"

   ) ""

   ) "role-reported"

   ) "slave"

   ) "role-reported-time"

   ) ""

   ) "master-link-down-time"

   ) ""

   ) "master-link-status"

   ) "ok"

   ) "master-host"

   ) "192.168.200.208"

   ) "master-port"

   ) ""

   ) "slave-priority"

   ) ""

   ) "slave-repl-offset"

   ) ""

② 验证failover

kill 掉 master，通过日志查看是切换过程的信息：

[]  Jun ::08.728 # +sdown master dxy 192.168.200.208    #进入客观不可用

[]  Jun ::08.819 # +odown master dxy 192.168.200.208    #quorum / #投票好之后进入主观不可用

[]  Jun ::08.819 # +new-epoch                              #版本号

[]  Jun ::08.819 # +try-failover master dxy 192.168.200.208   #达到failover条件，正等待其他sentinel的选举

[]  Jun ::08.819 # +vote-for-leader 38da843c4ad8baf95dcfdcd968ae6c2f05ab995c   #选举出leader

[]  Jun ::08.820 # 192.168.200.199: voted for 38da843c4ad8baf95dcfdcd968ae6c2f05ab995c

[]  Jun ::08.820 # 192.168.200.73: voted for 38da843c4ad8baf95dcfdcd968ae6c2f05ab995c

[]  Jun ::08.909 # +elected-leader master dxy 192.168.200.208  #选择leader

[]  Jun ::08.909 # +failover-state-select-slave master dxy 192.168.200.208  #选择一个slave当选新master

[]  Jun ::08.965 # +selected-slave slave 192.168.200.73: 192.168.200.73  @ dxy 192.168.200.208  #选择了从73作为master

[]  Jun ::08.965 * +failover-state-send-slaveof-noone slave 192.168.200.73: 192.168.200.73  @ dxy 192.168.200.208  #当把选择为新master的slave的身份进行切换

[]  Jun ::09.017 * +failover-state-wait-promotion slave 192.168.200.73: 192.168.200.73  @ dxy 192.168.200.208  #等待其他sentinel的确认

[]  Jun ::09.867 # +promoted-slave slave 192.168.200.73: 192.168.200.73  @ dxy 192.168.200.208  #确认成功

[]  Jun ::09.867 # +failover-state-reconf-slaves master dxy 192.168.200.208  #Failover状态变为reconf-slaves

[]  Jun ::09.957 * +slave-reconf-sent slave 192.168.200.199: 192.168.200.199  @ dxy 192.168.200.208  #sentinel发送SLAVEOF命令把它重新配置，重新配置到新主

[]  Jun ::10.887 * +slave-reconf-inprog slave 192.168.200.199: 192.168.200.199  @ dxy 192.168.200.208  #slave被重新配置为另外一个master的slave，但数据复制还未发生

[]  Jun ::10.887 * +slave-reconf-done slave 192.168.200.199: 192.168.200.199  @ dxy 192.168.200.208  #slave被重新配置为另外一个master的slave并且数据复制已经与master同步

[]  Jun ::10.946 # -odown master dxy 192.168.200.208  #老主离开主观不可用

[]  Jun ::10.946 # +failover-end master dxy 192.168.200.208  ##failover成功完成

[]  Jun ::10.946 # +switch-master dxy 192.168.200.208  192.168.200.73  #监听新的master

[]  Jun ::10.946 * +slave slave 192.168.200.199: 192.168.200.199  @ dxy 192.168.200.73  #发现slave

[]  Jun ::10.947 * +slave slave 192.168.200.208: 192.168.200.208  @ dxy 192.168.200.73

[]  Jun ::40.960 # +sdown slave 192.168.200.208: 192.168.200.208  @ dxy 192.168.200.73

start 老的master，通过日志查看：

[]  Jun ::01.856 # -sdown slave 192.168.200.208: 192.168.200.208  @ dxy 192.168.200.73

[]  Jun ::11.793 * +convert-to-slave slave 192.168.200.208: 192.168.200.208  @ dxy 192.168.200.73   #failover 成功！

更多的日志信息见上一篇文章。在sentinel里有个选项client-reconfig-script，接下来说明下。

failover脚本：高可用，通过参数 client-reconfig-script 指定脚本：failover发生时候执行的脚本。

该参数的解释：

# When the master changed because of a failover a script can be called in

# order to perform application-specific tasks to notify the clients that the

# configuration has changed and the master is at a different address.

#

# The following arguments are passed to the script:

#

# <master-name> <role> <state> <from-ip> <from-port> <to-ip> <to-port>

#

# <state> is currently always "failover"

# <role> is either "leader" or "observer"

#

# The arguments from-ip, from-port, to-ip, to-port are used to communicate

# the old address of the master and the new address of the elected slave

# (now a master).

#

# This script should be resistant to multiple invocations.

返回的参数：

<master-name> <role> <state> <from-ip> <from-port> <to-ip> <to-port>

脚本的目的是在发生failover之后，发送邮件报警，并且把vip切换到新的master上，有点类似MySQL的MHA，脚本比较简单，没有做其他多余的判断，也可以根据复杂的情况加强这个脚本。实现方法：

①：首先在三台redis实例上建立信任用密码登陆。

用ssh-keygen创建公钥，一直默认回车，最后会在.ssh/下面生成id_rsa.pub

ssh-keygen -t rsa  

把id_rsa.pub 文件复制到另外2台机子并导入公钥：

cat id_rsa.pub >> /root/.ssh/authorized_keys

这里需要注意：因为测试中的sentinel实例和redis实例是放一起的，要是本地的sentinel要操作(down,up VIP)redis实例，也需要本地也可以访问本地，即自己ssh-keygen创建的公钥也要放到自己的authorized_keys中，最后每个服务器的authorized_keys都相互包含（三行）。

②：第一次执行的时候需要在master上先设置vip，即搭好redis sentinel之后，就需要在master上设置好vip。

③：通过收集日志，取得所需要的ip。

④：发送、记录日志，并且远程执行up、down VIP。

在此之前首先要安装paramiko模块：easy_install paramiko，需要依赖包：apt-get install python-setuptools python-dev build-essential libffi-dev libssl-dev；或则直接执行：apt-get install python-paramiko。

具体脚本如下：logging说明

#!/usr/bin/env python

#-*-encoding:utf8-*-

#------------------------------------------------

# Name:        notify.py

# Purpose:     failover切换后的操作

# Author:      zhoujy

# Created:     2016-06-17

#------------------------------------------------

import os

import sys

import time

import datetime

import smtplib

import subprocess

import fileinput

import logging

import paramiko

from email.mime.text import MIMEText

from email.mime.multipart import MIMEMultipart

from email.Utils import COMMASPACE, formatdate

reload(sys)

sys.setdefaultencoding('utf8')

def send_mail(to, subject, text, from_mail, server="localhost"):

    message = MIMEMultipart()

    message['From'] = from_mail

    message['To'] = COMMASPACE.join(to)

    message['Date'] = formatdate(localtime=True)

    message['Subject'] = subject

    message.attach(MIMEText(text,_charset='utf-8'))

    smtp = smtplib.SMTP(server)

    smtp.sendmail(from_mail, to, message.as_string())

    smtp.close()

#关vip

def down_vip(hostname,port):

    ssh = paramiko.SSHClient()

    ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())

    ssh.connect(hostname=hostname,port=port)

    stdin, stdout, stderr = ssh.exec_command("ifconfig eth0:0 down")

#    print stdout.readlines()

    if  not stderr.readlines() :

        print "down vip ok..."

    else :

        print stderr.readlines()

    ssh.close()

#开vip

def up_vip(hostname,port,vip):

    ssh = paramiko.SSHClient()

    ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())

    ssh.connect(hostname=hostname,port=port)

    stdin, stdout, stderr = ssh.exec_command("ifconfig eth0:0 %s;arping -c 3 -A %s;hash -r" %(vip,vip))

#    print stdout.readlines()

    if  not stderr.readlines() :

        print "up vip ok..."

    else :

        print stderr.readlines()

    ssh.close()

if __name__ == "__main__":

#服务器端口

    ssh_port = 22

#指定VIP

    vip      = '192.168.200.2'

#通过logging.basicConfig函数对日志的输出格式及方式做相关配置

    logging.basicConfig(level=logging.INFO,

                format=':::%(levelname)s::: \n%(message)s',

                datefmt='%a, %d %b %Y %H:%M:%S',

                filename='/var/log/redis/failover.txt',

                filemode='a')

#定义一个StreamHandler，将INFO级别的日志信息打印到标准错误，并将其添加到当前的日志处理对象

    console = logging.StreamHandler()

    console.setLevel(logging.INFO)

    formatter = logging.Formatter('%(name)-12s: %(levelname)-8s %(message)s')

    console.setFormatter(formatter)

    logging.getLogger('').addHandler(console)

    time =  (datetime.datetime.now()).strftime("%Y-%m-%d %H:%M:%S")

    message = sys.argv[1:]

    master_name = sys.argv[1]

    role = sys.argv[2]

    stats = sys.argv[3]

    from_ip = sys.argv[4]

    from_port = sys.argv[5]

    to_ip = sys.argv[6]

    to_port = sys.argv[7]

    messages = "++++++++++++++++++++++++++"+time+" failover++++++++++++++++++++++++++"+'\n'+' '.join(message)

    subject = ''' Redis 【%s】 Failover ''' %master_name

    info = ''' %s : Redis Master %s failover %s(%s:%s) to %s(%s:%s) succeeded ! '''  %(time,master_name,from_ip,from_ip,from_port,to_ip,to_ip,to_port)

    mail_list =['zjy@dxyer.com']

    if role == 'leader':

        logging.info(messages)

        down_vip(from_ip,ssh_port)

        up_vip(to_ip,ssh_port,vip)

        send_mail(mail_list, subject.encode("utf8"), info +' and VIP do sucessed !!', "Redis_failover_report@ls.xxx.net", server="192.168.xxx.xxx")

当发生切换时，最终邮件报警的内容如下：

-- :: : Redis Master dxy failover 192.168.200.73(192.168.200.73:) to 192.168.200.208(192.168.200.208:) succeeded !  and VIP do sucessed !!

日志里记录的信息如下：

::INFO:::

++++++++++++++++++++++++++-- :: failover++++++++++++++++++++++++++

dxy leader start 192.168.200.73  192.168.200.208

:::INFO:::

Connected (version 2.0, client OpenSSH_6..1p1)

:::INFO:::

Authentication (publickey) successful!

:::INFO:::

Connected (version 2.0, client OpenSSH_6..1p1)

:::INFO:::

Authentication (publickey) successful!

BTW：程序可以直接连vip访问Redis，实现一定的高可用：当vip切换的时候，服务会断开，多久不可用主要看设置的检测时间(down-after-milliseconds：默认30秒，可以设置更低，如5000即5秒)和程序重连的时间。当然也可以直接用java的jedis客户端访问，直接实现高可用（通过sentinel中的信息得到master，再连master）。

总结：

通过Redis 复制、Sentinel的搭建和原理说明和本文大致的了解redis sentinel 高可用的实现，sentinel比较简单在压力不大，单机可以满足需求的情况下，redis sentinel是一个不错的选择。

参考文档：

Redis 复制、Sentinel的搭建和原理说明

码农公寓

相关文章