安装:
这里对源码编译进行一下说明,本文实例的操作系统是Ubuntu16.04,使用Redis的版本是3.2.0。安装步骤如下:
-
下载源码包:w g et h tt p:/ /d o wn loa d.redis.io/releases/redis-3.2.0.tar.gz
-
安装依赖包:sudo apt-get install gcc tcl
-
解压编译 :
#tar zxvf redis-3.2.0.tar.gz
...
...
#make...
Hint: It's a good idea to run 'make test' ;)#make test
...
\o/ All tests passed without errors!...
#make install注意:这里很可能会在make test 这步出现一个错误:
[err]: Test replication partial resync: ok psync (diskless: yes, reconnect: 1) in tests/integration/replication-psync.tcl
Expected condition '[s -1 sync_partial_ok] > 0' to be true ([s -1 sync_partial_ok] > 0)
出现这个问题的原因可能是"测试点在配置比较低的机器上会因为超时而过不了",本文的环境是一个lxc的虚拟机。不过有2个方法可以避免:
1:在解压目录中修改
# vi tests/integration/replication-psync.tcl
把 after 100 改成 after 5002:用taskset来make test
# taskset -c 1 make test到此redis编译安装完成。
-
编译文件的目录里有2个配置:
redis.conf、sentinel.conf,配置文件说明请见这篇文章。 -
本文测试的环境架构:
3个redis实例1主、2从、3sentinel。M:10.0.3.110、S:10.0.3.92、10.0.3.66,每个redis实例上配置一个sentinel实例。修改配置文件:
redis.conf -
View Code
sentinel.conf
port 16379dir "/var/lib/sentinel_16379"logfile "/var/log/redis/sentinel_16379.log"daemonize yesprotected-mode nosentinel monitor dxy 10.0.3.110 6379 2sentinel auth-pass dxy dxydxy
sentinel down-after-milliseconds dxy 15000sentinel failover-timeout dxy 120000#发生切换之后执行的一个自定义脚本:如发邮件、vip切换等
#sentinel notification-script <master-name> <script-path>#sentinel client-reconfig-script <master-name> <script-path>配置文件保存在 /etc/redis/目录下,按照配置文件创建相应的目录。和Redis 复制、Sentinel的搭建和原理说明这里不同的是各个redis实例都配置了密码访问的限制(requirepass)。
注意:当一个master配置需要密码才能连接时,客户端和slave在连接时都需要提供密码。master通过requirepass设置自身的密码,不提供密码无法连接到这个master。slave通过masterauth来设置访问master时的密码。客户端需要auth提供密码,但是当使用了sentinel时,由于一个master可能会变成一个slave,一个slave也可能会变成master,所以需要同时设置上述两个配置项,并且sentinel需要连接master和slave,需要设置参数:sentinel auth-pass <master_name> xxxxx。 -
创建redis用户和组,把配置文件里指定的目录均授权。
# useradd redis
# groupadd redis
# chown -R redis.redis redis/# chown -R redis.redis /etc/redis/ -
开启各个redis实例
redis-server /etc/redis/redis.conf
注意:开启的时redis的日志会报几个WARNING:
-
29407:M 14 Jun 14:36:42.186 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.处理:修改/etc/sysctl.conf文件,增加一行 net.core.somaxconn= 1024;然后执行命令:sysctl -p29407:M 14 Jun 14:36:42.186 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.处理:echo 1 > /proc/sys/vm/29407:M 14 Jun 14:36:42.187 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.处理:echo never > /sys/kernel/mm/transparent_hugepage/enabled
WARNING说明:
View Code
-
建立好复制后(slaveof)开启各个sentinel实例
-
redis-sentinel /etc/redis/sentinel.conf
注意:这里出现一个问题,这个问题罪魁祸首是参数:protected-mode。看下日志:
2208:X 14 Jun 23:13:09.185 * +sentinel sentinel ebf9b1b4a5cc98bffead5d0996b8f43deb806641 10.0.3.92 16379 @ dxy 10.0.3.110 63792208:X 14 Jun 23:13:24.234 # +sdown sentinel ebf9b1b4a5cc98bffead5d0996b8f43deb806641 10.0.3.92 16379 @ dxy 10.0.3.110 63792208:X 14 Jun 23:14:18.888 * +sentinel sentinel 07e189ae6c30d4951d3eb48e9effd948de026c3b 10.0.3.66 16379 @ dxy 10.0.3.110 63792208:X 14 Jun 23:14:33.962 # +sdown sentinel 07e189ae6c30d4951d3eb48e9effd948de026c3b 10.0.3.66 16379 @ dxy 10.0.3.110 6379
从日志里可以看到,除了本地的sentinel正常,其他2个sentinel都主观不可用了(SDOWN),时间刚好15秒(down-after-milliseconds 15000),sentinel会向master发送心跳PING来确认master是否存活,如果master在“一定时间范围”内不回应PONG 或者是回复了一个错误消息,那么这个sentinel会主观地(单方面地)认为这个master已经不可用了(subjectively down, 也简称为SDOWN)。而这个down-after-milliseconds就是用来指定这个“一定时间范围”的,单位是毫秒。
通过时间点的判断可以看到,sentinel之间发现不了对方,导致SDOWN(从Redis 复制、Sentinel的搭建和原理说明里介绍的发现机制)。因为没有错误信息,这里找了半天原因都没发现什么问题。最后登陆sentinel上查看一下:# redis -h 10.0.3.110 -p 1637910.0.3.110:16379> infoDENIED Redis is running in protected mode because protected mode is enabled, no bind address was specified, no authentication password is requested to clients. In this mode connections are only accepted from the loopback interface. If you want to connect from external computers to Redis you may adopt one of the following solutions: 1) Just disable protected mode sending the command 'CONFIG SET protected-mode no' from the loopback interface by connecting to Redis from the same host the server is running, however MAKE SURE Redis is not publicly accessible from internet if you do so. Use CONFIG REWRITE to make this change permanent. 2) Alternatively you can just disable the protected mode by editing the Redis configuration file, and setting the protected mode option to 'no', and then restarting the server. 3) If you started the server manually just for testing, restart it with the '--protected-mode no' option. 4) Setup a bind address or an authentication password. NOTE: You only need to do one of the above things in order for the server to start accepting connections from the outside.
这里看到一大串的信息,总的就是在说redis在没有开启bind和密码的情况下,保护模式被开启。然后Redis的只接受来自环回IPv4和IPv6地址的连接。拒绝外部连接,使用户知道发生了什么错误。其实应该为用户提供了线索,而不是拒绝连接。具体的说明可以看作者的讨论,最后作者给出的建议是关闭保护模式:--portected-mode no。所以最后我们这里的错误信息可以得到解释:由于sentinel没有指定bind和密码访问,所以被开启了protected-mode保护模式,拒绝其他sentinel的连接。导致进入了ODWON。在sentinel.conf里加入:
protected-mode no
问题得到解决。portected-mode是3.2被引入,默认开启。具体的信息如下:
View Code
-
开启sentinel,查看日志:(成功开启)
2253:X 14 Jun 23:48:05.477 # Sentinel ID is 68fdb1e07c0998b119e4678f7aead7742a7b1f642253:X 14 Jun 23:48:05.477 # +monitor master dxy 10.0.3.110 6379 quorum 22253:X 14 Jun 23:48:05.478 * +slave slave 10.0.3.92:6379 10.0.3.92 6379 @ dxy 10.0.3.110 63792253:X 14 Jun 23:48:05.512 * +slave slave 10.0.3.66:6379 10.0.3.66 6379 @ dxy 10.0.3.110 63792253:X 14 Jun 23:48:14.894 * +sentinel sentinel b2fb07a1cce853ddec86a993428fb09edf15b6c1 10.0.3.92 16379 @ dxy 10.0.3.110 63792253:X 14 Jun 23:48:23.346 * +sentinel sentinel d9b198d75ede190fc63d95af8a7ca58e1a395c9b 10.0.3.66 16379 @ dxy 10.0.3.110 6379
-
查看状态,验证sentinel是否建立成功。(任意登陆一个sentinel查看)
10.0.3.92:16379> info sentinel# Sentinel
sentinel_masters:1sentinel_tilt:0sentinel_running_scripts:0sentinel_scripts_queue_length:0sentinel_simulate_failure_flags:0master0:name=dxy,status=ok,address=10.0.3.110:6379,slaves=2,sentinels=3上面粗体的字说明sentinel开启成功。
测试:
注意:因为上面的虚拟机连不了邮件服务器,所以更换了环境。新环境:版本2.8.4,3个redis实例1主、2从、3sentinel。M:192.168.200.208<6379>、S:192.168.200.199、192.168.200.73,每个redis实例上配置一个sentinel<7379>实例。
① 查看:info
192.168.200.208:6379> info replication# Replication
role:master
connected_slaves:2slave0:ip=192.168.200.199,port=6379,state=online,offset=354835,lag=0slave1:ip=192.168.200.73,port=6379,state=online,offset=354835,lag=0master_repl_offset:354974 repl_backlog_active:1repl_backlog_size:5242880 repl_backlog_first_byte_offset:2repl_backlog_histlen:354973192.168.200.208:6379>192.168.200.208:7379> info sentinel# Sentinel
sentinel_masters:1sentinel_tilt:0sentinel_running_scripts:0sentinel_scripts_queue_length:0
View Code
② 验证failover
kill 掉 master,通过日志查看是切换过程的信息:
View Code
start 老的master,通过日志查看:
View Code
更多的日志信息见上一篇文章。在sentinel里有个选项client-reconfig-script,接下来说明下。
failover脚本:高可用,通过参数 client-reconfig-script 指定脚本:failover发生时候执行的脚本。
该参数的解释:
View Code
返回的参数:
<master-name> <role> <state> <from-ip> <from-port> <to-ip> <to-port>
脚本的目的是在发生failover之后,发送邮件报警,并且把vip切换到新的master上,有点类似MySQL的MHA,脚本比较简单,没有做其他多余的判断,也可以根据复杂的情况加强这个脚本。实现方法:
①:首先在三台redis实例上建立信任用密码登陆。
用ssh-keygen创建公钥,一直默认回车,最后会在.ssh/下面生成id_rsa.pubssh-keygen -t rsa
把id_rsa.pub 文件复制到另外2台机子并导入公钥:
cat id_rsa.pub >> /root/.ssh/authorized_keys
这里需要注意:因为测试中的sentinel实例和redis实例是放一起的,要是本地的sentinel要操作(down,up VIP)redis实例,也需要本地也可以访问本地,即自己ssh-keygen创建的公钥也要放到自己的authorized_keys中,最后每个服务器的authorized_keys都相互包含(三行)。
②:第一次执行的时候需要在master上先设置vip,即搭好redis sentinel之后,就需要在master上设置好vip。
③:通过收集日志,取得所需要的ip。
④:发送、记录日志,并且远程执行up、down VIP。
在此之前首先要安装paramiko模块:easy_install paramiko,需要依赖包:apt-get install python-setuptools python-dev build-essential libffi-dev libssl-dev;或则直接执行:apt-get install python-paramiko。
具体脚本如下:
#!/usr/bin/env python#-*-encoding:utf8-*-#------------------------------------------------# Name: notify.py# Purpose: failover切换后的操作# Author: zhoujy# Created: 2016-06-17#------------------------------------------------import osimport sysimport timeimport datetimeimport smtplibimport subprocessimport fileinputimport loggingimport paramikofrom email.mime.text import MIMETextfrom email.mime.multipart import MIMEMultipartfrom email.Utils import COMMASPACE, formatdate
reload(sys)
sys.setdefaultencoding('utf8')def send_mail(to, subject, text, from_mail, server="localhost"):
message = MIMEMultipart()
message['From'] = from_mail
message['To'] = COMMASPACE.join(to)
message['Date'] = formatdate(localtime=True)
message['Subject'] = subject
message.attach(MIMEText(text,_charset='utf-8'))
smtp = smtplib.SMTP(server)
smtp.sendmail(from_mail, to, message.as_string())
smtp.close()#关vipdef down_vip(hostname,port):
ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect(hostname=hostname,port=port)
stdin, stdout, stderr = ssh.exec_command("ifconfig eth0:0 down")# print stdout.readlines()
if not stderr.readlines() : print "down vip ok..."
else : print stderr.readlines()
ssh.close()#开vipdef up_vip(hostname,port,vip):
ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect(hostname=hostname,port=port)
stdin, stdout, stderr = ssh.exec_command("ifconfig eth0:0 %s;arping -c 3 -A %s;hash -r" %(vip,vip))# print stdout.readlines()
if not stderr.readlines() : print "up vip ok..."
else : print stderr.readlines()
ssh.close()if __name__ == "__main__":#服务器端口
ssh_port = 22#指定VIP
vip = '192.168.200.2'#通过logging.basicConfig函数对日志的输出格式及方式做相关配置
logging.basicConfig(level=logging.INFO,
format=':::%(levelname)s::: \n%(message)s',
datefmt='%a, %d %b %Y %H:%M:%S',
filename='/var/log/redis/failover.txt',
filemode='a')#定义一个StreamHandler,将INFO级别的日志信息打印到标准错误,并将其添加到当前的日志处理对象
console = logging.StreamHandler()
console.setLevel(logging.INFO)
formatter = logging.Formatter('%(name)-12s: %(levelname)-8s %(message)s')
console.setFormatter(formatter)
logging.getLogger('').addHandler(console)
time = (datetime.datetime.now()).strftime("%Y-%m-%d %H:%M:%S")
message = sys.argv[1:]
master_name = sys.argv[1]
role = sys.argv[2]
stats = sys.argv[3]
from_ip = sys.argv[4]
from_port = sys.argv[5]
to_ip = sys.argv[6]
to_port = sys.argv[7]
messages = "++++++++++++++++++++++++++"+time+" failover++++++++++++++++++++++++++"+'\n'+' '.join(message)
subject = ''' Redis 【%s】 Failover ''' %master_name
info = ''' %s : Redis Master %s failover %s(%s:%s) to %s(%s:%s) succeeded ! ''' %(time,master_name,from_ip,from_ip,from_port,to_ip,to_ip,to_port)
mail_list =['zjy@dxyer.com'] if role == 'leader':
logging.info(messages) down_vip(from_ip,ssh_port)
up_vip(to_ip,ssh_port,vip)
send_mail(mail_list, subject.encode("utf8"), info +' and VIP do sucessed !!', "Redis_failover_report@ls.xxx.net", server="192.168.xxx.xxx")
当发生切换时,最终邮件报警的内容如下:
2016-06-17 19:06:42 : Redis Master dxy failover 192.168.200.73(192.168.200.73:6379) to 192.168.200.208(192.168.200.208:6379) succeeded ! and VIP do sucessed !!
日志里记录的信息如下:
::INFO:::++++++++++++++++++++++++++2016-06-17 19:06:42 failover++++++++++++++++++++++++++dxy leader start 192.168.200.73 6379 192.168.200.208 6379:::INFO:::
Connected (version 2.0, client OpenSSH_6.6.1p1)
:::INFO:::
Authentication (publickey) successful!:::INFO:::
Connected (version 2.0, client OpenSSH_6.6.1p1)
:::INFO:::
Authentication (publickey) successful!
BTW:程序可以直接连vip访问Redis,实现一定的高可用:当vip切换的时候,服务会断开,多久不可用主要看设置的检测时间(down-after-milliseconds:默认30秒,可以设置更低,如5000即5秒)和程序重连的时间。当然也可以直接用java的jedis客户端访问,直接实现高可用(通过sentinel中的信息得到master,再连master)。