1.1. MMM安装规划
在主主、主从搭建正常情况下,进行如下配置。
1.1.1. 主机规划
配置环境:
name |
Host |
Write/read |
Description |
mmm_monitor |
192.168.6.100 |
Mysql-mon |
|
mysql_master1 |
192.168.6.100 |
Mysql-db1 |
|
mysql_master2 |
192.168.6.101 |
Mysql-db2 |
|
mysql_slave1 |
192.168.6.102 |
Mysql-slave1 |
|
VIP_Write |
192.168.6.51 |
VIP |
|
VIP_Read1 |
192.168.6.52 |
VIP |
|
VIP_Read2 |
VIP |
192.168.6.100和192.168.6.101是主主复制;
192.168.6.100和192.168.6.102是主从复制;
注意:监控节点最好单独放在一*立的服务器上,也可以和mysql放在一起。
1.1.2. 设置hosts解析
三台服务器配置如下(分别配置):
192.168.6.100 db1 192.168.6.101 db2 192.168.6.102 slave1 |
1.1.3. 创建mysql账号
创建账号:在上述三台主机上分别创建。
repl |
mysql>grant replication slave,replication client on *.* to ‘repl_user‘@‘%‘ identified by ‘123456‘; mysql>flush privileges; |
复制账号,用于主主复制 |
mmm_agent |
mysql>grant process,super,replication client on *.* to ‘mmm_agent‘@‘%‘ identified by ‘123456‘; mysql>flush privileges; |
代理账号,是mmm代理用来变成只读模式和同步master等的 |
mmm_monitor |
mysql>grant replication client on *.* to ‘mmm_monitor‘@‘%‘ identified by ‘123456‘; mysql>flush privileges; |
监听账号,是mmm监控服务器用来对mysql服务器做健康检查的 |
1.2. MMM依赖安装1.2.1. epel包安装
在安装mmm之前要安装epel包,Epel是企业版Linux附加软件包(Extra Packages for Enterprise Linux)的缩写,是一个由特别兴趣小组创建、维护并管理的,针对红帽企业版Linux(RHEL)及其衍生发行版(比如CentOS、Scientific Linux)的一个高质量附加软件包项目。分为32bit和64bit。
在三台服务器上分别安装:
#wget http://download.fedoraproject.or ... ease-5-4.noarch.rpm
#rpm -ivh epel-release-5-4.noarch.rpm
#yum -y install mysql-mmm*
注意区分版本。
mmm软件是基于perl的。所以,需要安装许多perl的模块。
1.2.2. MMMperl依赖包安装
安装mmm之前需要安装以下几个必须的perl模块,依赖包名称、安装方法及下载地址省略。
1.3. MMM安装
各个数据库上只需要mmm_agent进程,管理服务器上需要监控进程。
1.3.1. 安装mysql-mmm
在三台服务器上分别安装:
#wget http://download.fedoraproject.or ... ease-5-4.noarch.rpm
#rpm -ivh epel-release-5-4.noarch.rpm
#yum -y install mysql-mmm*
注意区分版本。
下载地址:
http://mysql-mmm.org/downloads/mysql-mmm-2.2.1.tar.gz
http://mysql-mmm.org/_media/:mmm2:mysql-mmm-2.2.1.pdf
安装:
[root@gflinux home]# tar -zxvf mysql-mmm-2.2.1.tar.gz
[root@gflinux home]# cd mysql-mmm-2.2.1
[root@gflinux mysql-mmm-2.2.1]#make
[root@gflinux mysql-mmm-2.2.1]# make install
1.3.2. 配置mysql-mmm监控和代理服务1.3.2.1. mmm_common.conf配置
完成安装后,所有的配置文件都放到了/etc/mysql-mmm/下面。管理服务器和数据库服务器上都要包含一个共同的文件mmm_common.conf,所有服务器配置文件内容相同,内容如下:
[root@gflinux mysql-mmm]# vi mmm_common.conf
active_master_role writer <host default> cluster_interface eth0 pid_path /var/run/mmm_agentd.pid bin_path /usr/lib/mysql-mmm/ replication_user repl_user replication_password 123456 agent_user mmm_agent agent_password 123456 </host> <host db1> ip 192.168.6.100 mode master peer db2 </host> <host db2> ip 192.168.6.101 mode master peer db1 </host> #<host db3> <host slave1> ip 192.168.6.102 mode slave </host> <role writer> hosts db1, db2 #能够作为write的服务器 ips 192.168.6.51 #write节点的虚拟ip,应用的写请求将直接连接到这个ip mode exclusive #排他模式,在这种模式下任何时候只能有一个主机拥有该角色 </role> <role reader> hosts db1, db2 ,slave1 #作为read的服务器 ips 192.168.6.52 # 192.168.0.53 #read节点的虚拟ip,应用的读请求将直接连接到这些ip mode balanced #平衡模式,该模式下可以多个主机同时拥有此角色 </role> |
1.3.2.2. mmm_agent.conf代理配置
在所有数据库服务器上配置该文件:
(1)mysql-db1上配置
[root@gflinux mysql-mmm]# vi mmm_agent.conf
include mmm_common.conf
this db1
这里db1的名称一定要与mmm_common.conf文件中的<host db1>名称相同。
(2)mysql-db1上配置
include mmm_common.conf
this db2
(3)mysql-slave1上配置
[root@gflinux203 mysql-mmm]# vi mmm_agent.conf
include mmm_common.conf
this slave1
1.3.2.3. mmm_mon.conf监控配置
在mysql-mon服务器上配置mmm_mon.conf,这个只需要在监控服务器上配置即可:
[root@gflinux203 mysql-mmm]# vi mmm_mon.conf
include mmm_common.conf <monitor> # ip 127.0.0.1 ip 192.168.6.100 pid_path /var/run/mmm_mond.pid bin_path /usr/lib/mysql-mmm/ status_path /var/lib/misc/mmm_mond.status ping_ips 192.168.6.100,192.168.6.101,192.168.6.102 #可以ping的真实代理服务器的ip auto_set_online 10 #发现节点丢失10s进行切换 </monitor> <host default> monitor_user mmm_monitor monitor_password 123456 </host> #debug 0 debug 1 |
debug 1:有利于把错误信息打印出来,对排除错误提供参考。
1.3.2.4. 可能遇到的问题
(1)监控程序服务器无法启动
在管理服务器上,一切都完成后,通过mmm_control查看状态,得到下面的错误信息:ERROR: Can‘t connect to monitor daemon! 通过编辑/etc/mysql-mmm/mmm_mon.conf文件将debug 0改为debug 1,打开监控程序的debug状态。重新启动监控程序(service mysql-mmm-monitor restart >test.txt),就会看到详细的错误信息,找不到Perl Time HiRes库。执行yum -y install perl-Time-HiRes*就可以解决。
(2)防火墙问题导致Warning: agent on host db1 is not reachable.
控制台序正确启动后,再次执行mmm_control show,却看到下面的输出:
关闭db2、db3上的防火墙,再次重启监控程序,一切回到正常状态!
1.3.3. 启动各服务器的相关服务1.3.3.1. 从各个数据库节点上启动代理程序
在数据库服务器上启动代理程序(192.169.6.100 192.168.6.101 192.169.6.102)
cd /etc/init.d/
chkconfig mysql-mmm-agent on
service mysql-mmm-agent start[|stop]
1.3.3.2. 在管理服务器上启动监控程序
在管理服务器上启动监控程序(192.168.6.100)
cd /etc/init.d/
chkconfig mysql-mmm-monitor on
service mysql-mmm-monitor start
启动后,稍等几秒,可以通过mmm_control程序查看状态:
1.3.3.3. 检查监控程序
[root@gflinux mysql-mmm]# mmm_control --help
Invalid command ‘--help‘
Valid commands are:
help - show this message
ping - ping monitor
show - show status
checks [<host>|all [<check>|all]] - show checks status
set_online <host> - set host <host> online
set_offline <host> - set host <host> offline
mode - print current mode.
set_active - switch into active mode.
set_manual - switch into manual mode.
set_passive - switch into passive mode.
move_role [--force] <role> <host> - move exclusive role <role> to host <host>
(Only use --force if you know what you are doing!)
set_ip <ip> <host> - set role with ip <ip> to host <host>
查看状态信息
[root@gflinux DBD-mysql-4.023]# mmm_control show
db1(192.168.6.100) master/ONLINE. Roles: writer(192.168.6.51)
db2(192.168.6.101) master/ONLINE. Roles:
slave1(192.168.6.102) slave/ONLINE. Roles: reader(192.168.6.52 # 192.168.0.53)
[root@gflinux DBD-mysql-4.023]# mmm_control checks all
db2 ping [last change: 2014/09/10 11:30:03] OK
db2 mysql [last change: 2014/09/10 11:49:25] OK
db2 rep_threads [last change: 2014/09/10 11:49:25] OK
db2 rep_backlog [last change: 2014/09/10 11:49:25] OK: Backlog is null
slave1 ping [last change: 2014/09/10 11:30:03] OK
slave1 mysql [last change: 2014/09/10 11:30:03] OK
slave1 rep_threads [last change: 2014/09/10 11:30:03] OK
slave1 rep_backlog [last change: 2014/09/10 11:30:03] OK: Backlog is null
db1 ping [last change: 2014/09/10 11:30:03] OK
db1 mysql [last change: 2014/09/10 11:30:03] OK
db1 rep_threads [last change: 2014/09/10 11:38:03] OK
db1 rep_backlog [last change: 2014/09/10 11:30:03] OK: Backlog is null
[root@gflinux DBD-mysql-4.023]#
其中db1负责写(192.168.6.51),slave1负责读(192.168.6.52)。这里要特别说明,这个读写分离要配合前端程序应用,也就是说你的程序要支持读写分离,把这两个vip告诉开发人员,让他们在配置文件里指定,mmm并不具有读写分离的功能,这个与官方的mysql-proxy和淘宝的Amoeba是不同的。
节点的上线和脱机:
这样不必每次重新启动mmm_control。
[root@gflinux Net-ARP-1.0.8]# mmm_control set_offline db2
OK: State of ‘db2‘ changed to ADMIN_OFFLINE. Now you can wait some time and check all roles!
[root@gflinux Net-ARP-1.0.8]# mmm_control show
db1(192.168.6.100) master/ONLINE. Roles: writer(192.168.6.51)
db2(192.168.6.101) master/ADMIN_OFFLINE. Roles:
slave1(192.168.6.102) slave/ONLINE. Roles: reader(192.168.6.52)
[root@gflinux Net-ARP-1.0.8]# mmm_control show
db1(192.168.6.100) master/ONLINE. Roles: writer(192.168.6.51)
db2(192.168.6.101) master/ADMIN_OFFLINE. Roles:
slave1(192.168.6.102) slave/ONLINE. Roles: reader(192.168.6.52)
[root@gflinux Net-ARP-1.0.8]# mmm_control set_online db2
OK: State of ‘db2‘ changed to ONLINE. Now you can wait some time and check its new roles!
[root@gflinux Net-ARP-1.0.8]# mmm_control show
db1(192.168.6.100) master/ONLINE. Roles: writer(192.168.6.51)
db2(192.168.6.101) master/ONLINE. Roles:
slave1(192.168.6.102) slave/ONLINE. Roles: reader(192.168.6.52)
1.3.3.4. 查看agent服务器上的虚拟ip
命令为:ip -f inet addr
slave1:
[root@gflinux203 mysql-mmm]# ip -f inet addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue
inet 127.0.0.1/8 scope host lo
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
inet 192.168.6.102/24 brd 192.168.6.255 scope global eth0
注意查看不到虚拟ip
使一台主机脱机后,再观察:
[root@gflinux Net-ARP-1.0.8]# mmm_control set_offline db1
OK: State of ‘db1‘ changed to ADMIN_OFFLINE. Now you can wait some time and check all roles!
[root@gflinux203 mysql-mmm]# ip -f inet addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue
inet 127.0.0.1/8 scope host lo
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
inet 192.168.6.102/24 brd 192.168.6.255 scope global eth0
inet 192.168.6.52/32 scope global eth0
db1:
[root@gflinux Net-ARP-1.0.8]# ip -f inet addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue
inet 127.0.0.1/8 scope host lo
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
inet 192.168.6.100/24 brd 192.168.6.255 scope global eth0
db2:
[root@gflinux mysql-mmm]# ip -f inet addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue
inet 127.0.0.1/8 scope host lo
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
inet 192.168.6.101/24 brd 192.168.6.255 scope global eth0
inet 192.168.6.51/32 scope global eth0
1.3.3.5. 监控项目说明
mmm_mon在每台服务器上执行四种不同的检查,以确定相关服务是否OK:
(1)ping host is pingable
(2)mysql MySQL server on the host is alive
(3)rep_threads replication threads are running
(4)rep_backlog replication backlog is not too big
1.3.4. 查看服务器日志
服务器日志存放路径:/var/log/mysql-mmm/
代理服务器日志文件(启动及切换信息):mmm_agentd.log
监控服务器日志文件(集群状态信息):mmm_mond.log
[root@gflinux mysql-mmm]# tail -f mmm_mond.log
2014/09/10 11:41:30 INFO Removing all roles from host ‘slave1‘:
2014/09/10 11:41:30 INFO Removed role ‘reader(192.168.6.52)‘ from host ‘slave1‘
2014/09/10 11:41:30 INFO Orphaned role ‘reader(192.168.6.52)‘ has been assigned to ‘db2‘
2014/09/10 11:43:32 FATAL Admin changed state of ‘slave1‘ from ADMIN_OFFLINE to ONLINE
2014/09/10 11:46:50 FATAL Admin changed state of ‘db1‘ from ONLINE to ADMIN_OFFLINE
2014/09/10 11:46:50 INFO Removing all roles from host ‘db1‘:
2014/09/10 11:46:50 INFO Removed role ‘writer(192.168.6.51)‘ from host ‘db1‘
2014/09/10 11:46:50 INFO Orphaned role ‘writer(192.168.6.51)‘ has been assigned to ‘db2‘
2014/09/10 11:46:50 INFO Moving role ‘reader(192.168.6.52)‘ from host ‘db2‘ to host ‘slave1‘
2014/09/10 11:47:04 ERROR Check ‘rep_threads‘ on ‘db1‘ has failed for 10 seconds! Message: ERROR: Replication is broken
[root@gflinux mysql-mmm]# tail -f mmm_agentd.log
2014/09/10 11:41:29 INFO We have some new roles added or old rules deleted!
2014/09/10 11:41:29 INFO Added: reader(192.168.6.52)
2014/09/10 11:41:29 FATAL Couldn‘t configure IP ‘192.168.6.52‘ on interface ‘eth0‘: undef
2014/09/10 11:46:49 INFO We have some new roles added or old rules deleted!
2014/09/10 11:46:49 INFO Deleted: reader(192.168.6.52)
2014/09/10 11:46:49 INFO Added: writer(192.168.6.51)
2014/09/10 11:46:50 FATAL Couldn‘t configure IP ‘192.168.6.51‘ on interface ‘eth0‘: undef
[root@gflinux203 mysql-mmm]# tail -f mmm_agentd.log
2014/09/10 11:42:50 INFO Changing active master to ‘db1‘
1.3.5. 模拟宕机切换测试
服务器正常状态如下:
[root@gflinux DBD-mysql-4.023]# mmm_control show
db1(192.168.6.100) master/ONLINE. Roles: writer(192.168.6.51)
db2(192.168.6.101) master/ONLINE. Roles:
slave1(192.168.6.102) slave/ONLINE. Roles: reader(192.168.6.52 # 192.168.0.53)
1.3.5.1. master宕机切换
场景一:
master宕机,切换
模拟db2同步延时,slave1同步无延时。
[root@gflinux init.d]# service mysql stop
Shutting down MySQL.. [ OK ]
vip理论上切换到slave1上,但是这里是db2上,永远不会切换到比其身份低的slave1从机上,即使他没有延时。如果此时db2再挂掉,那么整个集群全部挂掉。
[root@gflinux init.d]# mmm_control show
db1(192.168.6.100) master/HARD_OFFLINE. Roles:
db2(192.168.6.101) master/ONLINE. Roles: writer(192.168.6.51)
slave1(192.168.6.102) slave/ONLINE. Roles: reader(192.168.6.52 # 192.168.0.53)
在漂移到db2上后,必须要等到同步追完才可以切换,否则就在那里等待。
mysql> show processlist;
+----+-------------+-----------+------+-------------+------+-----------------------------------------------------------------------+-------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------------+-----------+------+-------------+------+-----------------------------------------------------------------------+-------------------------+
| 1 | system user | | NULL | Connect | 1131 | Reconnecting after a failed master event read | NULL |
| 2 | system user | | gf | Connect | 714 | Waiting for table level lock | insert into t1 select 1 |
| 13 | repl_user | db1:44093 | NULL | Binlog Dump | 1113 | Master has sent all binlog to slave; waiting for binlog to be updated | NULL |
| 44 | root | localhost | gf | Query | 0 | init | show processlist |
+----+-------------+-----------+------+-------------+------+-----------------------------------------------------------------------+-------------------------+
4 rows in set (0.00 sec)
这里有可能用到master_pos_wait函数,它会等待slave执行完全部的中继日之后,在释放锁。如果延时很大,想要硬切换,只能人工杀掉select master_pos_wait()这个进程id,此时slave1会自动执行db2,自动执行change master命令与db2机器进行同步复制,并且这个软件会自动记录db2上的点
[root@gflinux init.d]# mmm_control show
db1(192.168.6.100) master/HARD_OFFLINE. Roles:
db2(192.168.6.101) master/ONLINE. Roles: writer(192.168.6.51)
slave1(192.168.6.102) slave/ONLINE. Roles: reader(192.168.6.52 # 192.168.0.53)
再把db1重启起来:
[root@gflinux DBD-mysql-4.023]# mmm_control show
db1(192.168.6.100) master/AWAITING_RECOVERY. Roles:
db2(192.168.6.101) master/ONLINE. Roles: writer(192.168.6.51)
slave1(192.168.6.102) slave/ONLINE. Roles: reader(192.168.6.52 # 192.168.0.53)
此时vip还在db2上,并不会因db1修复好了再漂移过去
mysql> unlock tables;
Query OK, 0 rows affected (0.00 sec)
mysql> select * from t1;
+------+
| id |
+------+
| 1 |
+------+
1 row in set (0.00 sec)
mysql>
1.3.5.2. slave切换
场景二:slave切换
在slave上,只要同步挂掉,有一个线程不为yes,就会切换到另两台主机上:
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
mysql> stop slave;
Query OK, 0 rows affected (0.00 sec)
[root@gflinux DBD-mysql-4.023]# mmm_control show
db1(192.168.6.100) master/ONLINE. Roles:
db2(192.168.6.101) master/ONLINE. Roles: writer(192.168.6.51)
slave1(192.168.6.102) slave/REPLICATION_FAIL. Roles:
[root@gflinux DBD-mysql-4.023]# mmm_control show
db1(192.168.6.100) master/ONLINE. Roles: reader(192.168.6.52 # 192.168.0.53)
db2(192.168.6.101) master/ONLINE. Roles: writer(192.168.6.51)
slave1(192.168.6.102) slave/REPLICATION_FAIL. Roles:
结果slave VIP漂移到db1上。
在db1上再次执行stop slave:
[root@gflinux DBD-mysql-4.023]# mmm_control show
db1(192.168.6.100) master/REPLICATION_FAIL. Roles:
db2(192.168.6.101) master/ONLINE. Roles: reader(192.168.6.52 # 192.168.0.53), writer(192.168.6.51)
slave1(192.168.6.102) slave/REPLICATION_FAIL. Roles:
结果slave VIP漂移到db2上。
再在slave上执行start slave
[root@gflinux DBD-mysql-4.023]# mmm_control show
db1(192.168.6.100) master/REPLICATION_FAIL. Roles:
db2(192.168.6.101) master/ONLINE. Roles: writer(192.168.6.51)
slave1(192.168.6.102) slave/ONLINE. Roles: reader(192.168.6.52 # 192.168.0.53)
这时vip再次漂移到slave上。
优缺点:
优点 |
安全性、稳定性高,可扩展性好,高可用,当主服务器挂掉以后,另一个主立即接管,其他的从服务器能自动切换,不用人工干预 |
缺点 |
至少三个节点,对主机的数量有要求,需要实现读写分离,对程序来说是个挑战。 |