MySQL MHA高可用

目录


MHA高可用架构介绍

MHA(Master High Availability)是一款开源的 MySQL 的高可用解决方案,基于perl语言实现,由日本DeNA公司开发。它为 MySQL 主从复制架构提供了 Automating Master Failover 功能。在MySQL故障切换过程中,MHA能做到0-30s之内自动完成数据库的故障切换操作,并且在进行故障切换过程中,MHA能最大程度保证数据库的一致性,以达到真正意义上的高可用。

MHA 高可用方案由两个部分构成:

  • MHA Manager(管理节点):Manager可以独立部署在一*立的机器上管理多个Master-Slave集群,也可以部署在一台Slave上。当Master出现故障时,它可以自动将最新数据的Slave提升为Master,然后将所有其他的Slave重新指向新的Master。整个故障切换对应用程序是完全透明的。

    Manager工具包主要包括以下几个工具:
    masterha_manger             启动mamager的脚本
    masterha_stop	            关闭manager的脚本
    masterha_check_ssh          检查MHA的SSH配置状况
    masterha_check_repl         检查MySQL主从复制状况  
    masterha_check_status       检测当前MHA运行状态 
    masterha_master_monitor     检测master是否宕机 
    masterha_master_switch      控制故障转移(自动或者手动)
    masterha_conf_host          添加或删除配置的server信息
    
  • MHA Node(数据节点):所有节点都要安装

    Node工具包主要包括以下几个工具(这些工具通常由MHA Manager的脚本触发,无需人为操作):
    save_binary_logs            保存和复制master的二进制日志 
    apply_diff_relay_logs       识别差异的中继日志事件并将其差异的事件应用于其他的Slave
    purge_relay_logs            清除中继日志(不会阻塞SQL线程)
    

MHA适用场景:目前MHA主要支持一主多从的架构,要搭建MHA,要求一个复制集群必须最少有3台数据库服务器(一主二从,一台充当Master、一台充当备用Master、一台充当从库)。出于成本考虑,淘宝在此基础上进行了改造,目前淘宝开发的TMHA已经支持一主一从。

MHA高可用架构原理

  1. MHA监控节点 (通过配置文件获取所有节点信息)
    系统、网络、SSH连接性、主从状态(重点是主库)

  2. 如果主库宕机,MHA处理过程如下:
    2.1 选主:
        (1) 如果判断从库(position或者GTID),数据有差异,最接近于Master的slave,成为备选主。
        (2) 如果判断从库(position或者GTID),数据一致,按照配置文件顺序选主。
        (3) 如果设定有权重(candidate_master=1),按照权重强制指定备选主。
             a. 默认情况下如果一个slave落后master 100M的relay logs的话,即使有权重,也会失效。
             b. 如果check_repl_delay=0的话,即使落后很多日志,也强制选择其为备选主。
    2.2 数据补偿
        (1) 当SSH能连接时,从库对比主库GTID 或者position号,立即将二进制日志保存至各个从节点并且应用(save_binary_logs脚本 )
        (2) 当SSH不能连接时,MHA无法保存二进制文件,只能进行故障转移但是丢失最新数据,可以对比从库之间的relaylog的差异(apply_diff_relay_logs脚本)
    2.3 Failover
    将故障节点踢出集群,备选主进行身份切换,并对外提供服务;其余从库和新主库确认新的主从关系。

其他功能:

  1. 应用透明:使用VIP机制实现应用透明
  2. 故障切换通知(send_report)
  3. 二次数据补偿(binlog_server)
  4. 自动修复主库并加入集群(待开发)

MHA高可用环境搭建

MHA基础架构搭建

环境规划

MySQL MHA高可用

环境准备

配置一主双从环境

  • 主库:192.168.1.5

    db01 [(none)]>show processlist ;
    +----+------+-------------------+------+------------------+------+---------------------------------------------------------------+------------------+
    | Id | User | Host              | db   | Command          | Time | State                                                         | Info             |
    +----+------+-------------------+------+------------------+------+---------------------------------------------------------------+------------------+
    |  4 | root | localhost         | NULL | Query            |    0 | starting                                                      | show processlist |
    |  5 | repl | 192.168.1.6:41668 | NULL | Binlog Dump GTID |   31 | Master has sent all binlog to slave; waiting for more updates | NULL             |
    |  6 | repl | 192.168.1.7:38244 | NULL | Binlog Dump GTID |   27 | Master has sent all binlog to slave; waiting for more updates | NULL             |
    +----+------+-------------------+------+------------------+------+---------------------------------------------------------------+------------------+
    3 rows in set (0.00 sec)
    
  • 从库:192.168.1.6

    db02 [(none)]>show slave status\G ;
    *************************** 1. row ***************************
                   Slave_IO_State: Waiting for master to send event
                      Master_Host: 192.168.1.5
                      Master_User: repl
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: mysql-bin.000003
              Read_Master_Log_Pos: 194
                   Relay_Log_File: slave01-relay-bin.000002
                    Relay_Log_Pos: 367
            Relay_Master_Log_File: mysql-bin.000003
                 Slave_IO_Running: Yes
                Slave_SQL_Running: Yes
    
    
  • 从库:192.168.1.7

    db03 [(none)]>show slave status \G;
    *************************** 1. row ***************************
                   Slave_IO_State: Waiting for master to send event
                      Master_Host: 192.168.1.5
                      Master_User: repl
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: mysql-bin.000003
              Read_Master_Log_Pos: 194
                   Relay_Log_File: slave02-relay-bin.000002
                    Relay_Log_Pos: 367
            Relay_Master_Log_File: mysql-bin.000003
                 Slave_IO_Running: Yes
                Slave_SQL_Running: Yes
    

配置关键程序软链接

由于MHA软件脚本中写的是命令的绝对路径/usr/bin/mysqlbinlog与/usr/bin/mysql,因此为了使用MHA,就要为mysqlbinlog和mysql可执行文件建立软链接。

[root@localhost ~]# ln -s /data/mysql/bin/mysqlbinlog    /usr/bin/mysqlbinlog
[root@localhost ~]# ln -s /data/mysql/bin/mysql          /usr/bin/mysql

配置各节点互信

  1. 各节点生成密钥

    [root@master ~]# ssh-keygen -t rsa
    [root@slave01 ~]# ssh-keygen -t rsa
    [root@slave02 ~]# ssh-keygen -t rsa
    
  2. 将各节点的公钥文件汇总到一个总的认证文件authorized_keys中

    [root@slave01 ~]# scp authorized_keys 192.168.1.5:/root/a1_keys
    [root@slave02 ~]# scp authorized_keys 192.168.1.5:/root/a2_keys
    [root@master ~]# cat /root/a1_keys >> authorized_keys 
    [root@master ~]# cat /root/a2_keys >> authorized_keys 
    
  3. 分发汇总的公钥文件

    [root@master ~]# cp authorized_keys .ssh/
    [root@master ~]# scp authorized_keys 192.168.1.6:/root/.ssh/
    [root@master ~]# scp authorized_keys 192.168.1.7:/root/.ssh/
    
  4. 测试

    [root@master ~]# ssh 192.168.1.5 date
    Mon Oct  4 20:09:48 CST 2021
    [root@master ~]# ssh 192.168.1.6 date
    Mon Oct  4 20:09:51 CST 2021
    [root@master ~]# ssh 192.168.1.7 date
    Mon Oct  4 20:09:53 CST 2021
    
    [root@slave01 ~]# ssh 192.168.1.5 date
    Mon Oct  4 20:10:13 CST 2021
    [root@slave01 ~]# ssh 192.168.1.6 date
    Mon Oct  4 20:10:16 CST 2021
    [root@slave01 ~]# ssh 192.168.1.7 date
    Mon Oct  4 20:10:18 CST 2021
    
    [root@slave02 ~]# ssh 192.168.1.5 date
    Mon Oct  4 20:10:43 CST 2021
    [root@slave02 ~]# ssh 192.168.1.6 date
    Mon Oct  4 20:10:44 CST 2021
    [root@slave02 ~]# ssh 192.168.1.7 date
    Mon Oct  4 20:10:47 CST 2021
    

安装软件

  1. 下载MHA软件

    [root@slave02 ~]# wget https://github.com/yoshinorim/mha4mysql-manager/releases/download/v0.58/mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
    [root@slave02 ~]# wget https://github.com/yoshinorim/mha4mysql-node/releases/download/v0.58/mha4mysql-node-0.58-0.el7.centos.noarch.rpm
    
  2. 各节点安装node依赖包

    [root@slave01 ~]# yum install -y perl-DBD-MySQL
    
  3. 各节点安装node软件

    [root@master ~]# rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm 
    [root@slave01 ~]# rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm 
    [root@slave02 ~]# rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm
    
  4. db03节点安装manager依赖包

    [root@slave02 ~]# yum install -y perl-Config-Tiny epel-release perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes
    
  5. db03节点安装manager软件

    [root@slave02 ~]# rpm -ivh mha4mysql-manager-0.58-0.el7.centos.noarch.rpm 
    
  6. 创建MHA专用监控管理用户(主库)

    db01 [(none)]>grant all privileges on *.* to mha@'192.168.1.%' identified by 'mha';
    

配置文件

  • 准备manager配置文件

    # 创建配置文件目录
    [root@slave02 ~]# mkdir -p /etc/mha
    # 创建日志目录
    [root@slave02 ~]# mkdir -p /var/log/mha/mysql
    # 编辑MHA配置文件
    [root@slave02 ~]# vim /etc/mha/mysql.cnf
    [server default]                         # 总配置信息
    manager_log=/var/log/mha/mysql/manager   # 日志信息
    manager_workdir=/var/log/mha/mysql       # 存放日志的目录
    master_binlog_dir=/data/binlog           # 主库二进制日志的位置
    user=mha                                 # 专用mha监控用户   
    password=mha                             # mha密码
    ping_interval=2                          # 探测节点状态的时间间隔
    repl_user=repl                           # 主从复制用户  (构建新的主从环境change master to需要主从复制用户和密码)
    repl_password=123                        # 主从复制密码
    ssh_user=root                            # ssh连接用户    
    
    [server1]                                # 节点1
    hostname=192.168.1.5
    port=3306                                  
    
    [server2]                                # 节点2
    hostname=192.168.1.6
    port=3306
    
    [server3]                                # 节点3
    hostname=192.168.1.7
    port=3306
    

    主库宕机谁来接管?

    1. 所有从节点日志一致,则默认以配置文件的顺序选择新master;
    2. 从节点日志不一致,自动选择最接近主库的从库;
    3. 若某节点设置权重(candidate_master=1),权重节点优先选择。但如果该节点日志落后主库100M的话,也不会被选择;
    4. 通过设置check_repl_delay=0,关闭日志量的检查,强制选择候选节点。

    额外参数:
    ping_interval=1:设置监控主库发送ping包的时间间隔,尝试三次没有回应的时候自动进行failover


    candidate_master=1:设为候选master,设置该参数后发送主从切换会将此库提升为主库,即使该库不是集群中事件最新的slave


    check_repl_delay=0:默认情况下如果一个slave落后master 100M的relaylog的话,MHA不会选择该slave作为一个新的master(即使设置了candidate_master=1);通过设置check_repl_delay=0,MHA触发切换在选择一个新的master 的时候会忽略复制延时


    candidate_master=1 + check_repl_delay=0可以保证设置candidate_master=1一定是新的master


    应用场景:两地三中心、Keepalived实现VIP

状态检查

  • 互信状态检查

    [root@slave02 ~]# masterha_check_ssh  --conf=/etc/mha/mysql.cnf 
    Mon Oct  4 21:02:33 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
    Mon Oct  4 21:02:33 2021 - [info] Reading application default configuration from /etc/mha/mysql.cnf..
    Mon Oct  4 21:02:33 2021 - [info] Reading server configuration from /etc/mha/mysql.cnf..
    Mon Oct  4 21:02:33 2021 - [info] Starting SSH connection tests..
    Mon Oct  4 21:02:40 2021 - [debug] 
    Mon Oct  4 21:02:33 2021 - [debug]  Connecting via SSH from root@192.168.1.5(192.168.1.5:22) to root@192.168.1.6(192.168.1.6:22)..
    Mon Oct  4 21:02:39 2021 - [debug]   ok.
    Mon Oct  4 21:02:39 2021 - [debug]  Connecting via SSH from root@192.168.1.5(192.168.1.5:22) to root@192.168.1.7(192.168.1.7:22)..
    Mon Oct  4 21:02:40 2021 - [debug]   ok.
    Mon Oct  4 21:02:40 2021 - [debug] 
    Mon Oct  4 21:02:33 2021 - [debug]  Connecting via SSH from root@192.168.1.6(192.168.1.6:22) to root@192.168.1.5(192.168.1.5:22)..
    Mon Oct  4 21:02:39 2021 - [debug]   ok.
    Mon Oct  4 21:02:39 2021 - [debug]  Connecting via SSH from root@192.168.1.6(192.168.1.6:22) to root@192.168.1.7(192.168.1.7:22)..
    Mon Oct  4 21:02:40 2021 - [debug]   ok.
    Mon Oct  4 21:02:40 2021 - [debug] 
    Mon Oct  4 21:02:34 2021 - [debug]  Connecting via SSH from root@192.168.1.7(192.168.1.7:22) to root@192.168.1.5(192.168.1.5:22)..
    Mon Oct  4 21:02:39 2021 - [debug]   ok.
    Mon Oct  4 21:02:39 2021 - [debug]  Connecting via SSH from root@192.168.1.7(192.168.1.7:22) to root@192.168.1.6(192.168.1.6:22)..
    Mon Oct  4 21:02:40 2021 - [debug]   ok.
    Mon Oct  4 21:02:40 2021 - [info] All SSH connection tests passed successfully.
    
  • 主从状态检查

    [root@slave02 ~]# masterha_check_repl  --conf=/etc/mha/mysql.cnf
    Mon Oct  4 21:04:31 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
    Mon Oct  4 21:04:31 2021 - [info] Reading application default configuration from /etc/mha/mysql.cnf..
    Mon Oct  4 21:04:31 2021 - [info] Reading server configuration from /etc/mha/mysql.cnf..
    Mon Oct  4 21:04:31 2021 - [info] MHA::MasterMonitor version 0.58.
    Mon Oct  4 21:04:34 2021 - [info] GTID failover mode = 1
    Mon Oct  4 21:04:34 2021 - [info] Dead Servers:
    Mon Oct  4 21:04:34 2021 - [info] Alive Servers:
    Mon Oct  4 21:04:34 2021 - [info]   192.168.1.5(192.168.1.5:3306)
    Mon Oct  4 21:04:34 2021 - [info]   192.168.1.6(192.168.1.6:3306)
    Mon Oct  4 21:04:34 2021 - [info]   192.168.1.7(192.168.1.7:3306)
    Mon Oct  4 21:04:34 2021 - [info] Alive Slaves:
    Mon Oct  4 21:04:34 2021 - [info]   192.168.1.6(192.168.1.6:3306)  Version=5.7.20-log (oldest major version between slaves) log-bin:enabled
    Mon Oct  4 21:04:34 2021 - [info]     GTID ON
    Mon Oct  4 21:04:34 2021 - [info]     Replicating from 192.168.1.5(192.168.1.5:3306)
    Mon Oct  4 21:04:34 2021 - [info]   192.168.1.7(192.168.1.7:3306)  Version=5.7.20-log (oldest major version between slaves) log-bin:enabled
    Mon Oct  4 21:04:34 2021 - [info]     GTID ON
    Mon Oct  4 21:04:34 2021 - [info]     Replicating from 192.168.1.5(192.168.1.5:3306)
    Mon Oct  4 21:04:34 2021 - [info] Current Alive Master: 192.168.1.5(192.168.1.5:3306)
    Mon Oct  4 21:04:34 2021 - [info] Checking slave configurations..
    Mon Oct  4 21:04:34 2021 - [info]  read_only=1 is not set on slave 192.168.1.6(192.168.1.6:3306).
    Mon Oct  4 21:04:34 2021 - [info]  read_only=1 is not set on slave 192.168.1.7(192.168.1.7:3306).
    Mon Oct  4 21:04:34 2021 - [info] Checking replication filtering settings..
    Mon Oct  4 21:04:34 2021 - [info]  binlog_do_db= , binlog_ignore_db= 
    Mon Oct  4 21:04:34 2021 - [info]  Replication filtering check ok.
    Mon Oct  4 21:04:34 2021 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
    Mon Oct  4 21:04:34 2021 - [info] Checking SSH publickey authentication settings on the current master..
    Mon Oct  4 21:04:34 2021 - [info] HealthCheck: SSH to 192.168.1.5 is reachable.
    Mon Oct  4 21:04:34 2021 - [info] 
    192.168.1.5(192.168.1.5:3306) (current master)
     +--192.168.1.6(192.168.1.6:3306)
     +--192.168.1.7(192.168.1.7:3306)
    
    Mon Oct  4 21:04:34 2021 - [info] Checking replication health on 192.168.1.6..
    Mon Oct  4 21:04:34 2021 - [info]  ok.
    Mon Oct  4 21:04:34 2021 - [info] Checking replication health on 192.168.1.7..
    Mon Oct  4 21:04:34 2021 - [info]  ok.
    Mon Oct  4 21:04:34 2021 - [warning] master_ip_failover_script is not defined.
    Mon Oct  4 21:04:34 2021 - [warning] shutdown_script is not defined.
    Mon Oct  4 21:04:34 2021 - [info] Got exit code 0 (Not master dead).
    
    MySQL Replication Health is OK.
    

开启MHA

db03开启MHA

[root@slave02 ~]# nohup masterha_manager --conf=/etc/mha/mysql.cnf --remove_dead_master_conf --ignore_last_failover  < /dev/null> /var/log/mha/mysql/manager.log 2>&1 &
[1] 3035

查看MHA

[root@slave02 ~]# masterha_check_status --conf=/etc/mha/mysql.cnf
mysql (pid:3035) is running(0:PING_OK), master:192.168.1.5
[root@slave02 ~]# mysql -umha -pmha -h 192.168.1.5 -e "show variables like 'server_id'"
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| server_id     | 5     |
+---------------+-------+
[root@slave02 ~]# mysql -umha -pmha -h 192.168.1.6 -e "show variables like 'server_id'"
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| server_id     | 6     |
+---------------+-------+
[root@slave02 ~]# mysql -umha -pmha -h 192.168.1.7 -e "show variables like 'server_id'"
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| server_id     | 7     |
+---------------+-------

MHA环境故障模拟及修复

故障模拟

模拟master宕机

[root@master ~]# systemctl stop mysqld

查看db02主从信息

db02 [(none)]>show slave status \G 
Empty set (0.00 sec)

db02 [(none)]>show processlist;
+----+------+-------------------+------+------------------+------+---------------------------------------------------------------+------------------+
| Id | User | Host              | db   | Command          | Time | State                                                         | Info             |
+----+------+-------------------+------+------------------+------+---------------------------------------------------------------+------------------+
| 35 | root | localhost         | NULL | Query            |    1 | starting                                                      | show processlist |
| 44 | repl | 192.168.1.7:53166 | NULL | Binlog Dump GTID |  150 | Master has sent all binlog to slave; waiting for more updates | NULL             |
+----+------+-------------------+------+------------------+------+---------------------------------------------------------------+------------------+
2 rows in set (0.12 sec)

查看MHA manager主机,manager停止

[root@slave02 ~]#  masterha_check_status --conf=/etc/mha/mysql.cnf
mysql is stopped(2:NOT_RUNNING).

查看MHA manager日志信息

[root@slave02 ~]# tail -f /var/log/mha/mysql/manager
...
----- Failover Report -----

mysql: MySQL Master failover 192.168.1.5(192.168.1.5:3306) to 192.168.1.6(192.168.1.6:3306) succeeded

Master 192.168.1.5(192.168.1.5:3306) is down!

Check MHA Manager logs at slave02:/var/log/mha/mysql/manager for details.

Started automated(non-interactive) failover.
Selected 192.168.1.6(192.168.1.6:3306) as a new master.
192.168.1.6(192.168.1.6:3306): OK: Applying all logs succeeded.
192.168.1.7(192.168.1.7:3306): OK: Slave started, replicating from 192.168.1.6(192.168.1.6:3306)
192.168.1.6(192.168.1.6:3306): Resetting slave info succeeded.
Master failover to 192.168.1.6(192.168.1.6:3306) completed successfully.

查看MHA manager配置文件,可以发现server1已经没了

[root@slave02 ~]# vim /etc/mha/mysql.cnf 
[server default]
manager_log=/var/log/mha/mysql/manager
manager_workdir=/var/log/mha/mysql
master_binlog_dir=/data/binlog
password=mha
ping_interval=2
repl_password=123
repl_user=repl
ssh_user=root
user=mha

[server2]
hostname=192.168.1.6
port=3306

[server3]
hostname=192.168.1.7
port=3306

可见MHA已经自动处理了故障(但是只是一次性处理).

故障修复

  1. 修复db01

    [root@master ~]# systemctl start mysqld
    
  2. 修复主从
    查看manager日志中change master to命令

    [root@slave02 ~]# vim /var/log/mha/mysql/manager
    CHANGE MASTER TO MASTER_HOST='192.168.1.6', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='123';
    

    MySQL MHA高可用
    在db01中执行该命令,并启动slave

    db01 [(none)]>CHANGE MASTER TO MASTER_HOST='192.168.1.6', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='123';
    Query OK, 0 rows affected, 2 warnings (0.16 sec)
    
    db01 [(none)]>start slave;
    Query OK, 0 rows affected (0.14 sec)
    
  3. 修改db03上的manager配置文件,将db01加入

    [root@slave02 ~]# vim /etc/mha/mysql.cnf 
    [server1]
    hostname=192.168.1.5
    port=3306
    
  4. 启动MHA

    [root@slave02 ~]# nohup masterha_manager --conf=/etc/mha/mysql.cnf --remove_dead_master_conf --ignore_last_failover  < /dev/null> /var/log/mha/mysql/manager.log 2>&1 &
    [1] 7635
    [root@slave02 ~]#  masterha_check_status --conf=/etc/mha/mysql.cnf
    mysql (pid:7635) is running(0:PING_OK), master:192.168.1.6
    

MHA的 VIP 功能

  1. 将脚本文件放置在/usr/local/bin/目录下,并增加执行权限(master_ip_failover事先准备)

    [root@slave02 ~]# mv master_ip_failover /usr/local/bin/
    [root@slave02 ~]# chmod +x /usr/local/bin/master_ip_failover 
    

    master_ip_failover脚本如下:

    #!/usr/bin/env perl
    
    #  Copyright (C) 2011 DeNA Co.,Ltd.
    #
    #  This program is free software; you can redistribute it and/or modify
    #  it under the terms of the GNU General Public License as published by
    #  the Free Software Foundation; either version 2 of the License, or
    #  (at your option) any later version.
    #
    #  This program is distributed in the hope that it will be useful,
    #  but WITHOUT ANY WARRANTY; without even the implied warranty of
    #  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    #  GNU General Public License for more details.
    #
    #  You should have received a copy of the GNU General Public License
    #   along with this program; if not, write to the Free Software
    #  Foundation, Inc.,
    #  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
    
    ## Note: This is a sample script and is not complete. Modify the script based on your environment.
    
    use strict;
    use warnings FATAL => 'all';
    
    use Getopt::Long;
    use MHA::DBHelper;
    
    my (
      $command,        $ssh_user,         $orig_master_host,
      $orig_master_ip, $orig_master_port, $new_master_host,
      $new_master_ip,  $new_master_port,  $new_master_user,
      $new_master_password
    );
    my $vip = '192.168.1.100/24';
    my $key = "1";
    my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip";
    my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down";
    
    GetOptions(
      'command=s'             => \$command,
      'ssh_user=s'            => \$ssh_user,
      'orig_master_host=s'    => \$orig_master_host,
      'orig_master_ip=s'      => \$orig_master_ip,
      'orig_master_port=i'    => \$orig_master_port,
      'new_master_host=s'     => \$new_master_host,
      'new_master_ip=s'       => \$new_master_ip,
      'new_master_port=i'     => \$new_master_port,
      'new_master_user=s'     => \$new_master_user,
      'new_master_password=s' => \$new_master_password,
    );
    
    exit &main();
    
    sub main {
      if ( $command eq "stop" || $command eq "stopssh" ) {
    
        # $orig_master_host, $orig_master_ip, $orig_master_port are passed.
        # If you manage master ip address at global catalog database,
        # invalidate orig_master_ip here.
        my $exit_code = 1;
        eval {
    
          # updating global catalog, etc
          $exit_code = 0;
        };
        if ($@) {
          warn "Got Error: $@\n";
          exit $exit_code;
        }
        exit $exit_code;
      }
        elsif ( $command eq "start" ) {
    
            # all arguments are passed.
            # If you manage master ip address at global catalog database,
            # activate new_master_ip here.
            # You can also grant write access (create user, set read_only=0, etc) here.
            my $exit_code = 10;
            eval {
                print "Enabling the VIP - $vip on the new master - $new_master_host \n";
                &start_vip();
                &stop_vip();
                $exit_code = 0;
            };
            if ($@) {
                warn $@;
                exit $exit_code;
            }
            exit $exit_code;
        }
        elsif ( $command eq "status" ) {
            print "Checking the Status of the script.. OK \n";
            `ssh $ssh_user\@$orig_master_host \" $ssh_start_vip \"`;
            exit 0;
        }
        else {
            &usage();
            exit 1;
        }
    }
    
    
    sub start_vip() {
        `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
    }
    # A simple system call that disable the VIP on the old_master 
    sub stop_vip() {
       `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
    }
    
    
    sub usage {
      print
    "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
    }
    
  2. 修改master_ip_failover配置信息

    [root@slave02 ~]# vim /usr/local/bin/master_ip_failover 
    my $vip = '192.168.1.10/24';
    my $key = "1";
    my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip";
    my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down";
    
  3. 编辑manager配置文件

    [root@slave02 ~]# vim /etc/mha/mysql.cnf
    master_ip_failover_script=/usr/local/bin/master_ip_failover
    
  4. 在主库手工生成第一个vip地址

    [root@slave01 ~]# ifconfig ens33:1 192.168.1.10/24
    

    注意:第一次配置VIP时,需要在主库手工生成VIP,因为只有在故障切换时才会自动生成VIP。

  5. 重启MHA

    [root@slave02 ~]# masterha_stop --conf=/etc/mha/mysql.cnf
    Stopped mysql successfully.
    [1]+  Exit 1                  nohup masterha_manager --conf=/etc/mha/mysql.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/mysql/manager.log 2>&1
    [root@slave02 ~]# nohup masterha_manager --conf=/etc/mha/mysql.cnf --remove_dead_master_conf --ignore_last_failover  < /dev/null> /var/log/mha/mysql/manager.log 2>&1 &
    [1] 8498
    [root@slave02 ~]#  masterha_check_status --conf=/etc/mha/mysql.cnf
    mysql (pid:8498) is running(0:PING_OK), master:192.168.1.6
    
  6. 测试VIP功能:将此时的主库db02关闭mysqld服务模拟故障,预想结果应是db01接管master,且vip跳转到db01

    # 模拟故障,关闭db02 mysqld服务
    [root@slave01 ~]# systemctl stop mysqld
    
    # 查看db01是否有vip信息
    [root@master ~]# ifconfig
    ens33:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
            inet 192.168.1.10  netmask 255.255.255.0  broadcast 192.168.1.255
            ether 00:0c:29:df:f3:fd  txqueuelen 1000  (Ethernet)
    
    # 查看db03的从库状态
    db03 [(none)]>show slave status \G
    *************************** 1. row ***************************
                   Slave_IO_State: Waiting for master to send event
                      Master_Host: 192.168.1.5
                      Master_User: repl
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: mysql-bin.000006
              Read_Master_Log_Pos: 194
                   Relay_Log_File: slave02-relay-bin.000002
                    Relay_Log_Pos: 367
            Relay_Master_Log_File: mysql-bin.000006
                 Slave_IO_Running: Yes
                Slave_SQL_Running: Yes
    

MHA Send_Report

MHA是一次性的高可用解决方案,因此出现了宕机需要得到及时的通知。MHA提供了report_script参数进行邮件告警。

  1. 准备邮件脚本send_report放置在/usr/local/bin/目录下,并增加执行权限
    [root@slave02 ~]# mv send_report /usr/local/bin/
    [root@slave02 ~]# chmod +x /usr/local/bin/send_report 
    
  2. 编辑manager配置文件,调用邮件脚本
    [root@slave02 ~]# vim /etc/mha/mysql.cnf
    report_script=/usr/local/bin/send_report
    
  3. 重启MHA
    [root@slave02 ~]# masterha_stop --conf=/etc/mha/mysql.cnf
    [root@slave02 ~]# nohup masterha_manager --conf=/etc/mha/mysql.cnf --remove_dead_master_conf --ignore_last_failover  < /dev/null> /var/log/mha/mysql/manager.log 2>&1 &
    
  4. 关闭主库,模拟故障,查看警告邮件
    [root@master ~]# systemctl stop mysqld
    

MHA Binlog Server

额外配置一台机器,记录主库所有二进制日志文件,防止SSH无法连接的情况。
注意:额外机器必须与原库版本一致,本测试采用db03机器。

  1. 编辑mha配置文件,增加binlog信息
    [root@slave02 ~]# vim /etc/mha/mysql.cnf
    [binlog1]
    no_master=1
    hostname=192.168.1.7
    master_binlog_dir=/binlog
    
  2. 创建必要目录并授权
    [root@slave02 ~]# mkdir /binlog
    [root@slave02 ~]# chown -R mysql.mysql /binlog/
    
  3. 拉取主库binlog日志
    [root@slave02 ~]# cd /binlog/      # 必须进入创建好的目录
    [root@slave02 ~]# mysqlbinlog  -R --host=192.168.1.5 --user=mha --password=mha --raw  --stop-never mysql-bin.000006 &
    # 注意:拉取日志的起点应该按照该目前从库已经获得到的二进制日志点为起点
    
  4. 重启MHA
    [root@slave02 ~]# masterha_stop --conf=/etc/mha/mysql.cnf
    [root@slave02 ~]# nohup masterha_manager --conf=/etc/mha/mysql.cnf --remove_dead_master_conf --ignore_last_failover  < /dev/null> /var/log/mha/mysql/manager.log 2>&1 &
    

故障处理:主库宕机,binlog server自动停止,manager也自动停止
处理思路:

  • 重新获取新主库的binlog到binlog server中
  • 重新配置manager配置文件中binlog server信息
  • 启动MHA

管理员在高可用架构维护的职责

  1. 搭建:MHA+VIP+SendReport+BinlogServer
  2. 监控及故障处理
  3. 高可用架构的优化
           优化核心:尽可能降低主从的延时,让MHA花在数据补偿上的时间尽量减少;
           尽量使用MySQL 5.7 版本,开启GTID模式,开启从库SQL并发复制。
上一篇:利联科技:扬州BGP服务器的华为三层交换机和二层交换机的区别


下一篇:MySQL —MHA高可用配置及故障切换