MySQL Master High Available 实战篇

2021-07-05 03:21:21

测试背景

以下所有测试，全部基于以下复制结构完成

host_1(host_1:3306) (current master)
 +--host_2(host_2:3306)
 +--host_3(host_3:3306)

一、MHA安装

mha node 所有服务器节点都要安装
mha manager 只需要安装在manager节点

MHA NODE

1. yum install perl-DBD-MySQL
2. rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm

MHA Manager

1. yum install perl-DBD-MySQL
2. yum install perl-Config-Tiny
3. yum install perl-Log-Dispatch
4. yum install perl-Parallel-ForkManager
5. yum install perl-Time-HiRes
6. rpm -ivh mha4mysql-manager-0.56-0.el6.noarch.rpm

MHA rpm安装路径

* Node
Node> rpm -qpl mha4mysql-node-0.56-0.el6.noarch.rpm

/usr/bin/apply_diff_relay_logs
/usr/bin/filter_mysqlbinlog
/usr/bin/purge_relay_logs
/usr/bin/save_binary_logs
/usr/share/man/man1/apply_diff_relay_logs.1.gz
/usr/share/man/man1/filter_mysqlbinlog.1.gz
/usr/share/man/man1/purge_relay_logs.1.gz
/usr/share/man/man1/save_binary_logs.1.gz
/usr/share/perl5/vendor_perl/MHA/BinlogHeaderParser.pm
/usr/share/perl5/vendor_perl/MHA/BinlogManager.pm
/usr/share/perl5/vendor_perl/MHA/BinlogPosFindManager.pm
/usr/share/perl5/vendor_perl/MHA/BinlogPosFinder.pm
/usr/share/perl5/vendor_perl/MHA/BinlogPosFinderElp.pm
/usr/share/perl5/vendor_perl/MHA/BinlogPosFinderXid.pm
/usr/share/perl5/vendor_perl/MHA/NodeConst.pm
/usr/share/perl5/vendor_perl/MHA/NodeUtil.pm
/usr/share/perl5/vendor_perl/MHA/SlaveUtil.pm

* Manager
Manager> rpm -qpl mha4mysql-manager-0.56-0.el6.noarch.rpm

/usr/bin/masterha_check_repl
/usr/bin/masterha_check_ssh
/usr/bin/masterha_check_status
/usr/bin/masterha_conf_host
/usr/bin/masterha_manager
/usr/bin/masterha_master_monitor
/usr/bin/masterha_master_switch
/usr/bin/masterha_secondary_check
/usr/bin/masterha_stop
/usr/share/man/man1/masterha_check_repl.1.gz
/usr/share/man/man1/masterha_check_ssh.1.gz
/usr/share/man/man1/masterha_check_status.1.gz
/usr/share/man/man1/masterha_conf_host.1.gz
/usr/share/man/man1/masterha_manager.1.gz
/usr/share/man/man1/masterha_master_monitor.1.gz
/usr/share/man/man1/masterha_master_switch.1.gz
/usr/share/man/man1/masterha_secondary_check.1.gz
/usr/share/man/man1/masterha_stop.1.gz
/usr/share/perl5/vendor_perl/MHA/Config.pm
/usr/share/perl5/vendor_perl/MHA/DBHelper.pm
/usr/share/perl5/vendor_perl/MHA/FileStatus.pm
/usr/share/perl5/vendor_perl/MHA/HealthCheck.pm
/usr/share/perl5/vendor_perl/MHA/ManagerAdmin.pm
/usr/share/perl5/vendor_perl/MHA/ManagerAdminWrapper.pm
/usr/share/perl5/vendor_perl/MHA/ManagerConst.pm
/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm
/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm
/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm
/usr/share/perl5/vendor_perl/MHA/MasterRotate.pm
/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm
/usr/share/perl5/vendor_perl/MHA/Server.pm
/usr/share/perl5/vendor_perl/MHA/ServerManager.pm

二、MHA配置文件

global scope

* /etc/masterha_default.cnf

[server default]  -- 这下面的都是全局配置，适用于所有app.cnf
user=dba
password=dba
ssh_user=root
master_binlog_dir= /data/mysql.bin
secondary_check_script= masterha_secondary_check -s remote_host1 -s remote_host2
ping_interval=3
master_ip_failover_script=/script/masterha/master_ip_failover
shutdown_script= /script/masterha/power_manager
report_script= /script/masterha/send_master_failover_mail

application scope

* /etc/app1.cnf

[server default]
remote_workdir=/var/log/masterha/app1
manager_workdir=/var/log/masterha/app1
manager_log=/var/log/masterha/app1/app1.log

[server default] 里面的这些配置，会影响到下面的N 个 server

local scope

* /etc/app1.cnf

[server1]
hostname=192.168.1.10
candidate_master=1

[server2]
hostname=192.168.1.11
candidate_master=1

[server3]
hostname=192.168.1.12
no_master=1

每个server下面的配置，专属于某个server

三、前提要求(必须满足)

3.1 SSH公钥认证

基本上MHA manager，MNA node，以及二次检测的节点，都需要互相信任

# masterha_check_ssh --conf=/etc/app1.cnf

  Sat May 14 14:42:19 2011 - [warn] Global configuration file /etc/masterha_default.cnf not found. Skipping.
  Sat May 14 14:42:19 2011 - [info] Reading application default configurations from /etc/app1.cnf..
  Sat May 14 14:42:19 2011 - [info] Reading server configurations from /etc/app1.cnf..
  Sat May 14 14:42:19 2011 - [info] Starting SSH connection tests..
  Sat May 14 14:42:19 2011 - [debug]  Connecting via SSH from root@host1(192.168.0.1) to root@host2(192.168.0.2)..
  Sat May 14 14:42:20 2011 - [debug]   ok.
  Sat May 14 14:42:20 2011 - [debug]  Connecting via SSH from root@host1(192.168.0.1) to root@host3(192.168.0.3)..
  Sat May 14 14:42:20 2011 - [debug]   ok.
  Sat May 14 14:42:21 2011 - [debug]  Connecting via SSH from root@host2(192.168.0.2) to root@host1(192.168.0.1)..
  Sat May 14 14:42:21 2011 - [debug]   ok.
  Sat May 14 14:42:21 2011 - [debug]  Connecting via SSH from root@host2(192.168.0.2) to root@host3(192.168.0.3)..
  Sat May 14 14:42:21 2011 - [debug]   ok.
  Sat May 14 14:42:22 2011 - [debug]  Connecting via SSH from root@host3(192.168.0.3) to root@host1(192.168.0.1)..
  Sat May 14 14:42:22 2011 - [debug]   ok.
  Sat May 14 14:42:22 2011 - [debug]  Connecting via SSH from root@host3(192.168.0.3) to root@host2(192.168.0.2)..
  Sat May 14 14:42:22 2011 - [debug]   ok.
  Sat May 14 14:42:22 2011 - [info] All SSH connection tests passed successfully.

如果slave比较多，实例比较多，最好提高下 /etc/ssh/sshd_config MaxStartups 的值（默认是10）

3.2 操作系统

仅在Linux上测试过

3.3 单写master和多slave或者只读master

打从一开始，MHA就是为了解决数据一致性而出生，所以，最好是多个slave
如果你只有一个slave，根本就碰不到数据一致性问题，也就不需要mha了
如果是一个slave，用版同步复制也能解决

从0.52开始，MHA就支持多master的复制架构了,下面列举了多master环境下注意点

多master，但是只允许单点写入
默认情况下，只支持2层复制架构

3.4 三层或者多层复制环境

默认情况下，MHA是不支持3层或多层复制架构的（Master1 -> Master2 -> Slave3）
MHA可以恢复Master2，但是不能恢复Slave3，因为Master2，Slave3有不同的master
为了让MHA支持以上架构，可以参考如下配置：

在配置文件中，只配置两层(master1 and master2)
使用 "multi_tier_slave=1" 参数，然后设置所有hosts

3.5 MySQL版本必须是5.0 或者高于 5.0

MySQL版本必须大于等于5.0
尽量使用高版本的MySQL

3.6 使用mysqlbinlog 5.1+ 支持MySQL5.1+

MHA使用mysqlbinlog来应用日志到目标slave上的
如果MySQL master设置的是row格式，那么MySQL必须是大于等于5.1版本，因为5.0不支持row
mysqlbinlog版本可以这样被检测：

[app@slave_host1]$  mysqlbinlog --version
mysqlbinlog Ver 3.3 for unknown-linux-gnu at x86_64

如果你使用的是MySQL5.1，那么mysqlbinlog必须大于等于3.3
如果mysqlbinlog的版本是3.2，而mysql的版本是5.1，那么mha manager会报错，且停止monitoring

3.7 log-bin必须在候选master上开启

如果当前slave没有设置log-bin，那么很显然它不能成为提升为new master
如果没有任何机器设置了log-bin，那么mha会报错且停止failover

3.8 binlog，relay-log 主从环境必须全部一致

复制过滤规则（binlog-do-db, replicate-ignore-db 等等）必须全部一致

3.9 复制用户必须在候选master上要存在

切换完成后，所有slave都必须执行change master 命令。在new master上复制用户必须有（REPLICATEION SLAVE权限）

3.10 使用purge_relay_logs来定期删除relay logs

默认情况下，如果SQL线程执行完relay-log，relay logs就会被自动删除。
但是这些relay-logs 也许还会用来恢复其他的slave，所以你需要关闭自动删除relay-logs的purge线程，然后自己阶段性的来删除
如果是你自己来删的话，必须考虑repl 延迟问题

最好让slave删除relay log不要在同一时间点，假如需要恢复，那么这个时间点所有relay logs都被删除了就不好了

3.11 不要在SBR的环境中使用load data infile

不管是SBR，还是RBR，最好不要使用load data

四、实战演练

[功能测试] 测试场景一、手工操作，手动完成，failover

[功能测试] 测试场景二、手工操作, 自动完成，failover

[功能测试] 测试场景三、手工操作，自动完成，online master switch

[功能测试] 测试场景四、自动监控，自动操作，自动完成，failover

[用例测试] MySQL master 服务 down掉，是否成功自动failover

[用例测试] MySQL master too many connection，无权限，响应慢，是否自动failover

[用例测试] MySQL master 服务器 down掉，且候选master落后的最多，是否自动failover，知否可以成功的做日志补偿

[用例测试] MySQL slave 服务 down掉，是否自动failover

[用例测试] MySQL 有一台slave，或者多台slave服务器延迟很大，是否自动failover

[用例测试] MySQL slave IO/SQL线程 stop，是否自动failover

[用例测试] MySQL slave IO/SQL线程报错，是否自动failover

[用例测试] MySQL master 有大事务超过100s再执行，是否可以online master switch

[用例测试] MySQL master 网络断掉，是否自动failover

[用例测试] MySQL master 网路瞬断（1~30秒），是否自动failover

[用例测试] MySQL master 和候选master 网络都挂掉的情况，是否自动failover

[用例测试] MHA manager 和 MySQL master之间的网络断掉,但是master和slave之间的网络是好的，是否自动failover

[用例测试] MHA manager 和 MySQL master,slave 网络都断掉的情况，是否自动failover

[用例测试] GTID模式下，还需要relay-log吗？是否能够成功的补齐日志

[用例测试] 多线程复制模式下，做failover 和 online master switch，会不会有问题呢？

[用例测试] 在一开始没有开启MHA的group中，如何做到日志补偿，然后change master呢？

4.0 简单的failover过程

step1：安装MHA node在所有节点上
step2：安装MHA manager
step3：创建配置文件

manager_host$ cat /etc/app1.cnf

  [server default]
  # mysql user and password
  user=root
  password=mysqlpass
  ssh_user=root
  # working directory on the manager
  manager_workdir=/var/log/masterha/app1
  # working directory on MySQL servers
  remote_workdir=/var/log/masterha/app1

  [server1]
  hostname=host1

  [server2]
  hostname=host2

  [server3]
  hostname=host3

step4: 检查SSH 互信

# masterha_check_ssh --conf=/etc/app1.cnf

step5: 检查复制配置

manager_host$ masterha_check_repl --conf=/etc/app1.cnf

step6：开启Manager

manager_host$ masterha_manager --conf=/etc/app1.cnf

step7: 检查manager的状态

manager_host$ masterha_check_status --conf=/etc/app1.cnf

step8: 测试关闭manager

manager_host$ masterha_stop --conf=/etc/app1.cnf

step9：测试master failover

host1$  killall -9 mysqld mysqld_safe

未完成的步骤(进阶)

* 二次检测： 检查master是否真的挂了，避免脑裂   secondary_network_script

* master_ip_failover: 默认的是空，什么都不做
    ip漂移
    新master赋予写
    日志保留
    报警
等等

4.2 测试场景一、手工操作，手动完成，failover

step1: 检查各种MHA manager的各种环境

* 检查SSH
    masterha_check_ssh --conf=/etc/app1.cnf

* 检查复制(如果是非GTID模式，slave如果没有设置relay_log_purge=0，那么会有警告)
    masterha_check_repl --conf=/etc/app1.cnf
        Thu Jul 28 14:03:33 2016 - [warning]  relay_log_purge=0 is not set on slave host_2(host_2:3306).
        Thu Jul 28 14:03:33 2016 - [warning]  relay_log_purge=0 is not set on slave host_3(host_3:3306).

* 检查mha状态
    masterha_check_status --conf=/etc/app1.cnf

* 开启mha 监控
    nohup masterha_manager --conf=/etc/app1.cnf &

* 关闭mha 监控
    masterha_stop --conf=/etc/app1.cnf

step2: 手工制造master mysql挂了

killall -9 mysqld mysqld_safe

step3: 人工交互式failover

masterha_master_switch --master_state=dead --conf=/etc/app1.cnf --dead_master_host=host_1 --interactive=1 --ignore_last_failover


* 切换日志如下：

--dead_master_ip=<dead_master_ip> is not set. Using host_1.
--dead_master_port=<dead_master_port> is not set. Using 3306.
Thu Jul 28 16:37:04 2016 - [info] Reading default configuration from /etc/masterha_default.cnf..
Thu Jul 28 16:37:04 2016 - [info] Reading application default configuration from /etc/app1.cnf..
Thu Jul 28 16:37:04 2016 - [info] Reading server configuration from /etc/app1.cnf..
Thu Jul 28 16:37:04 2016 - [info] MHA::MasterFailover version 0.56.
Thu Jul 28 16:37:04 2016 - [info] Starting master failover.
Thu Jul 28 16:37:04 2016 - [info]
Thu Jul 28 16:37:04 2016 - [info] * Phase 1: Configuration Check Phase..
Thu Jul 28 16:37:04 2016 - [info]
Thu Jul 28 16:37:04 2016 - [info] GTID failover mode = 0
Thu Jul 28 16:37:04 2016 - [info] Dead Servers:
Thu Jul 28 16:37:04 2016 - [info]   host_1(host_1:3306)
Thu Jul 28 16:37:04 2016 - [info] Checking master reachability via MySQL(double check)...
Thu Jul 28 16:37:04 2016 - [info]  ok.
Thu Jul 28 16:37:04 2016 - [info] Alive Servers:
Thu Jul 28 16:37:04 2016 - [info]   host_2(host_2:3306)
Thu Jul 28 16:37:04 2016 - [info]   host_3(host_3:3306)
Thu Jul 28 16:37:04 2016 - [info] Alive Slaves:
Thu Jul 28 16:37:04 2016 - [info]   host_2(host_2:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 16:37:04 2016 - [info]     Replicating from host_1(host_1:3306)
Thu Jul 28 16:37:04 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Jul 28 16:37:04 2016 - [info]   host_3(host_3:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 16:37:04 2016 - [info]     Replicating from host_1(host_1:3306)
Thu Jul 28 16:37:04 2016 - [info]     Not candidate for the new Master (no_master is set)
Master host_1(host_1:3306) is dead. Proceed? (yes/NO): yes
Thu Jul 28 16:37:06 2016 - [info] Starting Non-GTID based failover.
Thu Jul 28 16:37:06 2016 - [info]
Thu Jul 28 16:37:06 2016 - [info] ** Phase 1: Configuration Check Phase completed.
Thu Jul 28 16:37:06 2016 - [info]
Thu Jul 28 16:37:06 2016 - [info] * Phase 2: Dead Master Shutdown Phase..
Thu Jul 28 16:37:06 2016 - [info]
Thu Jul 28 16:37:06 2016 - [info] HealthCheck: SSH to host_1 is reachable.
Thu Jul 28 16:37:07 2016 - [info] Forcing shutdown so that applications never connect to the current master..
Thu Jul 28 16:37:07 2016 - [info] Executing master IP deactivation script:
Thu Jul 28 16:37:07 2016 - [info]   /home/mysql/MHA/masterha/master_ip_failover --orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306 --command=stopssh --ssh_user=root
Thu Jul 28 16:37:07 2016 - [info]  done.
Thu Jul 28 16:37:07 2016 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Thu Jul 28 16:37:07 2016 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Thu Jul 28 16:37:07 2016 - [info]
Thu Jul 28 16:37:07 2016 - [info] * Phase 3: Master Recovery Phase..
Thu Jul 28 16:37:07 2016 - [info]
Thu Jul 28 16:37:07 2016 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Thu Jul 28 16:37:07 2016 - [info]
Thu Jul 28 16:37:07 2016 - [info] The latest binary log file/position on all slaves is host_1_name.000004:154
Thu Jul 28 16:37:07 2016 - [info] Latest slaves (Slaves that received relay log files to the latest):
Thu Jul 28 16:37:07 2016 - [info]   host_2(host_2:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 16:37:07 2016 - [info]     Replicating from host_1(host_1:3306)
Thu Jul 28 16:37:07 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Jul 28 16:37:07 2016 - [info]   host_3(host_3:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 16:37:07 2016 - [info]     Replicating from host_1(host_1:3306)
Thu Jul 28 16:37:07 2016 - [info]     Not candidate for the new Master (no_master is set)
Thu Jul 28 16:37:07 2016 - [info] The oldest binary log file/position on all slaves is host_1_name.000004:154
Thu Jul 28 16:37:07 2016 - [info] Oldest slaves:
Thu Jul 28 16:37:07 2016 - [info]   host_2(host_2:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 16:37:07 2016 - [info]     Replicating from host_1(host_1:3306)
Thu Jul 28 16:37:07 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Jul 28 16:37:07 2016 - [info]   host_3(host_3:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 16:37:07 2016 - [info]     Replicating from host_1(host_1:3306)
Thu Jul 28 16:37:07 2016 - [info]     Not candidate for the new Master (no_master is set)
Thu Jul 28 16:37:07 2016 - [info]
Thu Jul 28 16:37:07 2016 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Thu Jul 28 16:37:07 2016 - [info]
Thu Jul 28 16:37:07 2016 - [info] Fetching dead master's binary logs..
Thu Jul 28 16:37:07 2016 - [info] Executing command on the dead master host_1(host_1:3306): save_binary_logs --command=save --start_file=host_1_name.000004  --start_pos=154 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160728163704.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56
  Creating /var/log/masterha/app1 if not exists..    ok.
 Concat binary/relay logs from host_1_name.000004 pos 154 to host_1_name.000004 EOF into /var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160728163704.binlog ..
 Binlog Checksum enabled
  Dumping binlog format description event, from position 0 to 154.. ok.
  No need to dump effective binlog data from /data/mysql.bin/host_1_name.000004 (pos starts 154, filesize 154). Skipping.
 Binlog Checksum enabled
 /var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160728163704.binlog has no effective data events.
Event not exists.
Thu Jul 28 16:37:07 2016 - [info] Additional events were not found from the orig master. No need to save.
Thu Jul 28 16:37:07 2016 - [info]
Thu Jul 28 16:37:07 2016 - [info] * Phase 3.3: Determining New Master Phase..
Thu Jul 28 16:37:07 2016 - [info]
Thu Jul 28 16:37:07 2016 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Thu Jul 28 16:37:07 2016 - [info] All slaves received relay logs to the same position. No need to resync each other.
Thu Jul 28 16:37:07 2016 - [info] Searching new master from slaves..
Thu Jul 28 16:37:07 2016 - [info]  Candidate masters from the configuration file:
Thu Jul 28 16:37:07 2016 - [info]   host_2(host_2:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 16:37:07 2016 - [info]     Replicating from host_1(host_1:3306)
Thu Jul 28 16:37:07 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Jul 28 16:37:07 2016 - [info]  Non-candidate masters:
Thu Jul 28 16:37:07 2016 - [info]   host_3(host_3:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 16:37:07 2016 - [info]     Replicating from host_1(host_1:3306)
Thu Jul 28 16:37:07 2016 - [info]     Not candidate for the new Master (no_master is set)
Thu Jul 28 16:37:07 2016 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Thu Jul 28 16:37:07 2016 - [info] New master is host_2(host_2:3306)
Thu Jul 28 16:37:07 2016 - [info] Starting master failover..
Thu Jul 28 16:37:07 2016 - [info]
From:
host_1(host_1:3306) (current master)
 +--host_2(host_2:3306)
 +--host_3(host_3:3306)

To:
host_2(host_2:3306) (new master)
 +--host_3(host_3:3306)

Starting master switch from host_1(host_1:3306) to host_2(host_2:3306)? (yes/NO): yes
Thu Jul 28 16:37:09 2016 - [info] New master decided manually is host_2(host_2:3306)
Thu Jul 28 16:37:09 2016 - [info]
Thu Jul 28 16:37:09 2016 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Thu Jul 28 16:37:09 2016 - [info]
Thu Jul 28 16:37:09 2016 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Thu Jul 28 16:37:09 2016 - [info]
Thu Jul 28 16:37:09 2016 - [info] * Phase 3.4: Master Log Apply Phase..
Thu Jul 28 16:37:09 2016 - [info]
Thu Jul 28 16:37:09 2016 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Thu Jul 28 16:37:09 2016 - [info] Starting recovery on host_2(host_2:3306)..
Thu Jul 28 16:37:09 2016 - [info]  This server has all relay logs. Waiting all logs to be applied..
Thu Jul 28 16:37:09 2016 - [info]   done.
Thu Jul 28 16:37:09 2016 - [info]  All relay logs were successfully applied.
Thu Jul 28 16:37:09 2016 - [info] Getting new master's binlog name and position..
Thu Jul 28 16:37:09 2016 - [info]  host_2_name.000002:294
Thu Jul 28 16:37:09 2016 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='host_2', MASTER_PORT=3306, MASTER_LOG_FILE='host_2_name.000002', MASTER_LOG_POS=294, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Thu Jul 28 16:37:09 2016 - [info] Executing master IP activate script:
Thu Jul 28 16:37:09 2016 - [info]   /home/mysql/MHA/masterha/master_ip_failover --command=start --ssh_user=root --orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306 --new_master_host=host_2 --new_master_ip=host_2 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba'
Set read_only=0 on the new master.
No need to Creating app user on the new master..
Thu Jul 28 16:37:09 2016 - [info]  OK.
Thu Jul 28 16:37:09 2016 - [info] ** Finished master recovery successfully.
Thu Jul 28 16:37:09 2016 - [info] * Phase 3: Master Recovery Phase completed.
Thu Jul 28 16:37:09 2016 - [info]
Thu Jul 28 16:37:09 2016 - [info] * Phase 4: Slaves Recovery Phase..
Thu Jul 28 16:37:09 2016 - [info]
Thu Jul 28 16:37:09 2016 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Thu Jul 28 16:37:09 2016 - [info]
Thu Jul 28 16:37:09 2016 - [info] -- Slave diff file generation on host host_3(host_3:3306) started, pid: 6211. Check tmp log /var/log/masterha/app1/host_3_3306_20160728163704.log if it takes time..
Thu Jul 28 16:37:09 2016 - [info]
Thu Jul 28 16:37:09 2016 - [info] Log messages from host_3 ...
Thu Jul 28 16:37:09 2016 - [info]
Thu Jul 28 16:37:09 2016 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Thu Jul 28 16:37:09 2016 - [info] End of log messages from host_3.
Thu Jul 28 16:37:09 2016 - [info] -- host_3(host_3:3306) has the latest relay log events.
Thu Jul 28 16:37:09 2016 - [info] Generating relay diff files from the latest slave succeeded.
Thu Jul 28 16:37:09 2016 - [info]
Thu Jul 28 16:37:09 2016 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Thu Jul 28 16:37:09 2016 - [info]
Thu Jul 28 16:37:09 2016 - [info] -- Slave recovery on host host_3(host_3:3306) started, pid: 6216. Check tmp log /var/log/masterha/app1/host_3_3306_20160728163704.log if it takes time..
Thu Jul 28 16:37:10 2016 - [info]
Thu Jul 28 16:37:10 2016 - [info] Log messages from host_3 ...
Thu Jul 28 16:37:10 2016 - [info]
Thu Jul 28 16:37:09 2016 - [info] Starting recovery on host_3(host_3:3306)..
Thu Jul 28 16:37:09 2016 - [info]  This server has all relay logs. Waiting all logs to be applied..
Thu Jul 28 16:37:09 2016 - [info]   done.
Thu Jul 28 16:37:09 2016 - [info]  All relay logs were successfully applied.
Thu Jul 28 16:37:09 2016 - [info]  Resetting slave host_3(host_3:3306) and starting replication from the new master host_2(host_2:3306)..
Thu Jul 28 16:37:09 2016 - [info]  Executed CHANGE MASTER.
Thu Jul 28 16:37:10 2016 - [info]  Slave started.
Thu Jul 28 16:37:10 2016 - [info] End of log messages from host_3.
Thu Jul 28 16:37:10 2016 - [info] -- Slave recovery on host host_3(host_3:3306) succeeded.
Thu Jul 28 16:37:10 2016 - [info] All new slave servers recovered successfully.
Thu Jul 28 16:37:10 2016 - [info]
Thu Jul 28 16:37:10 2016 - [info] * Phase 5: New master cleanup phase..
Thu Jul 28 16:37:10 2016 - [info]
Thu Jul 28 16:37:10 2016 - [info] Resetting slave info on the new master..
Thu Jul 28 16:37:10 2016 - [info]  host_2: Resetting slave info succeeded.
Thu Jul 28 16:37:10 2016 - [info] Master failover to host_2(host_2:3306) completed successfully.
Thu Jul 28 16:37:10 2016 - [info]

----- Failover Report -----

app1: MySQL Master failover host_1(host_1:3306) to host_2(host_2:3306) succeeded

Master host_1(host_1:3306) is down!

Check MHA Manager logs at host_manager_name for details.

Started manual(interactive) failover.
Invalidated master IP address on host_1(host_1:3306)
The latest slave host_2(host_2:3306) has all relay logs for recovery.
Selected host_2(host_2:3306) as a new master.
host_2(host_2:3306): OK: Applying all logs succeeded.
host_2(host_2:3306): OK: Activated master IP address.
host_3(host_3:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
host_3(host_3:3306): OK: Applying all logs succeeded. Slave started, replicating from host_2(host_2:3306)
host_2(host_2:3306): Resetting slave info succeeded.
Master failover to host_2(host_2:3306) completed successfully.

4.3 [功能测试] 测试场景二、手工操作, 自动完成，failover

step1: 检查各种MHA manager的各种环境

* 检查SSH
    masterha_check_ssh --conf=/etc/app1.cnf

* 检查复制(如果是非GTID模式，slave如果没有设置relay_log_purge=0，那么会有警告)
    masterha_check_repl --conf=/etc/app1.cnf
        Thu Jul 28 14:03:33 2016 - [warning]  relay_log_purge=0 is not set on slave host_2(host_2:3306).
        Thu Jul 28 14:03:33 2016 - [warning]  relay_log_purge=0 is not set on slave host_3(host_3:3306).

* 检查mha状态
    masterha_check_status --conf=/etc/app1.cnf

* 开启mha 监控
    nohup masterha_manager --conf=/etc/app1.cnf &

* 关闭mha 监控
    masterha_stop --conf=/etc/app1.cnf

step2: 手工制造master mysql挂了

killall -9 mysqld mysqld_safe

step3: 人工-非交互式 failover

masterha_master_switch --master_state=dead --conf=/etc/app1.cnf --dead_master_host=host_1 --interactive=0 --ignore_last_failover


* 切换日志如下：

--dead_master_ip=<dead_master_ip> is not set. Using host_1.
--dead_master_port=<dead_master_port> is not set. Using 3306.
Thu Jul 28 16:33:21 2016 - [info] Reading default configuration from /etc/masterha_default.cnf..
Thu Jul 28 16:33:21 2016 - [info] Reading application default configuration from /etc/app1.cnf..
Thu Jul 28 16:33:21 2016 - [info] Reading server configuration from /etc/app1.cnf..

4.4 [功能测试] 测试场景三、手工操作，自动完成，online master switch

step1: 检查各种MHA manager的各种环境

* 检查SSH
    masterha_check_ssh --conf=/etc/app1.cnf

* 检查复制(如果是非GTID模式，slave如果没有设置relay_log_purge=0，那么会有警告)
    masterha_check_repl --conf=/etc/app1.cnf
        Thu Jul 28 14:03:33 2016 - [warning]  relay_log_purge=0 is not set on slave host_2(host_2:3306).
        Thu Jul 28 14:03:33 2016 - [warning]  relay_log_purge=0 is not set on slave host_3(host_3:3306).

* 检查mha状态
    masterha_check_status --conf=/etc/app1.cnf

* 开启mha 监控
    nohup masterha_manager --conf=/etc/app1.cnf &

step2: 手工制造master mysql挂了

killall -9 mysqld mysqld_safe

step3: 手工操作，自动完成，online master switch

masterha_master_switch --master_state=alive --conf=/etc/app1.cnf  --orig_master_is_new_slave --interactive=0

* 在online master切换过程中，如果MHA manager 开启，那么切换不会成功

Thu Jul 28 16:13:57 2016 - [info] Checking MHA is not monitoring or doing failover..
Thu Jul 28 16:13:57 2016 - [error][/usr/share/perl5/vendor_perl/MHA/MasterRotate.pm, ln142] Getting advisory lock failed on the current master. MHA Monitor runs on the current master. Stop MHA Manager/Monitor and try again.
Thu Jul 28 16:13:57 2016 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR:  at /usr/bin/masterha_master_switch line 53

* 在online master切换过程中，如果设置--orig_master_is_new_slave, 但是没有设置repl_password，就会报错

Thu Jul 28 16:15:13 2016 - [error][/usr/share/perl5/vendor_perl/MHA/Server.pm, ln784] Slave could not be started on host_1(host_1:3306)! Check slave status.
Thu Jul 28 16:15:13 2016 - [error][/usr/share/perl5/vendor_perl/MHA/Server.pm, ln862] Starting slave IO/SQL thread on host_1(host_1:3306) failed!
Thu Jul 28 16:15:13 2016 - [error][/usr/share/perl5/vendor_perl/MHA/MasterRotate.pm, ln573]  Failed!
Thu Jul 28 16:15:13 2016 - [error][/usr/share/perl5/vendor_perl/MHA/MasterRotate.pm, ln602] Switching master to host_2(host_2:3306) done, but switching slaves partially failed.


* 切换成功的日志如下：

Thu Jul 28 16:23:25 2016 - [info] MHA::MasterRotate version 0.56.
Thu Jul 28 16:23:25 2016 - [info] Starting online master switch..
Thu Jul 28 16:23:25 2016 - [info]
Thu Jul 28 16:23:25 2016 - [info] * Phase 1: Configuration Check Phase..
Thu Jul 28 16:23:25 2016 - [info]
Thu Jul 28 16:23:25 2016 - [info] Reading default configuration from /etc/masterha_default.cnf..
Thu Jul 28 16:23:25 2016 - [info] Reading application default configuration from /etc/app1.cnf..
Thu Jul 28 16:23:25 2016 - [info] Reading server configuration from /etc/app1.cnf..
Thu Jul 28 16:23:25 2016 - [info] GTID failover mode = 0
Thu Jul 28 16:23:25 2016 - [info] Current Alive Master: host_2(host_2:3306)
Thu Jul 28 16:23:25 2016 - [info] Alive Slaves:
Thu Jul 28 16:23:25 2016 - [info]   host_1(host_1:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 16:23:25 2016 - [info]     Replicating from host_2(host_2:3306)
Thu Jul 28 16:23:25 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Jul 28 16:23:25 2016 - [info]   host_3(host_3:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 16:23:25 2016 - [info]     Replicating from host_2(host_2:3306)
Thu Jul 28 16:23:25 2016 - [info]     Not candidate for the new Master (no_master is set)
Thu Jul 28 16:23:25 2016 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Thu Jul 28 16:23:25 2016 - [info]  ok.
Thu Jul 28 16:23:25 2016 - [info] Checking MHA is not monitoring or doing failover..
Thu Jul 28 16:23:25 2016 - [info] Checking replication health on host_1..
Thu Jul 28 16:23:25 2016 - [info]  ok.
Thu Jul 28 16:23:25 2016 - [info] Checking replication health on host_3..
Thu Jul 28 16:23:25 2016 - [info]  ok.
Thu Jul 28 16:23:25 2016 - [info] Searching new master from slaves..
Thu Jul 28 16:23:25 2016 - [info]  Candidate masters from the configuration file:
Thu Jul 28 16:23:25 2016 - [info]   host_1(host_1:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 16:23:25 2016 - [info]     Replicating from host_2(host_2:3306)
Thu Jul 28 16:23:25 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Jul 28 16:23:25 2016 - [info]   host_2(host_2:3306)  Version=5.7.13-log log-bin:enabled
Thu Jul 28 16:23:25 2016 - [info]  Non-candidate masters:
Thu Jul 28 16:23:25 2016 - [info]   host_3(host_3:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 16:23:25 2016 - [info]     Replicating from host_2(host_2:3306)
Thu Jul 28 16:23:25 2016 - [info]     Not candidate for the new Master (no_master is set)
Thu Jul 28 16:23:25 2016 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Thu Jul 28 16:23:25 2016 - [info]
From:
host_2(host_2:3306) (current master)
 +--host_1(host_1:3306)
 +--host_3(host_3:3306)

To:
host_1(host_1:3306) (new master)
 +--host_3(host_3:3306)
 +--host_2(host_2:3306)
Thu Jul 28 16:23:25 2016 - [info] Checking whether host_1(host_1:3306) is ok for the new master..
Thu Jul 28 16:23:25 2016 - [info]  ok.
Thu Jul 28 16:23:25 2016 - [info] host_2(host_2:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Thu Jul 28 16:23:25 2016 - [info] host_2(host_2:3306): Resetting slave pointing to the dummy host.
Thu Jul 28 16:23:25 2016 - [info] ** Phase 1: Configuration Check Phase completed.
Thu Jul 28 16:23:25 2016 - [info]
Thu Jul 28 16:23:25 2016 - [info] * Phase 2: Rejecting updates Phase..
Thu Jul 28 16:23:25 2016 - [info]
Thu Jul 28 16:23:25 2016 - [warning] master_ip_online_change_script is not defined. Skipping disabling writes on the current master.
Thu Jul 28 16:23:25 2016 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Thu Jul 28 16:23:25 2016 - [info] Executing FLUSH TABLES WITH READ LOCK..
Thu Jul 28 16:23:25 2016 - [info]  ok.
Thu Jul 28 16:23:25 2016 - [info] Orig master binlog:pos is host_2_name.000002:294.
Thu Jul 28 16:23:25 2016 - [info]  Waiting to execute all relay logs on host_1(host_1:3306)..
Thu Jul 28 16:23:25 2016 - [info]  master_pos_wait(host_2_name.000002:294) completed on host_1(host_1:3306). Executed 0 events.
Thu Jul 28 16:23:25 2016 - [info]   done.
Thu Jul 28 16:23:25 2016 - [info] Getting new master's binlog name and position..
Thu Jul 28 16:23:25 2016 - [info]  host_1_name.000004:154
Thu Jul 28 16:23:25 2016 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='host_1', MASTER_PORT=3306, MASTER_LOG_FILE='host_1_name.000004', MASTER_LOG_POS=154, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Thu Jul 28 16:23:25 2016 - [info] Setting read_only=0 on host_1(host_1:3306)..
Thu Jul 28 16:23:26 2016 - [info]  ok.
Thu Jul 28 16:23:26 2016 - [info]
Thu Jul 28 16:23:26 2016 - [info] * Switching slaves in parallel..
Thu Jul 28 16:23:26 2016 - [info]
Thu Jul 28 16:23:26 2016 - [info] -- Slave switch on host host_3(host_3:3306) started, pid: 32120
Thu Jul 28 16:23:26 2016 - [info]
Thu Jul 28 16:23:26 2016 - [info] Log messages from host_3 ...
Thu Jul 28 16:23:26 2016 - [info]
Thu Jul 28 16:23:26 2016 - [info]  Waiting to execute all relay logs on host_3(host_3:3306)..
Thu Jul 28 16:23:26 2016 - [info]  master_pos_wait(host_2_name.000002:294) completed on host_3(host_3:3306). Executed 0 events.
Thu Jul 28 16:23:26 2016 - [info]   done.
Thu Jul 28 16:23:26 2016 - [info]  Resetting slave host_3(host_3:3306) and starting replication from the new master host_1(host_1:3306)..
Thu Jul 28 16:23:26 2016 - [info]  Executed CHANGE MASTER.
Thu Jul 28 16:23:26 2016 - [info]  Slave started.
Thu Jul 28 16:23:26 2016 - [info] End of log messages from host_3 ...
Thu Jul 28 16:23:26 2016 - [info]
Thu Jul 28 16:23:26 2016 - [info] -- Slave switch on host host_3(host_3:3306) succeeded.
Thu Jul 28 16:23:26 2016 - [info] Unlocking all tables on the orig master:
Thu Jul 28 16:23:26 2016 - [info] Executing UNLOCK TABLES..
Thu Jul 28 16:23:26 2016 - [info]  ok.
Thu Jul 28 16:23:26 2016 - [info] Starting orig master as a new slave..
Thu Jul 28 16:23:26 2016 - [info]  Resetting slave host_2(host_2:3306) and starting replication from the new master host_1(host_1:3306)..
Thu Jul 28 16:23:26 2016 - [info]  Executed CHANGE MASTER.
Thu Jul 28 16:23:26 2016 - [info]  Slave started.
Thu Jul 28 16:23:26 2016 - [info] All new slave servers switched successfully.
Thu Jul 28 16:23:26 2016 - [info]
Thu Jul 28 16:23:26 2016 - [info] * Phase 5: New master cleanup phase..
Thu Jul 28 16:23:26 2016 - [info]
Thu Jul 28 16:23:26 2016 - [info]  host_1: Resetting slave info succeeded.
Thu Jul 28 16:23:26 2016 - [info] Switching master to host_1(host_1:3306) completed successfully.

4.5 测试场景四、自动监控，自动操作，自动完成，failover

step1: 检查各种MHA manager的各种环境

* 检查SSH
    masterha_check_ssh --conf=/etc/app1.cnf

* 检查复制(如果是非GTID模式，slave如果没有设置relay_log_purge=0，那么会有警告)
    masterha_check_repl --conf=/etc/app1.cnf
        Thu Jul 28 14:03:33 2016 - [warning]  relay_log_purge=0 is not set on slave host_2(host_2:3306).
        Thu Jul 28 14:03:33 2016 - [warning]  relay_log_purge=0 is not set on slave host_3(host_3:3306).

* 检查mha状态
    masterha_check_status --conf=/etc/app1.cnf

* 开启mha 监控
    nohup masterha_manager --conf=/etc/app1.cnf &

step2: 手工制造master mysql挂了

killall -9 mysqld mysqld_safe

step3: 自动监控，自动操作，自动完成，自动failover

Thu Jul 28 17:10:25 2016 - [info] MHA::MasterMonitor version 0.56.
Thu Jul 28 17:10:26 2016 - [info] GTID failover mode = 0
Thu Jul 28 17:10:26 2016 - [info] Dead Servers:
Thu Jul 28 17:10:26 2016 - [info] Alive Servers:
Thu Jul 28 17:10:26 2016 - [info]   host_1(host_1:3306)
Thu Jul 28 17:10:26 2016 - [info]   host_2(host_2:3306)
Thu Jul 28 17:10:26 2016 - [info]   host_3(host_3:3306)
Thu Jul 28 17:10:26 2016 - [info] Alive Slaves:
Thu Jul 28 17:10:26 2016 - [info]   host_2(host_2:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 17:10:26 2016 - [info]     Replicating from host_1(host_1:3306)
Thu Jul 28 17:10:26 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Jul 28 17:10:26 2016 - [info]   host_3(host_3:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 17:10:26 2016 - [info]     Replicating from host_1(host_1:3306)
Thu Jul 28 17:10:26 2016 - [info]     Not candidate for the new Master (no_master is set)
Thu Jul 28 17:10:26 2016 - [info] Current Alive Master: host_1(host_1:3306)
Thu Jul 28 17:10:26 2016 - [info] Checking slave configurations..
Thu Jul 28 17:10:26 2016 - [info]  read_only=1 is not set on slave host_2(host_2:3306).
Thu Jul 28 17:10:26 2016 - [warning]  relay_log_purge=0 is not set on slave host_3(host_3:3306).
Thu Jul 28 17:10:26 2016 - [info] Checking replication filtering settings..
Thu Jul 28 17:10:26 2016 - [info]  binlog_do_db= , binlog_ignore_db=
Thu Jul 28 17:10:26 2016 - [info]  Replication filtering check ok.
Thu Jul 28 17:10:26 2016 - [info] GTID (with auto-pos) is not supported
Thu Jul 28 17:10:26 2016 - [info] Starting SSH connection tests..
Thu Jul 28 17:10:27 2016 - [info] All SSH connection tests passed successfully.
Thu Jul 28 17:10:27 2016 - [info] Checking MHA Node version..
Thu Jul 28 17:10:28 2016 - [info]  Version check ok.
Thu Jul 28 17:10:28 2016 - [info] Checking SSH publickey authentication settings on the current master..
Thu Jul 28 17:10:28 2016 - [info] HealthCheck: SSH to host_1 is reachable.
Thu Jul 28 17:10:29 2016 - [info] Master MHA Node version is 0.56.
Thu Jul 28 17:10:29 2016 - [info] Checking recovery script configurations on host_1(host_1:3306)..
Thu Jul 28 17:10:29 2016 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --start_file=host_1_name.000004
Thu Jul 28 17:10:29 2016 - [info]   Connecting to root@host_1(host_1:22)..
  Creating /var/log/masterha/app1 if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /data/mysql.bin, up to host_1_name.000004
Thu Jul 28 17:10:29 2016 - [info] Binlog setting check done.
Thu Jul 28 17:10:29 2016 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Thu Jul 28 17:10:29 2016 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='dba' --slave_host=host_2 --slave_ip=host_2 --slave_port=3306 --workdir=/var/log/masterha/app1 --target_version=5.7.13-log --manager_version=0.56 --relay_dir=/data/mysql_data --current_relay_log=host_2_name-relay-bin.000001  --slave_pass=xxx
Thu Jul 28 17:10:29 2016 - [info]   Connecting to root@host_2(host_2:22)..
  Checking slave recovery environment settings..
    Relay log found at /data/mysql_data, up to host_2_name-relay-bin.000002
    Temporary relay log file is /data/mysql_data/host_2_name-relay-bin.000002
    Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Thu Jul 28 17:10:29 2016 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='dba' --slave_host=host_3 --slave_ip=host_3 --slave_port=3306 --workdir=/var/log/masterha/app1 --target_version=5.7.13-log --manager_version=0.56 --relay_dir=/data/mysql_data --current_relay_log=host_3_name-relay-bin.000001  --slave_pass=xxx
Thu Jul 28 17:10:29 2016 - [info]   Connecting to root@host_3(host_3:22)..
  Checking slave recovery environment settings..
    Relay log found at /data/mysql_data, up to host_3_name-relay-bin.000002
    Temporary relay log file is /data/mysql_data/host_3_name-relay-bin.000002
    Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Thu Jul 28 17:10:29 2016 - [info] Slaves settings check done.
Thu Jul 28 17:10:29 2016 - [info]
host_1(host_1:3306) (current master)
 +--host_2(host_2:3306)
 +--host_3(host_3:3306)

Thu Jul 28 17:10:29 2016 - [info] Checking master_ip_failover_script status:
Thu Jul 28 17:10:29 2016 - [info]   /home/mysql/MHA/masterha/master_ip_failover --command=status --ssh_user=root --orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306
Thu Jul 28 17:10:30 2016 - [info]  OK.
Thu Jul 28 17:10:30 2016 - [warning] shutdown_script is not defined.
Thu Jul 28 17:10:30 2016 - [info] Set master ping interval 3 seconds.
Thu Jul 28 17:10:30 2016 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Thu Jul 28 17:10:30 2016 - [info] Starting ping health check on host_1(host_1:3306)..
Thu Jul 28 17:10:30 2016 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
Thu Jul 28 17:11:21 2016 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Thu Jul 28 17:11:21 2016 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=host_1_name
Thu Jul 28 17:11:21 2016 - [info] HealthCheck: SSH to host_1 is reachable.
Thu Jul 28 17:11:24 2016 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Thu Jul 28 17:11:24 2016 - [warning] Connection failed 2 time(s)..
Thu Jul 28 17:11:27 2016 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Thu Jul 28 17:11:27 2016 - [warning] Connection failed 3 time(s)..
Thu Jul 28 17:11:30 2016 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Thu Jul 28 17:11:30 2016 - [warning] Connection failed 4 time(s)..
Thu Jul 28 17:11:30 2016 - [warning] Master is not reachable from health checker!
Thu Jul 28 17:11:30 2016 - [warning] Master host_1(host_1:3306) is not reachable!
Thu Jul 28 17:11:30 2016 - [warning] SSH is reachable.
Thu Jul 28 17:11:30 2016 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/app1.cnf again, and trying to connect to all servers to check server status..
Thu Jul 28 17:11:30 2016 - [info] Reading default configuration from /etc/masterha_default.cnf..
Thu Jul 28 17:11:30 2016 - [info] Reading application default configuration from /etc/app1.cnf..
Thu Jul 28 17:11:30 2016 - [info] Reading server configuration from /etc/app1.cnf..
Thu Jul 28 17:11:30 2016 - [info] GTID failover mode = 0
Thu Jul 28 17:11:30 2016 - [info] Dead Servers:
Thu Jul 28 17:11:30 2016 - [info]   host_1(host_1:3306)
Thu Jul 28 17:11:30 2016 - [info] Alive Servers:
Thu Jul 28 17:11:30 2016 - [info]   host_2(host_2:3306)
Thu Jul 28 17:11:30 2016 - [info]   host_3(host_3:3306)
Thu Jul 28 17:11:30 2016 - [info] Alive Slaves:
Thu Jul 28 17:11:30 2016 - [info]   host_2(host_2:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 17:11:30 2016 - [info]     Replicating from host_1(host_1:3306)
Thu Jul 28 17:11:30 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Jul 28 17:11:30 2016 - [info]   host_3(host_3:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 17:11:30 2016 - [info]     Replicating from host_1(host_1:3306)
Thu Jul 28 17:11:30 2016 - [info]     Not candidate for the new Master (no_master is set)
Thu Jul 28 17:11:30 2016 - [info] Checking slave configurations..
Thu Jul 28 17:11:30 2016 - [info]  read_only=1 is not set on slave host_2(host_2:3306).
Thu Jul 28 17:11:30 2016 - [warning]  relay_log_purge=0 is not set on slave host_3(host_3:3306).
Thu Jul 28 17:11:30 2016 - [info] Checking replication filtering settings..
Thu Jul 28 17:11:30 2016 - [info]  Replication filtering check ok.
Thu Jul 28 17:11:30 2016 - [info] Master is down!
Thu Jul 28 17:11:30 2016 - [info] Terminating monitoring script.
Thu Jul 28 17:11:30 2016 - [info] Got exit code 20 (Master dead).
Thu Jul 28 17:11:30 2016 - [info] MHA::MasterFailover version 0.56.
Thu Jul 28 17:11:30 2016 - [info] Starting master failover.
Thu Jul 28 17:11:30 2016 - [info]
Thu Jul 28 17:11:30 2016 - [info] * Phase 1: Configuration Check Phase..
Thu Jul 28 17:11:30 2016 - [info]
Thu Jul 28 17:11:30 2016 - [info] GTID failover mode = 0
Thu Jul 28 17:11:30 2016 - [info] Dead Servers:
Thu Jul 28 17:11:30 2016 - [info]   host_1(host_1:3306)
Thu Jul 28 17:11:30 2016 - [info] Checking master reachability via MySQL(double check)...
Thu Jul 28 17:11:30 2016 - [info]  ok.
Thu Jul 28 17:11:30 2016 - [info] Alive Servers:
Thu Jul 28 17:11:30 2016 - [info]   host_2(host_2:3306)
Thu Jul 28 17:11:30 2016 - [info]   host_3(host_3:3306)
Thu Jul 28 17:11:30 2016 - [info] Alive Slaves:
Thu Jul 28 17:11:30 2016 - [info]   host_2(host_2:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 17:11:30 2016 - [info]     Replicating from host_1(host_1:3306)
Thu Jul 28 17:11:30 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Jul 28 17:11:30 2016 - [info]   host_3(host_3:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 17:11:30 2016 - [info]     Replicating from host_1(host_1:3306)
Thu Jul 28 17:11:30 2016 - [info]     Not candidate for the new Master (no_master is set)
Thu Jul 28 17:11:30 2016 - [info] Starting Non-GTID based failover.
Thu Jul 28 17:11:30 2016 - [info]
Thu Jul 28 17:11:30 2016 - [info] ** Phase 1: Configuration Check Phase completed.
Thu Jul 28 17:11:30 2016 - [info]
Thu Jul 28 17:11:30 2016 - [info] * Phase 2: Dead Master Shutdown Phase..
Thu Jul 28 17:11:30 2016 - [info]
Thu Jul 28 17:11:30 2016 - [info] Forcing shutdown so that applications never connect to the current master..
Thu Jul 28 17:11:30 2016 - [info] Executing master IP deactivation script:
Thu Jul 28 17:11:30 2016 - [info]   /home/mysql/MHA/masterha/master_ip_failover --orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306 --command=stopssh --ssh_user=root
Thu Jul 28 17:11:30 2016 - [info]  done.
Thu Jul 28 17:11:30 2016 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Thu Jul 28 17:11:30 2016 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Thu Jul 28 17:11:30 2016 - [info]
Thu Jul 28 17:11:30 2016 - [info] * Phase 3: Master Recovery Phase..
Thu Jul 28 17:11:30 2016 - [info]
Thu Jul 28 17:11:30 2016 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Thu Jul 28 17:11:30 2016 - [info]
Thu Jul 28 17:11:30 2016 - [info] The latest binary log file/position on all slaves is host_1_name.000004:154
Thu Jul 28 17:11:30 2016 - [info] Latest slaves (Slaves that received relay log files to the latest):
Thu Jul 28 17:11:30 2016 - [info]   host_2(host_2:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 17:11:30 2016 - [info]     Replicating from host_1(host_1:3306)
Thu Jul 28 17:11:30 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Jul 28 17:11:30 2016 - [info]   host_3(host_3:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 17:11:30 2016 - [info]     Replicating from host_1(host_1:3306)
Thu Jul 28 17:11:30 2016 - [info]     Not candidate for the new Master (no_master is set)
Thu Jul 28 17:11:30 2016 - [info] The oldest binary log file/position on all slaves is host_1_name.000004:154
Thu Jul 28 17:11:30 2016 - [info] Oldest slaves:
Thu Jul 28 17:11:30 2016 - [info]   host_2(host_2:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 17:11:30 2016 - [info]     Replicating from host_1(host_1:3306)
Thu Jul 28 17:11:30 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Jul 28 17:11:30 2016 - [info]   host_3(host_3:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 17:11:30 2016 - [info]     Replicating from host_1(host_1:3306)
Thu Jul 28 17:11:30 2016 - [info]     Not candidate for the new Master (no_master is set)
Thu Jul 28 17:11:30 2016 - [info]
Thu Jul 28 17:11:30 2016 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Thu Jul 28 17:11:30 2016 - [info]
Thu Jul 28 17:11:30 2016 - [info] Fetching dead master's binary logs..
Thu Jul 28 17:11:30 2016 - [info] Executing command on the dead master host_1(host_1:3306): save_binary_logs --command=save --start_file=host_1_name.000004  --start_pos=154 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160728171130.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56
  Creating /var/log/masterha/app1 if not exists..    ok.
 Concat binary/relay logs from host_1_name.000004 pos 154 to host_1_name.000004 EOF into /var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160728171130.binlog ..
 Binlog Checksum enabled
  Dumping binlog format description event, from position 0 to 154.. ok.
  No need to dump effective binlog data from /data/mysql.bin/host_1_name.000004 (pos starts 154, filesize 154). Skipping.
 Binlog Checksum enabled
 /var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160728171130.binlog has no effective data events.
Event not exists.
Thu Jul 28 17:11:31 2016 - [info] Additional events were not found from the orig master. No need to save.
Thu Jul 28 17:11:31 2016 - [info]
Thu Jul 28 17:11:31 2016 - [info] * Phase 3.3: Determining New Master Phase..
Thu Jul 28 17:11:31 2016 - [info]
Thu Jul 28 17:11:31 2016 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Thu Jul 28 17:11:31 2016 - [info] All slaves received relay logs to the same position. No need to resync each other.
Thu Jul 28 17:11:31 2016 - [info] Searching new master from slaves..
Thu Jul 28 17:11:31 2016 - [info]  Candidate masters from the configuration file:
Thu Jul 28 17:11:31 2016 - [info]   host_2(host_2:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 17:11:31 2016 - [info]     Replicating from host_1(host_1:3306)
Thu Jul 28 17:11:31 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Jul 28 17:11:31 2016 - [info]  Non-candidate masters:
Thu Jul 28 17:11:31 2016 - [info]   host_3(host_3:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 17:11:31 2016 - [info]     Replicating from host_1(host_1:3306)
Thu Jul 28 17:11:31 2016 - [info]     Not candidate for the new Master (no_master is set)
Thu Jul 28 17:11:31 2016 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Thu Jul 28 17:11:31 2016 - [info] New master is host_2(host_2:3306)
Thu Jul 28 17:11:31 2016 - [info] Starting master failover..
Thu Jul 28 17:11:31 2016 - [info]
From:
host_1(host_1:3306) (current master)
 +--host_2(host_2:3306)
 +--host_3(host_3:3306)

To:
host_2(host_2:3306) (new master)
 +--host_3(host_3:3306)
Thu Jul 28 17:11:31 2016 - [info]
Thu Jul 28 17:11:31 2016 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Thu Jul 28 17:11:31 2016 - [info]
Thu Jul 28 17:11:31 2016 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Thu Jul 28 17:11:31 2016 - [info]
Thu Jul 28 17:11:31 2016 - [info] * Phase 3.4: Master Log Apply Phase..
Thu Jul 28 17:11:31 2016 - [info]
Thu Jul 28 17:11:31 2016 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Thu Jul 28 17:11:31 2016 - [info] Starting recovery on host_2(host_2:3306)..
Thu Jul 28 17:11:31 2016 - [info]  This server has all relay logs. Waiting all logs to be applied..
Thu Jul 28 17:11:31 2016 - [info]   done.
Thu Jul 28 17:11:31 2016 - [info]  All relay logs were successfully applied.
Thu Jul 28 17:11:31 2016 - [info] Getting new master's binlog name and position..
Thu Jul 28 17:11:31 2016 - [info]  host_2_name.000002:294
Thu Jul 28 17:11:31 2016 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='host_2', MASTER_PORT=3306, MASTER_LOG_FILE='host_2_name.000002', MASTER_LOG_POS=294, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Thu Jul 28 17:11:31 2016 - [info] Executing master IP activate script:
Thu Jul 28 17:11:31 2016 - [info]   /home/mysql/MHA/masterha/master_ip_failover --command=start --ssh_user=root --orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306 --new_master_host=host_2 --new_master_ip=host_2 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba'
Set read_only=0 on the new master.
No need to Creating app user on the new master..
Thu Jul 28 17:11:31 2016 - [info]  OK.
Thu Jul 28 17:11:31 2016 - [info] ** Finished master recovery successfully.
Thu Jul 28 17:11:31 2016 - [info] * Phase 3: Master Recovery Phase completed.
Thu Jul 28 17:11:31 2016 - [info]
Thu Jul 28 17:11:31 2016 - [info] * Phase 4: Slaves Recovery Phase..
Thu Jul 28 17:11:31 2016 - [info]
Thu Jul 28 17:11:31 2016 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Thu Jul 28 17:11:31 2016 - [info]
Thu Jul 28 17:11:31 2016 - [info] -- Slave diff file generation on host host_3(host_3:3306) started, pid: 22427. Check tmp log /var/log/masterha/app1/host_3_3306_20160728171130.log if it takes time..
Thu Jul 28 17:11:31 2016 - [info]
Thu Jul 28 17:11:31 2016 - [info] Log messages from host_3 ...
Thu Jul 28 17:11:31 2016 - [info]
Thu Jul 28 17:11:31 2016 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Thu Jul 28 17:11:31 2016 - [info] End of log messages from host_3.
Thu Jul 28 17:11:31 2016 - [info] -- host_3(host_3:3306) has the latest relay log events.
Thu Jul 28 17:11:31 2016 - [info] Generating relay diff files from the latest slave succeeded.
Thu Jul 28 17:11:31 2016 - [info]
Thu Jul 28 17:11:31 2016 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Thu Jul 28 17:11:31 2016 - [info]
Thu Jul 28 17:11:31 2016 - [info] -- Slave recovery on host host_3(host_3:3306) started, pid: 22429. Check tmp log /var/log/masterha/app1/host_3_3306_20160728171130.log if it takes time..
Thu Jul 28 17:11:31 2016 - [info]
Thu Jul 28 17:11:31 2016 - [info] Log messages from host_3 ...
Thu Jul 28 17:11:31 2016 - [info]
Thu Jul 28 17:11:31 2016 - [info] Starting recovery on host_3(host_3:3306)..
Thu Jul 28 17:11:31 2016 - [info]  This server has all relay logs. Waiting all logs to be applied..
Thu Jul 28 17:11:31 2016 - [info]   done.
Thu Jul 28 17:11:31 2016 - [info]  All relay logs were successfully applied.
Thu Jul 28 17:11:31 2016 - [info]  Resetting slave host_3(host_3:3306) and starting replication from the new master host_2(host_2:3306)..
Thu Jul 28 17:11:31 2016 - [info]  Executed CHANGE MASTER.
Thu Jul 28 17:11:31 2016 - [info]  Slave started.
Thu Jul 28 17:11:31 2016 - [info] End of log messages from host_3.
Thu Jul 28 17:11:31 2016 - [info] -- Slave recovery on host host_3(host_3:3306) succeeded.
Thu Jul 28 17:11:31 2016 - [info] All new slave servers recovered successfully.
Thu Jul 28 17:11:31 2016 - [info]
Thu Jul 28 17:11:31 2016 - [info] * Phase 5: New master cleanup phase..
Thu Jul 28 17:11:31 2016 - [info]
Thu Jul 28 17:11:31 2016 - [info] Resetting slave info on the new master..
Thu Jul 28 17:11:31 2016 - [info]  host_2: Resetting slave info succeeded.
Thu Jul 28 17:11:31 2016 - [info] Master failover to host_2(host_2:3306) completed successfully.
Thu Jul 28 17:11:31 2016 - [info]

----- Failover Report -----

app1: MySQL Master failover host_1(host_1:3306) to host_2(host_2:3306) succeeded

Master host_1(host_1:3306) is down!

Check MHA Manager logs at host_manager_name:/var/log/masterha/app1/app1.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on host_1(host_1:3306)
The latest slave host_2(host_2:3306) has all relay logs for recovery.
Selected host_2(host_2:3306) as a new master.
host_2(host_2:3306): OK: Applying all logs succeeded.
host_2(host_2:3306): OK: Activated master IP address.
host_3(host_3:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
host_3(host_3:3306): OK: Applying all logs succeeded. Slave started, replicating from host_2(host_2:3306)
host_2(host_2:3306): Resetting slave info succeeded.
Master failover to host_2(host_2:3306) completed successfully.

4.6 [用例测试] MySQL master too many connection，无权限，响应慢

step1: 检查各种MHA manager的各种环境

* 检查SSH
    masterha_check_ssh --conf=/etc/app1.cnf

* 检查复制(如果是非GTID模式，slave如果没有设置relay_log_purge=0，那么会有警告)
    masterha_check_repl --conf=/etc/app1.cnf
        Thu Jul 28 14:03:33 2016 - [warning]  relay_log_purge=0 is not set on slave host_2(host_2:3306).
        Thu Jul 28 14:03:33 2016 - [warning]  relay_log_purge=0 is not set on slave host_3(host_3:3306).

* 检查mha状态
    masterha_check_status --conf=/etc/app1.cnf

* 开启mha 监控
    nohup masterha_manager --conf=/etc/app1.cnf &

step2: 手工制造master too many connection

mysqlslap  --concurrency=4000 --iterations=10  --query='insert into tb(id_2) select (1);'  --debug-info -hhost_1 -uxx -pxx --create-schema=lc

mysqlslap: Error when connecting to server: 1040 Too many connections
mysqlslap: Error when connecting to server: 1040 Too many connections
mysqlslap: Error when connecting to server: 1040 Too many connections

step3: 检查是否发生failover

* 1. 手动开启另一个session，链接报错tmc。 但是并没有发生failover，说明master 还活着
* 2. 检查了MHA的health check机制，原来是长链接

DIGEST_TEXT: SELECT ? AS VALUE
 COUNT_STAR: 46
 FIRST_SEEN: 2016-08-02 11:28:00
  LAST_SEEN: 2016-08-02 11:30:37

4.7 [用例测试] MySQL master 服务器 down掉，且候选master落后的最多，是否自动failover，知否可以成功的做日志补偿

step1: 检查各种MHA manager的各种环境

* 检查SSH
    masterha_check_ssh --conf=/etc/app1.cnf

* 检查复制(如果是非GTID模式，slave如果没有设置relay_log_purge=0，那么会有警告)
    masterha_check_repl --conf=/etc/app1.cnf
        Thu Jul 28 14:03:33 2016 - [warning]  relay_log_purge=0 is not set on slave host_2(host_2:3306).
        Thu Jul 28 14:03:33 2016 - [warning]  relay_log_purge=0 is not set on slave host_3(host_3:3306).

* 检查mha状态
    masterha_check_status --conf=/etc/app1.cnf

* 开启mha 监控
    nohup masterha_manager --conf=/etc/app1.cnf &

step2: 手动制造候选master落后的情况

候选master> stop slave； sleep 600s ； start slave；

step3: 手动制造master 服务器挂掉的情况

master > reboot

step4: 观察MHA 日志

Tue Aug  2 14:33:34 2016 - [warning]  Slave host_2(host_2:3306) SQL Thread delays too much. Latest log file:host_1_name.000002:576864, Current log file:host_1_name.000001:39731438. This server is not selected as a new master because recovery will take long time.
Tue Aug  2 14:33:34 2016 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln970] None of existing slaves matches as a new master. Maybe preferred node is misconfigured or all slaves are too  far behind.
Tue Aug  2 14:33:34 2016 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR:  at /usr/bin/masterha_master_switch line 53

step5：重新修正配置文件

由于要指定特定的slave为候选master，而此slave还落后非常多
必须在每组服务器上都加上check_repl_delay=0

vi /etc/app1.cnf

[server default]
remote_workdir=/var/log/masterha/app1
manager_workdir=/var/log/masterha/app1
manager_log=/var/log/masterha/app1/app1.log

[server1]
hostname=host_1
candidate_master=1
check_repl_delay=0

[server2]
hostname=host_2
candidate_master=1
check_repl_delay=0

[server3]
hostname=host_3
no_master=1
check_repl_delay=0

step6：用手工failover再来模拟

masterha_master_switch --master_state=dead --conf=/etc/app1.cnf --dead_master_host=host_1 --interactive=1 --ignore_last_failover --new_master_host=host_2

step7：观察MHA 日志

Tue Aug  2 14:38:55 2016 - [info] Reading default configuration from /etc/masterha_default.cnf..
Tue Aug  2 14:38:55 2016 - [info] Reading application default configuration from /etc/app1.cnf..
Tue Aug  2 14:38:55 2016 - [info] Reading server configuration from /etc/app1.cnf..
Tue Aug  2 14:38:55 2016 - [info] MHA::MasterFailover version 0.56.
Tue Aug  2 14:38:55 2016 - [info] Starting master failover.
Tue Aug  2 14:38:55 2016 - [info]
Tue Aug  2 14:38:55 2016 - [info] * Phase 1: Configuration Check Phase..
Tue Aug  2 14:38:55 2016 - [info]
Tue Aug  2 14:38:55 2016 - [info] GTID failover mode = 0
Tue Aug  2 14:38:55 2016 - [info] Dead Servers:
Tue Aug  2 14:38:55 2016 - [info]   host_1(host_1:3306)
Tue Aug  2 14:38:55 2016 - [info] Checking master reachability via MySQL(double check)...
Tue Aug  2 14:38:55 2016 - [info]  ok.
Tue Aug  2 14:38:55 2016 - [info] Alive Servers:
Tue Aug  2 14:38:55 2016 - [info]   host_2(host_2:3306)
Tue Aug  2 14:38:55 2016 - [info]   host_3(host_3:3306)
Tue Aug  2 14:38:55 2016 - [info] Alive Slaves:
Tue Aug  2 14:38:55 2016 - [info]   host_2(host_2:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Tue Aug  2 14:38:55 2016 - [info]     Replicating from host_1(host_1:3306)
Tue Aug  2 14:38:55 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Aug  2 14:38:55 2016 - [info]   host_3(host_3:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Tue Aug  2 14:38:55 2016 - [info]     Replicating from host_1(host_1:3306)
Tue Aug  2 14:38:55 2016 - [info]     Not candidate for the new Master (no_master is set)
Master host_1(host_1:3306) is dead. Proceed? (yes/NO): yes
Tue Aug  2 14:38:57 2016 - [info] Starting Non-GTID based failover.
Tue Aug  2 14:38:57 2016 - [info]
Tue Aug  2 14:38:57 2016 - [info] ** Phase 1: Configuration Check Phase completed.
Tue Aug  2 14:38:57 2016 - [info]
Tue Aug  2 14:38:57 2016 - [info] * Phase 2: Dead Master Shutdown Phase..
Tue Aug  2 14:38:57 2016 - [info]
Tue Aug  2 14:38:58 2016 - [info] HealthCheck: SSH to host_1 is reachable.
Tue Aug  2 14:38:58 2016 - [info] Forcing shutdown so that applications never connect to the current master..
Tue Aug  2 14:38:58 2016 - [info] Executing master IP deactivation script:
Tue Aug  2 14:38:58 2016 - [info]   /home/mysql/MHA/masterha/master_ip_failover --orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306 --command=stopssh --ssh_user=root
Tue Aug  2 14:38:58 2016 - [info]  done.
Tue Aug  2 14:38:58 2016 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Tue Aug  2 14:38:58 2016 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Tue Aug  2 14:38:58 2016 - [info]
Tue Aug  2 14:38:58 2016 - [info] * Phase 3: Master Recovery Phase..
Tue Aug  2 14:38:58 2016 - [info]
Tue Aug  2 14:38:58 2016 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Tue Aug  2 14:38:58 2016 - [info]
Tue Aug  2 14:38:58 2016 - [info] The latest binary log file/position on all slaves is host_1_name.000002:576864
Tue Aug  2 14:38:58 2016 - [info] Latest slaves (Slaves that received relay log files to the latest):
Tue Aug  2 14:38:58 2016 - [info]   host_3(host_3:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Tue Aug  2 14:38:58 2016 - [info]     Replicating from host_1(host_1:3306)
Tue Aug  2 14:38:58 2016 - [info]     Not candidate for the new Master (no_master is set)
Tue Aug  2 14:38:58 2016 - [info] The oldest binary log file/position on all slaves is host_1_name.000001:39731619
Tue Aug  2 14:38:58 2016 - [info] Oldest slaves:
Tue Aug  2 14:38:58 2016 - [info]   host_2(host_2:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Tue Aug  2 14:38:58 2016 - [info]     Replicating from host_1(host_1:3306)
Tue Aug  2 14:38:58 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Aug  2 14:38:58 2016 - [info]
Tue Aug  2 14:38:58 2016 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Tue Aug  2 14:38:58 2016 - [info]
Tue Aug  2 14:38:58 2016 - [info] Fetching dead master's binary logs..
Tue Aug  2 14:38:58 2016 - [info] Executing command on the dead master host_1(host_1:3306): save_binary_logs --command=save --start_file=host_1_name.000002  --start_pos=576864 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56
  Creating /var/log/masterha/app1 if not exists..    ok.
 Concat binary/relay logs from host_1_name.000002 pos 576864 to host_1_name.000002 EOF into /var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlog ..
 Binlog Checksum enabled
  Dumping binlog format description event, from position 0 to 154.. ok.
  Dumping effective binlog data from /data/mysql.bin/host_1_name.000002 position 576864 to tail(577435).. ok.
 Binlog Checksum enabled
 Concat succeeded.
saved_master_binlog_from_host_1_3306_20160802143855.binlog                                                           100%  725     0.7KB/s   00:00
Tue Aug  2 14:38:59 2016 - [info] scp from root@host_1:/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlog to local:/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlog succeeded.
Tue Aug  2 14:38:59 2016 - [info] HealthCheck: SSH to host_2 is reachable.
Tue Aug  2 14:39:00 2016 - [info] HealthCheck: SSH to host_3 is reachable.
Tue Aug  2 14:39:00 2016 - [info]
Tue Aug  2 14:39:00 2016 - [info] * Phase 3.3: Determining New Master Phase..
Tue Aug  2 14:39:00 2016 - [info]
Tue Aug  2 14:39:00 2016 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Tue Aug  2 14:39:00 2016 - [info] Checking whether host_3 has relay logs from the oldest position..
Tue Aug  2 14:39:00 2016 - [info] Executing command: apply_diff_relay_logs --command=find --latest_mlf=host_1_name.000002 --latest_rmlp=576864 --target_mlf=host_1_name.000001 --target_rmlp=39731619 --server_id=12616606 --workdir=/var/log/masterha/app1 --timestamp=20160802143855 --manager_version=0.56 --relay_dir=/data/mysql_data --current_relay_log=host_3_name-relay-bin.000004  :
    Relay log found at /data/mysql_data, up to host_3_name-relay-bin.000004
 Fast relay log position search failed. Reading relay logs to find..
Reading host_3_name-relay-bin.000004
 Binlog Checksum enabled
 Master Version is 5.7.13-log
 Binlog Checksum enabled
 host_3_name-relay-bin.000004 contains master host_1_name.000002 from position 4
Reading host_3_name-relay-bin.000003
 Binlog Checksum enabled
 host_3_name-relay-bin.000003 contains master host_1_name.000001 from position 218246073
Reading host_3_name-relay-bin.000002
 Binlog Checksum enabled
 host_3_name-relay-bin.000002 contains master host_1_name.000001 from position 154
Target relay log FOUND!
Tue Aug  2 14:39:00 2016 - [info] OK. host_3 has all relay logs.
Tue Aug  2 14:39:00 2016 - [info] Searching new master from slaves..
Tue Aug  2 14:39:00 2016 - [info]  Candidate masters from the configuration file:
Tue Aug  2 14:39:00 2016 - [info]   host_2(host_2:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Tue Aug  2 14:39:00 2016 - [info]     Replicating from host_1(host_1:3306)
Tue Aug  2 14:39:00 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Aug  2 14:39:00 2016 - [info]  Non-candidate masters:
Tue Aug  2 14:39:00 2016 - [info]   host_3(host_3:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Tue Aug  2 14:39:00 2016 - [info]     Replicating from host_1(host_1:3306)
Tue Aug  2 14:39:00 2016 - [info]     Not candidate for the new Master (no_master is set)
Tue Aug  2 14:39:00 2016 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Tue Aug  2 14:39:00 2016 - [info]   Not found.
Tue Aug  2 14:39:00 2016 - [info]  Searching from all candidate_master slaves..
Tue Aug  2 14:39:00 2016 - [info] New master is host_2(host_2:3306)
Tue Aug  2 14:39:00 2016 - [info] Starting master failover..
Tue Aug  2 14:39:00 2016 - [info]
From:
host_1(host_1:3306) (current master)
 +--host_2(host_2:3306)
 +--host_3(host_3:3306)

To:
host_2(host_2:3306) (new master)
 +--host_3(host_3:3306)

Starting master switch from host_1(host_1:3306) to host_2(host_2:3306)? (yes/NO): yes
Tue Aug  2 14:39:05 2016 - [info] New master decided manually is host_2(host_2:3306)
Tue Aug  2 14:39:05 2016 - [info]
Tue Aug  2 14:39:05 2016 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Tue Aug  2 14:39:05 2016 - [info]
Tue Aug  2 14:39:05 2016 - [info] Server host_2 received relay logs up to: host_1_name.000001:39731619
Tue Aug  2 14:39:05 2016 - [info] Need to get diffs from the latest slave(host_3) up to: host_1_name.000002:576864 (using the latest slave's relay logs)
Tue Aug  2 14:39:05 2016 - [info] Connecting to the latest slave host host_3, generating diff relay log files..
Tue Aug  2 14:39:05 2016 - [info] Executing command: apply_diff_relay_logs --command=generate_and_send --scp_user=root --scp_host=host_2 --latest_mlf=host_1_name.000002 --latest_rmlp=576864 --target_mlf=host_1_name.000001 --target_rmlp=39731619 --server_id=12616606 --diff_file_readtolatest=/var/log/masterha/app1/relay_from_read_to_latest_host_2_3306_20160802143855.binlog --workdir=/var/log/masterha/app1 --timestamp=20160802143855 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --relay_dir=/data/mysql_data --current_relay_log=host_3_name-relay-bin.000004
Tue Aug  2 14:39:11 2016 - [info]
    Relay log found at /data/mysql_data, up to host_3_name-relay-bin.000004
 Fast relay log position search failed. Reading relay logs to find..
Reading host_3_name-relay-bin.000004
 Binlog Checksum enabled
 Master Version is 5.7.13-log
 Binlog Checksum enabled
 host_3_name-relay-bin.000004 contains master host_1_name.000002 from position 4
Reading host_3_name-relay-bin.000003
 Binlog Checksum enabled
 host_3_name-relay-bin.000003 contains master host_1_name.000001 from position 218246073
Reading host_3_name-relay-bin.000002
 Binlog Checksum enabled
 host_3_name-relay-bin.000002 contains master host_1_name.000001 from position 154
 Target relay log file/position found. start_file:host_3_name-relay-bin.000002, start_pos:39731788.
 Concat binary/relay logs from host_3_name-relay-bin.000002 pos 39731788 to host_3_name-relay-bin.000004 EOF into /var/log/masterha/app1/relay_from_read_to_latest_host_2_3306_20160802143855.binlog ..
 Binlog Checksum enabled
 Binlog Checksum enabled
  Dumping binlog format description event, from position 0 to 323.. ok.
  Dumping effective binlog data from /data/mysql_data/host_3_name-relay-bin.000002 position 39731788 to tail(218246252).. ok.
  Dumping binlog head events (rotate events), skipping format description events from /data/mysql_data/host_3_name-relay-bin.000003..  Binlog Checksum enabled
dumped up to pos 264. ok.
  No need to dump effective binlog data from /data/mysql_data/host_3_name-relay-bin.000003 (pos starts 264, filesize 264). Skipping.
  Dumping binlog head events (rotate events), skipping format description events from /data/mysql_data/host_3_name-relay-bin.000004..  Binlog Checksum enabled
 Binlog Checksum enabled
dumped up to pos 373. ok.
  Dumping effective binlog data from /data/mysql_data/host_3_name-relay-bin.000004 position 373 to tail(577083).. ok.
 Binlog Checksum enabled
 Binlog Checksum enabled
 Concat succeeded.
 Generating diff relay log succeeded. Saved at /var/log/masterha/app1/relay_from_read_to_latest_host_2_3306_20160802143855.binlog .
 scp host_3_name.58os.org:/var/log/masterha/app1/relay_from_read_to_latest_host_2_3306_20160802143855.binlog to root@host_2(22) succeeded.
Tue Aug  2 14:39:11 2016 - [info]  Generating diff files succeeded.
Tue Aug  2 14:39:11 2016 - [info] Sending binlog..
saved_master_binlog_from_host_1_3306_20160802143855.binlog                                                           100%  725     0.7KB/s   00:00
Tue Aug  2 14:39:12 2016 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlog to root@host_2:/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlog succeeded.
Tue Aug  2 14:39:12 2016 - [info]
Tue Aug  2 14:39:12 2016 - [info] * Phase 3.4: Master Log Apply Phase..
Tue Aug  2 14:39:12 2016 - [info]
Tue Aug  2 14:39:12 2016 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Tue Aug  2 14:39:12 2016 - [info] Starting recovery on host_2(host_2:3306)..
Tue Aug  2 14:39:12 2016 - [info]  Generating diffs succeeded.
Tue Aug  2 14:39:12 2016 - [info] Waiting until all relay logs are applied.
Tue Aug  2 14:39:12 2016 - [info]  done.
Tue Aug  2 14:39:12 2016 - [info] Getting slave status..
Tue Aug  2 14:39:12 2016 - [info] This slave(host_2)'s Exec_Master_Log_Pos(host_1_name.000001:39731438) does not equal to Read_Master_Log_Pos(host_1_name.000001:39731619). It is likely that relay log was cut during transaction. Need to recover from Exec_Master_Log_Pos.
Tue Aug  2 14:39:12 2016 - [info] Saving local relay logs from exec pos to read pos on host_2: from host_2_name-relay-bin.000003:39726380 to the end of the relay log..
Tue Aug  2 14:39:12 2016 - [info] Executing command : save_binary_logs --command=save --start_file=host_2_name-relay-bin.000003  --start_pos=39726380 --output_file=/var/log/masterha/app1/relay_from_exec_to_read_host_2_3306_20160802143855.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --binlog_dir=/data/mysql_data
Tue Aug  2 14:39:12 2016 - [info]
  Creating /var/log/masterha/app1 if not exists..    ok.
 Concat binary/relay logs from host_2_name-relay-bin.000003 pos 39726380 to host_2_name-relay-bin.000003 EOF into /var/log/masterha/app1/relay_from_exec_to_read_host_2_3306_20160802143855.binlog ..
 Binlog Checksum enabled
 Binlog Checksum enabled
  Dumping binlog format description event, from position 0 to 323.. ok.
  Dumping effective binlog data from /data/mysql_data/host_2_name-relay-bin.000003 position 39726380 to tail(39726561).. ok.
 Binlog Checksum enabled
 Binlog Checksum enabled
 Concat succeeded.
Tue Aug  2 14:39:12 2016 - [info] Connecting to the target slave host host_2, running recover script..
Tue Aug  2 14:39:12 2016 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='dba' --slave_host=host_2 --slave_ip=host_2  --slave_port=3306 --apply_files=/var/log/masterha/app1/relay_from_exec_to_read_host_2_3306_20160802143855.binlog,/var/log/masterha/app1/relay_from_read_to_latest_host_2_3306_20160802143855.binlog,/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlog --workdir=/var/log/masterha/app1 --target_version=5.7.13-log --timestamp=20160802143855 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx
Tue Aug  2 14:42:11 2016 - [info]
 Concat all apply files to /var/log/masterha/app1/total_binlog_for_host_2_3306.20160802143855.binlog ..
 Copying the first binlog file /var/log/masterha/app1/relay_from_exec_to_read_host_2_3306_20160802143855.binlog to /var/log/masterha/app1/total_binlog_for_host_2_3306.20160802143855.binlog.. ok.
  Dumping binlog head events (rotate events), skipping format description events from /var/log/masterha/app1/relay_from_read_to_latest_host_2_3306_20160802143855.binlog..  Binlog Checksum enabled
 Binlog Checksum enabled
dumped up to pos 323. ok.
 /var/log/masterha/app1/relay_from_read_to_latest_host_2_3306_20160802143855.binlog has effective binlog events from pos 323.
  Dumping effective binlog data from /var/log/masterha/app1/relay_from_read_to_latest_host_2_3306_20160802143855.binlog position 323 to tail(179091707).. ok.
  Dumping binlog head events (rotate events), skipping format description events from /var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlog..  Binlog Checksum enabled
dumped up to pos 154. ok.
 /var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlog has effective binlog events from pos 154.
  Dumping effective binlog data from /var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlog position 154 to tail(725).. ok.
 Concat succeeded.
All apply target binary logs are concatinated at /var/log/masterha/app1/total_binlog_for_host_2_3306.20160802143855.binlog .
MySQL client version is 5.7.13. Using --binary-mode.
Applying differential binary/relay log files /var/log/masterha/app1/relay_from_exec_to_read_host_2_3306_20160802143855.binlog,/var/log/masterha/app1/relay_from_read_to_latest_host_2_3306_20160802143855.binlog,/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlog on host_2:3306. This may take long time...
Applying log files succeeded.
Tue Aug  2 14:42:11 2016 - [info]  All relay logs were successfully applied.
Tue Aug  2 14:42:11 2016 - [info] Getting new master's binlog name and position..
Tue Aug  2 14:42:11 2016 - [info]  host_2_name.000001:217647730
Tue Aug  2 14:42:11 2016 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='host_2', MASTER_PORT=3306, MASTER_LOG_FILE='host_2_name.000001', MASTER_LOG_POS=217647730, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Tue Aug  2 14:42:11 2016 - [info] Executing master IP activate script:
Tue Aug  2 14:42:11 2016 - [info]   /home/mysql/MHA/masterha/master_ip_failover --command=start --ssh_user=root --orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306 --new_master_host=host_2 --new_master_ip=host_2 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba'
Set read_only=0 on the new master.
No need to Creating app user on the new master..
Tue Aug  2 14:42:11 2016 - [info]  OK.
Tue Aug  2 14:42:11 2016 - [info] ** Finished master recovery successfully.
Tue Aug  2 14:42:11 2016 - [info] * Phase 3: Master Recovery Phase completed.
Tue Aug  2 14:42:11 2016 - [info]
Tue Aug  2 14:42:11 2016 - [info] * Phase 4: Slaves Recovery Phase..
Tue Aug  2 14:42:11 2016 - [info]
Tue Aug  2 14:42:11 2016 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Tue Aug  2 14:42:11 2016 - [info]
Tue Aug  2 14:42:11 2016 - [info] -- Slave diff file generation on host host_3(host_3:3306) started, pid: 22541. Check tmp log /var/log/masterha/app1/host_3_3306_20160802143855.log if it takes time..
Tue Aug  2 14:42:11 2016 - [info]
Tue Aug  2 14:42:11 2016 - [info] Log messages from host_3 ...
Tue Aug  2 14:42:11 2016 - [info]
Tue Aug  2 14:42:11 2016 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Tue Aug  2 14:42:11 2016 - [info] End of log messages from host_3.
Tue Aug  2 14:42:11 2016 - [info] -- host_3(host_3:3306) has the latest relay log events.
Tue Aug  2 14:42:11 2016 - [info] Generating relay diff files from the latest slave succeeded.
Tue Aug  2 14:42:11 2016 - [info]
Tue Aug  2 14:42:11 2016 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Tue Aug  2 14:42:11 2016 - [info]
Tue Aug  2 14:42:11 2016 - [info] -- Slave recovery on host host_3(host_3:3306) started, pid: 22543. Check tmp log /var/log/masterha/app1/host_3_3306_20160802143855.log if it takes time..
saved_master_binlog_from_host_1_3306_20160802143855.binlog                                                           100%  725     0.7KB/s   00:00
Tue Aug  2 14:42:12 2016 - [info]
Tue Aug  2 14:42:12 2016 - [info] Log messages from host_3 ...
Tue Aug  2 14:42:12 2016 - [info]
Tue Aug  2 14:42:11 2016 - [info] Sending binlog..
Tue Aug  2 14:42:12 2016 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlog to root@host_3:/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlog succeeded.
Tue Aug  2 14:42:12 2016 - [info] Starting recovery on host_3(host_3:3306)..
Tue Aug  2 14:42:12 2016 - [info]  Generating diffs succeeded.
Tue Aug  2 14:42:12 2016 - [info] Waiting until all relay logs are applied.
Tue Aug  2 14:42:12 2016 - [info]  done.
Tue Aug  2 14:42:12 2016 - [info] Getting slave status..
Tue Aug  2 14:42:12 2016 - [info] This slave(host_3)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(host_1_name.000002:576864). No need to recover from Exec_Master_Log_Pos.
Tue Aug  2 14:42:12 2016 - [info] Connecting to the target slave host host_3, running recover script..
Tue Aug  2 14:42:12 2016 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='dba' --slave_host=host_3 --slave_ip=host_3  --slave_port=3306 --apply_files=/var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlog --workdir=/var/log/masterha/app1 --target_version=5.7.13-log --timestamp=20160802143855 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx
Tue Aug  2 14:42:12 2016 - [info]
MySQL client version is 5.7.13. Using --binary-mode.
Applying differential binary/relay log files /var/log/masterha/app1/saved_master_binlog_from_host_1_3306_20160802143855.binlog on host_3:3306. This may take long time...
Applying log files succeeded.
Tue Aug  2 14:42:12 2016 - [info]  All relay logs were successfully applied.
Tue Aug  2 14:42:12 2016 - [info]  Resetting slave host_3(host_3:3306) and starting replication from the new master host_2(host_2:3306)..
Tue Aug  2 14:42:12 2016 - [info]  Executed CHANGE MASTER.
Tue Aug  2 14:42:12 2016 - [info]  Slave started.
Tue Aug  2 14:42:12 2016 - [info] End of log messages from host_3.
Tue Aug  2 14:42:12 2016 - [info] -- Slave recovery on host host_3(host_3:3306) succeeded.
Tue Aug  2 14:42:12 2016 - [info] All new slave servers recovered successfully.
Tue Aug  2 14:42:12 2016 - [info]
Tue Aug  2 14:42:12 2016 - [info] * Phase 5: New master cleanup phase..
Tue Aug  2 14:42:12 2016 - [info]
Tue Aug  2 14:42:12 2016 - [info] Resetting slave info on the new master..
Tue Aug  2 14:42:12 2016 - [info]  host_2: Resetting slave info succeeded.
Tue Aug  2 14:42:12 2016 - [info] Master failover to host_2(host_2:3306) completed successfully.
Tue Aug  2 14:42:12 2016 - [info]

----- Failover Report -----

app1: MySQL Master failover host_1(host_1:3306) to host_2(host_2:3306) succeeded

Master host_1(host_1:3306) is down!

Check MHA Manager logs at host_manager_name for details.

Started manual(interactive) failover.
Invalidated master IP address on host_1(host_1:3306)
The latest slave host_3(host_3:3306) has all relay logs for recovery.
Selected host_2(host_2:3306) as a new master.
host_2(host_2:3306): OK: Applying all logs succeeded.
host_2(host_2:3306): OK: Activated master IP address.
host_3(host_3:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
host_3(host_3:3306): OK: Applying all logs succeeded. Slave started, replicating from host_2(host_2:3306)
host_2(host_2:3306): Resetting slave info succeeded.
Master failover to host_2(host_2:3306) completed successfully.

4.8 [用例测试] Master 没挂，但是MySQL slave 服务 down掉，是否自动failover

MySQL slave down掉，不会自动failover

* MHA 只会监控master的状态，只有master挂了，才会自动failover

[info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..

master没挂，其他slave的错误，都不会导致自动failover

> 10. [用例测试] MySQL slave IO/SQL线程 stop，是否自动failover
> 11. [用例测试] MySQL slave IO/SQL线程 报错，是否自动failover

4.9 [用例测试] Master 挂了，但是MySQL slave 有问题，是否自动failover

MySQL slave IO/SQL线程 stop
MySQL slave IO/SQL线程延迟很多

step1: 检查各种MHA manager的各种环境

* 检查SSH
    masterha_check_ssh --conf=/etc/app1.cnf

* 检查复制(如果是非GTID模式，slave如果没有设置relay_log_purge=0，那么会有警告)
    masterha_check_repl --conf=/etc/app1.cnf
        Thu Jul 28 14:03:33 2016 - [warning]  relay_log_purge=0 is not set on slave host_2(host_2:3306).
        Thu Jul 28 14:03:33 2016 - [warning]  relay_log_purge=0 is not set on slave host_3(host_3:3306).

* 检查mha状态
    masterha_check_status --conf=/etc/app1.cnf

* 开启mha 监控
    nohup masterha_manager --conf=/etc/app1.cnf &

step2: 手动让某台slave stop slave ，然后再让master 挂掉

某台slave> stop slave;
master> shutdown mysql;

step3: 观察日志，观察情况

* 如果是slave 停止同步，然后master也挂了，任然会自动failover，不会报错
* 如果是slave 延迟严重，然后master也挂了，任然会自动failover，不会报错

4.10 [用例测试] MySQL master 有大事务超过100s再执行，是否online master switch

step1: 检查各种MHA manager的各种环境

* 检查SSH
    masterha_check_ssh --conf=/etc/app1.cnf

* 检查复制(如果是非GTID模式，slave如果没有设置relay_log_purge=0，那么会有警告)
    masterha_check_repl --conf=/etc/app1.cnf
        Thu Jul 28 14:03:33 2016 - [warning]  relay_log_purge=0 is not set on slave host_2(host_2:3306).
        Thu Jul 28 14:03:33 2016 - [warning]  relay_log_purge=0 is not set on slave host_3(host_3:3306).

* 检查mha状态
    masterha_check_status --conf=/etc/app1.cnf

step2: 手动在master上制造大事务

master> select id , sleep(20) from tb for update;

step3: 哪种情况MHA会报错

* slave 有问题的时候
     IO，sql thread有问题
     slave 挂了
     slave 延迟太多
    复制过滤规则不统一
     等等

以上情况，都会导致切换失败

step4：日志如下

如果有大事务，MHA会一直等待，直到事务结束

Wed Aug  3 15:33:04 2016 - [info] MHA::MasterRotate version 0.56.
Wed Aug  3 15:33:04 2016 - [info] Starting online master switch..
Wed Aug  3 15:33:04 2016 - [info]
Wed Aug  3 15:33:04 2016 - [info] * Phase 1: Configuration Check Phase..
Wed Aug  3 15:33:04 2016 - [info]
Wed Aug  3 15:33:04 2016 - [info] Reading default configuration from /etc/masterha_default.cnf..
Wed Aug  3 15:33:04 2016 - [info] Reading application default configuration from /etc/app1.cnf..
Wed Aug  3 15:33:04 2016 - [info] Reading server configuration from /etc/app1.cnf..
Wed Aug  3 15:33:04 2016 - [info] GTID failover mode = 0
Wed Aug  3 15:33:04 2016 - [info] Current Alive Master: host_1(host_1:3306)
Wed Aug  3 15:33:04 2016 - [info] Alive Slaves:
Wed Aug  3 15:33:04 2016 - [info]   host_2(host_2:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Wed Aug  3 15:33:04 2016 - [info]     Replicating from host_1(host_1:3306)
Wed Aug  3 15:33:04 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Wed Aug  3 15:33:04 2016 - [info]   host_3(host_3:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Wed Aug  3 15:33:04 2016 - [info]     Replicating from host_1(host_1:3306)
Wed Aug  3 15:33:04 2016 - [info]     Not candidate for the new Master (no_master is set)
Wed Aug  3 15:33:04 2016 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..

... 一直等待  ...

4.11 [用例测试] MySQL master 网络断掉

step1: 检查各种MHA manager的各种环境

* 检查SSH
    masterha_check_ssh --conf=/etc/app1.cnf

* 检查复制(如果是非GTID模式，slave如果没有设置relay_log_purge=0，那么会有警告)
    masterha_check_repl --conf=/etc/app1.cnf
        Thu Jul 28 14:03:33 2016 - [warning]  relay_log_purge=0 is not set on slave host_2(host_2:3306).
        Thu Jul 28 14:03:33 2016 - [warning]  relay_log_purge=0 is not set on slave host_3(host_3:3306).

* 检查mha状态
    masterha_check_status --conf=/etc/app1.cnf

* 开启mha 监控
    nohup masterha_manager --conf=/etc/app1.cnf &

step2: 手动制造master网络不通的情况

* MASTER 网络断掉，只允许host_monitor的22端口访问（便于恢复网络）
master> iptables -A INPUT -p tcp -s host_monitor --dprot 22 -j ACCEPT
master> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP

* MASTER 恢复网络
master> service iptables restart

step3: master的网络不通，观察MHA日志

Wed Aug  3 18:32:26 2016 - [warning] Got timeout on MySQL Ping(SELECT) child process and killed it! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 431.
Wed Aug  3 18:32:26 2016 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=host_1_name
Wed Aug  3 18:32:29 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Wed Aug  3 18:32:29 2016 - [warning] Connection failed 2 time(s)..
Wed Aug  3 18:32:31 2016 - [warning] HealthCheck: Got timeout on checking SSH connection to host_1! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342.
Wed Aug  3 18:32:32 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Wed Aug  3 18:32:32 2016 - [warning] Connection failed 3 time(s)..
Wed Aug  3 18:32:35 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Wed Aug  3 18:32:35 2016 - [warning] Connection failed 4 time(s)..
Wed Aug  3 18:32:35 2016 - [warning] Master is not reachable from health checker!
Wed Aug  3 18:32:35 2016 - [warning] Master host_1(host_1:3306) is not reachable!
Wed Aug  3 18:32:35 2016 - [warning] SSH is NOT reachable.
Wed Aug  3 18:32:35 2016 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/app1.cnf again, and trying to connect to all servers to check server status..
Wed Aug  3 18:32:35 2016 - [info] Reading default configuration from /etc/masterha_default.cnf..
Wed Aug  3 18:32:35 2016 - [info] Reading application default configuration from /etc/app1.cnf..
Wed Aug  3 18:32:35 2016 - [info] Reading server configuration from /etc/app1.cnf..
Wed Aug  3 18:32:35 2016 - [info] GTID failover mode = 0
Wed Aug  3 18:32:35 2016 - [info] Dead Servers:
Wed Aug  3 18:32:35 2016 - [info]   host_1(host_1:3306)
Wed Aug  3 18:32:35 2016 - [info] Alive Servers:
Wed Aug  3 18:32:35 2016 - [info]   host_2(host_2:3306)
Wed Aug  3 18:32:35 2016 - [info]   host_3(host_3:3306)
Wed Aug  3 18:32:35 2016 - [info] Alive Slaves:
Wed Aug  3 18:32:35 2016 - [info]   host_2(host_2:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Wed Aug  3 18:32:35 2016 - [info]     Replicating from host_1(host_1:3306)
Wed Aug  3 18:32:35 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Wed Aug  3 18:32:35 2016 - [info]   host_3(host_3:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Wed Aug  3 18:32:35 2016 - [info]     Replicating from host_1(host_1:3306)
Wed Aug  3 18:32:35 2016 - [info]     Not candidate for the new Master (no_master is set)
Wed Aug  3 18:32:35 2016 - [info] Checking slave configurations..
Wed Aug  3 18:32:35 2016 - [info]  read_only=1 is not set on slave host_2(host_2:3306).
Wed Aug  3 18:32:35 2016 - [info]  read_only=1 is not set on slave host_3(host_3:3306).
Wed Aug  3 18:32:35 2016 - [warning]  relay_log_purge=0 is not set on slave host_3(host_3:3306).
Wed Aug  3 18:32:35 2016 - [info] Checking replication filtering settings..
Wed Aug  3 18:32:35 2016 - [info]  Replication filtering check ok.
Wed Aug  3 18:32:35 2016 - [info] Master is down!
Wed Aug  3 18:32:35 2016 - [info] Terminating monitoring script.
Wed Aug  3 18:32:35 2016 - [info] Got exit code 20 (Master dead).
Wed Aug  3 18:32:35 2016 - [info] MHA::MasterFailover version 0.56.
Wed Aug  3 18:32:35 2016 - [info] Starting master failover.
Wed Aug  3 18:32:35 2016 - [info]
Wed Aug  3 18:32:35 2016 - [info] * Phase 1: Configuration Check Phase..
Wed Aug  3 18:32:35 2016 - [info]
Wed Aug  3 18:32:35 2016 - [info] GTID failover mode = 0
Wed Aug  3 18:32:35 2016 - [info] Dead Servers:
Wed Aug  3 18:32:35 2016 - [info]   host_1(host_1:3306)
Wed Aug  3 18:32:35 2016 - [info] Checking master reachability via MySQL(double check)...
Wed Aug  3 18:32:36 2016 - [info]  ok.
Wed Aug  3 18:32:36 2016 - [info] Alive Servers:
Wed Aug  3 18:32:36 2016 - [info]   host_2(host_2:3306)
Wed Aug  3 18:32:36 2016 - [info]   host_3(host_3:3306)
Wed Aug  3 18:32:36 2016 - [info] Alive Slaves:
Wed Aug  3 18:32:36 2016 - [info]   host_2(host_2:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Wed Aug  3 18:32:36 2016 - [info]     Replicating from host_1(host_1:3306)
Wed Aug  3 18:32:36 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Wed Aug  3 18:32:36 2016 - [info]   host_3(host_3:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Wed Aug  3 18:32:36 2016 - [info]     Replicating from host_1(host_1:3306)
Wed Aug  3 18:32:36 2016 - [info]     Not candidate for the new Master (no_master is set)
Wed Aug  3 18:32:36 2016 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln309] Last failover was done at 2016/08/03 16:07:43. Current time is too early to do failover again. If you want to do failover, manually remove /var/log/masterha/app1/app1.failover.complete and run this script again.
Wed Aug  3 18:32:36 2016 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR:  at /usr/bin/masterha_manager line 65

* 由于之前做过多次切换，然后时间太短，所以failover终止，但是从这里可以看出，MHA已经检测到网络问题，进行failover动作了

4.12 [用例测试] MySQL master 网路瞬断（1~30秒），是否自动failover

step1: 检查各种MHA manager的各种环境

* 检查SSH
    masterha_check_ssh --conf=/etc/app1.cnf

* 检查复制(如果是非GTID模式，slave如果没有设置relay_log_purge=0，那么会有警告)
    masterha_check_repl --conf=/etc/app1.cnf
        Thu Jul 28 14:03:33 2016 - [warning]  relay_log_purge=0 is not set on slave host_2(host_2:3306).
        Thu Jul 28 14:03:33 2016 - [warning]  relay_log_purge=0 is not set on slave host_3(host_3:3306).

* 检查mha状态
    masterha_check_status --conf=/etc/app1.cnf

* 开启mha 监控
    nohup masterha_manager --conf=/etc/app1.cnf &

step2: 手动制造master网络不通的情况, 断掉1~10s左右

* MASTER 网络断掉，只允许host_monitor的22端口访问（便于恢复网络）
master> iptables -A INPUT -p tcp -s host_monitor --dprot 22 -j ACCEPT
master> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP


* 1~10秒左右 MASTER 恢复网络
master> service iptables restart

step3: 观察MHA日志

结论：如果时间非常短，比如1~2秒的网络瞬断，不会failover
结论：如果时间比较长，比如：> 5~10秒，那么如果MHA已经判定到Master is not reachable from health checker! 那么，就会进行failover阶段

Wed Aug  3 18:48:35 2016 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..



Wed Aug  3 18:48:50 2016 - [warning] Got timeout on MySQL Ping(SELECT) child process and killed it! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 431.
Wed Aug  3 18:48:50 2016 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=host_1_name
Wed Aug  3 18:48:53 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Wed Aug  3 18:48:53 2016 - [warning] Connection failed 2 time(s)..
Wed Aug  3 18:48:55 2016 - [warning] HealthCheck: Got timeout on checking SSH connection to host_1! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342.
Wed Aug  3 18:48:56 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Wed Aug  3 18:48:56 2016 - [warning] Connection failed 3 time(s)..
Wed Aug  3 18:48:59 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Wed Aug  3 18:48:59 2016 - [warning] Connection failed 4 time(s)..
Wed Aug  3 18:48:59 2016 - [warning] Master is not reachable from health checker!
Wed Aug  3 18:48:59 2016 - [warning] Master host_1(host_1:3306) is not reachable!
Wed Aug  3 18:48:59 2016 - [warning] SSH is NOT reachable.

----- Failover Report -----

app1: MySQL Master failover host_1(host_1:3306) to host_2(host_2:3306) succeeded

Master host_1(host_1:3306) is down!

Check MHA Manager logs at host_manager_name:/var/log/masterha/app1/app1.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on host_1(host_1:3306)
The latest slave host_2(host_2:3306) has all relay logs for recovery.
Selected host_2(host_2:3306) as a new master.
host_2(host_2:3306): OK: Applying all logs succeeded.
host_2(host_2:3306): OK: Activated master IP address.
host_3(host_3:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
host_3(host_3:3306): OK: Applying all logs succeeded. Slave started, replicating from host_2(host_2:3306)
host_2(host_2:3306): Resetting slave info succeeded.
Master failover to host_2(host_2:3306) completed successfully.

4.14 [用例测试] 多段网络测试0 （没有二次检测的机制）

Manager <-- 不通 --> Master
Manager <-- 正常 --> S1 <-- 正常 --> master
Manager <-- 正常 --> S2 <-- 正常 --> master

step1: 检查各种MHA manager的各种环境

* 检查SSH
    masterha_check_ssh --conf=/etc/app1.cnf

* 检查复制(如果是非GTID模式，slave如果没有设置relay_log_purge=0，那么会有警告)
    masterha_check_repl --conf=/etc/app1.cnf
        Thu Jul 28 14:03:33 2016 - [warning]  relay_log_purge=0 is not set on slave host_2(host_2:3306).
        Thu Jul 28 14:03:33 2016 - [warning]  relay_log_purge=0 is not set on slave host_3(host_3:3306).

* 检查mha状态
    masterha_check_status --conf=/etc/app1.cnf

* 开启mha 监控
    nohup masterha_manager --conf=/etc/app1.cnf &

step2: 手动制造上述网络故障


master> iptables -A INPUT -p tcp -s host_monitor  -j ACCEPT
master> iptables -A INPUT -p tcp -s host_2  -j ACCEPT
master> iptables -A INPUT -p tcp -s host_3  -j ACCEPT
master> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP

step3: 检查日志

已经发生failover

----- Failover Report -----

app1: MySQL Master failover host_1(host_1:3306) to host_2(host_2:3306) succeeded

Master host_1(host_1:3306) is down!

Check MHA Manager logs at host_manager_name:/var/log/masterha/app1/app1.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on host_1(host_1:3306)
The latest slave host_2(host_2:3306) has all relay logs for recovery.
Selected host_2(host_2:3306) as a new master.
host_2(host_2:3306): OK: Applying all logs succeeded.
host_2(host_2:3306): OK: Activated master IP address.
host_3(host_3:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
host_3(host_3:3306): OK: Applying all logs succeeded. Slave started, replicating from host_2(host_2:3306)
host_2(host_2:3306): Resetting slave info succeeded.
Master failover to host_2(host_2:3306) completed successfully.

4.14 [用例测试] 多段网络测试1 （有二次检测的机制）

Manager <-- 不通 --> Master
Manager <-- 正常 --> S1 <-- 正常 --> master
Manager <-- 正常 --> S2 <-- 正常 --> master

step1: 检查各种MHA manager的各种环境

* 检查SSH
    masterha_check_ssh --conf=/etc/app1.cnf

* 检查复制(如果是非GTID模式，slave如果没有设置relay_log_purge=0，那么会有警告)
    masterha_check_repl --conf=/etc/app1.cnf
        Thu Jul 28 14:03:33 2016 - [warning]  relay_log_purge=0 is not set on slave host_2(host_2:3306).
        Thu Jul 28 14:03:33 2016 - [warning]  relay_log_purge=0 is not set on slave host_3(host_3:3306).

* 检查mha状态
    masterha_check_status --conf=/etc/app1.cnf

* 开启mha 监控
    nohup masterha_manager --conf=/etc/app1.cnf --last_failover_minute=1 &

* 二次检测
    secondary_check_script= masterha_secondary_check -s host_2 -s host_3

step2: 手动制造上述网络故障


master> iptables -A INPUT -p tcp -s host_monitor  -j ACCEPT
master> iptables -A INPUT -p tcp -s host_2  -j ACCEPT
master> iptables -A INPUT -p tcp -s host_3  -j ACCEPT
master> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP

step3: 观察日志

结论：因为二次检查可以通过，所以MHA认为master没有挂，没有failover，但是在一直循环检测中
结论：只要有二次检测，那么只要有一个别的server可以连通master，那么就会认为master没有挂，就不会failover

Fri Aug  5 10:16:24 2016 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
Fri Aug  5 10:16:24 2016 - [info] HealthCheck: SSH to host_1 is reachable.













Fri Aug  5 10:17:06 2016 - [warning] Got timeout on MySQL Ping(SELECT) child process and killed it! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 431.
Fri Aug  5 10:17:06 2016 - [info] Executing secondary network check script: masterha_secondary_check -s host_2 -s host_3  --user=root  --master_host=host_1  --master_ip=host_1  --master_port=3306 --master_user=dba --master_password=dba --ping_type=SELECT
Fri Aug  5 10:17:06 2016 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=host_1_name
Master is reachable from host_2!
Fri Aug  5 10:17:07 2016 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.
Fri Aug  5 10:17:09 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug  5 10:17:09 2016 - [warning] Connection failed 2 time(s)..
Fri Aug  5 10:17:11 2016 - [warning] HealthCheck: Got timeout on checking SSH connection to host_1! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342.
Fri Aug  5 10:17:12 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug  5 10:17:12 2016 - [warning] Connection failed 3 time(s)..
Fri Aug  5 10:17:15 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug  5 10:17:15 2016 - [warning] Connection failed 4 time(s)..
Fri Aug  5 10:17:15 2016 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Aug  5 10:17:18 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug  5 10:17:18 2016 - [warning] Connection failed 1 time(s)..
Fri Aug  5 10:17:19 2016 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=host_1_name
Fri Aug  5 10:17:19 2016 - [info] Executing secondary network check script: masterha_secondary_check -s host_2 -s host_3  --user=root  --master_host=host_1  --master_ip=host_1  --master_port=3306 --master_user=dba --master_password=dba --ping_type=SELECT
Master is reachable from host_2!
Fri Aug  5 10:17:19 2016 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.
Fri Aug  5 10:17:22 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug  5 10:17:22 2016 - [warning] Connection failed 2 time(s)..
Fri Aug  5 10:17:24 2016 - [warning] HealthCheck: Got timeout on checking SSH connection to host_1! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342.
Fri Aug  5 10:17:25 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug  5 10:17:25 2016 - [warning] Connection failed 3 time(s)..
Fri Aug  5 10:17:28 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug  5 10:17:28 2016 - [warning] Connection failed 4 time(s)..
Fri Aug  5 10:17:28 2016 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Aug  5 10:17:31 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug  5 10:17:31 2016 - [warning] Connection failed 1 time(s)..
Fri Aug  5 10:17:31 2016 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=host_1_name
Fri Aug  5 10:17:31 2016 - [info] Executing secondary network check script: masterha_secondary_check -s host_2 -s host_3  --user=root  --master_host=host_1  --master_ip=host_1  --master_port=3306 --master_user=dba --master_password=dba --ping_type=SELECT
Master is reachable from host_2!

4.14 [用例测试] 多段网络测试2 （有二次检测的机制）

Manager <-- 不通 --> Master
Manager <-- 正常 --> S1 <-- 不通 --> master
Manager <-- 正常 --> S2 <-- 正常 --> master

step1: 检查各种MHA manager的各种环境

* 检查SSH
    masterha_check_ssh --conf=/etc/app1.cnf

* 检查复制(如果是非GTID模式，slave如果没有设置relay_log_purge=0，那么会有警告)
    masterha_check_repl --conf=/etc/app1.cnf
        Thu Jul 28 14:03:33 2016 - [warning]  relay_log_purge=0 is not set on slave host_2(host_2:3306).
        Thu Jul 28 14:03:33 2016 - [warning]  relay_log_purge=0 is not set on slave host_3(host_3:3306).

* 检查mha状态
    masterha_check_status --conf=/etc/app1.cnf

* 开启mha 监控
    nohup masterha_manager --conf=/etc/app1.cnf &

step2: 手动制造上述网络故障


master> iptables -A INPUT -p tcp -s host_monitor  -j ACCEPT
master> iptables -A INPUT -p tcp -s host_3  -j ACCEPT
master> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP

step3: 观察日志

结论：因为二次检查可以通过，所以MHA认为master没有挂，没有failover，但是在一直循环检测中
结论：只要有二次检测，那么只要有一个别的server可以连通master，那么就会认为master没有挂，就不会failover

Fri Aug  5 10:32:27 2016 - [info] Starting ping health check on host_1(host_1:3306)..
Fri Aug  5 10:32:27 2016 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..



















Fri Aug  5 10:33:30 2016 - [warning] Got timeout on MySQL Ping(SELECT) child process and killed it! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 431.
Fri Aug  5 10:33:30 2016 - [info] Executing secondary network check script: masterha_secondary_check -s host_2 -s host_3  --user=root  --master_host=host_1  --master_ip=host_1  --master_port=3306 --master_user=dba --master_password=dba --ping_type=SELECT
Fri Aug  5 10:33:30 2016 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=host_1_name
Fri Aug  5 10:33:33 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug  5 10:33:33 2016 - [warning] Connection failed 2 time(s)..
Fri Aug  5 10:33:35 2016 - [warning] HealthCheck: Got timeout on checking SSH connection to host_1! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342.
Monitoring server host_2 is reachable, Master is not reachable from host_2. OK.
Master is reachable from host_3!
Fri Aug  5 10:33:35 2016 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.
Fri Aug  5 10:33:36 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug  5 10:33:36 2016 - [warning] Connection failed 3 time(s)..
Fri Aug  5 10:33:39 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug  5 10:33:39 2016 - [warning] Connection failed 4 time(s)..
Fri Aug  5 10:33:39 2016 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Aug  5 10:33:42 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug  5 10:33:42 2016 - [warning] Connection failed 1 time(s)..
Fri Aug  5 10:33:42 2016 - [info] Executing secondary network check script: masterha_secondary_check -s host_2 -s host_3  --user=root  --master_host=host_1  --master_ip=host_1  --master_port=3306 --master_user=dba --master_password=dba --ping_type=SELECT
Fri Aug  5 10:33:42 2016 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=host_1_name
Fri Aug  5 10:33:45 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug  5 10:33:45 2016 - [warning] Connection failed 2 time(s)..
Fri Aug  5 10:33:47 2016 - [warning] HealthCheck: Got timeout on checking SSH connection to host_1! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342.
Monitoring server host_2 is reachable, Master is not reachable from host_2. OK.
Master is reachable from host_3!
Fri Aug  5 10:33:47 2016 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.
Fri Aug  5 10:33:48 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug  5 10:33:48 2016 - [warning] Connection failed 3 time(s)..
Fri Aug  5 10:33:51 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug  5 10:33:51 2016 - [warning] Connection failed 4 time(s)..
Fri Aug  5 10:33:51 2016 - [warning] Secondary network check script returned errors. Failover should not start so checking server status again. Check network settings for details.
Fri Aug  5 10:33:54 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug  5 10:33:54 2016 - [warning] Connection failed 1 time(s)..
Fri Aug  5 10:33:54 2016 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=host_1_name
Fri Aug  5 10:33:54 2016 - [info] Executing secondary network check script: masterha_secondary_check -s host_2 -s host_3  --user=root  --master_host=host_1  --master_ip=host_1  --master_port=3306 --master_user=dba --master_password=dba --ping_type=SELECT
Fri Aug  5 10:33:57 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug  5 10:33:57 2016 - [warning] Connection failed 2 time(s)..
Fri Aug  5 10:33:59 2016 - [warning] HealthCheck: Got timeout on checking SSH connection to host_1! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342.
Monitoring server host_2 is reachable, Master is not reachable from host_2. OK.
Master is reachable from host_3!
Fri Aug  5 10:34:00 2016 - [warning] Master is reachable from at least one of other monitoring servers. Failover should not happen.
Fri Aug  5 10:34:00 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug  5 10:34:00 2016 - [warning] Connection failed 3 time(s)..

4.14 [用例测试] 多段网络测试3 （有二次检测的机制）

Manager <-- 不通 --> Master
Manager <-- 正常 --> S1 <-- 不通 --> master
Manager <-- 正常 --> S2 <-- 不通 --> master

step1: 检查各种MHA manager的各种环境

* 检查SSH
    masterha_check_ssh --conf=/etc/app1.cnf

* 检查复制(如果是非GTID模式，slave如果没有设置relay_log_purge=0，那么会有警告)
    masterha_check_repl --conf=/etc/app1.cnf
        Thu Jul 28 14:03:33 2016 - [warning]  relay_log_purge=0 is not set on slave host_2(host_2:3306).
        Thu Jul 28 14:03:33 2016 - [warning]  relay_log_purge=0 is not set on slave host_3(host_3:3306).

* 检查mha状态
    masterha_check_status --conf=/etc/app1.cnf

* 开启mha 监控
    nohup masterha_manager --conf=/etc/app1.cnf &

step2: 手动制造上述网络故障


master> iptables -A INPUT -p tcp -s host_monitor  -j ACCEPT
master> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP

step3: 观察日志

由于都不通，那么会failover

Fri Aug  5 10:27:56 2016 - [warning] Got timeout on MySQL Ping(SELECT) child process and killed it! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 431.
Fri Aug  5 10:27:56 2016 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=host_1_name
Fri Aug  5 10:27:56 2016 - [info] Executing secondary network check script: masterha_secondary_check -s host_2 -s host_3  --user=root  --master_host=host_1  --master_ip=host_1  --master_port=3306 --master_user=dba --master_password=dba --ping_type=SELECT
Fri Aug  5 10:27:59 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug  5 10:27:59 2016 - [warning] Connection failed 2 time(s)..
Fri Aug  5 10:28:01 2016 - [warning] HealthCheck: Got timeout on checking SSH connection to host_1! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342.
Monitoring server host_2 is reachable, Master is not reachable from host_2. OK.
Fri Aug  5 10:28:02 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug  5 10:28:02 2016 - [warning] Connection failed 3 time(s)..
Fri Aug  5 10:28:05 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug  5 10:28:05 2016 - [warning] Connection failed 4 time(s)..
Monitoring server host_3 is reachable, Master is not reachable from host_3. OK.
Fri Aug  5 10:28:06 2016 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Fri Aug  5 10:28:06 2016 - [warning] Master is not reachable from health checker!
Fri Aug  5 10:28:06 2016 - [warning] Master host_1(host_1:3306) is not reachable!
Fri Aug  5 10:28:06 2016 - [warning] SSH is NOT reachable.
Fri Aug  5 10:28:06 2016 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/app1.cnf again, and trying to connect to all servers to check server status..
Fri Aug  5 10:28:06 2016 - [info] Reading default configuration from /etc/masterha_default.cnf..
Fri Aug  5 10:28:06 2016 - [info] Reading application default configuration from /etc/app1.cnf..
Fri Aug  5 10:28:06 2016 - [info] Reading server configuration from /etc/app1.cnf..
Fri Aug  5 10:28:06 2016 - [info] GTID failover mode = 0
Fri Aug  5 10:28:06 2016 - [info] Dead Servers:
Fri Aug  5 10:28:06 2016 - [info]   host_1(host_1:3306)
Fri Aug  5 10:28:06 2016 - [info] Alive Servers:
Fri Aug  5 10:28:06 2016 - [info]   host_2(host_2:3306)
Fri Aug  5 10:28:06 2016 - [info]   host_3(host_3:3306)
Fri Aug  5 10:28:06 2016 - [info] Alive Slaves:
Fri Aug  5 10:28:06 2016 - [info]   host_2(host_2:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Aug  5 10:28:06 2016 - [info]     Replicating from host_1(host_1:3306)
Fri Aug  5 10:28:06 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Aug  5 10:28:06 2016 - [info]   host_3(host_3:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Aug  5 10:28:06 2016 - [info]     Replicating from host_1(host_1:3306)
Fri Aug  5 10:28:06 2016 - [info]     Not candidate for the new Master (no_master is set)
Fri Aug  5 10:28:06 2016 - [info] Checking slave configurations..
Fri Aug  5 10:28:06 2016 - [info]  read_only=1 is not set on slave host_2(host_2:3306).
Fri Aug  5 10:28:06 2016 - [info]  read_only=1 is not set on slave host_3(host_3:3306).
Fri Aug  5 10:28:06 2016 - [warning]  relay_log_purge=0 is not set on slave host_3(host_3:3306).
Fri Aug  5 10:28:06 2016 - [info] Checking replication filtering settings..
Fri Aug  5 10:28:06 2016 - [info]  Replication filtering check ok.
Fri Aug  5 10:28:06 2016 - [info] Master is down!
Fri Aug  5 10:28:06 2016 - [info] Terminating monitoring script.
Fri Aug  5 10:28:06 2016 - [info] Got exit code 20 (Master dead).
Fri Aug  5 10:28:06 2016 - [info] MHA::MasterFailover version 0.56.
Fri Aug  5 10:28:06 2016 - [info] Starting master failover.
Fri Aug  5 10:28:06 2016 - [info]
Fri Aug  5 10:28:06 2016 - [info] * Phase 1: Configuration Check Phase..
Fri Aug  5 10:28:06 2016 - [info]
Fri Aug  5 10:28:07 2016 - [info] GTID failover mode = 0
Fri Aug  5 10:28:07 2016 - [info] Dead Servers:
Fri Aug  5 10:28:07 2016 - [info]   host_1(host_1:3306)
Fri Aug  5 10:28:07 2016 - [info] Checking master reachability via MySQL(double check)...
Fri Aug  5 10:28:08 2016 - [info]  ok.
Fri Aug  5 10:28:08 2016 - [info] Alive Servers:
Fri Aug  5 10:28:08 2016 - [info]   host_2(host_2:3306)
Fri Aug  5 10:28:08 2016 - [info]   host_3(host_3:3306)
Fri Aug  5 10:28:08 2016 - [info] Alive Slaves:
Fri Aug  5 10:28:08 2016 - [info]   host_2(host_2:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Aug  5 10:28:08 2016 - [info]     Replicating from host_1(host_1:3306)
Fri Aug  5 10:28:08 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Aug  5 10:28:08 2016 - [info]   host_3(host_3:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Aug  5 10:28:08 2016 - [info]     Replicating from host_1(host_1:3306)
Fri Aug  5 10:28:08 2016 - [info]     Not candidate for the new Master (no_master is set)
Fri Aug  5 10:28:08 2016 - [info] Starting Non-GTID based failover.
Fri Aug  5 10:28:08 2016 - [info]
Fri Aug  5 10:28:08 2016 - [info] ** Phase 1: Configuration Check Phase completed.
Fri Aug  5 10:28:08 2016 - [info]
Fri Aug  5 10:28:08 2016 - [info] * Phase 2: Dead Master Shutdown Phase..
Fri Aug  5 10:28:08 2016 - [info]
Fri Aug  5 10:28:08 2016 - [info] Forcing shutdown so that applications never connect to the current master..
Fri Aug  5 10:28:08 2016 - [info] Executing master IP deactivation script:
Fri Aug  5 10:28:08 2016 - [info]   /home/mysql/MHA/masterha/master_ip_failover --orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306 --command=stop
Fri Aug  5 10:28:08 2016 - [info]  done.
Fri Aug  5 10:28:08 2016 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Fri Aug  5 10:28:08 2016 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Fri Aug  5 10:28:08 2016 - [info]
Fri Aug  5 10:28:08 2016 - [info] * Phase 3: Master Recovery Phase..
Fri Aug  5 10:28:08 2016 - [info]
Fri Aug  5 10:28:08 2016 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Fri Aug  5 10:28:08 2016 - [info]
Fri Aug  5 10:28:08 2016 - [info] The latest binary log file/position on all slaves is host_1_name.000001:154
Fri Aug  5 10:28:08 2016 - [info] Latest slaves (Slaves that received relay log files to the latest):
Fri Aug  5 10:28:08 2016 - [info]   host_2(host_2:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Aug  5 10:28:08 2016 - [info]     Replicating from host_1(host_1:3306)
Fri Aug  5 10:28:08 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Aug  5 10:28:08 2016 - [info]   host_3(host_3:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Aug  5 10:28:08 2016 - [info]     Replicating from host_1(host_1:3306)
Fri Aug  5 10:28:08 2016 - [info]     Not candidate for the new Master (no_master is set)
Fri Aug  5 10:28:08 2016 - [info] The oldest binary log file/position on all slaves is host_1_name.000001:154
Fri Aug  5 10:28:08 2016 - [info] Oldest slaves:
Fri Aug  5 10:28:08 2016 - [info]   host_2(host_2:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Aug  5 10:28:08 2016 - [info]     Replicating from host_1(host_1:3306)
Fri Aug  5 10:28:08 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Aug  5 10:28:08 2016 - [info]   host_3(host_3:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Aug  5 10:28:08 2016 - [info]     Replicating from host_1(host_1:3306)
Fri Aug  5 10:28:08 2016 - [info]     Not candidate for the new Master (no_master is set)
Fri Aug  5 10:28:08 2016 - [info]
Fri Aug  5 10:28:08 2016 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Fri Aug  5 10:28:08 2016 - [info]
Fri Aug  5 10:28:08 2016 - [warning] Dead Master is not SSH reachable. Could not save it's binlogs. Transactions that were not sent to the latest slave (Read_Master_Log_Pos to the tail of the dead master's binlog) were lost.
Fri Aug  5 10:28:08 2016 - [info]
Fri Aug  5 10:28:08 2016 - [info] * Phase 3.3: Determining New Master Phase..
Fri Aug  5 10:28:08 2016 - [info]
Fri Aug  5 10:28:08 2016 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Fri Aug  5 10:28:08 2016 - [info] All slaves received relay logs to the same position. No need to resync each other.
Fri Aug  5 10:28:08 2016 - [info] Searching new master from slaves..
Fri Aug  5 10:28:08 2016 - [info]  Candidate masters from the configuration file:
Fri Aug  5 10:28:08 2016 - [info]   host_2(host_2:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Aug  5 10:28:08 2016 - [info]     Replicating from host_1(host_1:3306)
Fri Aug  5 10:28:08 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Aug  5 10:28:08 2016 - [info]  Non-candidate masters:
Fri Aug  5 10:28:08 2016 - [info]   host_3(host_3:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Fri Aug  5 10:28:08 2016 - [info]     Replicating from host_1(host_1:3306)
Fri Aug  5 10:28:08 2016 - [info]     Not candidate for the new Master (no_master is set)
Fri Aug  5 10:28:08 2016 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Fri Aug  5 10:28:08 2016 - [info] New master is host_2(host_2:3306)
Fri Aug  5 10:28:08 2016 - [info] Starting master failover..
Fri Aug  5 10:28:08 2016 - [info]
From:
host_1(host_1:3306) (current master)
 +--host_2(host_2:3306)
 +--host_3(host_3:3306)

To:
host_2(host_2:3306) (new master)
 +--host_3(host_3:3306)
Fri Aug  5 10:28:08 2016 - [info]
Fri Aug  5 10:28:08 2016 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Fri Aug  5 10:28:08 2016 - [info]
Fri Aug  5 10:28:08 2016 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Fri Aug  5 10:28:08 2016 - [info]
Fri Aug  5 10:28:08 2016 - [info] * Phase 3.4: Master Log Apply Phase..
Fri Aug  5 10:28:08 2016 - [info]
Fri Aug  5 10:28:08 2016 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Fri Aug  5 10:28:08 2016 - [info] Starting recovery on host_2(host_2:3306)..
Fri Aug  5 10:28:08 2016 - [info]  This server has all relay logs. Waiting all logs to be applied..
Fri Aug  5 10:28:08 2016 - [info]   done.
Fri Aug  5 10:28:08 2016 - [info]  All relay logs were successfully applied.
Fri Aug  5 10:28:08 2016 - [info] Getting new master's binlog name and position..
Fri Aug  5 10:28:08 2016 - [info]  host_2_name.000001:154
Fri Aug  5 10:28:08 2016 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='host_2', MASTER_PORT=3306, MASTER_LOG_FILE='host_2_name.000001', MASTER_LOG_POS=154, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Fri Aug  5 10:28:08 2016 - [info] Executing master IP activate script:
Fri Aug  5 10:28:08 2016 - [info]   /home/mysql/MHA/masterha/master_ip_failover --command=start --ssh_user=root --orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306 --new_master_host=host_2 --new_master_ip=host_2 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba'
Set read_only=0 on the new master.
No need to Creating app user on the new master..
Fri Aug  5 10:28:08 2016 - [info]  OK.
Fri Aug  5 10:28:08 2016 - [info] ** Finished master recovery successfully.
Fri Aug  5 10:28:08 2016 - [info] * Phase 3: Master Recovery Phase completed.
Fri Aug  5 10:28:08 2016 - [info]
Fri Aug  5 10:28:08 2016 - [info] * Phase 4: Slaves Recovery Phase..
Fri Aug  5 10:28:08 2016 - [info]
Fri Aug  5 10:28:08 2016 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Fri Aug  5 10:28:08 2016 - [info]
Fri Aug  5 10:28:08 2016 - [info] -- Slave diff file generation on host host_3(host_3:3306) started, pid: 3964. Check tmp log /var/log/masterha/app1/host_3_3306_20160805102806.log if it takes time..
Fri Aug  5 10:28:08 2016 - [info]
Fri Aug  5 10:28:08 2016 - [info] Log messages from host_3 ...
Fri Aug  5 10:28:08 2016 - [info]
Fri Aug  5 10:28:08 2016 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Fri Aug  5 10:28:08 2016 - [info] End of log messages from host_3.
Fri Aug  5 10:28:08 2016 - [info] -- host_3(host_3:3306) has the latest relay log events.
Fri Aug  5 10:28:08 2016 - [info] Generating relay diff files from the latest slave succeeded.
Fri Aug  5 10:28:08 2016 - [info]
Fri Aug  5 10:28:08 2016 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Fri Aug  5 10:28:08 2016 - [info]
Fri Aug  5 10:28:08 2016 - [info] -- Slave recovery on host host_3(host_3:3306) started, pid: 3966. Check tmp log /var/log/masterha/app1/host_3_3306_20160805102806.log if it takes time..
Fri Aug  5 10:28:08 2016 - [info]
Fri Aug  5 10:28:08 2016 - [info] Log messages from host_3 ...
Fri Aug  5 10:28:08 2016 - [info]
Fri Aug  5 10:28:08 2016 - [info] Starting recovery on host_3(host_3:3306)..
Fri Aug  5 10:28:08 2016 - [info]  This server has all relay logs. Waiting all logs to be applied..
Fri Aug  5 10:28:08 2016 - [info]   done.
Fri Aug  5 10:28:08 2016 - [info]  All relay logs were successfully applied.
Fri Aug  5 10:28:08 2016 - [info]  Resetting slave host_3(host_3:3306) and starting replication from the new master host_2(host_2:3306)..
Fri Aug  5 10:28:08 2016 - [info]  Executed CHANGE MASTER.
Fri Aug  5 10:28:08 2016 - [info]  Slave started.
Fri Aug  5 10:28:08 2016 - [info] End of log messages from host_3.
Fri Aug  5 10:28:08 2016 - [info] -- Slave recovery on host host_3(host_3:3306) succeeded.
Fri Aug  5 10:28:08 2016 - [info] All new slave servers recovered successfully.
Fri Aug  5 10:28:08 2016 - [info]
Fri Aug  5 10:28:08 2016 - [info] * Phase 5: New master cleanup phase..
Fri Aug  5 10:28:08 2016 - [info]
Fri Aug  5 10:28:08 2016 - [info] Resetting slave info on the new master..
Fri Aug  5 10:28:08 2016 - [info]  host_2: Resetting slave info succeeded.
Fri Aug  5 10:28:08 2016 - [info] Master failover to host_2(host_2:3306) completed successfully.
Fri Aug  5 10:28:08 2016 - [info]

----- Failover Report -----

app1: MySQL Master failover host_1(host_1:3306) to host_2(host_2:3306) succeeded

Master host_1(host_1:3306) is down!

Check MHA Manager logs at host_manager_name:/var/log/masterha/app1/app1.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on host_1(host_1:3306)
The latest slave host_2(host_2:3306) has all relay logs for recovery.
Selected host_2(host_2:3306) as a new master.
host_2(host_2:3306): OK: Applying all logs succeeded.
host_2(host_2:3306): OK: Activated master IP address.
host_3(host_3:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
host_3(host_3:3306): OK: Applying all logs succeeded. Slave started, replicating from host_2(host_2:3306)
host_2(host_2:3306): Resetting slave info succeeded.
Master failover to host_2(host_2:3306) completed successfully.

4.14 [用例测试] 多段网络测试4 （没有二次检测的机制）

Manager <-- 不通 --> Master
Manager <-- 不通 --> S1 <-- 正常 --> master
Manager <-- 不通 --> S2 <-- 正常 --> master

step1: 检查各种MHA manager的各种环境

* 检查SSH
    masterha_check_ssh --conf=/etc/app1.cnf

* 检查复制(如果是非GTID模式，slave如果没有设置relay_log_purge=0，那么会有警告)
    masterha_check_repl --conf=/etc/app1.cnf
        Thu Jul 28 14:03:33 2016 - [warning]  relay_log_purge=0 is not set on slave host_2(host_2:3306).
        Thu Jul 28 14:03:33 2016 - [warning]  relay_log_purge=0 is not set on slave host_3(host_3:3306).

* 检查mha状态
    masterha_check_status --conf=/etc/app1.cnf

* 开启mha 监控
    nohup masterha_manager --conf=/etc/app1.cnf &

step2: 手动制造上述网络故障


master> iptables -A INPUT -p tcp -s host_monitor  -j ACCEPT
master> iptables -A INPUT -p tcp -s host_2  -j ACCEPT
master> iptables -A INPUT -p tcp -s host_3  -j ACCEPT
master> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP

s1> iptables -A INPUT -p tcp -s host_monitor  -j ACCEPT
s1> iptables -A INPUT -p tcp -s host_1  -j ACCEPT
s1> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP

s2> iptables -A INPUT -p tcp -s host_monitor  -j ACCEPT
s2> iptables -A INPUT -p tcp -s host_1  -j ACCEPT
s2> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP

step3: 观察日志

结论：因为master不通，其他slave也不通，那么没有存活的slave，也就不会failover了

Fri Aug  5 10:39:39 2016 - [info] Checking master_ip_failover_script status:
Fri Aug  5 10:39:39 2016 - [info]   /home/mysql/MHA/masterha/master_ip_failover --command=status --ssh_user=root --orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306
Fri Aug  5 10:39:39 2016 - [info]  OK.
Fri Aug  5 10:39:39 2016 - [warning] shutdown_script is not defined.
Fri Aug  5 10:39:39 2016 - [info] Set master ping interval 3 seconds.
Fri Aug  5 10:39:39 2016 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Fri Aug  5 10:39:39 2016 - [info] Starting ping health check on host_1(host_1:3306)..
Fri Aug  5 10:39:39 2016 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
Fri Aug  5 10:41:51 2016 - [warning] Got timeout on MySQL Ping(SELECT) child process and killed it! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 431.
Fri Aug  5 10:41:51 2016 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=host_1_name
Fri Aug  5 10:41:54 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug  5 10:41:54 2016 - [warning] Connection failed 2 time(s)..
Fri Aug  5 10:41:56 2016 - [warning] HealthCheck: Got timeout on checking SSH connection to host_1! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342.
Fri Aug  5 10:41:57 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug  5 10:41:57 2016 - [warning] Connection failed 3 time(s)..
Fri Aug  5 10:42:00 2016 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on 'host_1' (4))
Fri Aug  5 10:42:00 2016 - [warning] Connection failed 4 time(s)..
Fri Aug  5 10:42:00 2016 - [warning] Master is not reachable from health checker!
Fri Aug  5 10:42:00 2016 - [warning] Master host_1(host_1:3306) is not reachable!
Fri Aug  5 10:42:00 2016 - [warning] SSH is NOT reachable.
Fri Aug  5 10:42:00 2016 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/app1.cnf again, and trying to connect to all servers to check server status..
Fri Aug  5 10:42:00 2016 - [info] Reading default configuration from /etc/masterha_default.cnf..
Fri Aug  5 10:42:00 2016 - [info] Reading application default configuration from /etc/app1.cnf..
Fri Aug  5 10:42:00 2016 - [info] Reading server configuration from /etc/app1.cnf..
Fri Aug  5 10:42:08 2016 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln188] There is no alive server. We can't do failover
Fri Aug  5 10:42:08 2016 - [warning] Got Error:  at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 558
Fri Aug  5 10:42:08 2016 - [info] Got exit code 1 (Not master dead).

4.14 [用例测试] 多段网络测试5 （有二次检测的机制）