1、切换过程
[root@es3 ~]# masterha_check_repl --conf=/root/app1.cnf Tue Aug 20 10:22:41 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Tue Aug 20 10:22:41 2019 - [info] Reading application default configuration from /root/app1.cnf.. Tue Aug 20 10:22:41 2019 - [info] Reading server configuration from /root/app1.cnf.. Tue Aug 20 10:22:41 2019 - [info] MHA::MasterMonitor version 0.58. Tue Aug 20 10:22:42 2019 - [info] GTID failover mode = 1 Tue Aug 20 10:22:42 2019 - [info] Dead Servers: Tue Aug 20 10:22:42 2019 - [info] Alive Servers: Tue Aug 20 10:22:42 2019 - [info] es1(192.168.56.14:3306) Tue Aug 20 10:22:42 2019 - [info] es2(192.168.56.15:3306) Tue Aug 20 10:22:42 2019 - [info] es3(192.168.56.16:3306) Tue Aug 20 10:22:42 2019 - [info] Alive Slaves: Tue Aug 20 10:22:42 2019 - [info] es1(192.168.56.14:3306) Version=5.7.24-log (oldest major version between slaves) log-bin:enabled Tue Aug 20 10:22:42 2019 - [info] GTID ON Tue Aug 20 10:22:42 2019 - [info] Replicating from es3(192.168.56.16:3306) Tue Aug 20 10:22:42 2019 - [info] es2(192.168.56.15:3306) Version=5.7.24-log (oldest major version between slaves) log-bin:enabled Tue Aug 20 10:22:42 2019 - [info] GTID ON Tue Aug 20 10:22:42 2019 - [info] Replicating from 192.168.56.16(192.168.56.16:3306) Tue Aug 20 10:22:42 2019 - [info] Current Alive Master: es3(192.168.56.16:3306) Tue Aug 20 10:22:42 2019 - [info] Checking slave configurations.. Tue Aug 20 10:22:42 2019 - [info] read_only=1 is not set on slave es2(192.168.56.15:3306). Tue Aug 20 10:22:42 2019 - [info] Checking replication filtering settings.. Tue Aug 20 10:22:42 2019 - [info] binlog_do_db= , binlog_ignore_db= Tue Aug 20 10:22:42 2019 - [info] Replication filtering check ok. Tue Aug 20 10:22:42 2019 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking. Tue Aug 20 10:22:42 2019 - [info] Checking SSH publickey authentication settings on the current master.. Tue Aug 20 10:22:43 2019 - [info] HealthCheck: SSH to es3 is reachable. Tue Aug 20 10:22:43 2019 - [info] es3(192.168.56.16:3306) (current master) +--es1(192.168.56.14:3306) +--es2(192.168.56.15:3306) Tue Aug 20 10:22:43 2019 - [info] Checking replication health on es1.. Tue Aug 20 10:22:43 2019 - [info] ok. Tue Aug 20 10:22:43 2019 - [info] Checking replication health on es2.. Tue Aug 20 10:22:43 2019 - [info] ok. Tue Aug 20 10:22:43 2019 - [info] Checking master_ip_failover_script status: Tue Aug 20 10:22:43 2019 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=es3 --orig_master_ip=192.168.56.16 --orig_master_port=3306 IN SCRIPT TEST====/sbin/ifconfig enp0s8:1 down==/sbin/ifconfig enp0s8:1 192.168.56.191/24=== Checking the Status of the script.. OK Tue Aug 20 10:22:43 2019 - [info] OK. Tue Aug 20 10:22:43 2019 - [info] Checking shutdown script status: Tue Aug 20 10:22:43 2019 - [info] /usr/local/bin/stop_report --command=status --ssh_user=root --host=es3 --ip=192.168.56.16 Tue Aug 20 10:22:43 2019 - [info] OK. Tue Aug 20 10:22:43 2019 - [info] Got exit code 0 (Not master dead). MySQL Replication Health is OK. [root@es3 ~]# masterha_manager --conf=/root/app1.cnf Tue Aug 20 10:22:47 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Tue Aug 20 10:22:47 2019 - [info] Reading application default configuration from /root/app1.cnf.. Tue Aug 20 10:22:47 2019 - [info] Reading server configuration from /root/app1.cnf.. Tue Aug 20 10:22:47 2019 - [info] MHA::MasterMonitor version 0.58. Tue Aug 20 10:22:48 2019 - [info] GTID failover mode = 1 Tue Aug 20 10:22:48 2019 - [info] Dead Servers: Tue Aug 20 10:22:48 2019 - [info] Alive Servers: Tue Aug 20 10:22:48 2019 - [info] es1(192.168.56.14:3306) Tue Aug 20 10:22:48 2019 - [info] es2(192.168.56.15:3306) Tue Aug 20 10:22:48 2019 - [info] es3(192.168.56.16:3306) Tue Aug 20 10:22:48 2019 - [info] Alive Slaves: Tue Aug 20 10:22:48 2019 - [info] es1(192.168.56.14:3306) Version=5.7.24-log (oldest major version between slaves) log-bin:enabled Tue Aug 20 10:22:48 2019 - [info] GTID ON Tue Aug 20 10:22:48 2019 - [info] Replicating from es3(192.168.56.16:3306) Tue Aug 20 10:22:48 2019 - [info] es2(192.168.56.15:3306) Version=5.7.24-log (oldest major version between slaves) log-bin:enabled Tue Aug 20 10:22:48 2019 - [info] GTID ON Tue Aug 20 10:22:48 2019 - [info] Replicating from 192.168.56.16(192.168.56.16:3306) Tue Aug 20 10:22:48 2019 - [info] Current Alive Master: es3(192.168.56.16:3306) Tue Aug 20 10:22:48 2019 - [info] Checking slave configurations.. Tue Aug 20 10:22:48 2019 - [info] read_only=1 is not set on slave es2(192.168.56.15:3306). Tue Aug 20 10:22:48 2019 - [info] Checking replication filtering settings.. Tue Aug 20 10:22:48 2019 - [info] binlog_do_db= , binlog_ignore_db= Tue Aug 20 10:22:48 2019 - [info] Replication filtering check ok. Tue Aug 20 10:22:48 2019 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking. Tue Aug 20 10:22:48 2019 - [info] Checking SSH publickey authentication settings on the current master.. Tue Aug 20 10:22:48 2019 - [info] HealthCheck: SSH to es3 is reachable. Tue Aug 20 10:22:48 2019 - [info] es3(192.168.56.16:3306) (current master) +--es1(192.168.56.14:3306) +--es2(192.168.56.15:3306) Tue Aug 20 10:22:48 2019 - [info] Checking master_ip_failover_script status: Tue Aug 20 10:22:48 2019 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=es3 --orig_master_ip=192.168.56.16 --orig_master_port=3306 IN SCRIPT TEST====/sbin/ifconfig enp0s8:1 down==/sbin/ifconfig enp0s8:1 192.168.56.191/24=== Checking the Status of the script.. OK Tue Aug 20 10:22:48 2019 - [info] OK. Tue Aug 20 10:22:48 2019 - [info] Checking shutdown script status: Tue Aug 20 10:22:48 2019 - [info] /usr/local/bin/stop_report --command=status --ssh_user=root --host=es3 --ip=192.168.56.16 Tue Aug 20 10:22:48 2019 - [info] OK. Tue Aug 20 10:22:48 2019 - [info] Set master ping interval 3 seconds. Tue Aug 20 10:22:48 2019 - [info] Set secondary check script: masterha_secondary_check -s 192.168.56.16 -s 192.168.56.15 Tue Aug 20 10:22:48 2019 - [info] Starting ping health check on es3(192.168.56.16:3306).. Tue Aug 20 10:22:48 2019 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond.. Tue Aug 20 10:23:00 2019 - [warning] Got error on MySQL select ping: 2013 (Lost connection to MySQL server during query) Tue Aug 20 10:23:01 2019 - [info] Executing SSH check script: exit 0 Tue Aug 20 10:23:01 2019 - [info] Executing secondary network check script: masterha_secondary_check -s 192.168.56.16 -s 192.168.56.15 --user=root --master_host=es3 --master_ip=192.168.56.16 --master_port=3306 --master_user=repl --master_password=123456 --ping_type=SELECT Tue Aug 20 10:23:01 2019 - [info] HealthCheck: SSH to es3 is reachable. Monitoring server 192.168.56.16 is reachable, Master is not reachable from 192.168.56.16. OK. Monitoring server 192.168.56.15 is reachable, Master is not reachable from 192.168.56.15. OK. Tue Aug 20 10:23:02 2019 - [info] Master is not reachable from all other monitoring servers. Failover should start. Tue Aug 20 10:23:03 2019 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.56.16' (111)) Tue Aug 20 10:23:03 2019 - [warning] Connection failed 2 time(s).. Tue Aug 20 10:23:06 2019 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.56.16' (111)) Tue Aug 20 10:23:06 2019 - [warning] Connection failed 3 time(s).. Tue Aug 20 10:23:09 2019 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.56.16' (111)) Tue Aug 20 10:23:09 2019 - [warning] Connection failed 4 time(s).. Tue Aug 20 10:23:09 2019 - [warning] Master is not reachable from health checker! Tue Aug 20 10:23:09 2019 - [warning] Master es3(192.168.56.16:3306) is not reachable! Tue Aug 20 10:23:09 2019 - [warning] SSH is reachable. Tue Aug 20 10:23:09 2019 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /root/app1.cnf again, and trying to connect to all servers to check server status.. Tue Aug 20 10:23:09 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Tue Aug 20 10:23:09 2019 - [info] Reading application default configuration from /root/app1.cnf.. Tue Aug 20 10:23:09 2019 - [info] Reading server configuration from /root/app1.cnf.. Tue Aug 20 10:23:11 2019 - [info] GTID failover mode = 1 Tue Aug 20 10:23:11 2019 - [info] Dead Servers: Tue Aug 20 10:23:11 2019 - [info] es3(192.168.56.16:3306) Tue Aug 20 10:23:11 2019 - [info] Alive Servers: Tue Aug 20 10:23:11 2019 - [info] es1(192.168.56.14:3306) Tue Aug 20 10:23:11 2019 - [info] es2(192.168.56.15:3306) Tue Aug 20 10:23:11 2019 - [info] Alive Slaves: Tue Aug 20 10:23:11 2019 - [info] es1(192.168.56.14:3306) Version=5.7.24-log (oldest major version between slaves) log-bin:enabled Tue Aug 20 10:23:11 2019 - [info] GTID ON Tue Aug 20 10:23:11 2019 - [info] Replicating from es3(192.168.56.16:3306) Tue Aug 20 10:23:11 2019 - [info] es2(192.168.56.15:3306) Version=5.7.24-log (oldest major version between slaves) log-bin:enabled Tue Aug 20 10:23:11 2019 - [info] GTID ON Tue Aug 20 10:23:11 2019 - [info] Replicating from 192.168.56.16(192.168.56.16:3306) Tue Aug 20 10:23:11 2019 - [info] Checking slave configurations.. Tue Aug 20 10:23:11 2019 - [info] read_only=1 is not set on slave es2(192.168.56.15:3306). Tue Aug 20 10:23:11 2019 - [info] Checking replication filtering settings.. Tue Aug 20 10:23:11 2019 - [info] Replication filtering check ok. Tue Aug 20 10:23:11 2019 - [info] Master is down! Tue Aug 20 10:23:11 2019 - [info] Terminating monitoring script. Tue Aug 20 10:23:11 2019 - [info] Got exit code 20 (Master dead). Tue Aug 20 10:23:11 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Tue Aug 20 10:23:11 2019 - [info] Reading application default configuration from /root/app1.cnf.. Tue Aug 20 10:23:11 2019 - [info] Reading server configuration from /root/app1.cnf.. Tue Aug 20 10:23:11 2019 - [info] MHA::MasterFailover version 0.58. Tue Aug 20 10:23:11 2019 - [info] Starting master failover. Tue Aug 20 10:23:11 2019 - [info] Tue Aug 20 10:23:11 2019 - [info] * Phase 1: Configuration Check Phase.. Tue Aug 20 10:23:11 2019 - [info] Tue Aug 20 10:23:12 2019 - [info] GTID failover mode = 1 Tue Aug 20 10:23:12 2019 - [info] Dead Servers: Tue Aug 20 10:23:12 2019 - [info] es3(192.168.56.16:3306) Tue Aug 20 10:23:12 2019 - [info] Checking master reachability via MySQL(double check)... Tue Aug 20 10:23:12 2019 - [info] ok. Tue Aug 20 10:23:12 2019 - [info] Alive Servers: Tue Aug 20 10:23:12 2019 - [info] es1(192.168.56.14:3306) Tue Aug 20 10:23:12 2019 - [info] es2(192.168.56.15:3306) Tue Aug 20 10:23:12 2019 - [info] Alive Slaves: Tue Aug 20 10:23:12 2019 - [info] es1(192.168.56.14:3306) Version=5.7.24-log (oldest major version between slaves) log-bin:enabled Tue Aug 20 10:23:12 2019 - [info] GTID ON Tue Aug 20 10:23:12 2019 - [info] Replicating from es3(192.168.56.16:3306) Tue Aug 20 10:23:12 2019 - [info] es2(192.168.56.15:3306) Version=5.7.24-log (oldest major version between slaves) log-bin:enabled Tue Aug 20 10:23:12 2019 - [info] GTID ON Tue Aug 20 10:23:12 2019 - [info] Replicating from 192.168.56.16(192.168.56.16:3306) Tue Aug 20 10:23:12 2019 - [info] Starting GTID based failover. Tue Aug 20 10:23:12 2019 - [info] Tue Aug 20 10:23:12 2019 - [info] ** Phase 1: Configuration Check Phase completed. Tue Aug 20 10:23:12 2019 - [info] Tue Aug 20 10:23:12 2019 - [info] * Phase 2: Dead Master Shutdown Phase.. Tue Aug 20 10:23:12 2019 - [info] Tue Aug 20 10:23:12 2019 - [info] Forcing shutdown so that applications never connect to the current master.. Tue Aug 20 10:23:12 2019 - [info] Executing master IP deactivation script: Tue Aug 20 10:23:12 2019 - [info] /usr/local/bin/master_ip_failover --orig_master_host=es3 --orig_master_ip=192.168.56.16 --orig_master_port=3306 --command=stopssh --ssh_user=root IN SCRIPT TEST====/sbin/ifconfig enp0s8:1 down==/sbin/ifconfig enp0s8:1 192.168.56.191/24=== Disabling the VIP on old master: es3 Tue Aug 20 10:23:13 2019 - [info] done. Tue Aug 20 10:23:13 2019 - [info] Executing SHUTDOWN script: Tue Aug 20 10:23:13 2019 - [info] /usr/local/bin/stop_report --command=stopssh --ssh_user=root --host=es3 --ip=192.168.56.16 --port=3306 Tue Aug 20 10:23:13 2019 - [info] Power off done. Tue Aug 20 10:23:13 2019 - [info] * Phase 2: Dead Master Shutdown Phase completed. Tue Aug 20 10:23:13 2019 - [info] Tue Aug 20 10:23:13 2019 - [info] * Phase 3: Master Recovery Phase.. Tue Aug 20 10:23:13 2019 - [info] Tue Aug 20 10:23:13 2019 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Tue Aug 20 10:23:13 2019 - [info] Tue Aug 20 10:23:13 2019 - [info] The latest binary log file/position on all slaves is mysqlbin.000005:154 Tue Aug 20 10:23:13 2019 - [info] Latest slaves (Slaves that received relay log files to the latest): Tue Aug 20 10:23:13 2019 - [info] es1(192.168.56.14:3306) Version=5.7.24-log (oldest major version between slaves) log-bin:enabled Tue Aug 20 10:23:13 2019 - [info] GTID ON Tue Aug 20 10:23:13 2019 - [info] Replicating from es3(192.168.56.16:3306) Tue Aug 20 10:23:13 2019 - [info] es2(192.168.56.15:3306) Version=5.7.24-log (oldest major version between slaves) log-bin:enabled Tue Aug 20 10:23:13 2019 - [info] GTID ON Tue Aug 20 10:23:13 2019 - [info] Replicating from 192.168.56.16(192.168.56.16:3306) Tue Aug 20 10:23:13 2019 - [info] The oldest binary log file/position on all slaves is mysqlbin.000005:154 Tue Aug 20 10:23:13 2019 - [info] Oldest slaves: Tue Aug 20 10:23:13 2019 - [info] es1(192.168.56.14:3306) Version=5.7.24-log (oldest major version between slaves) log-bin:enabled Tue Aug 20 10:23:13 2019 - [info] GTID ON Tue Aug 20 10:23:13 2019 - [info] Replicating from es3(192.168.56.16:3306) Tue Aug 20 10:23:13 2019 - [info] es2(192.168.56.15:3306) Version=5.7.24-log (oldest major version between slaves) log-bin:enabled Tue Aug 20 10:23:13 2019 - [info] GTID ON Tue Aug 20 10:23:13 2019 - [info] Replicating from 192.168.56.16(192.168.56.16:3306) Tue Aug 20 10:23:13 2019 - [info] Tue Aug 20 10:23:13 2019 - [info] * Phase 3.3: Determining New Master Phase.. Tue Aug 20 10:23:13 2019 - [info] Tue Aug 20 10:23:13 2019 - [info] Searching new master from slaves.. Tue Aug 20 10:23:13 2019 - [info] Candidate masters from the configuration file: Tue Aug 20 10:23:13 2019 - [info] Non-candidate masters: Tue Aug 20 10:23:13 2019 - [info] New master is es1(192.168.56.14:3306) Tue Aug 20 10:23:13 2019 - [info] Starting master failover.. Tue Aug 20 10:23:13 2019 - [info] From: es3(192.168.56.16:3306) (current master) +--es1(192.168.56.14:3306) +--es2(192.168.56.15:3306) To: es1(192.168.56.14:3306) (new master) +--es2(192.168.56.15:3306) Tue Aug 20 10:23:13 2019 - [info] Tue Aug 20 10:23:13 2019 - [info] * Phase 3.3: New Master Recovery Phase.. Tue Aug 20 10:23:13 2019 - [info] Tue Aug 20 10:23:13 2019 - [info] Waiting all logs to be applied.. Tue Aug 20 10:23:13 2019 - [info] done. Tue Aug 20 10:23:13 2019 - [info] Getting new master's binlog name and position.. Tue Aug 20 10:23:13 2019 - [info] mysqlbin.000005:194 Tue Aug 20 10:23:13 2019 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='es1 or 192.168.56.14', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Tue Aug 20 10:23:13 2019 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysqlbin.000005, 194, 83c39775-c24a-11e9-999a-0800272f2bf4:1-4 Tue Aug 20 10:23:13 2019 - [info] Executing master IP activate script: Tue Aug 20 10:23:13 2019 - [info] /usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=es3 --orig_master_ip=192.168.56.16 --orig_master_port=3306 --new_master_host=es1 --new_master_ip=192.168.56.14 --new_master_port=3306 --new_master_user='repl' --new_master_password=xxx Unknown option: new_master_user Unknown option: new_master_password IN SCRIPT TEST====/sbin/ifconfig enp0s8:1 down==/sbin/ifconfig enp0s8:1 192.168.56.191/24=== Enabling the VIP - 192.168.56.191/24 on the new master - es1 Tue Aug 20 10:23:13 2019 - [info] OK. Tue Aug 20 10:23:13 2019 - [info] Setting read_only=0 on es1(192.168.56.14:3306).. Tue Aug 20 10:23:13 2019 - [info] ok. Tue Aug 20 10:23:13 2019 - [info] ** Finished master recovery successfully. Tue Aug 20 10:23:13 2019 - [info] * Phase 3: Master Recovery Phase completed. Tue Aug 20 10:23:13 2019 - [info] Tue Aug 20 10:23:13 2019 - [info] * Phase 4: Slaves Recovery Phase.. Tue Aug 20 10:23:13 2019 - [info] Tue Aug 20 10:23:13 2019 - [info] Tue Aug 20 10:23:13 2019 - [info] * Phase 4.1: Starting Slaves in parallel.. Tue Aug 20 10:23:13 2019 - [info] Tue Aug 20 10:23:13 2019 - [info] -- Slave recovery on host es2(192.168.56.15:3306) started, pid: 4516. Check tmp log /data/manager/es2_3306_20190820102311.log if it takes time.. Tue Aug 20 10:23:14 2019 - [info] Tue Aug 20 10:23:14 2019 - [info] Log messages from es2 ... Tue Aug 20 10:23:14 2019 - [info] Tue Aug 20 10:23:13 2019 - [info] Resetting slave es2(192.168.56.15:3306) and starting replication from the new master es1(192.168.56.14:3306).. Tue Aug 20 10:23:13 2019 - [info] Executed CHANGE MASTER. Tue Aug 20 10:23:13 2019 - [info] Slave started. Tue Aug 20 10:23:13 2019 - [info] gtid_wait(83c39775-c24a-11e9-999a-0800272f2bf4:1-4) completed on es2(192.168.56.15:3306). Executed 0 events. Tue Aug 20 10:23:14 2019 - [info] End of log messages from es2. Tue Aug 20 10:23:14 2019 - [info] -- Slave on host es2(192.168.56.15:3306) started. Tue Aug 20 10:23:14 2019 - [info] All new slave servers recovered successfully. Tue Aug 20 10:23:14 2019 - [info] Tue Aug 20 10:23:14 2019 - [info] * Phase 5: New master cleanup phase.. Tue Aug 20 10:23:14 2019 - [info] Tue Aug 20 10:23:14 2019 - [info] Resetting slave info on the new master.. Tue Aug 20 10:23:14 2019 - [info] es1: Resetting slave info succeeded. Tue Aug 20 10:23:14 2019 - [info] Master failover to es1(192.168.56.14:3306) completed successfully. Tue Aug 20 10:23:14 2019 - [info] ----- Failover Report ----- app1: MySQL Master failover es3(192.168.56.16:3306) to es1(192.168.56.14:3306) succeeded Master es3(192.168.56.16:3306) is down! Check MHA Manager logs at es3 for details. Started automated(non-interactive) failover. Invalidated master IP address on es3(192.168.56.16:3306) Power off es3. Selected es1(192.168.56.14:3306) as a new master. es1(192.168.56.14:3306): OK: Applying all logs succeeded. es1(192.168.56.14:3306): OK: Activated master IP address. es2(192.168.56.15:3306): OK: Slave started, replicating from es1(192.168.56.14:3306) es1(192.168.56.14:3306): Resetting slave info succeeded. Master failover to es1(192.168.56.14:3306) completed successfully. Tue Aug 20 10:23:14 2019 - [info] Sending mail.. [root@es3 ~]# ll
2、遇到问题,缺省情况下,如果MHA检测到连续发生宕机,且两次宕机时间间隔不足八小时的话,则不会进行Failover,需要删除最近时间的app1.failover.complete
[error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln310] Last failover was done at 2019/08/20 10:23:14. Current time is too early to do failover again. If you want to do failover, manually remove /data/manager/app1.failover.complete and run this script again. Tue Aug 20 10:54:20 2019 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/bin/masterha_manager line 65.
或者增加如下参数启动
--ignore_last_failover