一、需求源:
仅作为实施方收到需求而搭建,不参与应用业务架构讨论。也仅作为构建测试的一点点记录。
1.MySQL 3台server 1主1从1备,keepalived保持vip在主库节点,在主节点和从节点均启动keepalived,3台机器均部署orchestrator。定义 mysqlA 为主,mysqlB为从,mysqlC为备。
2.MySQL节点故障:主节点失效,orchestrator自动切换新的主节点,保持高可用。
3.keepalived 监测本机是否为主节点,如果为主则获取vip。
4.失效节点恢复后作为从节点加入集群。(无要求自动加入,但为满足相对自动化,已写脚本通过keepalived检查并加入。)
二、部署过程:
node OS: centos7.8
mysql version: 5.7.31
keepalived version: 2.1.5
node A : mysqlA 192.168.127.142
node B: mysqlB 192.168.127.143
node C: mysqlC 192.168.127.124
VIP : 192.168.127.150
disable firewalld
setenforce 0
3台机器配置添加hosts
echo '192.168.127.142 mysqlA' >> /etc/hosts echo '192.168.127.143 mysqlB' >> /etc/hosts echo '192.168.127.144 mysqlC' >> /etc/hosts
3台机器互免密登录。
ssh-keygen #一直回车 ssh-copy-id mysqlA ssh-copy-id mysqlB ssh-copy-id mysqlC
因配置了Raft,有多个Orchestrator配置高可用,orchestrator-client会自动选择leader。
3台机器执行 vi /etc/profile,最后添加
export ORCHESTRATOR_API="mysqlA:3000/api mysqlB:3000/api mysqlC:3000/api"
软件列表,各node均需要
[root@mysqlA ~]# ll -rw-r--r--. 1 root root 1081559 Nov 25 06:14 keepalived-2.1.5.tar.gz -rw-r--r--. 1 root root 9116 Nov 25 06:14 mysql57-community-release-el7-8.noarch.rpm -rw-r--r--. 1 root root 10467081 Nov 25 06:14 orchestrator-3.2.3-1.x86_64.rpm -rw-r--r--. 1 root root 10043463 Nov 25 06:14 orchestrator-cli-3.2.3-1.x86_64.rpm -rw-r--r--. 1 root root 15153 Nov 25 06:14 orchestrator-client-3.2.3-1.x86_64.rpm
2.1 keepalived
解压并安装
[root@mysqlA ~]# tar -zxf keepalived-2.1.5.tar.gz [root@mysqlA ~]# cd keepalived-2.1.5 [root@mysqlA keepalived-2.1.5]# ./configure && make && make install
安装keepalive遇到错误1:
configure: error: !!! OpenSSL is not properly installed on your system. !!! !!! Can not include OpenSSL headers files. !!!
解决错误1:
[root@mysqlA keepalived-2.1.5]# yum -y install openssl-devel
编制keepalive.conf
[root@mysqlA ~]# mkdir /etc/keepalived [root@mysqlA ~]# cp /usr/local/etc/keepalived/keepalived.conf /etc/keepalived/ [root@mysqlA ~]# vi /etc/keepalived/keepalived.conf
! Configuration File for keepalived global_defs { # notification_email { # acassen@firewall.loc # failover@firewall.loc # sysadmin@firewall.loc # } # notification_email_from Alexandre.Cassen@firewall.loc # smtp_server 192.168.200.1 # smtp_connect_timeout 30 # router_id LVS_DEVEL # vrrp_skip_check_adv_addr # vrrp_strict # vrrp_garp_interval 0 # vrrp_gna_interval 0 #执行脚本的用户,默认是keepalived_script,无则创建,此处用root执行 script_user root #开启安全脚本 enable_script_security } #开启check_run,脚本是检测mysql集群中,谁是master,谁就占用vip。 #interval 脚本执行间隔时间 vrrp_script check_run { script "/usr/local/orchestrator/mysqlA_check_master_vip.sh" interval 2 } #state 为BACKUP,2台mysql一致; #interface:名称为网卡名称 #virtual_route_id:2台mysql一致; #priority:2台mysql一致 #track_script:对应上面vrrp_script的值 #notify_backup:当前keepalived状态为backup时,即不占用VIP时,通过脚本检查mysql状态,并自动以 从 身份加入到集群里 vrrp_instance mysql { state BACKUP interface ens33 virtual_router_id 51 priority 80 nopreempt advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 192.168.127.150/24 } track_script { check_run } notify_backup /usr/local/orchestrator/join_MysqlCluster.sh }View Code keepalived.conf
mysqlB 配置文件相同,注意修改vrrp_script check_run 中的脚本名字,改为:mysqlB_check_master_vip.sh 。
将keepalived添加到系统服务,AB机均执行。
[root@mysqlA keepalived-2.1.5]# pwd /root/keepalived-2.1.5 [root@mysqlA keepalived-2.1.5]# cp keepalived/etc/sysconfig/keepalived /etc/sysconfig/ [root@mysqlA init.d]# pwd /root/keepalived-2.1.5/keepalived/etc/init.d [root@mysqlA init.d]# cp keepalived /etc/init.d/
至此,keepalived暂告一段落。
2.2mysql
使用yum安装。3台机器均执行。仅做主从复制配置,详细优化参数未设置,要阅读官方天书补充。
rpm -Uvh mysql57-community-release-el7-8.noarch.rpm
#对应的yum安装mysql版本在http://repo.mysql.com查询,el为centos及rhel
yum clean all && yum makecache yum install gcc gcc-c++ openssl-devel mysql mysql-server mysql-devel -y [root@mysqlA ~]# mkdir /data/sqlite -p [root@mysqlA ~]# chown -R mysql:mysql /data/sqlite
mysql安装登录密码,查询:
[root@mysqlA ~]# systemctl start mysqld grep 'temporary password' /var/log/mysqld.log
登录mysql后,修改密码:
mysql> SET PASSWORD = PASSWORD('P@ssw0rd');
创建用户,并授权
mysql> CREATE USER 'orchestrator'@'%' IDENTIFIED BY 'P@ssw0rd'; mysql> grant SUPER,REPLICATION CLIENT,replication slave on *.* to 'orchestrator'@'192.168.127.%' identified by 'P@ssw0rd'; mysql> flush privileges;
退出mysql,编制mysql配置文件
[root@mysqlA ~]# cat /etc/my.cnf
[mysqld] symbolic-links=0 log-error=/var/log/mysqld.log pid-file=/var/run/mysqld/mysqld.pid #server-id 每台机器不一致,现在以IP后位设定 server-id = 142 log-bin=master-bin binlog_format=row binlog-ignore-db = mysql,information_schema,performance_schema #report_host 为主机名 report_host = mysqlA #开启GTID gtid-mode=ON enforce-gtid-consistency=ON #mysqlC的log-slave-update不配置。 log-slave-updates=1 skip_slave_start=1View Code my.cnf
#在A及B机必须开启的参数
log_slave_updates=1
所有主机重启mysql服务
[root@mysqlA ~]# systemctl start mysqld
所有主机各自添加mysql_config_editor,登录mysql时不用输入密码
mysql_config_editor set --login-path=local --user=root --port=3306 --password
#查看本机mysql_config_editor,注意local对应之前设置的--login-path
#my_print_defaults -s local
#重置所有
#mysql_config_editor reset
#测试
#mysql --login-path=local
设置主从复制,初定架构为:A--->B--->C
mysqlB:
[root@mysqlB ~]# mysql --login-path=local mysql> change master to master_host='mysqlA',master_port=3306,master_user='orchestrator',master_password='P@ssw0rd',master_auto_position=1; mysql> start slave; mysql> show slave status\G
关注如图两项
mysqlC:
[root@mysqlC ~]# mysql --login-path=local mysql> change master to master_host='mysqlB',master_port=3306,master_user='orchestrator',master_password='P@ssw0rd',master_auto_position=1; mysql> start slave; mysql> show slave status\G
查看下面两项值均为Yes,即表示设置从服务器成功。
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
没正式使用之前,验证可同步后,先停掉所有mysql的复制关系。等orchestrator和keepalived正式配置完整后,在启用复制。生产环境慎用 reset master;。
stop slave; reset slave all; reset master;
至此,MySQL暂告一段落。
2.3orcheatrator
3台机器均安装。
下载https://github.com/openark/orchestrator/releases
使用yum安装rpm包
[root@mysqlA ~]# yum install epel-release [root@mysqlA ~]# yum install orchestrator-3.2.3-1.x86_64.rpm orchestrator-client-3.2.3-1.x86_64.rpm orchestrator-cli-3.2.3-1.x86_64.rpm -y
配置
将/usr/local/orchestrator/orchestrator-sample.conf.json 复制到/etc目录下。
[root@mysqlA ~]# cp /usr/local/orchestrator/orchestrator-sample.conf.json /etc/orchestrator.conf.json
编辑 /etc/orchestrator.conf.json
{ "Debug": true, "EnableSyslog": false, "ListenAddress": ":3000", "MySQLTopologyUser": "orchestrator", *** ###与前面添加的mysql用户对应*** "MySQLTopologyPassword": "P@ssw0rd", *** ###密码对应*** "MySQLTopologyCredentialsConfigFile": "", "MySQLTopologySSLPrivateKeyFile": "", "MySQLTopologySSLCertFile": "", "MySQLTopologySSLCAFile": "", "MySQLTopologySSLSkipVerify": true, "MySQLTopologyUseMutualTLS": false, "BackendDB": "sqlite", *** ###使用sqlite,没有使用mysql库,可改用*** "SQLite3DataFile": "/data/sqlite/orchestrator.db", *** ###sqlite文件绝对路径,有文件夹即可。*** "MySQLConnectTimeoutSeconds": 1, "DefaultInstancePort": 3306, "DiscoverByShowSlaveHosts": true, "InstancePollSeconds": 5, "DiscoveryIgnoreReplicaHostnameFilters": [ "a_host_i_want_to_ignore[.]example[.]com", ".*[.]ignore_all_hosts_from_this_domain[.]example[.]com", "a_host_with_extra_port_i_want_to_ignore[.]example[.]com:3307" ], "UnseenInstanceForgetHours": 240, "SnapshotTopologiesIntervalHours": 0, "InstanceBulkOperationsWaitTimeoutSeconds": 10, "HostnameResolveMethod": "default", "MySQLHostnameResolveMethod": "@@hostname", "SkipBinlogServerUnresolveCheck": true, "ExpiryHostnameResolvesMinutes": 60, "RejectHostnameResolvePattern": "", "ReasonableReplicationLagSeconds": 10, "ProblemIgnoreHostnameFilters": [], "VerifyReplicationFilters": false, "ReasonableMaintenanceReplicationLagSeconds": 20, "CandidateInstanceExpireMinutes": 60, "AuditLogFile": "", "AuditToSyslog": false, "RemoveTextFromHostnameDisplay": ".mydomain.com:3306", "ReadOnly": false, "AuthenticationMethod": "basic", *** ###打开web时使用basic方式认证,避免免密打开*** "HTTPAuthUser": "admin", *** ###认证用户*** "HTTPAuthPassword": "P@ssw0rd", *** ###认证密码*** "AuthUserHeader": "", "PowerAuthUsers": [ "*" ], "ClusterNameToAlias": { "127.0.0.1": "test suite" }, "ReplicationLagQuery": "", "DetectClusterAliasQuery": "SELECT SUBSTRING_INDEX(@@hostname, '.', 1)", "DetectClusterDomainQuery": "", "DetectInstanceAliasQuery": "", "DetectPromotionRuleQuery": "", "DataCenterPattern": "[.]([^.]+)[.][^.]+[.]mydomain[.]com", "PhysicalEnvironmentPattern": "[.]([^.]+[.][^.]+)[.]mydomain[.]com", "PromotionIgnoreHostnameFilters": [], "DetectSemiSyncEnforcedQuery": "", "ServeAgentsHttp": false, "AgentsServerPort": ":3001", "AgentsUseSSL": false, "AgentsUseMutualTLS": false, "AgentSSLSkipVerify": false, "AgentSSLPrivateKeyFile": "", "AgentSSLCertFile": "", "AgentSSLCAFile": "", "AgentSSLValidOUs": [], "UseSSL": false, "UseMutualTLS": false, "SSLSkipVerify": false, "SSLPrivateKeyFile": "", "SSLCertFile": "", "SSLCAFile": "", "SSLValidOUs": [], "URLPrefix": "", "StatusEndpoint": "/api/status", "StatusSimpleHealth": true, "StatusOUVerify": false, "AgentPollMinutes": 60, "UnseenAgentForgetHours": 6, "StaleSeedFailMinutes": 60, "SeedAcceptableBytesDiff": 8192, "PseudoGTIDPattern": "", "PseudoGTIDPatternIsFixedSubstring": false, "PseudoGTIDMonotonicHint": "asc:", "DetectPseudoGTIDQuery": "", "BinlogEventsChunkSize": 10000, "SkipBinlogEventsContaining": [], "ReduceReplicationAnalysisCount": true, "FailureDetectionPeriodBlockMinutes": 5, "RecoveryPeriodBlockSeconds": 30, "RecoveryIgnoreHostnameFilters": [], "RecoverMasterClusterFilters": [ "*" ], "RecoverIntermediateMasterClusterFilters": [ "*" ], "OnFailureDetectionProcesses": [ "echo '`date +'%Y-%m-%d %T'` Detected {failureType} on {failureCluster}. Affected replicas: {countSlaves}' >> /tmp/recovery.log" ], "PreGracefulTakeoverProcesses": [ "echo '`date +'%Y-%m-%d %T'` Planned takeover about to take place on {failureCluster}. Master will switch to read_only' >> /tmp/recovery.log" ], "PreFailoverProcesses": [ "echo '`date +'%Y-%m-%d %T'` Will recover from {failureType} on {failureCluster}' >> /tmp/recovery.log" ], "PostFailoverProcesses": [ "echo '`date +'%Y-%m-%d %T'` (for all types) Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}; failureClusterAlias:{failureClusterAlias}' >> /tmp/recovery.log", "/usr/local/orchestrator/orch_hook.sh {failureType} {failureClusterAlias} {failedHost} {successorHost} >> /tmp/orch.log" ***###添加脚本,当机器master出现硬件故障,或mysql软件故障,即刻漂移VIP,并切换主*** ], "PostUnsuccessfulFailoverProcesses": [ "echo '`date +'%Y-%m-%d %T'` Unsuccessful Failover ' >> /tmp/recovery.log"], "PostMasterFailoverProcesses": [ "echo '`date +'%Y-%m-%d %T'` Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Promoted: {successorHost}:{successorPort}' >> /tmp/recovery.log" ], "PostIntermediateMasterFailoverProcesses": [ "echo '`date +'%Y-%m-%d %T'` Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}' >> /tmp/recovery.log" ], "PostGracefulTakeoverProcesses": [ "echo '`date +'%Y-%m-%d %T'` Planned takeover complete' >> /tmp/recovery.log" ], "CoMasterRecoveryMustPromoteOtherCoMaster": true, "DetachLostSlavesAfterMasterFailover": true, "ApplyMySQLPromotionAfterMasterFailover": true, "PreventCrossDataCenterMasterFailover": false, "PreventCro***egionMasterFailover": false, "MasterFailoverDetachReplicaMasterHost": false, "MasterFailoverLostInstancesDowntimeMinutes": 0, "PostponeReplicaRecoveryOnLagMinutes": 0, "OSCIgnoreHostnameFilters": [], "GraphiteAddr": "", "GraphitePath": "", "GraphiteConvertHostnameDotsToUnderscores": true, "ConsulAddress": "", "ConsulAclToken": "", "PseudoGTIDPattern": "drop view if exists .*?`_pseudo_gtid_hint__", "PseudoGTIDMonotonicHint": "asc:", "DetectPseudoGTIDQuery": "select count(*) as pseudo_gtid_exists from meta.pseudo_gtid_status where anchor = 1 and time_generated > now() - interval 2 day", "RaftEnabled":true, ***###启用高可用*** "RaftDataDir":"/usr/local/orchestrator", *** ###orchestrator目录*** "RaftBind":"192.168.127.142", ***###本机的IP*** "DefaultRaftPort":10008, "RaftNodes":[ ***###3台机的IP*** "192.168.127.142", "192.168.127.143", "192.168.127.144" ] }View Code orchestrator.conf
复制到其他机器,注意修改文件的本机IP
[root@mysqlA ~]# scp /etc/orchestrator.conf.json root@mysqlB:/etc
脚本:
路径:/usr/local/orchestrator
mysqlA:orch_hook.sh、mysqlA_check_master_vip.sh、join_MysqlCluster.sh
mysqlB:orch_hook.sh、mysqlB_check_master_vip.sh、join_MysqlCluster.sh
orch_hook.sh用途:用于检查master状态,如master硬件故障或mysql软件故障,即切换VIP至从(orchestrator已经自动切换从为新主);
mysqlA_check_master_vip.sh用途:检查集群MysqlCluster中,谁是master,谁即占用VIP;虽然和orch_hook.sh用途相似,但是上一个脚本没有检查MysqlCluster集群master的功能,当主恢复故障后,是独立master出现,但不在集群里。
join_MysqlCluster.sh:检查本机是否在集群里。若不在,检查本机mysql状态,状态OK自动以 从 库身份加入MysqlCluster集群,并自动启用复制。
orch_hook.sh,在orchestrator配置文件上调用:
#!/bin/bash isitdead=$1 cluster=$2 oldmaster=$3 newmaster=$4 logfile="/var/log/orch_hook.log" # list of clusternames #clusternames=(rep blea lajos) if [[ $isitdead == "DeadMaster" ]]; then ssh $oldmaster "systemctl restart keepalived" elif [[ $isitdead == "DeadIntermediateMasterWithSingleSlaveFailingToConnect" ]]; then ssh $oldmaster "systemctl restart keepalived" elif [[ $isitdead == "DeadIntermediateMaster" ]]; then ssh $oldmaster "systemctl restart keepalived" fiView Code orch_hook.sh
mysqlA_check_master_vip.sh,在keepalived配置文件上调用,先本地执行验证无误:
#!/bin/bash master_node=`orchestrator-client -b admin:P@ssw0rd -c which-cluster-master -i MysqlCluster |awk -F ":" '{print $1}'` #echo "The cluster-master is $master_node" #mysqlB_check_master_vip.sh 的不同参数仅为下列两项,互换即可 local_node=mysqlA remote_node=mysqlB #echo $local_node vipadd=192.168.127.150 if [ $master_node == $local_node ];then local_ip=`ip a show dev ens33|grep -w inet|awk '{print $2}'|awk -F '/' '{print $1}'` if [[ "$local_ip" =~ "$vipadd" ]];then # echo "vip is in master" exit 0 else echo "master vip is down, restarting now" ssh $remote_node "systemctl restart keepalived" # ssh $remote_node "systemctl start keepalived" fi else local_ip=`ip a show dev ens33|grep -w inet|awk '{print $2}'|awk -F '/' '{print $1}'` if [[ "$local_ip" =~ "$vipadd" ]];then # echo "vip is in local,release..." systemctl restart keepalived # systemctl start keepalived else # echo "vip don't in local,exit..." exit 0 fi fiView Code check_master_vip.sh
join_MysqlCluster.sh:在keepalived配置文件上调用
#!/bin/bash mein=`orchestrator-client -b admin:P@ssw0rd -c which-cluster-instances -i MysqlCluster |awk -F ":" '{print $1}'` local_node=`hostname` if [[ "$mein" =~ "$local_node" ]];then echo "the node in cluster" >> /data/nodestatus.txt exit 0 eles echo "will be check status and " >> /data/nodestatus.txt fi CHECK_TIME=3 #mysql is working MYSQL_OK is 1 , mysql down MYSQL_OK is 0 MYSQL_OK=1 #function to check mysql status function check_mysql_helth (){ mysql --login-path=local -e "show status;" >/dev/null 2>&1 if [ $? = 0 ] ;then MYSQL_OK=1 else MYSQL_OK=0 fi return $MYSQL_OK } while [ $CHECK_TIME -ne 0 ] do let "CHECK_TIME -= 1" check_mysql_helth if [ $MYSQL_OK = 1 ] ; then CHECK_TIME=0 #mysql status is OK,then try join MysqlCluster #get MysqlCluster master_node master_node=`orchestrator-client -b admin:P@ssw0rd -c which-cluster-master -i MysqlCluster |awk -F":" '{print $1}'` #copy to master_node,set read_only,and start slave; mysql --login-path=local -e "change master to master_host='$master_node',master_port=3306,master_user='orchestrator',master_password='P@ssw0rd',master_auto_position=1;set global read_only=1;start slave;" #check slave status, shuoult be check Slave_IO_Running & Slave_SQL_Running Slave_IO_Run_status=`mysql --login-path=local -e "show slave status\G" |grep Slave_IO_Running |awk -F ":" '{print $2}'` if [ $Slave_IO_Run_status == Yes ]; then echo "Slave_IO_Runing is Yes" else echo "Slave_IO_Runing is Error" exit 1 fi Slave_SQL_Run_status=`mysql --login-path=local -e "show slave status\G" |grep Slave_SQL_Running |awk -F ":" '{print $2}'` if [ $Slave_SQL_Run_status == Yes ]; then echo "Slave_SQL_Running is Yes" else echo "Slave_SQL_Running is Error" exit 1 fi exit 0 fi if [ $MYSQL_OK -eq 0 ] && [ $CHECK_TIME -eq 0 ] then #pkill keepalived exit 1 fi sleep 10 doneView Code join_MysqlCluster.sh
3台机器启动 orchestrator服务
[root@mysqlA~]# systemctl start orchestrator [root@mysqlB~]# systemctl start orchestrator [root@mysqC~]# systemctl start orchestrator
登录web界面,由于未启用eepalived,所以用其中1台IP登录,并做发现:
Discover a new instance
输入mysql地址或主机名
发现完毕
刷新,回到集群,重命名集群一个别名
新的集群别名,要和脚本上的变量一致
回到集群
各节点状态及自动生成的复制拓扑。
直接在拓扑图上修改从库的复制关系:
拖动修改主库,官方翻译为优雅的进行主从切换,切换后,旧主为从,需要重新启动slave,手动,暂无写脚本检测并启动。
临时写了一个 优雅切换主从后的 新从 启动复制脚本
在 新从 服务器上 ./orch_hook2.sh
#!/bin/bash mysql --login-path=local -e "stop slave;" new_master_node=`orchestrator-client -b admin:P@ssw0rd -c which-cluster-master -i MysqlCluster |awk -F":" '{print $1}'` mysql --login-path=local -e "change master to master_host='$new_master_node',master_port=3306,master_user='orchestrator',master_password='P@ssw0rd',master_auto_position=1;set global read_only=1;start slave;"View Code orch_hook2.sh
结语
##测试过程排错让人迷失方向---必须读懂官方文档---感谢搜到的国内几篇好文##
参考资料:
https://github.com/openark/orchestrator
https://www.cnblogs.com/zhoujinyi/p/10394389.html
https://zhuanlan.zhihu.com/p/139294374
1、我在构建MySQL的时候,确实是门外汉,太多配置参数、优化参数都不知道是什么,有什么用。像GTID,不启用的话,主从切换复制每次要show master status,记录File、和Position。测试过程还遇到 数据库复制正常,但不同步、reset master后的恢复、导出导入恢复...一言难尽。
2、keepalived 回首一看,也有5-6年没碰过这玩意了,配置文件很多参数都不知道怎么用了。调用脚本那里栽了几个跟斗。
3、orchestrator,今年首次认识这个东西,然后一直思考这个东西有什么用,能自动切换,能恢复拓扑,能。。。还以为主库挂了之后能自动恢复,结果只是我以为。
4、脚本也是写的太烂了,本来脚本能力就差...