今天配置了一下group replication,第一次配置,不熟悉,所以是边配边学边解决问题的过程,供未配置过但准备配置group replication的朋友参考。
大致过程如下:
需要在配置文件中增加以下参数。例如,对于192.168.153.153机器,设置my.cnf中的有关group replication的参数如下:
# for use gtid
gtid_mode=ON
enforce_gtid_consistency=ON
log_slave_updates=ON
#add for group replication
master_info_repository=TABLE
relay_log_info_repository=TABLE
binlog_checksum=NONE
transaction_write_set_extraction=XXHASH64
loose-group_replication_group_name="aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa"
loose-group_replication_start_on_boot=on
loose-group_replication_local_address= "192.168.153.153:33061"
loose-group_replication_group_seeds= "192.168.153.44:33061,192.168.153.153:33061,192.168.153.154:33061"
loose-group_replication_bootstrap_group=off
loose-group_replication_ip_whitelist="192.168.153.44/24"
对于192.168.153.154节点 则只需要将loose-group_replication_local_address设置为 "192.168.153.154:33061",其他参数均相同.
对于192.168.153.44节点 则只需要将loose-group_replication_local_address设置为 "192.168.153.44:33061",其他参数均相同.
第一次启动节点后,需要安装一个group_replicaiton 的plugin . 以及配置复制,进行初始配置。每个节点都需要执行如下两个命令:
INSTALL PLUGIN group_replication SONAME 'group_replication.so';
CHANGE MASTER TO MASTER_USER='rpl_user', MASTER_PASSWORD='rpl_pass' FOR CHANNEL 'group_replication_recovery';
这个复制跟普通的change master命令有区别,并不需要指定master是谁,但需要指定通道为'group_replication_recovery'。
配置完成后,启动集群中的第一个节点的group_replication时,需要设置boostrap参数。 其他的节点直接使用START GROUP_REPLICATION;即可。
SET GLOBAL group_replication_bootstrap_group=ON;
START GROUP_REPLICATION;
SET GLOBAL group_replication_bootstrap_group=OFF;
至此,整个集群配置完毕之后,当某个节点宕机之后,使用常规的mysql启动命令进行重启即可。无需再手动配置,节点重启后将自动加入复制集群。
对于配置中遇到一些错误:
2017-05-03T15:06:01.854288+08:00 0 [Warning] Plugin group_replication reported: '[GCS] Connection attempt from IP address 197.3.153.44 refused. Address is no
t in the IP whitelist.'
2017-05-03T15:06:01.854395+08:00 0 [ERROR] Plugin group_replication reported: '[GCS] Error connecting to the local group communication engine instance.'
错误摘要:来源IP没有在白名单列表中,所以连接拒绝。
原因: 是因为之前没有设置group_replication_ip_whitelist这个参数,默认值为127.0.0.1/8, 因为集群中的所有的节点为同一网段,因此在这里设置为"192.168.153.44/24". 但实际上可以更严格,指定集群内的所有的IP地址。
2017-05-03T16:18:31.963751+08:00 12 [Note] Plugin group_replication reported: 'Establishing group recovery connection with a possible donor. Attempt 1/10'
2017-05-03T16:18:31.988010+08:00 12 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='', master_port=
3358, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='MYSQLTMP', master_port= 3358, master_log_file='', master_log_pos= 4, mast
er_bind=''.
2017-05-03T16:18:32.037211+08:00 12 [Note] Plugin group_replication reported: 'Establishing connection to a group replication recovery donor f7e2c834-2a5f-11
e7-b622-005056aa17e6 at MYSQLTMP port: 3358.'
2017-05-03T16:18:32.037714+08:00 14 [Note] Slave I/O thread: Start asynchronous replication to master '@MYSQLTMP:3358' in log 'FIRST' at position 4
2017-05-03T16:18:32.037796+08:00 14 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore no
t recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more info
rmation.
2017-05-03T16:18:32.037824+08:00 14 [ERROR] Slave I/O for channel 'group_replication_recovery': Fatal error: Invalid (empty) username when attempting to
connect to the master server. Connection attempt terminated. Error_code: 1593
错误摘要:使用空账号去连接master server.
原因:没有配置同步账号跟密码,使用的是空密码进行同步。 需要为复制通道group_replication_recovery设置同步信息。命令如下。
CHANGE MASTER TO MASTER_USER='mysqlsync', MASTER_PASSWORD='mysqlsync_password' FOR CHANNEL 'group_replication_recovery';
2017-05-03T15:45:02.582665+08:00 25 [Note] Slave I/O thread: Start asynchronous replication to master 'mysqlsync@MYSQLTMP:3358' in log 'FIRST' at position 4
2017-05-03T15:45:02.582726+08:00 25 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore no
t recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more info
rmation.
2017-05-03T15:45:02.611559+08:00 25 [ERROR] Slave I/O for channel 'group_replication_recovery': error connecting to master 'mysqlsync@MYSQLTMP:3358' - retry-
time: 60 retries: 1, Error_code: 2005
错误摘要: 没能连接master,即primary节点。
原因:是没有识别到MYSQLTMP机器的IP地址信息, 需要将集群中的所有节点的hostname加入/etc/host文件中。