一、演示环境:
2台阿里ECS服务器自建MySQL传统主从复制环境
MySQL版本都是5.7.22 二进制安装
此次演示主从复制环境搭建忽略。如有需要请参考下面的博文:
https://blog.51cto.com/wujianwei/1961247
主从库都要开启独立表空间innodb_file_per_table = 1,主从库默认的引擎都是INNODB的
主库stage01 172.16.0.247
主库stage02 172.16.0.246
主库stage01上创建测试表,建表语句如下:
二、准备测试数据环境
2.1主库创建演示的新表,然后禁止掉新表表空间
CREATE TABLE `t_rent_received` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT ‘自增ID‘,
`stat_time` datetime NOT NULL COMMENT ‘统计日‘,
`investor_id` int(4) NOT NULL DEFAULT ‘0‘ COMMENT ‘资产方id‘,
`investor_name` varchar(50) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT ‘‘ COMMENT ‘该字段存在查询主表‘,
`model` int(4) NOT NULL DEFAULT ‘0‘ COMMENT ‘模式‘,
`investor_balance` decimal(16,2) NOT NULL DEFAULT ‘0.00‘ COMMENT ‘余额‘,
`received_bank_yesterday` decimal(16,2) NOT NULL DEFAULT ‘0.00‘ COMMENT ‘银行入账金额‘,
`received_bank_month` decimal(16,2) NOT NULL DEFAULT ‘0.00‘ COMMENT ‘专属账户入账金额‘,
`plan_money` decimal(16,2) NOT NULL DEFAULT ‘0.00‘ COMMENT ‘当月应收‘,
`received_rate` decimal(16,2) NOT NULL DEFAULT ‘0.00‘ COMMENT ‘当月回款/应收百分比‘,
`overdue_rate_thirty` decimal(16,2) NOT NULL DEFAULT ‘0.00‘ COMMENT ‘30天逾期率‘,
`received_cover_rate` decimal(16,2) NOT NULL DEFAULT ‘0.00‘ COMMENT ‘回款覆盖率‘,
`received_money` decimal(16,2) NOT NULL DEFAULT ‘0.00‘ COMMENT ‘当月实收‘,
`received_yesterday` decimal(16,2) NOT NULL DEFAULT ‘0.00‘ COMMENT ‘昨日实收‘,
`create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT ‘创建时间‘,
`update_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT ‘更新时间‘,
`status` int(1) NOT NULL DEFAULT ‘1‘ COMMENT ‘数据状态 0删除 1 正常‘,
PRIMARY KEY (`id`) USING BTREE,
KEY `idx_investor_id_stat_time` (`investor_id`,`stat_time`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=DYNAMIC COMMENT=‘回款监测统计表‘;
root@stage01 13:24: [test001]> ALTER TABLE test001.t_rent_received DISCARD TABLESPACE;
Query OK, 0 rows affected (0.06 sec)
把源端的t_rent_received.ibd 放到 主库服务器 /data1/mysql/data/test001/下
并且授权t_rent_received.ibd文件为MySQL权限:
[root@stage01 test001]# chown mysql.mysql t_rent_received.ibd
2.2主库操作导入t_rent_received表空间:
root@stage01 13:24: [test001]> ALTER TABLE test001.t_rent_received import TABLESPACE;
Query OK, 0 rows affected, 1 warning (0.45 sec)
主库执行成功,但是此时从库sql_thread线程复制报错
三、解决从库复制报错问题
主库导入test001.t_rent_received 表空间执行成功,但是此时从库sql_thread线程复制报错:
root@stage02 13:26: [(none)]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.16.0.247
Master_User: testrep
Master_Port: 3306
Connect_Retry: 10
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 7687
Relay_Log_File: stage02-relay-bin.000002
Relay_Log_Pos: 7654
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table: mysql.%
Last_Errno: 1812
Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction ‘ANONYMOUS‘ at master log mysql-bin.000001, end_log_pos 7687. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
Skip_Counter: 0
Exec_Master_Log_Pos: 7488
Relay_Log_Space: 8061
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1812
Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction ‘ANONYMOUS‘ at master log mysql-bin.000001, end_log_pos 7687. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
Replicate_Ignore_Server_Ids:
Master_Server_Id: 172160247
Master_UUID: f86f5b15-b91e-11eb-bb3b-00163e0eced4
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State:
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp: 210528 13:26:19
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
root@stage02 从库服务器MySQL error.log日志报错:
2021-05-28T12:49:57.679150+08:00 1002350 [Warning] ‘tables_priv‘ entry ‘sys_config mysql.sys@localhost‘ ignored in --skip-name-resolve mode.
2021-05-28T13:00:33.201101+08:00 1002099 [Note] Aborted connection 1002099 to db: ‘unconnected‘ user: ‘root‘ host: ‘localhost‘ (Got timeout reading communication packets)
2021-05-28T13:26:19.605381+08:00 1002350 [ERROR] InnoDB: Operating system error number 2 in a file operation.
2021-05-28T13:26:19.605422+08:00 1002350 [ERROR] InnoDB: The error means the system cannot find the path specified.
2021-05-28T13:26:19.605431+08:00 1002350 [ERROR] InnoDB: Trying to import a tablespace, but could not open the tablespace file ./test001/t_rent_received.ibd
2021-05-28T13:26:19.605448+08:00 1002350 [Note] InnoDB: Discarding tablespace of table `test001`.`t_rent_received`: Tablespace not found
2021-05-28T13:26:19.616360+08:00 1002350 [ERROR] Slave SQL for channel ‘‘: Worker 1 failed executing transaction ‘ANONYMOUS‘ at master log mysql-bin.000001, end_log_pos 7687; Error ‘Tablespace is missing for table `test001`.`t_rent_received`.‘ on query. Default database: ‘test001‘. Query: ‘ALTER TABLE test001.t_rent_received import TABLESPACE‘, Error_code: 1812
2021-05-28T13:26:19.616419+08:00 1002349 [Warning] Slave SQL for channel ‘‘: ... The slave coordinator and worker threads are stopped, possibly leaving data in inconsistent state. A restart should restore consistency automatically, although using non-transactional storage for data or info tables or DDL queries could lead to problems. In such cases you have to examine your data (see documentation for details). Error_code: 1756
2021-05-28T13:26:19.616441+08:00 1002349 [Note] Error reading relay log event for channel ‘‘: slave SQL thread was killed
分析主库binglog文件:
[root@stage01 binlog]# mysqlbinlog -vv --base64-output=decode-rows /data1/mysql/binlog/mysql-bin.000001 --start-position=7354 --stop-position=7687
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
#at 7354
#210528 13:24:52 server id 172160247 end_log_pos 7488 CRC32 0xf9c36596 Query thread_id=952836 exec_time=0 error_code=0
use `test001`/*!*/;
SET TIMESTAMP=1622179492/*!*/;
SET @@session.pseudo_thread_id=952836/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1411383296/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=224/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
**********ALTER TABLE test001.t_rent_received DISCARD TABLESPACE************
/*!*/;
#at 7488
#210528 13:26:19 server id 172160247 end_log_pos 7553 CRC32 0x12f5efcc Anonymous_GTID last_committed=7 sequence_number=8 rbr_only=no
SET @@SESSION.GTID_NEXT= ‘ANONYMOUS‘/*!*/;
#at 7553
#210528 13:26:19 server id 172160247 end_log_pos 7687 CRC32 0x49e819c8 Query thread_id=952836 exec_time=0 error_code=0
SET TIMESTAMP=1622179579/*!*/;
*******ALTER TABLE test001.t_rent_received import TABLESPACE*******
/*!*/;
SET @@SESSION.GTID_NEXT= ‘AUTOMATIC‘ /* added by mysqlbinlog */ /*!*/;
DELIMITER ;
#End of log file
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
可以发现主库binlog文件记录只记录了:
ALTER TABLE test001.t_rent_received DISCARD TABLESPACE
ALTER TABLE test001.t_rent_received import TABLESPACE
并没有记录insert相关的任何语句
假如在源端flush table xxx for export后,只将xxx.{ibd,cfg}拷贝到主库,那么当主库alter table xxx discard tablespace时,从库也会执行discard tablespace
而当主库执行alter table xxx import tablespace时,从库也会执行import tablespace .由于主库有拷贝过来的xxx.{ibd,cfg},所以可以执行成功,而从库没有拷贝xxx.{ibd,cfg},所以从库sql_thread线程复制会失败
修复方法是,将xxx.{ibd,cfg}拷贝到从库:
[root@stage01 pump]# scp t_rent_received.ibd root@172.16.0.246:/data1/mysql/data/test001/
[root@stage02 ~]# cd /data1/mysql/data/test001/
授权MySQL权限:
[root@stage02 test001]# chown mysql.mysql t_rent_received.ibd
然后从库开启sql_thread 复制线程:
root@stage02 17:40: [test001]> start slave sql_thread;
此时查看从库的error.log日志如下:
2021-05-28T17:41:43.786181+08:00 1037405 [Note] Slave SQL thread for channel ‘‘ initialized, starting replication in log ‘mysql-bin.000002‘ at position 353, relay log ‘./stage02-relay-bin.000006‘ position: 566
2021-05-28T17:41:43.796212+08:00 1037406 [Note] InnoDB: Sync to disk
2021-05-28T17:41:43.867262+08:00 1037406 [Note] InnoDB: Sync to disk - done!
2021-05-28T17:41:43.867326+08:00 1037406 [Note] InnoDB: Phase I - Update all pages
2021-05-28T17:41:43.887770+08:00 1037406 [Note] InnoDB: Sync to disk
2021-05-28T17:41:43.941873+08:00 1037406 [Note] InnoDB: Sync to disk - done!
2021-05-28T17:41:43.944749+08:00 1037406 [Note] InnoDB: Phase III - Flush changes to disk
2021-05-28T17:41:43.951529+08:00 1037406 [Note] InnoDB: Phase IV - Flush complete
2021-05-28T17:41:43.951727+08:00 1037406 [Note] InnoDB: `test001`.`t_rent_received` autoinc value set to 0
2021-05-28T17:41:43.981738+08:00 1037406 [Note] InnoDB: AUTOINC next value generation is disabled for ‘`test001`.`t_rent_received`‘
从库复制恢复正常:
root@stage02 17:42: [test001]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.16.0.247
Master_User: testrep
Master_Port: 3306
Connect_Retry: 10
Master_Log_File: mysql-bin.000002
Read_Master_Log_Pos: 551
Relay_Log_File: stage02-relay-bin.000006
Relay_Log_Pos: 764
Relay_Master_Log_File: mysql-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
在stage01主库上删除刚才的利用传输表空间导入的表:
root@stage01 17:02: [test001]> drop table t_rent_received\G
Query OK, 0 rows affected (0.01 sec)
此时从库复制也是正常的:
root@stage02 18:00: [test001]> show slave status\G
特别说明下:
http://fuxkdb.com/2017/08/22/%E4%B8%BB%E4%BB%8E%E4%BC%A0%E8%BE%93%E8%A1%A8%E7%A9%BA%E9%97%B4%E7%9A%84%E5%9D%91/
此博文提到的修复方法是存在问题的,根本不需要跳过gtid事务id