mysql传统复制传输表空间遇到的坑

一、演示环境:

2台阿里ECS服务器自建MySQL传统主从复制环境
MySQL版本都是5.7.22 二进制安装
此次演示主从复制环境搭建忽略。如有需要请参考下面的博文:
https://blog.51cto.com/wujianwei/1961247

主从库都要开启独立表空间innodb_file_per_table = 1,主从库默认的引擎都是INNODB的

主库stage01 172.16.0.247
主库stage02 172.16.0.246

主库stage01上创建测试表,建表语句如下:

二、准备测试数据环境

2.1主库创建演示的新表,然后禁止掉新表表空间

CREATE TABLE `t_rent_received` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT ‘自增ID‘,
  `stat_time` datetime NOT NULL COMMENT ‘统计日‘,
  `investor_id` int(4) NOT NULL DEFAULT ‘0‘ COMMENT ‘资产方id‘,
  `investor_name` varchar(50) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT ‘‘ COMMENT ‘该字段存在查询主表‘,
  `model` int(4) NOT NULL DEFAULT ‘0‘ COMMENT ‘模式‘,
  `investor_balance` decimal(16,2) NOT NULL DEFAULT ‘0.00‘ COMMENT ‘余额‘,
  `received_bank_yesterday` decimal(16,2) NOT NULL DEFAULT ‘0.00‘ COMMENT ‘银行入账金额‘,
  `received_bank_month` decimal(16,2) NOT NULL DEFAULT ‘0.00‘ COMMENT ‘专属账户入账金额‘,
  `plan_money` decimal(16,2) NOT NULL DEFAULT ‘0.00‘ COMMENT ‘当月应收‘,
  `received_rate` decimal(16,2) NOT NULL DEFAULT ‘0.00‘ COMMENT ‘当月回款/应收百分比‘,
  `overdue_rate_thirty` decimal(16,2) NOT NULL DEFAULT ‘0.00‘ COMMENT ‘30天逾期率‘,
  `received_cover_rate` decimal(16,2) NOT NULL DEFAULT ‘0.00‘ COMMENT ‘回款覆盖率‘,
  `received_money` decimal(16,2) NOT NULL DEFAULT ‘0.00‘ COMMENT ‘当月实收‘,
  `received_yesterday` decimal(16,2) NOT NULL DEFAULT ‘0.00‘ COMMENT ‘昨日实收‘,
  `create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT ‘创建时间‘,
  `update_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT ‘更新时间‘,
  `status` int(1) NOT NULL DEFAULT ‘1‘ COMMENT ‘数据状态 0删除 1 正常‘,
  PRIMARY KEY (`id`) USING BTREE,
  KEY `idx_investor_id_stat_time` (`investor_id`,`stat_time`) USING BTREE
) ENGINE=InnoDB  DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=DYNAMIC COMMENT=‘回款监测统计表‘;

root@stage01 13:24:  [test001]>  ALTER TABLE test001.t_rent_received DISCARD TABLESPACE;
Query OK, 0 rows affected (0.06 sec)

把源端的t_rent_received.ibd 放到 主库服务器 /data1/mysql/data/test001/下
并且授权t_rent_received.ibd文件为MySQL权限:

[root@stage01 test001]# chown mysql.mysql t_rent_received.ibd

2.2主库操作导入t_rent_received表空间:

root@stage01 13:24:  [test001]>  ALTER TABLE test001.t_rent_received  import TABLESPACE;
Query OK, 0 rows affected, 1 warning (0.45 sec)

主库执行成功,但是此时从库sql_thread线程复制报错

三、解决从库复制报错问题

主库导入test001.t_rent_received 表空间执行成功,但是此时从库sql_thread线程复制报错:


root@stage02 13:26:  [(none)]> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.16.0.247
                  Master_User: testrep
                  Master_Port: 3306
                Connect_Retry: 10
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 7687
               Relay_Log_File: stage02-relay-bin.000002
                Relay_Log_Pos: 7654
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: mysql.%
                   Last_Errno: 1812
                   Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction ‘ANONYMOUS‘ at master log mysql-bin.000001, end_log_pos 7687. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 7488
              Relay_Log_Space: 8061
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 1812
               Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction ‘ANONYMOUS‘ at master log mysql-bin.000001, end_log_pos 7687. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 172160247
                  Master_UUID: f86f5b15-b91e-11eb-bb3b-00163e0eced4
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: 
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 210528 13:26:19
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

root@stage02 从库服务器MySQL error.log日志报错:

2021-05-28T12:49:57.679150+08:00 1002350 [Warning] ‘tables_priv‘ entry ‘sys_config mysql.sys@localhost‘ ignored in --skip-name-resolve mode.
2021-05-28T13:00:33.201101+08:00 1002099 [Note] Aborted connection 1002099 to db: ‘unconnected‘ user: ‘root‘ host: ‘localhost‘ (Got timeout reading communication packets)
2021-05-28T13:26:19.605381+08:00 1002350 [ERROR] InnoDB: Operating system error number 2 in a file operation.
2021-05-28T13:26:19.605422+08:00 1002350 [ERROR] InnoDB: The error means the system cannot find the path specified.
2021-05-28T13:26:19.605431+08:00 1002350 [ERROR] InnoDB: Trying to import a tablespace, but could not open the tablespace file ./test001/t_rent_received.ibd
2021-05-28T13:26:19.605448+08:00 1002350 [Note] InnoDB: Discarding tablespace of table `test001`.`t_rent_received`: Tablespace not found
2021-05-28T13:26:19.616360+08:00 1002350 [ERROR] Slave SQL for channel ‘‘: Worker 1 failed executing transaction ‘ANONYMOUS‘ at master log mysql-bin.000001, end_log_pos 7687; Error ‘Tablespace is missing for table `test001`.`t_rent_received`.‘ on query. Default database: ‘test001‘. Query: ‘ALTER TABLE test001.t_rent_received  import TABLESPACE‘, Error_code: 1812
2021-05-28T13:26:19.616419+08:00 1002349 [Warning] Slave SQL for channel ‘‘: ... The slave coordinator and worker threads are stopped, possibly leaving data in inconsistent state. A restart should restore consistency automatically, although using non-transactional storage for data or info tables or DDL queries could lead to problems. In such cases you have to examine your data (see documentation for details). Error_code: 1756
2021-05-28T13:26:19.616441+08:00 1002349 [Note] Error reading relay log event for channel ‘‘: slave SQL thread was killed

分析主库binglog文件:

[root@stage01 binlog]# mysqlbinlog -vv --base64-output=decode-rows /data1/mysql/binlog/mysql-bin.000001 --start-position=7354 --stop-position=7687
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
#at 7354
#210528 13:24:52 server id 172160247  end_log_pos 7488 CRC32 0xf9c36596     Query   thread_id=952836    exec_time=0 error_code=0
use `test001`/*!*/;
SET TIMESTAMP=1622179492/*!*/;
SET @@session.pseudo_thread_id=952836/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1411383296/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=224/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
**********ALTER TABLE test001.t_rent_received DISCARD TABLESPACE************
/*!*/;
#at 7488
#210528 13:26:19 server id 172160247  end_log_pos 7553 CRC32 0x12f5efcc     Anonymous_GTID  last_committed=7    sequence_number=8   rbr_only=no
SET @@SESSION.GTID_NEXT= ‘ANONYMOUS‘/*!*/;
#at 7553
#210528 13:26:19 server id 172160247  end_log_pos 7687 CRC32 0x49e819c8     Query   thread_id=952836    exec_time=0 error_code=0
SET TIMESTAMP=1622179579/*!*/;
*******ALTER TABLE test001.t_rent_received  import TABLESPACE*******
/*!*/;
SET @@SESSION.GTID_NEXT= ‘AUTOMATIC‘ /* added by mysqlbinlog */ /*!*/;
DELIMITER ;
#End of log file
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

可以发现主库binlog文件记录只记录了

ALTER TABLE test001.t_rent_received DISCARD TABLESPACE
ALTER TABLE test001.t_rent_received  import TABLESPACE

并没有记录insert相关的任何语句

假如在源端flush table xxx for export后,只将xxx.{ibd,cfg}拷贝到主库,那么当主库alter table xxx discard tablespace时,从库也会执行discard tablespace
而当主库执行alter table xxx import tablespace时,从库也会执行import tablespace .由于主库有拷贝过来的xxx.{ibd,cfg},所以可以执行成功,而从库没有拷贝xxx.{ibd,cfg},所以从库sql_thread线程复制会失败

修复方法是,将xxx.{ibd,cfg}拷贝到从库:

[root@stage01 pump]# scp t_rent_received.ibd root@172.16.0.246:/data1/mysql/data/test001/
[root@stage02 ~]# cd /data1/mysql/data/test001/

授权MySQL权限:
[root@stage02 test001]# chown mysql.mysql t_rent_received.ibd

然后从库开启sql_thread 复制线程:

root@stage02 17:40: [test001]> start slave sql_thread;

此时查看从库的error.log日志如下:

2021-05-28T17:41:43.786181+08:00 1037405 [Note] Slave SQL thread for channel ‘‘ initialized, starting replication in log ‘mysql-bin.000002‘ at position 353, relay log ‘./stage02-relay-bin.000006‘ position: 566
2021-05-28T17:41:43.796212+08:00 1037406 [Note] InnoDB: Sync to disk
2021-05-28T17:41:43.867262+08:00 1037406 [Note] InnoDB: Sync to disk - done!
2021-05-28T17:41:43.867326+08:00 1037406 [Note] InnoDB: Phase I - Update all pages
2021-05-28T17:41:43.887770+08:00 1037406 [Note] InnoDB: Sync to disk
2021-05-28T17:41:43.941873+08:00 1037406 [Note] InnoDB: Sync to disk - done!
2021-05-28T17:41:43.944749+08:00 1037406 [Note] InnoDB: Phase III - Flush changes to disk
2021-05-28T17:41:43.951529+08:00 1037406 [Note] InnoDB: Phase IV - Flush complete
2021-05-28T17:41:43.951727+08:00 1037406 [Note] InnoDB: `test001`.`t_rent_received` autoinc value set to 0
2021-05-28T17:41:43.981738+08:00 1037406 [Note] InnoDB: AUTOINC next value generation is disabled for ‘`test001`.`t_rent_received`‘

从库复制恢复正常:


root@stage02 17:42:  [test001]> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.16.0.247
                  Master_User: testrep
                  Master_Port: 3306
                Connect_Retry: 10
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 551
               Relay_Log_File: stage02-relay-bin.000006
                Relay_Log_Pos: 764
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

在stage01主库上删除刚才的利用传输表空间导入的表:

root@stage01 17:02:  [test001]> drop table t_rent_received\G
Query OK, 0 rows affected (0.01 sec)

此时从库复制也是正常的:
root@stage02 18:00: [test001]> show slave status\G

特别说明下:
http://fuxkdb.com/2017/08/22/%E4%B8%BB%E4%BB%8E%E4%BC%A0%E8%BE%93%E8%A1%A8%E7%A9%BA%E9%97%B4%E7%9A%84%E5%9D%91/
此博文提到的修复方法是存在问题的,根本不需要跳过gtid事务id

mysql传统复制传输表空间遇到的坑

上一篇:spring | 踩坑 | Could not get JDBC Connectionl nested exception is java.sql.SQLException...


下一篇:09 spark连接mysql数据库