Mysql页面crash问题复现&恢复方法

2021-11-18 05:19:29

title: mysql页面crash问题复现&恢复方法

date: 2019-07-25 11:33:06
categories: Mysql

看到一个MYSQL单节点CRASH的案例，这里用MYSQL5.7尝试复现一个页面损坏的场景，并尝试恢复数据！

构造数据

sysbench构造10测试表

/home/mingjie.gmj/bin/sysbench-1.0.16/bin/sysbench oltp_common --threads=64 --events=0 --mysql-socket=/home/mingjie.gmj/databases/data/mydata5405/mysql5405.sock --mysql-user=root --tables=10 --table_size=1000 prepare
sysbench 1.0.16 (using bundled LuaJIT 2.1.0-beta2)

Initializing worker threads...

Creating table 'sbtest4'...
Creating table 'sbtest8'...
Creating table 'sbtest6'...
Creating table 'sbtest2'...
Creating table 'sbtest10'...
Creating table 'sbtest7'...
Creating table 'sbtest1'...
Creating table 'sbtest5'...
Creating table 'sbtest9'...
Creating table 'sbtest3'...
Inserting 1000 records into 'sbtest10'
Inserting 1000 records into 'sbtest7'
Inserting 1000 records into 'sbtest1'
Inserting 1000 records into 'sbtest2'
Inserting 1000 records into 'sbtest4'
Inserting 1000 records into 'sbtest6'
Inserting 1000 records into 'sbtest3'
Inserting 1000 records into 'sbtest9'
Inserting 1000 records into 'sbtest8'
Inserting 1000 records into 'sbtest5'
Creating a secondary index on 'sbtest10'...
Creating a secondary index on 'sbtest6'...
Creating a secondary index on 'sbtest1'...
Creating a secondary index on 'sbtest5'...
Creating a secondary index on 'sbtest7'...
Creating a secondary index on 'sbtest3'...
Creating a secondary index on 'sbtest2'...
Creating a secondary index on 'sbtest9'...
Creating a secondary index on 'sbtest8'...
Creating a secondary index on 'sbtest4'...

查看表数据

mysql> show tables;
+------------------+
| Tables_in_sbtest |
+------------------+
| sbtest1          |
| sbtest10         |
| sbtest2          |
| sbtest3          |
| sbtest4          |
| sbtest5          |
| sbtest6          |
| sbtest7          |
| sbtest8          |
| sbtest9          |
+------------------+
10 rows in set (0.00 sec)
mysql> select * from sbtest9 limit 1\G
*************************** 1. row ***************************
 id: 1
  k: 505
  c: 92419600644-86829681637-42100581414-80298414140-81768158898-74430369956-50895721992-62087272403-75473465539-28369755814
pad: 65092491791-76928308446-68130154933-07155890946-00453047346
1 row in set (0.00 sec)

手动破坏表文件

对表sbtest9，模拟页面损坏，首先查看数据文件位置

mysql> show global variables like '%datadir%';
+---------------+----------------------------------------------+
| Variable_name | Value                                        |
+---------------+----------------------------------------------+
| datadir       | /home/mingjie.gmj/databases/data/mydata5405/ |
+---------------+----------------------------------------------+
1 row in set (0.01 sec)

vim直接删除前十个字符！

vi sbtest9.ibd

删除后查询还可以成功（表文件删了也能查）因为现在查的还是缓存。

重启数据库触发问题

正常重启

/home/mingjie.gmj/databases/mysql5405/bin/mysqladmin -S /home/mingjie.gmj/databases/data/mydata5405/mysql5405.sock -uroot -p shutdown

/bin/sh /home/mingjie.gmj/databases/mysql5405/bin/mysqld_safe --defaults-file=/home/mingjie.gmj/databases/data/mydata5405/my.cnf &

切换数据库时，数据库crash!

mysql> use sbtest
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> 190725 14:32:01 mysqld_safe Number of processes running now: 0
190725 14:32:01 mysqld_safe mysqld restarted
190725 14:32:02 mysqld_safe mysqld from pid file /home/mingjie.gmj/databases/data/mydata5405/iZbp1d4tisi44j6vxze02fZ.pid ended

mysql> use sbtest
No connection. Trying to reconnect...
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/home/mingjie.gmj/databases/data/mydata5405/mysql5405.sock' (2)
ERROR:
Can't connect to the server

先看下重启启动时的日志，已经有一些问题了

提示pageheader错误（因为我前面删的是文件头部的数据）

2019-07-25 14:32:02 6933 [Note] InnoDB: Database was not shutdown normally!
2019-07-25 14:32:02 6933 [Note] InnoDB: Starting crash recovery.
2019-07-25 14:32:02 6933 [Note] InnoDB: Reading tablespace information from the .ibd files...
2019-07-25 14:32:02 6933 [ERROR] InnoDB: Space id in fsp header 1441792,but in the page header 0
2019-07-25 14:32:02 6933 [ERROR] InnoDB: innodb-page-size mismatch in tablespace ./sbtest/sbtest9.ibd (table sbtest/sbtest9)
2019-07-25 14:32:02 6933 [Note] InnoDB: Page size:1024 Pages to analyze:64
2019-07-25 14:32:02 6933 [Note] InnoDB: Page size: 1024, Possible space_id count:0
2019-07-25 14:32:02 6933 [Note] InnoDB: Page size:2048 Pages to analyze:64
2019-07-25 14:32:02 6933 [Note] InnoDB: Page size: 2048, Possible space_id count:0
2019-07-25 14:32:02 6933 [Note] InnoDB: Page size:4096 Pages to analyze:64
2019-07-25 14:32:02 6933 [Note] InnoDB: Page size: 4096, Possible space_id count:0
2019-07-25 14:32:02 6933 [Note] InnoDB: Page size:8192 Pages to analyze:43
2019-07-25 14:32:02 6933 [Note] InnoDB: Page size: 8192, Possible space_id count:0
2019-07-25 14:32:02 6933 [Note] InnoDB: Page size:16384 Pages to analyze:21
2019-07-25 14:32:02 6933 [Note] InnoDB: Page size: 16384, Possible space_id count:0
2019-07-25 14:32:02 7f4e65962740  InnoDB: Operating system error number 2 in a file operation.

在看一下crash的报错

InnoDB: The error means the system cannot find the path specified.
InnoDB: If you are installing InnoDB, remember that you must create
InnoDB: directories yourself, InnoDB does not create them.
InnoDB: Error: could not open single-table tablespace file ./sbtest/sbtest9.ibd
InnoDB: We do not continue the crash recovery, because the table may become
InnoDB: corrupt if we cannot apply the log records in the InnoDB log to it.
InnoDB: To fix the problem and start mysqld:
InnoDB: 1) If there is a permission problem in the file and mysqld cannot
InnoDB: open the file, you should modify the permissions.
InnoDB: 2) If the table is not needed, or you can restore it from a backup,
InnoDB: then you can remove the .ibd file, and InnoDB will do a normal
InnoDB: crash recovery and ignore that table.
InnoDB: 3) If the file system or the disk is broken, and you cannot remove
InnoDB: the .ibd file, you can set innodb_force_recovery > 0 in my.cnf
InnoDB: and force InnoDB to continue crash recovery here.
190725 14:32:02 mysqld_safe mysqld from pid file /home/mingjie.gmj/databases/data/mydata5405/iZbp1d4tisi44j6vxze02fZ.pid ended

注意看下3)，已经建议怎么操作了！

InnoDB: 3) If the file system or the disk is broken, and you cannot remove
InnoDB: the .ibd file, you can set innodb_force_recovery > 0 in my.cnf
InnoDB: and force InnoDB to continue crash recovery here.

开始修复

尝试再次重启数据库报一样的错误

现在数据库已经不可服务了！需要尽快恢复！

增加参数force拉起数据库innodb_force_recovery = 1

2019-07-25 14:42:33 8563 [Note] InnoDB: innodb_force_recovery was set to 1. Continuing crash recovery even though we cannot access the .ibd file of this table.
2019-07-25 14:42:33 8563 [Note] InnoDB: Restoring possible half-written data pages
2019-07-25 14:42:33 8563 [Note] InnoDB: from the doublewrite buffer...
InnoDB: Last MySQL binlog file position 0 1914387, file name mysql-bin.000013
2019-07-25 14:42:33 8563 [ERROR] InnoDB: Table sbtest/sbtest9 in the InnoDB data dictionary has tablespace id 43, but tablespace with that id or name does not exist. Have you deleted or moved .ibd files? This may also be a table created with CREATE TEMPORARY TABLE whose .ibd and .frm files MySQL automatically removed, but the table still exists in the InnoDB internal data dictionary.
InnoDB: Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/innodb-troubleshooting-datadict.html
InnoDB: for how to resolve the issue.
2019-07-25 14:42:33 8563 [Note] InnoDB: 128 rollback segment(s) are active.
2019-07-25 14:42:33 8563 [Note] InnoDB: Waiting for purge to start
2019-07-25 14:42:33 8563 [Note] InnoDB: 5.6.44 started; log sequence number 4247920472
2019-07-25 14:42:33 8563 [Note] InnoDB: !!! innodb_force_recovery is set to 1 !!!
2019-07-25 14:42:33 8563 [Note] Recovering after a crash using /home/mingjie.gmj/databases/data/mydata5405/mysql-bin
2019-07-25 14:42:33 8563 [Note] Starting crash recovery...
2019-07-25 14:42:33 8563 [Note] Crash recovery finished.
2019-07-25 14:42:33 8563 [Note] Server hostname (bind-address): '*'; port: 5405
2019-07-25 14:42:33 8563 [Note] IPv6 is available.
2019-07-25 14:42:33 8563 [Note]   - '::' resolves to '::';
2019-07-25 14:42:33 8563 [Note] Server socket created on IP: '::'.
InnoDB: innodb_force_recovery is on: we do not allow
InnoDB: database modifications by the user. Shut down
InnoDB: mysqld and edit my.cnf so thatInnoDB: innodb_force_... is removed.
2019-07-25 14:42:33 8563 [ERROR] Error writing master configuration.
2019-07-25 14:42:33 8563 [ERROR] Error reading master configuration.
InnoDB: innodb_force_recovery is on: we do not allow
InnoDB: database modifications by the user. Shut down
InnoDB: mysqld and edit my.cnf so thatInnoDB: innodb_force_... is removed.
2019-07-25 14:42:33 8563 [ERROR] Error writing relay log configuration.
2019-07-25 14:42:33 8563 [ERROR] Error reading relay log configuration.
2019-07-25 14:42:33 8563 [ERROR] Failed to initialize the master info structure
2019-07-25 14:42:33 8563 [Note] Check error log for additional messages. You will not be able to start replication until the issue is resolved and the server restarted.
2019-07-25 14:42:33 8563 [Note] Event Scheduler: Loaded 0 events
2019-07-25 14:42:33 8563 [Note] /home/mingjie.gmj/databases/mysql5405/bin/mysqld: ready for connections.
Version: '5.6.44-log'  socket: '/home/mingjie.gmj/databases/data/mydata5405/mysql5405.sock'  port: 5405  Source distribution

连上去尝试修复数据

mysql> create table sbtest9_recovery like sbtest9;
ERROR 1146 (42S02): Table 'sbtest.sbtest9' doesn't exist

文件头被损坏了，数据库已经读不到这个表了！只能干掉这个表了

mysql> drop table sbtest9;
Query OK, 0 rows affected (0.01 sec)

注释参数重启后恢复正常，但是数据已经无法恢复了！

启动起来之后，需要尝试查询有问题的表，如果只是某个页面损坏了应该大部分数据是可以查出来的，可以用这样的SQL来试

insert ignore into test_recovery select * from test limit 10;
insert ignore into test_recovery select * from test limit 20;
insert ignore into test_recovery select * from test limit 30;

页面crash可以出现各种个样的情况，我在测试的过程中出现了很多种情况，有的情况改参数也拉不起来数据库，还有COREDUMP等等，具体问题需要具体分析！

附：参数说明

innodb_force_recovery

1 (SRV_FORCE_IGNORE_CORRUPT): 忽略检查到的 corrupt 页。尽管检测到了损坏的 page 仍强制服务运行。一般设置为该值即可，然后 dump 出库表进行重建。

2 (SRV_FORCE_NO_BACKGROUND): 阻止主线程的运行，如主线程需要执行 full purge 操作，会导致 crash。 阻止 master thread 和任何 purge thread 运行。若 crash 发生在 purge 环节则使用该值。

3 (SRV_FORCE_NO_TRX_UNDO): 不执行事务回滚操作。

4 (SRV_FORCE_NO_IBUF_MERGE): 不执行插入缓冲的合并操作。如果可能导致崩溃则不要做这些操作。不要进行统计操作。该值可能永久损坏数据文件。若使用了该值，则将来要删除和重建辅助索引。

5 (SRV_FORCE_NO_UNDO_LOG_SCAN): 不查看重做日志，InnoDB 存储引擎会将未提交的事务视为已提交。此时 InnoDB 甚至把未完成的事务按照提交处理。该值可能永久性的损坏数据文件。

6 (SRV_FORCE_NO_LOG_REDO): 不执行前滚的操作。恢复时不做 redo log roll-forward。使数据库页处于废止状态，继而可能引起 B 树或者其他数据库结构更多的损坏。

官方文档：Forcing InnoDB Recovery

1 (SRV_FORCE_IGNORE_CORRUPT)

Lets the server run even if it detects a corrupt page. Tries to make SELECT * FROM *tbl_name* jump over corrupt index records and pages, which helps in dumping tables.
2 (SRV_FORCE_NO_BACKGROUND)

Prevents the master thread and any purge threads from running. If a crash would occur during the purge operation, this recovery value prevents it.
3 (SRV_FORCE_NO_TRX_UNDO)

Does not run transaction rollbacks after crash recovery.
4 (SRV_FORCE_NO_IBUF_MERGE)

Prevents insert buffer merge operations. If they would cause a crash, does not do them. Does not calculate tablestatistics. This value can permanently corrupt data files. After using this value, be prepared to drop and recreate all secondary indexes. Sets InnoDB to read-only.
5 (SRV_FORCE_NO_UNDO_LOG_SCAN)

Does not look at undo logs when starting the database: InnoDB treats even incomplete transactions as committed. This value can permanently corrupt data files. Sets InnoDB to read-only.
6 (SRV_FORCE_NO_LOG_REDO)

Does not do the redo log roll-forward in connection with recovery. This value can permanently corrupt data files. Leaves database pages in an obsolete state, which in turn may introduce more corruption into B-trees and other database structures. Sets InnoDB to read-only.

码农公寓

构造数据

手动破坏表文件

重启数据库触发问题

开始修复

附：参数说明

相关文章