故障描述
前几天,一个mysql数据库运维同事,在生产上用insert into select from语句,在生产上备份了一张表,结果将备份表全表锁住了,业务影响了大约10分钟。
看到这个语句,我第一反应就是select语句也能锁表,可是生产上的故障,证明确实锁表了。所以,需要将insert into select from获取锁的情况彻底研究明白。
故障复盘
创建模拟表和模拟记录
[root@localhost] 17:39:55 [testdb1]>show create table t_test_1\G;
*************************** 1. row ***************************
Table: t_test_1
Create Table: CREATE TABLE `t_test_1` (
`id` int(11) NOT NULL,
`name` char(10) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
1 row in set (0.00 sec)
[root@localhost] 17:40:30 [testdb1]>select * from t_test_1;
+----+-------+
| id | name |
+----+-------+
| 1 | trest |
| 2 | e99e |
| 3 | test |
| 4 | fresd |
| 5 | fsfa |
+----+-------+
5 rows in set (0.00 sec)
[root@localhost] 17:40:17 [testdb1]>show create table t_test_2\G;
*************************** 1. row ***************************
Table: t_test_2
Create Table: CREATE TABLE `t_test_2` (
`id` int(11) NOT NULL,
`name` char(10) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
1 row in set (0.00 sec)
模拟insert into select操作
[root@localhost] 17:41:32 [testdb1]>begin;
Query OK, 0 rows affected (0.00 sec)
[root@localhost] 17:41:33 [testdb1]>insert into t_test_2 select * from t_test_1 where name like 'trest';
Query OK, 1 row affected (0.02 sec)
Records: 1 Duplicates: 0 Warnings: 0
获取innodb的lock信息
[root@localhost] 17:42:00 [(none)]>show engine innodb status\G;
TRANSACTIONS
------------
Trx id counter 182551
Purge done for trx's n:o < 182551 undo n:o < 0 state: running but idle
History list length 0
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421524582451936, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 182546, ACTIVE 20 sec
3 lock struct(s), heap size 1136, 6 row lock(s), undo log entries 1
MySQL thread id 7, OS thread handle 140049254979328, query id 82 localhost root
从innodb引擎获取的lock信息,太少了,只能看到有3 lock struct(s),6 row lock(s),不清楚那表申请的锁,申请什么类型的锁,不知道这些信息,就研究不明白故障到底怎么发生的。
幸运的是,mysql数据库提供一个参数innodb_status_output_locks,可以打印更详细的lock信息。
启用innodb_status_output_locks参数
启用innodb_status_output_locks参数,默认是不开启,所以需要开启。
[root@localhost] 17:31:12 [(none)]>show variables like 'innodb_status_output_locks';
+----------------------------+-------+
| Variable_name | Value |
+----------------------------+-------+
| innodb_status_output_locks | OFF |
+----------------------------+-------+
1 row in set (0.00 sec)
[root@localhost] 17:47:41 [(none)]>set global innodb_status_output_locks=on;
Query OK, 0 rows affected (0.00 sec)
[root@localhost] 17:47:41 [(none)]>show variables like 'innodb_status_output_locks';
+----------------------------+-------+
| Variable_name | Value |
+----------------------------+-------+
| innodb_status_output_locks | ON |
+----------------------------+-------+
1 row in set (0.00 sec)
获取innodb的lock详细信息
下面是开启innodb_status_output_locks参数之后,获取的详细lock信息
[root@localhost] 17:48:28 [(none)]>show engine innodb status\G;
TRANSACTIONS
------------
Trx id counter 182552
Purge done for trx's n:o < 182551 undo n:o < 0 state: running but idle
History list length 0
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421524582451936, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 182551, ACTIVE 5 sec
3 lock struct(s), heap size 1136, 6 row lock(s), undo log entries 1
MySQL thread id 11, OS thread handle 140049254979328, query id 100 localhost root
TABLE LOCK table `testdb1`.`t_test_1` trx id 182551 lock mode IS
RECORD LOCKS space id 97 page no 3 n bits 72 index PRIMARY of table `testdb1`.`t_test_1` trx id 182551 lock mode S
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
0: len 8; hex 73757072656d756d; asc supremum;;
Record lock, heap no 2 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 4; hex 80000001; asc ;;
1: len 6; hex 00000002c710; asc ;;
2: len 7; hex af000000310110; asc 1 ;;
3: len 10; hex 74726573742020202020; asc trest ;;
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 4; hex 80000002; asc ;;
1: len 6; hex 00000002c710; asc ;;
2: len 7; hex af00000031011c; asc 1 ;;
3: len 10; hex 65393965202020202020; asc e99e ;;
Record lock, heap no 4 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 4; hex 80000003; asc ;;
1: len 6; hex 00000002c710; asc ;;
2: len 7; hex af000000310128; asc 1 (;;
3: len 10; hex 74657374202020202020; asc test ;;
Record lock, heap no 5 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 4; hex 80000004; asc ;;
1: len 6; hex 00000002c710; asc ;;
2: len 7; hex af000000310134; asc 1 4;;
3: len 10; hex 66726573642020202020; asc fresd ;;
Record lock, heap no 6 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 4; hex 80000005; asc ;;
1: len 6; hex 00000002c710; asc ;;
2: len 7; hex af000000310140; asc 1 @;;
3: len 10; hex 66736661202020202020; asc fsfa ;;
TABLE LOCK table `testdb1`.`t_test_2` trx id 182551 lock mode IX
从上面的信息,可以很清晰看到,t_test_1获取到IS锁,并且有5个Record lock信息,即锁了5条记录,而此表只有5条记录,所以锁全表。
TABLE LOCK table testdb1.t_test_1 trx id 182551 lock mode IS
锁全表解决方案
insert into t_test_2 select * from t_test_1 where name like 'trest';这个sql语句中,t_test_1表的name字段没有索引,索引走了全表扫描,如果在name字段创建索引呢,会有什么变化呢
创建索引
[root@localhost] 17:54:33 [testdb1]>alter table t_test_1 add index idx_t_test_1_name (name);
Query OK, 0 rows affected (0.24 sec)
Records: 0 Duplicates: 0 Warnings: 0
[root@localhost] 17:54:52 [testdb1]>begin;
Query OK, 0 rows affected (0.00 sec)
[root@localhost] 17:54:55 [testdb1]>insert into t_test_2 select * from t_test_1 where name like 'trest';
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0
重新获取innodb的lock详细信息
TRANSACTIONS
------------
Trx id counter 182565
Purge done for trx's n:o < 182565 undo n:o < 0 state: running but idle
History list length 0
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421524582451936, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 182560, ACTIVE 3 sec
3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 11, OS thread handle 140049254979328, query id 105 localhost root
TABLE LOCK table `testdb1`.`t_test_1` trx id 182560 lock mode IS
RECORD LOCKS space id 97 page no 4 n bits 72 index idx_t_test_1_name of table `testdb1`.`t_test_1` trx id 182560 lock mode S
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
0: len 8; hex 73757072656d756d; asc supremum;;
Record lock, heap no 6 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 10; hex 74726573742020202020; asc trest ;;
1: len 4; hex 80000001; asc ;;
TABLE LOCK table `testdb1`.`t_test_2` trx id 182560 lock mode IX
看到没有,在这里,现在只有一个Record lock,不再是锁全表了。
此故障分析未完,待续。