一、问题描述:
同事反馈线上一个表有其中一条数据无法删除,其他都正常,我拿到删数据的sql,尝试执行,报错如下:
mysql> delete from facebook_posts where id = 7048962;
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
二、问题处理
从报错信息看,应该是关于这条数据有事物未提交,锁等待超时了,下面我们就开始验证并解决问题
1、在sql执行期间,通过information_schema.innodb_trx表找到这个sql的事物ID(5316933097 )
mysql> select trx_id,trx_started,trx_requested_lock_id,trx_mysql_thread_id,trx_query from information_schema.innodb_trx where trx_query='delete from facebook_posts where id = 7048962';
+------------+---------------------+------------------------+---------------------+-----------------------------------------------+
| trx_id | trx_started | t rx_requested_lock_id | trx_mysql_thread_id | trx_query
+------------+---------------------+------------------------+---------------------+-----------------------------------------------+
| 5316933097 | 2017-08-15 07:31:57 | 5316933097:923:24693:6 | 1798850878 | delete from facebook_posts where id = 7048962 |
+------------+---------------------+------------------------+---------------------+-----------------------------------------------+
1 row in set (0.00 sec)
关于innodb_trx表字段含义的解释:
mysql> desc information_schema.innodb_trx;
+----------------------------+---------------------+------+-----+---------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------------------+---------------------+------+-----+---------------------+-------+
| trx_id | varchar(18) | NO | | | |#事务ID
| trx_state | varchar(13) | NO | | | |#事物状态
| trx_started | datetime | NO | | 0000-00-00 00:00:00 | |#事物开始时间
| trx_requested_lock_id | varchar(81) | YES | | NULL | |#事物请求锁ID
| trx_wait_started | datetime | YES | | NULL | |#事物开始等待时间
| trx_weight | bigint(21) unsigned | NO | | 0 | |#
| trx_mysql_thread_id | bigint(21) unsigned | NO | | 0 | |#事物线程ID,即show processlist看到ID
| trx_query | varchar(1024) | YES | | NULL | |#具体SQL
| trx_operation_state | varchar(64) | YES | | NULL | |#事物当前操作状态
| trx_tables_in_use | bigint(21) unsigned | NO | | 0 | |#事物中有多少个表被使用
| trx_tables_locked | bigint(21) unsigned | NO | | 0 | |#使用拥有多少个锁
| trx_lock_structs | bigint(21) unsigned | NO | | 0 | |#
| trx_lock_memory_bytes | bigint(21) unsigned | NO | | 0 | |#事物锁住的内存大小
| trx_rows_locked | bigint(21) unsigned | NO | | 0 | |#事物锁住的行数
| trx_rows_modified | bigint(21) unsigned | NO | | 0 | |#使用修改的行数
| trx_concurrency_tickets | bigint(21) unsigned | NO | | 0 | |#事物并发票数
| trx_isolation_level | varchar(16) | NO | | | |#事物隔离级别
| trx_unique_checks | int(1) | NO | | 0 | |#是否唯一性检查
| trx_foreign_key_checks | int(1) | NO | | 0 | |#是否外键检查
| trx_last_foreign_key_error | varchar(256) | YES | | NULL | |#最后的外键错误
| trx_adaptive_hash_latched | int(1) | NO | | 0 | |#
| trx_adaptive_hash_timeout | bigint(21) unsigned | NO | | 0 | |#
| trx_is_read_only | int(1) | NO | | 0 | |#
| trx_autocommit_non_locking | int(1) | NO | | 0 | |#
+----------------------------+---------------------+------+-----+---------------------+-------+
24 rows in set (0.00 sec)
2、通过上面步骤1找到的事物ID,找到占有锁的事物ID(5316888834 )
mysql> select * from information_schema.innodb_lock_waits where requesting_trx_id=5316933097;
+-------------------+------------------------+-----------------+------------------------+
| requesting_trx_id | requested_lock_id | blocking_trx_id | blocking_lock_id |
+-------------------+------------------------+-----------------+------------------------+
| 5316933097 | 5316933097:923:24693:6 | 5316888834 | 5316888834:923:24693:6 |
+-------------------+------------------------+-----------------+------------------------+
1 row in set (0.00 sec)
关于innodb_lock_waits 表的字段含义的解释:
mysql> desc information_schema.innodb_lock_waits;
+-------------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+-------------+------+-----+---------+-------+
| requesting_trx_id | varchar(18) | NO | | | |#请求锁的事物ID
| requested_lock_id | varchar(81) | NO | | | |#请求锁的锁ID
| blocking_trx_id | varchar(18) | NO | | | |#当前拥有锁的事物ID
| blocking_lock_id | varchar(81) | NO | | | |#当前拥有锁的锁ID
+-------------------+-------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
3、通过步骤2找到的占有锁的事物ID,找到占有锁的事物线程ID(1790259884 )
mysql> select * from information_schema.innodb_trx where trx_id=5316888834 \G
*************************** 1. row ***************************
trx_id: 5316888834
trx_state: RUNNING
trx_started: 2017-08-15 06:00:21
trx_requested_lock_id: NULL
trx_wait_started: NULL
trx_weight: 6
trx_mysql_thread_id: 1790259884
trx_query: NULL
trx_operation_state: NULL
trx_tables_in_use: 0
trx_tables_locked: 0
trx_lock_structs: 6
trx_lock_memory_bytes: 1184
trx_rows_locked: 10
trx_rows_modified: 0
trx_concurrency_tickets: 0
trx_isolation_level: REPEATABLE READ
trx_unique_checks: 1
trx_foreign_key_checks: 1
trx_last_foreign_key_error: NULL
trx_adaptive_hash_latched: 0
trx_adaptive_hash_timeout: 10000
trx_is_read_only: 0
trx_autocommit_non_locking: 0
1 row in set (0.00 sec)
4、通过步骤3找的事物ID,可以查看下这个事物发起的账号和主机信息,提供给开发人员查找异常的真正原因,并kill这个事物ID,这条数据就可以正常删除了
#查看下这个事物发起的账号和主机信息
mysql> select * from information_schema.processlist where ID=1790259884;
+------------+----------+---------------------+--------+---------+------+-------+------+
| ID | USER | HOST | DB | COMMAND | TIME | STATE | INFO |
+------------+----------+---------------------+--------+---------+------+-------+------+
| 1790259884 | spider_w | 172.31.11.143:46120 | db_mta | Sleep | 1319 | | NULL |
+------------+----------+---------------------+--------+---------+------+-------+------+
1 row in set (0.01 sec)
#kill 这个未提交的事物线程ID
mysql> CALL mysql.rds_kill(1790259884);
Query OK, 0 rows affected (0.00 sec)
#删除数据
mysql> delete from facebook_posts where id = 7041232;
Query OK, 1 row affected (0.02 sec)