最近遇到一个案例,很多查询被阻塞没有返回结果,使用show processlist查看,发现不少MySQL线程处于Waiting for table flush状态,查询语句一直被阻塞,只能通过Kill进程来解决。那么我们先来看看Waiting for table flush的官方解释:https://dev.mysql.com/doc/refman/5.6/en/general-thread-states.html
Waiting for table flush
The thread is executing FLUSH TABLES and is waiting for all threads to close their tables, or the thread got a notification that the underlying structure for a table has changed and it needs to reopen the table to get the new structure. However, to reopen the table, it must wait until all other threads have closed the table in question.
This notification takes place if another thread has used FLUSH TABLES or one of the following statements on the table in question: FLUSH TABLES tbl_name, ALTER TABLE, RENAME TABLE, REPAIR TABLE, ANALYZE TABLE, or OPTIMIZE TABLE.
那么我们接下来模拟一下线程处于Waiting for table flush状态的情况,如所示:
在第一个会话连接(connection id=13)中,我们使用lock table 锁定表test。
mysql> use MyDB;
Database changed
mysql> select connection_id();
+-----------------+
| connection_id() |
+-----------------+
| 13 |
+-----------------+
1 row in set (0.00 sec)
mysql> lock table test read;
Query OK, 0 rows affected (0.00 sec)
mysql>
在第二个会话连接(connection id=17)中,我们执行flush table 或 flush table test 皆可。此时你会发现flush table处于阻塞状态。
mysql> use MyDB;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> select connection_id();
+-----------------+
| connection_id() |
+-----------------+
| 17 |
+-----------------+
1 row in set (0.00 sec)
mysql> flush table test;
在第三个会话/连接中,当你切换到MyDB时,就会提示“You can turn off this feature to get a quicker startup with -A” ,此时处于阻塞状态。此时你退出会话,使用参数-A登录数据库后,你如果查询test表,就会处于阻塞状态(当然查询其它表不会被阻塞)。如下所示:
mysql> use MyDB;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
mysql> use MyDB;
Database changed
mysql> select * from test;
在第四个会话/连接,我们用show processlist查看到当前数据库所有连接线程状态,你会看到17、18都处于Waiting for table flush的状态。如下截图所示:
mysql> show processlist;
+----+------+-----------+------+---------+------+-------------------------+--------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+------+-----------+------+---------+------+-------------------------+--------------------+
| 13 | root | localhost | MyDB | Sleep | 90 | | NULL |
| 14 | root | localhost | NULL | Query | 0 | init | show processlist |
| 17 | root | localhost | MyDB | Query | 52 | Waiting for table flush | flush table test |
| 18 | root | localhost | MyDB | Query | 9 | Waiting for table flush | select * from test |
+----+------+-----------+------+---------+------+-------------------------+--------------------+
4 rows in set (0.00 sec)
mysql>
mysql> show processlist;
+----+------+-----------+------+---------+------+-------------------------+--------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+------+-----------+------+---------+------+-------------------------+--------------------+
| 13 | root | localhost | MyDB | Sleep | 90 | | NULL |
| 14 | root | localhost | NULL | Query | 0 | init | show processlist |
| 17 | root | localhost | MyDB | Query | 52 | Waiting for table flush | flush table test |
| 18 | root | localhost | MyDB | Query | 9 | Waiting for table flush | select * from test |
+----+------+-----------+------+---------+------+-------------------------+--------------------+
4 rows in set (0.00 sec)
mysql>
mysql>
mysql>
mysql>
mysql> show open tables where in_use >=1;
+----------+-------+--------+-------------+
| Database | Table | In_use | Name_locked |
+----------+-------+--------+-------------+
| MyDB | test | 1 | 0 |
+----------+-------+--------+-------------+
1 row in set (0.00 sec)
mysql> kill 17;
Query OK, 0 rows affected (0.00 sec)
mysql> show processlist;
+----+------+-----------+------+---------+------+-------------------------+--------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+------+-----------+------+---------+------+-------------------------+--------------------+
| 13 | root | localhost | MyDB | Sleep | 442 | | NULL |
| 14 | root | localhost | NULL | Query | 0 | init | show processlist |
| 18 | root | localhost | MyDB | Query | 361 | Waiting for table flush | select * from test |
+----+------+-----------+------+---------+------+-------------------------+--------------------+
3 rows in set (0.00 sec)
mysql> kill 13;
Query OK, 0 rows affected (0.00 sec)
mysql> show processlist;
+----+------+-----------+------+---------+------+-------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+------+-----------+------+---------+------+-------+------------------+
| 14 | root | localhost | NULL | Query | 0 | init | show processlist |
| 18 | root | localhost | MyDB | Sleep | 427 | | NULL |
+----+------+-----------+------+---------+------+-------+------------------+
2 rows in set (0.00 sec)
mysql>
注意:我们需要Kill线程13, Kill掉线程17是解决不了问题的。
生产环境中,很多时候可能不是lock table read引起的阻塞,而是由于慢查询,导致flush table一直无法关闭该表而一直处于等待状态,例如下面测试案例中,我使用同一张大表做笛卡尔积模拟一个慢查询,其它操作相同,如下所示,你会看到同样产生了Waiting for table flush
mysql> SELECT T.* FROM TEST1 T, TEST1 L;
另外,网上有个案例,mysqldump备份时,如果没有使用参数—single-transaction 或由于同时使用了flush-logs与—single-transaction两个参数也可能引起这样的等待场景,这个两个参数放在一起,会在开始dump数据之前先执行一个FLUSH TABLES操作。
解决方案:
出现Waiting for table flush时,我们一般需要找到那些表被lock住或那些慢查询导致flush table一直在等待而无法关闭该表。然后Kill掉对应的线程即可,但是如何精准定位是一个挑战,尤其是生产环境,你使用show processlist会看到大量的线程。让你眼花缭乱的,怎么一下子定位问题呢?
对于慢查询引起的其它线程处于Waiting for table flush状态的情形:
可以查看show processlist中Time值很大的线程。然后甄别确认后Kill掉,如上截图所示,会话连接14就是引起阻塞的源头SQL。有种规律就是这个线程的Time列值必定比被阻塞的线程要高。这个就能过滤很多记录。
对于lock table read引起的其它线程处于Waiting for table flush状态的情形:
对于实验中使用lock table read这种情况,这种会话可能处于Sleep状态,而且它也不会出现在show engine innodb status \G命令的输出信息中。 即使show open tables where in_use >=1;能找到是那张表被lock住了,但是无法定位到具体的线程(连接),其实这个是一个头痛的问题。但是inntop这款利器就可以定位到,如下所示,线程17锁住了表test,在innotop里面就能定位到是线程17。所谓工欲善其事必先利其器!
另外,在官方文档中ALTER TABLE, RENAME TABLE, REPAIR TABLE, ANALYZE TABLE, or OPTIMIZE TABLE都能引起这类等待,下面也做了一些简单测试,如下所示:
Waiting for table flush的另外一个场景
会话连接(connection id=18)执行下面SQL语句,模拟一个慢查询SQL
mysql> select connection_id();
+-----------------+
| connection_id() |
+-----------------+
| 18 |
+-----------------+
1 row in set (0.00 sec)
mysql> select name, sleep(64) from test;
会话连接(connection id=6)执行下面SQL语句,分析表test
mysql> select connection_id();
+-----------------+
| connection_id() |
+-----------------+
| 6 |
+-----------------+
1 row in set (0.00 sec)
mysql> analyze table test;
+-----------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+-----------+---------+----------+----------+
| MyDB.test | analyze | status | OK |
+-----------+---------+----------+----------+
1 row in set (0.04 sec)
mysql>
会话连接(connection id=8)执行下面SQL语句
mysql> select connection_id();
+-----------------+
| connection_id() |
+-----------------+
| 8 |
+-----------------+
1 row in set (0.00 sec)
mysql> select * from test;
查看线程的状态,你会发现被阻塞的会话处于 Waiting for table flush状态。 因为当对表做了ANALYZE TABLE后,后台针对该表的查询需要等待,因为MySQL已经检测到该表内部变化,需要使用FLUSH TABLE关闭然后重新打开该表,所以当你查询该表时,就会处于 Waiting for table flush
mysql> show processlist;
+----+------+-----------+------+---------+------+-------------------------+----------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+------+-----------+------+---------+------+-------------------------+----------------------------------+
| 6 | root | localhost | MyDB | Sleep | 22 | | NULL |
| 8 | root | localhost | MyDB | Query | 14 | Waiting for table flush | select * from test |
| 15 | root | localhost | NULL | Sleep | 3 | | NULL |
| 16 | root | localhost | NULL | Query | 0 | init | show processlist |
| 18 | root | localhost | MyDB | Query | 46 | User sleep | select name, sleep(64) from test |
+----+------+-----------+------+---------+------+-------------------------+----------------------------------+
5 rows in set (0.00 sec)
mysql>
Waiting for table metadata lock
会话连接(connection id=17)执行下面SQL语句,模拟一个慢查询SQL
mysql> select connection_id();
+-----------------+
| connection_id() |
+-----------------+
| 17 |
+-----------------+
1 row in set (0.00 sec)
mysql> select name, sleep(100) from test;
会话连接(connection id=6)执行下面SQL语句, 修改表结构操作
mysql> select connection_id();
+-----------------+
| connection_id() |
+-----------------+
| 6 |
+-----------------+
1 row in set (0.00 sec)
mysql> alter table test add tname varchar(10); // rename table test to kkk 同样会引起Waiting for table metadata lock
会话连接(connection id=8)执行下面SQL语句,查询表test
mysql> select connection_id();
+-----------------+
| connection_id() |
+-----------------+
| 8 |
+-----------------+
1 row in set (0.00 sec)
mysql> select * from test;
查看线程的状态,你会发现被阻塞的会话处于 Waiting for table metadata lock状态。
mysql> show processlist;
+----+------+-----------+------+---------+------+---------------------------------+----------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+------+-----------+------+---------+------+---------------------------------+----------------------------------------+
| 6 | root | localhost | MyDB | Query | 19 | Waiting for table metadata lock | alter table test add tname varchar(10) |
| 8 | root | localhost | MyDB | Query | 6 | Waiting for table metadata lock | select * from test |
| 15 | root | localhost | NULL | Sleep | 8 | | NULL |
| 16 | root | localhost | NULL | Query | 0 | init | show processlist |
| 17 | root | localhost | MyDB | Query | 55 | User sleep | select name, sleep(100) from test |
+----+------+-----------+------+---------+------+---------------------------------+----------------------------------------+
5 rows in set (0.00 sec)
mysql>
参考资料:
https://www.percona.com/blog/2013/02/27/mysql-optimizer-analyze-table-and-waiting-for-table-flush/
http://www.cnblogs.com/jackhub/p/3841004.html
http://myrock.github.io/2014/11/20/mysql-waiting-for-table-flush/