近期遇到一个新的case,执行sql后无返回,同时抓包没有抓到对应的sql发起访问(实际复现的时候抓到了报文)
,以及mysql客户端加不加-A 速度不一(不加 -A 要在本地创建一个库表缓存,加了就不创建这个),
实际分析下来并没有很难,本文主要为了分享一下该类问题分析的小技巧
1,需要循环执行sql,写个循环配置免密登陆,,如下所示的配置后,就可以直接 mysql -A 登陆成功了
[root@Ad****s-143 ~]# cat .my.cnf
[client]
host=rm-t********e3eo.mysql.si****re.rds.aliyuncs.com
user='p***b'
password='vk7m*****x%ta'
[root@Ad****s-143 ~]# mysql -A
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 12571135
Server version: 5.6.16-log Source distribution
Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
2, 配置循环sql,并抓包(构造一个特殊的sql,并追踪mysql命令行的执行过程)
抓包用这个sql就足够了
# for i in {1..100};do echo $i;mysql -A -e "select guid, name, 0 from pa.industry limit $i ;";sleep 1s;done
循环一百次 输出本伦次序号 mysql指向sql,同时limit指定为前面的i变量,便于提取sql 间隔1秒
strace跟踪的话用这个
# for i in {1..100};do echo $i;strace -F -ff -t -tt -s 4096 -o m.out mysql -A -e "select guid, name, 0 from pa.industry limit $i ;";sleep 1s;done
3, 复现问题,并分析
输出效果
[root@Ad*****s-143 ~]# for i in {1..100};do echo $i;mysql -A -e "select guid, name, 0 from pa.industry limit $i ;";sleep 1s;done
1
+--------------------------------------+------+---+
| guid | name | 0 |
+--------------------------------------+------+---+
| 965EADB8-C88E-83B2-325C-0DD04D5612DA | ???? | 0 |
+--------------------------------------+------+---+
2
+--------------------------------------+------+---+
| guid | name | 0 |
+--------------------------------------+------+---+
| 965EADB8-C88E-83B2-325C-0DD04D5612DA | ???? | 0 |
| 0DFECEC8-9348-75F1-5B7C-2EE050FB0186 | ???? | 0 |
+--------------------------------------+------+---+
......中间省略一万字
29
+--------------------------------------+---------+---+
| guid | name | 0 |
+--------------------------------------+---------+---+
| 965EADB8-C88E-83B2-325C-0DD04D5612DA | ???? | 0 |
| 0DFECEC8-9348-75F1-5B7C-2EE050FB0186 | ???? | 0 |
| 4184E44C-E829-E3F3-5D75-1B488B3953A6 | ?? | 0 |
| 7FB961D6-EE07-2E67-447B-E1DDB2C2A2E0 | ?? | 0 |
| F7D1588D-9961-E01E-4C3E-228752504C0C | ???? | 0 |
| 07AD88F0-4546-10B1-040F-89C1773E2C52 | ?? | 0 |
| 451DB2F6-170B-7BEC-5DF9-2C3643204CA8 | ?? | 0 |
| FC67E100-F77B-165A-3202-23DF76BB1120 | ?? | 0 |
| 4C5AEA5F-AD45-E04B-59A8-22AC3CC6BDF9 | ???? | 0 |
| BB3EF523-762E-D38A-7ACC-CBAEA027A2E2 | ???? | 0 |
| 0C331B61-A650-4178-9153-2FAD8402492B | ???? | 0 |
| A85AD107-8A6D-22DD-4854-86DB0AD5A0E1 | ???? | 0 |
| 758D3D24-8725-8FD2-21DC-E58CE7F790B0 | ???? | 0 |
| 1D0DE78E-DE69-8DEA-187F-80762F918CAF | ?????? | 0 |
| 86820362-D8FB-BF3E-D1AE-BEA9F22DF131 | ???? | 0 |
| 4561EF83-C1F9-3EC3-EB5B-B829D8E1B652 | ???? | 0 |
| F552AF90-792C-39CC-E201-CD97C9681A38 | ???? | 0 |
| A8B0CEDA-5B2B-A231-4414-5EA41E37B680 | ???? | 0 |
| AA5E6908-9DF6-17CD-EEB8-4EB877A65F80 | ???? | 0 |
| 220B5BDD-019B-B13D-4518-259A9BF33A84 | ???? | 0 |
| 00F55DEA-15D1-BFC0-EF96-8FA57464A036 | ????? | 0 |
| 9877703F-B2C1-E5BE-2A85-CF76CE944FC8 | ???? | 0 |
| BCB3CB52-37FB-2F99-C330-FBDC4C7E5949 | ???? | 0 |
| 5CF8E33B-0936-C2C4-0763-3880943D1461 | ???? | 0 |
| AF5941E7-56D5-4D56-B24D-8DA026A76B49 | ???? | 0 |
| E204689F-A318-E53E-FDF2-7FB942CA4D80 | ???? | 0 |
| BAB81C78-BC1C-33F5-81A5-CE9811F4F4E6 | ??????? | 0 |
| 6BD47A46-0CDE-5B95-1701-560BF2A8BBB7 | ???? | 0 |
| C098145A-F21B-033B-52D6-35A4BD7C83A4 | ???? | 0 |
+--------------------------------------+---------+---+
30
^CCtrl-C -- sending "KILL QUERY 12569784" to server ...
Ctrl-C -- query aborted.
^CCtrl-C -- sending "KILL 12569784" to server ...
Ctrl-C -- query aborted.
^CCtrl-C -- exit!
4,strace 看到的结果,可以看出来发出的sql是 limit 33(多次复现保留的现场不一,不用纠结序号对不对得上的问题)
5,查看wireshark结果,客户端发出query的sql 已经被server端确认了,但是没有给返回response的结果
不正常返回的截图
正常返回的截图
6,登陆mysql查看processlist发现新问题,server把query给ack(确认)后,客户端没有收到response,但是server记录的是sleep,说明server返回了response并进入sleep状态,server端认为客户端没有结束连接,说明response丢在了中间链路上
mysql的会话
mysql> select * from INFORMATION_SCHEMA.PROCESSLIST where HOST like '101.*.*.143%';
+----------+--------+-----------------------+------+---------+------+-----------+---------------------------------------------------------------------------------+
| ID | USER | HOST | DB | COMMAND | TIME | STATE | INFO |
+----------+--------+-----------------------+------+---------+------+-----------+---------------------------------------------------------------------------------+
| 12569201 | pa_web | 101.*.*.143:41580 | NULL | Sleep | 734 | | NULL |
这一条这一条
| 12569784 | pa_web | 101.*.*.143:53000 | NULL | Sleep | 161 | | NULL
ecs的tcp连接
7,客户端在国内,rds在新加坡,怀疑是跨境走的国际链路中间某一跳路由有问题导致报文被丢弃,建议跨境链路可以考虑使用高速通道打通内网调用