DBA如何利用strace/pstack/gdb来定位问题

参考:https://database.51cto.com/art/202006/617923.htm

strace简介

strace是Linux环境下的一款程序调试工具,用来监察一个应用程序所使用的系统调用。

Strace是一个简单的跟踪系统调用执行的工具。在其最简单的形式中,它可以从开始到结束跟踪二进制的执行,并在进程的生命周期中输出一行具有系统调用名称,每个系统调用的参数和返回值的文本行。

常用选项

  •  -T:strace输出显示时间
  •  -t:可以在每行的输出之前添加时间戳,-tt可以精确到微妙级别,-ttt也可以精确到微妙级,但是它并不是打印当前时间,而是显示自从epoch以来的所经过的秒数
  •  -s:指定输出的字符串的最大长度,默认为32,如果输出到文件的话会全部输出
  •  -o:指定将strace输出到文件
  •  -ff:如果提供-o filename,则所有进程的跟踪结果输出到相应的filename.pid中,pid是各进程的进程号.
  •  -e:指定跟踪某个行为,例如-e trace=open指定只跟踪open行为
  •  -y:将文件句柄用文件路径代替显示

正确姿势

简单使用

  1. strace -T -tt -o /tmp/strace.log CMD  
  2. strace -T -tt CMD 2>&1 |tee /tmp/strace.log  
  3. strace -T -tt -s 100 -o /tmp/strace.log CMD  
  4. strace -T -tt -s 100 -ff -o /tmp/strace.log CMD  
  5. strace -T -tt -s 100 -e trace=XXXX -o /tmp/strace.log CMD 

使用案例

利用strace观察客户端client执行SQL

通过sys.processlist表中pid可以知道客户端连接pid  

mysql> select  thd_id,conn_id,user,pid,command,current_statement,program_name from  sys.processlist  where conn_id>0 and  pid >0;
+--------+---------+----------------+--------+---------+-------------------------------------------------------------------+--------------+
| thd_id | conn_id | user           | pid    | command | current_statement                                                 | program_name |
+--------+---------+----------------+--------+---------+-------------------------------------------------------------------+--------------+
| 218265 |  218238 | root@localhost | 103440 | Query   | select  thd_id,conn_id,user,pi ... t  where conn_id>0 and  pid >0 | mysql        |
+--------+---------+----------------+--------+---------+-------------------------------------------------------------------+--------------+
1 row in set (0.33 sec)
#session1
[root@localhost scripts]# strace -T -tt -s 100 -o /tmp/strace.log -p103440
strace: Process 103440 attached

#session2
mysql> use test
Database changed
mysql> select *from t1 limit 2;
+----+--------------------------------------+
| id | name                                 |
+----+--------------------------------------+
|  1 | 137b97ba-6de1-11eb-91cf-000c29a5b3f5 |
|  2 | 13d6586d-6de1-11eb-91cf-000c29a5b3f5 |
+----+--------------------------------------+
2 rows in set (0.00 sec)

查看strace文件

[root@localhost tmp]# vi /tmp/strace.log

11:37:35.114324 write(1, "\r\33[15G*from t1 limit 2;\33[K", 26) = 26 <0.000846>
11:37:35.115354 read(0, "\n", 1)        = 1 <0.624789>
11:37:35.740309 write(1, "\n", 1)       = 1 <0.000850>
11:37:35.741471 ioctl(0, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost isig icanon echo ...}) = 0 <0.000025>
11:37:35.741701 rt_sigprocmask(SIG_BLOCK, [HUP INT QUIT TERM CONT TSTP WINCH], [], 8) = 0 <0.000022>
11:37:35.741814 rt_sigaction(SIGINT, {0x409933, [INT], SA_RESTORER|SA_RESTART, 0x7f5d55d0c5f0}, NULL, 8) = 0 <0.000019>
11:37:35.741886 rt_sigaction(SIGTSTP, {SIG_DFL, [], SA_RESTORER, 0x7f5d55d0c5f0}, NULL, 8) = 0 <0.000059>
11:37:35.742227 rt_sigaction(SIGTERM, {SIG_DFL, [], SA_RESTORER, 0x7f5d55d0c5f0}, NULL, 8) = 0 <0.000017>
11:37:35.742423 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000018>
11:37:35.742500 ioctl(0, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost isig icanon echo ...}) = 0 <0.000022>
11:37:35.742888 times({tms_utime=29, tms_stime=17, tms_cutime=0, tms_cstime=0}) = 436033838 <0.000022>
11:37:35.742987 ioctl(3, FIONREAD, [0]) = 0 <0.000018>
11:37:35.743055 sendto(3, "\30\0\0\0\3select *from t1 limit 2", 28, 0, NULL, 0) = 28 <0.001870>
11:37:35.745629 recvfrom(3, "\1\0\0\1\2\"\0\0\2\3def\4test\2t1\2t1\2id\2id\f?\0\v\0\0\0\3\3B\0\0\0&\0\0\3\3def\4test\2t1\2t1\4name\4name\f!\0\375\377\2\0\374\20\0\0\0\0'\0\0\4\0011$137b97ba"..., 16384, 0, NULL, NULL) = 182 <0.000031>
11:37:35.745963 times({tms_utime=29, tms_stime=17, tms_cutime=0, tms_cstime=0}) = 436033838 <0.000014>
11:37:35.746031 write(1, "+----+--------------------------------------+\n", 46) = 46 <0.000447>
11:37:35.746524 write(1, "| id | name                                 |\n", 46) = 46 <0.000205>
11:37:35.746796 write(1, "+----+--------------------------------------+\n", 46) = 46 <0.000205>
11:37:35.747051 write(1, "|  1 | 137b97ba-6de1-11eb-91cf-000c29a5b3f5 |\n", 46) = 46 <0.000153>
11:37:35.747251 write(1, "|  2 | 13d6586d-6de1-11eb-91cf-000c29a5b3f5 |\n", 46) = 46 <0.000186>
11:37:35.747480 write(1, "+----+--------------------------------------+\n", 46) = 46 <0.000158>
11:37:35.747692 write(1, "2 rows in set (0.00 sec)\n", 25) = 25 <0.000210>
11:37:35.747943 write(1, "\n", 1)       = 1 <0.000146>
11:37:35.748142 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1207, ...}) = 0 <0.000022>
11:37:35.748238 ioctl(1, TCGETS, {B38400 opost isig icanon echo ...}) = 0 <0.000018>
11:37:35.748295 ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0 <0.000013>
11:37:35.748348 rt_sigprocmask(SIG_BLOCK, [HUP INT QUIT TERM CONT TSTP WINCH], [], 8) = 0 <0.000017>
11:37:35.748413 rt_sigaction(SIGINT, {0x4b8097, [], SA_RESTORER, 0x7f5d55d0c5f0}, {0x409933, [INT], SA_RESTORER|SA_RESTART, 0x7f5d55d0c5f0}, 8) = 0 <0.000017>

利用strace观察server端执行


#命令  

[root@localhost tmp]# strace -o  /tmp/strace.log  -T -tt -f -p `pidof mysqld`
strace: Process 2075 attached with 40 threads

strace: Process 105607 attached
strace: Process 105612 attached
strace: Process 105619 attached
strace: Process 105624 attached
strace: Process 105631 attached
^Cstrace: Process 2075 detached


#假设你想看跟IO相关的行为,可以用-e trace来指定  
# mysql相关的IO操作:  
# mysql: read, write, open  
# innodb: pread64, pwrite64  
strace -o /tmp/strace.log -T -tt -f -e trace=read,open,write,pwrite64,pread64 -p `pidof mysqld` 

strace会在/tmp文件夹下生成strace.log.xxx,xxx对应的就是THREAD_OS_ID,我们直接看strace.log.8208文件

pstack简介

pstack用来跟踪进程栈,这个命令在排查进程问题时非常有用,比如我们发现一个服务一直处于work状态(如假死状态,好似死循环),使用这个命令就能轻松定位问题所在;

可以在一段时间内,多执行几次pstack,若发现代码栈总是停在同一个位置,那个位置就需要重点关注,很可能就是出问题的地方;

正确使用姿势

pstack使用起来非常简单,直接pstack -pid即可

这里分析下mysql夯住,登录不了的案例

pstack    2075  >1.log

举例如下:

#0  0x00007fc7c2b57a4d in recv () from /lib64/libpthread.so.0
#1  0x000000000047a3ae in inline_mysql_socket_recv (src_file=0x782550 "/dbfiles/soft/mysql-5.7.31/vio/viosocket.c", src_line=134, mysql_socket=..., buf=0xc0f460, n=16384, flags=0) at /dbfiles/soft/mysql-5.7.31/include/mysql/psi/mysql_socket.h:829
#2  0x000000000047a896 in vio_read (vio=0xc0aad0, buf=0xc0f460 "T", size=16384) at /dbfiles/soft/mysql-5.7.31/vio/viosocket.c:134
#3  0x000000000047aa0c in vio_read_buff (vio=0xc0aad0, buf=0xc13490 "\270", size=4) at /dbfiles/soft/mysql-5.7.31/vio/viosocket.c:177
#4  0x0000000000480103 in net_read_raw_loop (net=0xa2a200 <mysql>, count=4) at /dbfiles/soft/mysql-5.7.31/sql/net_serv.cc:694
#5  0x000000000048020b in net_read_packet_header (net=0xa2a200 <mysql>) at /dbfiles/soft/mysql-5.7.31/sql/net_serv.cc:784
#6  0x00000000004802c1 in net_read_packet (net=0xa2a200 <mysql>, complen=0x7ffdfaa4c550) at /dbfiles/soft/mysql-5.7.31/sql/net_serv.cc:844
#7  0x0000000000480451 in my_net_read (net=0xa2a200 <mysql>) at /dbfiles/soft/mysql-5.7.31/sql/net_serv.cc:921
#8  0x000000000045b0c2 in cli_safe_read_with_ok (mysql=0xa2a200 <mysql>, parse_ok=0 '\000', is_data_packet=0x0) at /dbfiles/soft/mysql-5.7.31/sql-common/client.c:1050
#9  0x000000000045b558 in cli_safe_read (mysql=0xa2a200 <mysql>, is_data_packet=0x0) at /dbfiles/soft/mysql-5.7.31/sql-common/client.c:1169
#10 0x000000000045f98f in cli_read_change_user_result (mysql=0xa2a200 <mysql>) at /dbfiles/soft/mysql-5.7.31/sql-common/client.c:2732
#11 0x0000000000462080 in run_plugin_auth (mysql=0xa2a200 <mysql>, data=0xc134b9 "\024f\022\320e~\f\023~ \213\026~\233\027\231\210p\264\201\066mysql_native_password", data_len=21, data_plugin=0xc134ce "mysql_native_password", db=0x0) at /dbfiles/soft/mysql-5.7.31/sql-common/client.c:3934
#12 0x000000000046494b in mysql_real_connect (mysql=0xa2a200 <mysql>, host=0x77e9d4 "localhost", user=0xbe71f0 "root", passwd=0xbe7220 "123456", db=0x0, port=0, unix_socket=0xbe71a0 "/dbfiles/mysql_home/data/mysql.sock", client_flag=66560) at /dbfiles/soft/mysql-5.7.31/sql-common/client.c:4620
#13 0x0000000000411884 in sql_real_connect (host=0x0, database=0x0, user=0xbe71f0 "root", password=0xbe7220 "123456", silent=0) at /dbfiles/soft/mysql-5.7.31/client/mysql.cc:5044
#14 0x0000000000411c4e in sql_connect (host=0x0, database=0x0, user=0xbe71f0 "root", password=0xbe7220 "123456", silent=0) at /dbfiles/soft/mysql-5.7.31/client/mysql.cc:5183
#15 0x000000000040904c in main (argc=5, argv=0xbe7408) at /dbfiles/soft/mysql-5.7.31/client/mysql.cc:1330

#15 0x000000000040904c in main (argc=5, argv=0xbe7408) at /dbfiles/soft/mysql-5.7.31/client/mysql.cc:1330

我们打开源码mysql.cc 对应的1330行:

 glob_buffer.mem_realloc(512);
  completion_hash_init(&ht, 128);
  init_alloc_root(PSI_NOT_INSTRUMENTED, &hash_mem_root, 16384, 0);
  memset(&mysql, 0, sizeof(mysql));
  if (sql_connect(current_host,current_db,current_user,opt_password,
		  opt_silent))
  {
    quick= 1;					// Avoid history
    status.exit_status= 1;
    mysql_end(-1);
  }
  if (!status.batch)
    ignore_errors=1;				// Don't abort monitor

1330行: if (sql_connect(current_host,current_db,current_user,opt_password,
          opt_silent))

接着查看sql_connect函数

sql_connect(char *host,char *database,char *user,char *password,uint silent)
{
  bool message=0;
  uint count=0;
  int error;
  for (;;)
  {
    if ((error=sql_real_connect(host,database,user,password,wait_flag)) >= 0)
    {
      if (count)
      {
	tee_fputs("\n", stderr);
	(void) fflush(stderr);
      }
      return error;
    }
    if (!wait_flag)
      return ignore_errors ? -1 : 1;
    if (!message && !silent)
    {
      message=1;
      tee_fputs("Waiting",stderr); (void) fflush(stderr);
    }
    (void) sleep(wait_time);
    if (!silent)
    {
      putc('.',stderr); (void) fflush(stderr);
      count++;
    }
  }
}

可发现目前mysql 客户端正在进行登录验证。

 

正常情况下,mysql登录如下:

#0  0x00007efe3b0b4700 in __read_nocancel () from /lib64/libpthread.so.0
#1  0x00000000004b645e in read_char (el=0x212ef40, cp=0x7ffc117afc00 L"t\x9000000") at /dbfiles/soft/mysql-5.7.31/cmd-line-utils/libedit/libedit-20190324-3.1/src/read.c:288
#2  0x00000000004b6703 in el_wgetc (el=0x212ef40, cp=0x7ffc117afc00 L"t\x9000000") at /dbfiles/soft/mysql-5.7.31/cmd-line-utils/libedit/libedit-20190324-3.1/src/read.c:410
#3  0x00000000004b6279 in read_getcmd (el=0x212ef40, cmdnum=0x7ffc117afc07 "\t", ch=0x7ffc117afc00 L"t\x9000000") at /dbfiles/soft/mysql-5.7.31/cmd-line-utils/libedit/libedit-20190324-3.1/src/read.c:234
#4  0x00000000004b6a4f in el_wgets (el=0x212ef40, nread=0x7ffc117afc74) at /dbfiles/soft/mysql-5.7.31/cmd-line-utils/libedit/libedit-20190324-3.1/src/read.c:523
#5  0x00000000004aaa45 in el_gets (el=0x212ef40, nread=0x7ffc117afc74) at /dbfiles/soft/mysql-5.7.31/cmd-line-utils/libedit/libedit-20190324-3.1/src/eln.c:75
#6  0x0000000000499fef in readline (p=0x2154090 "mysql> ") at /dbfiles/soft/mysql-5.7.31/cmd-line-utils/libedit/libedit-20190324-3.1/src/readline.c:449
#7  0x000000000040a973 in read_and_execute (interactive=true) at /dbfiles/soft/mysql-5.7.31/client/mysql.cc:2322
#8  0x0000000000409617 in main (argc=5, argv=0x210b408) at /dbfiles/soft/mysql-5.7.31/client/mysql.cc:1454
 if (!mysql_get_option(&mysql, MYSQL_OPT_PROTOCOL, &protocol) &&
      !mysql_get_option(&mysql, MYSQL_OPT_SSL_MODE, &ssl_mode))
  {
    if (protocol == MYSQL_PROTOCOL_SOCKET &&
        ssl_mode >= SSL_MODE_REQUIRED)
      put_info("You are enforcing ssl conection via unix socket. Please consider\n"
               "switching ssl off as it does not make connection via unix socket\n"
               "any more secure.", INFO_INFO);
  }
  status.exit_status= read_and_execute(!status.batch);
  if (opt_outfile)
    end_tee();
  mysql_end(0);
  DBUG_RETURN(0);				// Keep compiler happy
}

  status.exit_status= read_and_execute(!status.batch);
等待输入命令。

gdb简介

GDB 是 linux 环境下的一般功能强大的调试器,用来调试 C 或 C++ 写的程序。它可以做这些事情

  •  Start your program, specifying anything that might affect its behavior.
  •  Make your program stop on specified conditions.
  •  Examine what has happened, when your program has stopped.
  •  Change things in your program, so you can experiment with correcting the effects of one bug and go on to learn about another.

常用命令

  •  info threads:查看全部线程
  •  thread n:指定某个线程
  •  b:在某处打断点
  •  c:继续往下走
  •  s:执行一行代码,如果代码函数调用,则进入函数
  •  n:执行一行代码,函数调用不进入
  •  p:打印某个变量值
  •  list:打印代码的文本信息
  •  bt:查看某个线程的栈帧
  •  info b:查看当前所有断点信息

实际案例

进入gdb界面

[root@localhost mysql_home]# cat /dbfiles/source/debug.file

break main

run --defaults-file=/etc/my.cnf --user=mysql --gdb

gdb   -x /dbfiles/source/debug.file    /usr/local/bin/mysqld

(gdb) info thread
  Id   Target Id         Frame 
  13   Thread 0x7fffb0ff9700 (LWP 1818) "mysqld" 0x00007ffff7bca9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  12   Thread 0x7fffb17fa700 (LWP 1817) "mysqld" 0x00007ffff7bca9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  11   Thread 0x7fffb1ffb700 (LWP 1816) "mysqld" 0x00007ffff7bca9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  10   Thread 0x7fffb27fc700 (LWP 1815) "mysqld" 0x00007ffff7bca9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  9    Thread 0x7fffb2ffd700 (LWP 1814) "mysqld" 0x00007ffff7bca9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  8    Thread 0x7fffb37fe700 (LWP 1813) "mysqld" 0x00007ffff7bca9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  7    Thread 0x7fffb3fff700 (LWP 1812) "mysqld" 0x00007ffff7bca9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  6    Thread 0x7fffc0e85700 (LWP 1811) "mysqld" 0x00007ffff7bca9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  5    Thread 0x7fffc1686700 (LWP 1810) "mysqld" 0x00007ffff7bca9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  4    Thread 0x7fffc1e87700 (LWP 1809) "mysqld" 0x00007ffff7bca9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  3    Thread 0x7fffc2688700 (LWP 1808) "mysqld" 0x00007ffff7bca9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  2    Thread 0x7fffece5d700 (LWP 1807) "mysqld" 0x00007ffff62f153a in sigwaitinfo () from /lib64/libc.so.6
* 1    Thread 0x7ffff7fe3880 (LWP 1803) "mysqld" 0x00007ffff7bcde5d in nanosleep () from /lib64/libpthread.so.0

 

上一篇:栈(C语言版,尾插法,使用头节点)


下一篇:算法笔记--栈