1 现象,监控显示某从库存在从库延时,登录现场查看发现 从库的io和sql线程状态为NO,从而导致从库延时报警,start slave后等待延时追上,则拓扑恢复正常
2 排查
2.1 查看error log
2021-08-31T09:14:03.272155+08:00 0 [System] [MY-010116] [Server] /usr/local/mysql-8.0.20/bin/mysqld (mysqld 8.0.20) starting as process 21742 2021-08-31T09:14:03.311179+08:00 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. 2021-08-31T09:14:16.080741+08:00 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended. 2021-08-31T09:14:22.503344+08:00 0 [System] [MY-010229] [Server] Starting XA crash recovery... 2021-08-31T09:14:22.518475+08:00 0 [System] [MY-010232] [Server] XA crash recovery finished. 2021-08-31T09:14:25.770677+08:00 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed. 2021-08-31T09:14:29.231424+08:00 0 [System] [MY-010931] [Server] /usr/local/mysql-8.0.20/bin/mysqld: ready for connections. Version: ‘8.0.20‘ socket: ‘/tmp/mysql5520.sock‘ port: 5520 MySQL Community Server - GPL. 2021-08-31T10:26:38.687781+08:00 7390 [Warning] [MY-010897] [Repl] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the ‘START SLAVE Syntax‘ in the MySQL Manual for more information. 2021-08-31T10:26:38.693608+08:00 7390 [System] [MY-010562] [Repl] Slave I/O thread for channel ‘‘: connected to master ‘replica@10.79.48.27:5520‘,replication started in log ‘mysql-bin.007688‘ at position 702634674
可以发现 mysqld进程重启了,重启后不会自动start slave,从而导致的从库延时,那为什么会重启呢
2.2 查看 系统日志 /var/log/message
Aug 31 09:13:58 dbl-10-190-1-153 dockerd: time="2021-08-31T09:13:58+08:00" level=info msg="TaskOOM event &TaskOOM{ContainerID:1900b6d67507689306ebf3c49f9f02fb8b6d6e6da82d810f75b6d1112ceb029b,}" Aug 31 09:13:58 dbl-10-190-1-153 dockerd: time="2021-08-31T09:13:58+08:00" level=info msg="TaskOOM event &TaskOOM{ContainerID:c57ea4e777e3ccdac098b018d7bf828477ca88822e2ec6f3b26a05ad450f1ffd,}" Aug 31 09:13:58 dbl-10-190-1-153 kernel: mysqld invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=999 Aug 31 09:13:58 dbl-10-190-1-153 kernel: mysqld cpuset=docker-c57ea4e777e3ccdac098b018d7bf828477ca88822e2ec6f3b26a05ad450f1ffd.scope mems_allowed=0 Aug 31 09:13:58 dbl-10-190-1-153 kernel: CPU: 1 PID: 14616 Comm: mysqld Kdump: loaded Tainted: G ------------ T 3.10.0-957.21.3.el7.x86_64 #1 Aug 31 09:13:58 dbl-10-190-1-153 kernel: Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014 Aug 31 09:13:58 dbl-10-190-1-153 kernel: Call Trace: Aug 31 09:13:58 dbl-10-190-1-153 kernel: [<ffffffff8e163107>] dump_stack+0x19/0x1b Aug 31 09:13:58 dbl-10-190-1-153 kernel: [<ffffffff8e15db2a>] dump_header+0x90/0x229 Aug 31 09:13:58 dbl-10-190-1-153 kernel: [<ffffffff8dbba386>] ? find_lock_task_mm+0x56/0xc0 Aug 31 09:13:58 dbl-10-190-1-153 kernel: [<ffffffff8dbba834>] oom_kill_process+0x254/0x3d0 Aug 31 09:13:58 dbl-10-190-1-153 kernel: [<ffffffff8dc35586>] mem_cgroup_oom_synchronize+0x546/0x570 Aug 31 09:13:58 dbl-10-190-1-153 kernel: [<ffffffff8dc34a00>] ? mem_cgroup_charge_common+0xc0/0xc0 Aug 31 09:13:58 dbl-10-190-1-153 kernel: [<ffffffff8dbbb0c4>] pagefault_out_of_memory+0x14/0x90 Aug 31 09:13:58 dbl-10-190-1-153 kernel: [<ffffffff8e15c032>] mm_fault_error+0x6a/0x157 Aug 31 09:13:58 dbl-10-190-1-153 kernel: [<ffffffff8e1707c8>] __do_page_fault+0x3c8/0x4f0 Aug 31 09:13:58 dbl-10-190-1-153 kernel: [<ffffffff8e1709d6>] trace_do_page_fault+0x56/0x150 Aug 31 09:13:58 dbl-10-190-1-153 kernel: [<ffffffff8e16ff62>] do_async_page_fault+0x22/0xf0 Aug 31 09:13:58 dbl-10-190-1-153 kernel: [<ffffffff8e16c798>] async_page_fault+0x28/0x30 Aug 31 09:13:58 dbl-10-190-1-153 kernel: Task in /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod0f6af85c_ba33_11eb_82a0_246e96b7eb9c.slice/docker-c57ea4e777e3ccdac098b018d7bf828477ca88822e2ec6f3b26a05ad450f1ffd.scope killed as a result of limit of /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod0f6af85c_ba33_11eb_82a0_246e96b7eb9c.slice Aug 31 09:13:58 dbl-10-190-1-153 kernel: memory: usage 27525120kB, limit 27525120kB, failcnt 1442269476 Aug 31 09:13:58 dbl-10-190-1-153 kernel: memory+swap: usage 27525120kB, limit 9007199254740988kB, failcnt 0 Aug 31 09:13:58 dbl-10-190-1-153 kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Aug 31 09:13:58 dbl-10-190-1-153 kernel: Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod0f6af85c_ba33_11eb_82a0_246e96b7eb9c.slice: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Aug 31 09:13:58 dbl-10-190-1-153 kernel: Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod0f6af85c_ba33_11eb_82a0_246e96b7eb9c.slice/docker-1900b6d67507689306ebf3c49f9f02fb8b6d6e6da82d810f75b6d1112ceb029b.scope: cache:0KB rss:392KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:392KB inactive_file:0KB active_file:0KB unevictable:0KB Aug 31 09:13:58 dbl-10-190-1-153 kernel: Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod0f6af85c_ba33_11eb_82a0_246e96b7eb9c.slice/docker-c57ea4e777e3ccdac098b018d7bf828477ca88822e2ec6f3b26a05ad450f1ffd.scope: cache:384KB rss:27524344KB rss_huge:0KB mapped_file:32KB swap:0KB inactive_anon:0KB active_anon:27524268KB inactive_file:132KB active_file:176KB unevictable:0KB Aug 31 09:13:58 dbl-10-190-1-153 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Aug 31 09:13:58 dbl-10-190-1-153 kernel: [14214] 1001 14214 667 81 6 0 -998 pod Aug 31 09:13:58 dbl-10-190-1-153 kernel: [29009] 0 29009 2906 156 11 0 999 mysqlha_start.s Aug 31 09:13:58 dbl-10-190-1-153 kernel: [29113] 0 29113 2906 94 10 0 999 mysqlha_start.s Aug 31 09:13:58 dbl-10-190-1-153 kernel: [29114] 0 29114 2941 170 12 0 999 mysqld_safe Aug 31 09:13:58 dbl-10-190-1-153 kernel: [ 5686] 1001 5686 8262680 6878033 13929 0 999 mysqld Aug 31 09:13:58 dbl-10-190-1-153 kernel: Memory cgroup out of memory: Kill process 16021 (mysqld) score 2000 or sacrifice child Aug 31 09:13:58 dbl-10-190-1-153 kernel: Killed process 5686 (mysqld) total-vm:33050720kB, anon-rss:27508392kB, file-rss:3740kB, shmem-rss:0kB
通过系统日志可以发现,是发生了OOM导致,Memory cgroup out of memory: Kill process 16021 (mysqld) score 2000 or sacrifice child,oom导致系统主动kill了分值最高的mysqld进程,检测到mysqld进程挂掉后,mysqld_safe将mysqld重新拉起,
2.3 为何会发生oom
此mysql实例是运行在pod中的,从
Task in /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod0f6af85c_ba33_11eb_82a0_246e96b7eb9c.slice/docker-c57ea4e777e3ccdac098b018d7bf828477ca88822e2ec6f3b26a05ad450f1ffd.scope killed as a result of limit of /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod0f6af85c_ba33_11eb_82a0_246e96b7eb9c.slice memory: usage 27525120kB, limit 27525120kB, failcnt 1442269476
可以看出,oom的原因是由于此pod达到了 cgroup的limit 引发的oom
[root@dbl-xx-xx-xx-xx.dbnorth-1.node.kubernetes /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod0f6af85c_ba33_11eb_82a0_246e96b7eb9c.slice]# cat memory.limit_in_bytes 28185722880 [root@dbl-xx-xx-xx-xx.dbnorth-1.node.kubernetes /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod0f6af85c_ba33_11eb_82a0_246e96b7eb9c.slice/docker-c57ea4e777e3ccdac098b018d7bf828477ca88822e2ec6f3b26a05ad450f1ffd.scope]# cat memory.limit_in_bytes 28185722880
可以看到pod整体和pod内的一个容器的mem limit 都是26.25G,message中的limit 27525120kB 也是26.25G,基本确定就是cgroup limit值太小导致的oom了
2.4 为什么limit这么低,实际来说此limit值不应该是26.25G,线上的配置的46G左右,这个值的同步是通过脚本实时写入cgroup的,
原因是cgroup中mem设置有以下两个限制,不满足限制的配置会无效
1,对于pod而言,存在pod级别的mem限制,具体到例子里是在 /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod0f6af85c_ba33_11eb_82a0_246e96b7eb9c.slice/memory.limit_in_bytes文件里,对于容器而言,存在容器级别的mem限制,具体到例子里是/sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod0f6af85c_ba33_11eb_82a0_246e96b7eb9c.slice/docker-c57ea4e777e3ccdac098b018d7bf828477ca88822e2ec6f3b26a05ad450f1ffd.scope/memory.limit_in_bytes 文件里,显然pod级别的mem limit要大于容器级别的mem limit 。
2,对于linux而言,cgroup中存在 memory.limit_in_bytes 和 memory.memsw.limit_in_bytes 两个文件,其中memory.limit_in_bytes 指的是物理内存的限制,而有memsw的文件指的是物理内存加交换区(即swap)的限制,所有又有memory.memsw.limit_in_bytes值必须大于memory.limit_in_bytes值
由于存在以上两个规则的限制,原始的同步脚本没有去修改memory.memsw.limit_in_bytes的值,这就导致了如果需要调大mem limit值,会出现失败,具体报错是invalid argument ,如果手动echo会报错-bash: echo: write error: Invalid argument
3 解决,修改同步脚本,修改cgroup中的值时 先修改 memory.memsw.limit_in_bytes 再修改memory.limit_in_bytes
4 总结,简单的问题部深入追查,可能就会漏过一个大bug,长久下去迟早会引发故障
ps顺便看下mem cgroup几个重要文件的含义(其余的遇到再说)
root root 0 May 22 02:50 cgroup.clone_children root root 0 May 22 02:50 cgroup.event_control root root 0 Aug 8 17:22 cgroup.procs root root 0 May 22 02:50 memory.failcnt 统计量-内存发生缺页的次数 root root 0 May 22 02:50 memory.force_empty 配置文件,不可读文件,echo 0进入此文件时,立即触发内存回收 root root 0 May 22 02:50 memory.kmem.failcnt kmem 意味着这是内核mem参数 root root 0 May 22 02:50 memory.kmem.limit_in_bytes root root 0 May 22 02:50 memory.kmem.max_usage_in_bytes root root 0 May 22 02:50 memory.kmem.slabinfo root root 0 May 22 02:50 memory.kmem.tcp.failcnt root root 0 May 22 02:50 memory.kmem.tcp.limit_in_bytes root root 0 May 22 02:50 memory.kmem.tcp.max_usage_in_bytes root root 0 May 22 02:50 memory.kmem.tcp.usage_in_bytes root root 0 May 22 02:50 memory.kmem.usage_in_bytes root root 0 Aug 31 11:56 memory.limit_in_bytes 参数-内存使用量的限制(物理内存) root root 0 Aug 31 11:52 memory.max_usage_in_bytes 统计值-内存最大使用量 root root 0 May 22 02:50 memory.memsw.failcnt 统计值-内存发生缺页的次数(物理内存+swap) root root 0 Aug 31 11:56 memory.memsw.limit_in_bytes 参数-内存使用量的限制(物理内存+swap) root root 0 May 22 02:50 memory.memsw.max_usage_in_bytes 统计值-内存最大使用量(物理内存+swap) root root 0 May 22 02:50 memory.memsw.usage_in_bytes 统计值-内存当前使用量(物理内存+swap) root root 0 May 22 02:50 memory.move_charge_at_immigrate 参数-进程迁移cgroup后,内存资源是否迁移,为1,代表进程所占资源迁移至新cgroup;为0,已用mem资源不迁移,新增mem资源在新cgroup中申请 root root 0 May 22 02:50 memory.numa_stat root root 0 May 22 02:50 memory.oom_control 参数-oom控制器,为1,禁用oom ;为0,可以oom root root 0 May 22 02:50 memory.pressure_level 配置文件-此文件不可读,用于接受时间,与内存压力等级有关 root root 0 May 22 02:50 memory.soft_limit_in_bytes 参数-内存使用量的软限制,超限后不回收,整体内存不足时优先回收超限的线程 root root 0 May 22 02:50 memory.stat 统计值-内存使用报告 root root 0 May 22 02:50 memory.swappiness 参数-内核使用swap的倾向,范围在0~100之间 root root 0 May 22 02:50 memory.usage_in_bytes 统计值-内存当前使用量 root root 0 May 22 02:50 memory.use_hierarchy 参数-为0时,代表不同层次资源限制和使用均独立,为1时,子系统使用和限制计入父系统 root root 0 May 22 02:50 notify_on_release root root 0 May 22 02:50 tasks