故障现象
2016.1.1号早上4点左右,zabbi数据库服务器报警,写入数据失败。登陆机器后检查发现磁盘空间使用95%没有用满,进去zabbix数据库,执行insert命令提示错误“errir 1030(HY000):got error 28 from storage engine”.
前提
zabbix数据库由于超大的写入量,我们使用tokudb存储引擎来存储,此引擎有强大的压缩比,写入性能也非常不错,适合zabbix数据库场景。
故障调查
1)检查错误日志,发现有如下
Version: '5.6.22-72.0-log' socket: '/tmp/mysql.socket' port: 3306 Percona Server (GPL), Release 72.0, Revision 738
Sun Dec 27 06:18:58 2015 TokuFT file system space is low
Sun Dec 27 06:22:58 2015 TokuFT file system space is low
Sun Dec 27 06:26:43 2015 TokuFT file system space is low
Sun Dec 27 06:30:48 2015 TokuFT file system space is low
Sun Dec 27 06:34:48 2015 TokuFT file system space is low
Sun Dec 27 06:38:43 2015 TokuFT file system space is low
Fri Jan 1 03:57:56 2016 TokuFT file system space is really low and access is restricted
Fri Jan 1 04:25:56 2016 TokuFT file system space is really low and access is restricted
Fri Jan 1 05:52:07 2016 TokuFT file system space is really low and access is restricted
Fri Jan 1 07:33:47 2016 TokuFT file system space is really low and access is restricted
在3.57的时候开始报“Fri Jan 1 03:57:56 2016 TokuFT file system space is really low and access is restricted”错误。翻译一下就是说系统磁盘空间不足了,请求被拒绝。这个时间和DB写入失败时间一致。
2)查看percona官方文档,发现有一个变量是控制磁盘剩余空间检查的
variable tokudb_fs_reserve_percent This variable controls the percentage of the file system that must be available for inserts to be allowed. By default, this is set to 5. We recommend that this reserve be at least half the size of your physical memory. See Full Disks for more information.
看到默认设置是5,也就是说磁盘剩余可用空间低于5%的时候,拒绝写入,直到释放出更多的空间
3)进一步查看full disk information,得到一个信息,“TokuDB polls the file system every five seconds to determine how much free space is available”每5秒钟去检测一次磁盘空间。
Details about the disk system:
There is a free-space reserve requirement, which is a user-configurable parameter given as a percentage of the total space in the file system. The default reserve is five percent. This value is available in the global variable tokudb_fs_reserve_percent. We recommend that this reserve be at least half the size of your physical memory.
TokuDB polls the file system every five seconds to determine how much free space is available. If the free space dips below the reserve, then further table inserts are prohibited. Any transaction that attempts to insert rows will be aborted. Inserts are re-enabled when twice the reserve is available in the file system (so freeing a small amount of disk storage will not be sufficient to resume inserts). Warning messages are sent to the system error log when free space dips below twice the reserve and again when free space dips below the reserve.
Even with inserts prohibited it is still possible for the file system to become completely full. For example this can happen because another storage engine or another application consumes disk space.
If the file system becomes completely full, then TokuDB will freeze. It will not crash, but it will not respond to most SQL commands until some disk space is made available. When TokuDB is frozen in this state, it will still respond to the following command:
4)尝试动态设置这个参数,发现是个只读参数,需要重启服务
mysql> set global tokudb_fs_reserve_percent=4;
ERROR 1238 (HY000): Variable 'tokudb_fs_reserve_percent' is a read only variable
结论
tokudb为了保障数据库服务正常,每5秒检测一次磁盘剩余空间,默认剩余5%的时候阻塞写入,直到释放更多的空间再恢复正常。通过tokudb_fs_reserve_percent变量控制剩余百分比,这是个只读变量。在INNODB,MYISAM等引擎上没有这个参数可配置,磁盘可以写到100%。大家在使用tokudb的时候不要忘记这个参数,磁盘到95%之前就要准备扩容了。