Coinciding with the new native data dictionary in MySQL 8.0, we have made a number of useful enhancements to our
INFORMATION_SCHEMA
subsystem design in MySQL 8.0. In this post I will first go through our legacy implementation as it has stood since MySQL 5.1, and then cover what’s changed.与MySQL 8.0原生数据字典一致,在MySQL 8.0的
INFORMATION_SCHEMA
子系统设计中,我们做了一些很有用的增强。在这篇文章中,我将会介绍自MySQL 5.1以来的旧的实现方式,然后介绍我们做了什么改变。Background
INFORMATION_SCHEMA
was first introduced into MySQL 5.0, as a standards compliant way of retrieving meta data from a running MySQL server. When we look at the history of INFORMATION_SCHEMA
there have been a number of complaints about the performance of certain queries, particularly in the case that there are many database objects (schemas, tables etc).INFORMATION_SCHEMA
首次引入MySQL 5.0,作为一种从正在运行的MySQL服务器检索元数据的标准兼容方式。当我们回顾INFORMATION_SCHEMA
的历史时,对于某些特定查询性能总是有很多的抱怨,特别是在有许多数据库对象(schema,表等)的情况下。In an effort to address these reported issues, since MySQL 5.1 we have made a number of performance optimizations to speed up the execution of
INFORMATION_SCHEMA
queries. The optimizations are described in the MySQL manual, and apply when the user provides an explicit schema name or table name in the query.为了解决这些上报的问题,从MySQL 5.1开始,我们进行了许多性能优化来加快
INFORMATION_SCHEMA
查询的执行速度。MySQL手册<链接1>中描述了这些优化,当用户在查询中提供显式schema名称或表名时,将会应用这些。Alas, despite these improvements
INFORMATION_SCHEMA
performance is still a major pain point for many of our users. The key reason behind these performance issues in the current INFORMATION_SCHEMA
implementation is that INFORMATION_SCHEMA
tables are implemented as temporary tables that are created on-the-fly during query execution. These temporary tables are populated via:www.weixiu3721.com
-
Meta data from files, e.g. table definitions from .FRM files.
-
Details from storage engines, e.g. dynamic table statistics.
-
Data from global data structures in the MySQL server.
尽管有这些改进,
INFORMATION_SCHEMA的
性能仍然是我们许多用户的主要痛点。在当前INFORMATION_SCHEMA
实现方式下产生的性能问题背后的关键原因是,INFORMATION_SCHEMA
表的查询实现方式是在查询执行期间创建临时表。这些临时表通过以下方式填充:-
元数据来自文件,例如:表定义来自FRM文件
-
细节来自于存储引擎,例如:动态表的统计信息
-
来自MySQL server层中全局数据结构的数据
For a MySQL server having hundreds of database, each with hundreds of tables within them, the
INFORMATION_SCHEMA
query would end-up doing lot of I/O reading each individual FRM files from the file system. And it would also end-up using more CPU cycles in effort to open the table and prepare related in-memory data structures. It does attempt to use the MySQL server table cache (the system variable ‘table_definition_cache
‘), however in large server instances it’s very rare to have a table cache that is large enough to accommodate all of these tables.对于一个MySQL实例来说可能有上百个库,每个库又有上百张表,
INFORMATION_SCHEMA
查询最终会从文件系统中读取每个单独的FRM文件,造成很多I/O读取。并且最终还会消耗更多的CPU来打开表并准备相关的内存数据结构。它确实尝试使用MySQL server层的表缓存(系统变量table_definition_cache
),但是在大型实例中,很少有一个足够大的表缓存来容纳所有的表。One can easily face the above mentioned performance issue if the optimization is not used by the
INFORMATION_SCHEMA
query. For example, let us consider the two queries below