SQL执行的时候,SQL经过编译、预处理、逻辑解析和物理解析最终生成执行计划;TiDB的物理解析的时候,会根据统计信息来决定如何进行扫描,采用全表扫描还是走索引,走哪个索引等等方式; EXPLAIN查看执行计划的时候,返回的行数也是根据统计信息进行统计的;
统计信息的基本组成
- 表级别统计信息
- 总行数
- 表的健康情况--统计信息的时效性
- 列级别统计信息
- 直方图--等深直方图
- show stats_buckets where db_name = 'test' and table_name='student' ;
- Count-Min Sketch
- 对列值进行HASH,统计某几位的数量,从而判断某个列值出现的频次
- 不同值的分布和数量
- 空值数量
- 直方图--等深直方图
统计信息的搜集方法
手动更新统计信息:
统计信息搜集的时候会进行全表扫描,然后更新统计信息
analyze table t1;
analyze table t1 index idx_a;
analyze table t1 index idx_a with 50 samples;
自动更新统计信息:
默认60s会做一次自动更新
查看统计信息搜集情况
SHOW ANALYZE STATUS;
mysql> show analyze status;
+--------------+------------+----------------+--------------------+----------------+---------------------+---------------------+----------+
| Table_schema | Table_name | Partition_name | Job_info | Processed_rows | Start_time | End_time | State |
+--------------+------------+----------------+--------------------+----------------+---------------------+---------------------+----------+
| test | student | | auto analyze table | 4258 | 2022-01-19 21:14:41 | 2022-01-19 21:15:54 | finished |
| test | student | | auto analyze table | 8790 | 2022-01-19 21:17:44 | 2022-01-19 21:18:57 | finished |
| test | student | | auto analyze table | 18047 | 2022-01-19 21:23:44 | 2022-01-19 21:25:04 | finished |
| test | student2 | | auto analyze table | 100100 | 2022-01-19 21:32:44 | 2022-01-19 21:33:47 | finished |
| test | student | | auto analyze table | 37008 | 2022-01-19 21:36:44 | 2022-01-19 21:37:57 | finished |
+--------------+------------+----------------+--------------------+----------------+---------------------+---------------------+----------+
查看表的元信息
mysql> show stats_meta where db_name = 'test';
+---------+------------+----------------+---------------------+--------------+-----------+
| Db_name | Table_name | Partition_name | Update_time | Modify_count | Row_count |
+---------+------------+----------------+---------------------+--------------+-----------+
| test | hero | | 2022-01-11 21:28:44 | 0 | 0 |
| test | student2 | | 2022-01-20 17:37:04 | 0 | 100100 |
| test | student | | 2022-01-23 16:16:34 | 0 | 100098 |
| test | t1 | | 2022-01-23 18:17:59 | 0 | 10000 |
| test | t2 | | 2022-01-23 17:45:27 | 0 | 10000 |
+---------+------------+----------------+---------------------+--------------+-----------+
5 rows in set (0.00 sec)
mysql>
查看表的健康度
健康度表示表中未被修改的行数/总行数;如果健康度低于tidb_auto_analyze_ratio的时候就会触发自动搜集
mysql> show stats_healthy where db_name ='test';
+---------+------------+----------------+---------+
| Db_name | Table_name | Partition_name | Healthy |
+---------+------------+----------------+---------+
| test | hero | | 100 |
| test | student2 | | 100 |
| test | student | | 100 |
| test | t1 | | 100 |
| test | t2 | | 100 |
+---------+------------+----------------+---------+
5 rows in set (0.00 sec)
查看列的元信息
mysql> show stats_histograms where db_name = 'test' and table_name='student' ;
+---------+------------+----------------+-------------+----------+---------------------+----------------+------------+--------------+----------------------+
| Db_name | Table_name | Partition_name | Column_name | Is_index | Update_time | Distinct_count | Null_count | Avg_col_size | Correlation |
+---------+------------+----------------+-------------+----------+---------------------+----------------+------------+--------------+----------------------+
| test | student | | id | 0 | 2022-01-23 16:16:34 | 98608 | 0 | 9 | 0.06518354865014864 |
| test | student | | name | 0 | 2022-01-23 16:16:34 | 100098 | 0 | 14.89 | 0.056038873957110595 |
| test | student | | age | 0 | 2022-01-23 16:16:34 | 99360 | 0 | 9 | 0.06518321062603816 |
| test | student | | PRIMARY | 1 | 2022-01-23 16:16:34 | 98608 | 0 | 0 | 0 |
| test | student | | idx_age | 1 | 2022-01-23 16:16:34 | 99360 | 0 | 0 | 0 |
| test | student | | idx_name | 1 | 2022-01-23 16:16:34 | 100098 | 0 | 0 | 0 |
+---------+------------+----------------+-------------+----------+---------------------+----------------+------------+--------------+----------------------+
6 rows in set (0.00 sec)
导入导出统计信息
如果实验环境需要模拟生产环境中的执行计划,来进行调优,因为生产环境数据量太大,可以考虑导出统计信息;
导出当前统计信息
导出某一个时间的统计信息
导入统计信息
删除统计信息:
drop stats dbname.tbname