TiDB统计信息

SQL执行的时候,SQL经过编译、预处理、逻辑解析和物理解析最终生成执行计划;TiDB的物理解析的时候,会根据统计信息来决定如何进行扫描,采用全表扫描还是走索引,走哪个索引等等方式; EXPLAIN查看执行计划的时候,返回的行数也是根据统计信息进行统计的;

统计信息的基本组成

  • 表级别统计信息
    • 总行数
    • 表的健康情况--统计信息的时效性
  • 列级别统计信息
    • 直方图--等深直方图
      • TiDB统计信息
      • show stats_buckets where db_name = 'test' and table_name='student' ;
    • Count-Min Sketch
      • 对列值进行HASH,统计某几位的数量,从而判断某个列值出现的频次
      • TiDB统计信息

       

    • 不同值的分布和数量
    • 空值数量

     

统计信息的搜集方法

手动更新统计信息:

统计信息搜集的时候会进行全表扫描,然后更新统计信息

TiDB统计信息

TiDB统计信息

analyze table t1;
analyze table t1 index idx_a; 
analyze table t1 index idx_a with 50 samples;

自动更新统计信息:

默认60s会做一次自动更新

TiDB统计信息

 查看统计信息搜集情况

SHOW ANALYZE STATUS;

mysql> show analyze status;
+--------------+------------+----------------+--------------------+----------------+---------------------+---------------------+----------+
| Table_schema | Table_name | Partition_name | Job_info           | Processed_rows | Start_time          | End_time            | State    |
+--------------+------------+----------------+--------------------+----------------+---------------------+---------------------+----------+
| test         | student    |                | auto analyze table |           4258 | 2022-01-19 21:14:41 | 2022-01-19 21:15:54 | finished |
| test         | student    |                | auto analyze table |           8790 | 2022-01-19 21:17:44 | 2022-01-19 21:18:57 | finished |
| test         | student    |                | auto analyze table |          18047 | 2022-01-19 21:23:44 | 2022-01-19 21:25:04 | finished |
| test         | student2   |                | auto analyze table |         100100 | 2022-01-19 21:32:44 | 2022-01-19 21:33:47 | finished |
| test         | student    |                | auto analyze table |          37008 | 2022-01-19 21:36:44 | 2022-01-19 21:37:57 | finished |
+--------------+------------+----------------+--------------------+----------------+---------------------+---------------------+----------+

查看表的元信息

mysql> show stats_meta where db_name = 'test';
+---------+------------+----------------+---------------------+--------------+-----------+
| Db_name | Table_name | Partition_name | Update_time         | Modify_count | Row_count |
+---------+------------+----------------+---------------------+--------------+-----------+
| test    | hero       |                | 2022-01-11 21:28:44 |            0 |         0 |
| test    | student2   |                | 2022-01-20 17:37:04 |            0 |    100100 |
| test    | student    |                | 2022-01-23 16:16:34 |            0 |    100098 |
| test    | t1         |                | 2022-01-23 18:17:59 |            0 |     10000 |
| test    | t2         |                | 2022-01-23 17:45:27 |            0 |     10000 |
+---------+------------+----------------+---------------------+--------------+-----------+
5 rows in set (0.00 sec)

mysql> 

查看表的健康度

健康度表示表中未被修改的行数/总行数;如果健康度低于tidb_auto_analyze_ratio的时候就会触发自动搜集

mysql> show stats_healthy where db_name ='test';
+---------+------------+----------------+---------+
| Db_name | Table_name | Partition_name | Healthy |
+---------+------------+----------------+---------+
| test    | hero       |                |     100 |
| test    | student2   |                |     100 |
| test    | student    |                |     100 |
| test    | t1         |                |     100 |
| test    | t2         |                |     100 |
+---------+------------+----------------+---------+
5 rows in set (0.00 sec)

查看列的元信息

mysql> show stats_histograms where db_name = 'test' and table_name='student' ;
+---------+------------+----------------+-------------+----------+---------------------+----------------+------------+--------------+----------------------+
| Db_name | Table_name | Partition_name | Column_name | Is_index | Update_time         | Distinct_count | Null_count | Avg_col_size | Correlation          |
+---------+------------+----------------+-------------+----------+---------------------+----------------+------------+--------------+----------------------+
| test    | student    |                | id          |        0 | 2022-01-23 16:16:34 |          98608 |          0 |            9 |  0.06518354865014864 |
| test    | student    |                | name        |        0 | 2022-01-23 16:16:34 |         100098 |          0 |        14.89 | 0.056038873957110595 |
| test    | student    |                | age         |        0 | 2022-01-23 16:16:34 |          99360 |          0 |            9 |  0.06518321062603816 |
| test    | student    |                | PRIMARY     |        1 | 2022-01-23 16:16:34 |          98608 |          0 |            0 |                    0 |
| test    | student    |                | idx_age     |        1 | 2022-01-23 16:16:34 |          99360 |          0 |            0 |                    0 |
| test    | student    |                | idx_name    |        1 | 2022-01-23 16:16:34 |         100098 |          0 |            0 |                    0 |
+---------+------------+----------------+-------------+----------+---------------------+----------------+------------+--------------+----------------------+
6 rows in set (0.00 sec)

导入导出统计信息

如果实验环境需要模拟生产环境中的执行计划,来进行调优,因为生产环境数据量太大,可以考虑导出统计信息;

导出当前统计信息

导出某一个时间的统计信息

导入统计信息

TiDB统计信息

 删除统计信息:

drop stats dbname.tbname

上一篇:蓝桥杯单片机——数码管的静态显示(3)


下一篇:MD5 加密(MessageDigest)