Innodb索引优化管理
- 索引优化的实验效果需要在一张表中有海量数据的情况下,实验效果才会比较明显,我们在实验前需要先准备一张300万条数据的表。
# 创建表格
create table t1(
id int,
name varchar(20),
gender char(6),
email varchar(50)
);
# 创建存储过程,插入3000000条数据
delimiter $$
create procedure auto_insert1()
BEGIN
declare i int default 1;
while(i<3000000)do
insert into t1 values(i,‘nana‘,‘male‘,concat(‘nana‘,i,‘@haha‘));
set i=i+1;
end while;
END$$
delimiter ;
# 查看存储过程
show create procedure auto_insert1\G
# 调用存储过程
call auto_insert1();
查看数据文件大小
# 插入三百万条数据,大概大小为188M
[root@db01 db01]# ll -h /service/mysql/data/db01
total 189M
-rw-rw---- 1 mysql mysql 67 Jul 14 15:38 db.opt
-rw-rw---- 1 mysql mysql 8.5K Jul 14 15:39 t1.frm
-rw-rw---- 1 mysql mysql 188M Jul 14 16:06 t1.ibd
# 查看表格式内容
mysql> select * from t1 where id<3;
+------+------+--------+------------+
| id | name | gender | email |
+------+------+--------+------------+
| 1 | nana | male | nana1@haha |
| 2 | nana | male | nana2@haha |
+------+------+--------+------------+
2 rows in set (0.95 sec)
# 查看表属性
mysql> desc t1;
+--------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| id | int(11) | YES | | NULL | |
| name | varchar(20) | YES | | NULL | |
| gender | char(6) | YES | | NULL | |
| email | varchar(50) | YES | | NULL | |
+--------+-------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
索引命中也未必会加速
关于等值查询
1. 以重复度低的字段为基础创建索引,加速效果明显
mysql> select count(id) from t1 where id = 33;
+-----------+
| count(id) |
+-----------+
| 1 |
+-----------+
1 row in set (0.83 sec)
# explain(查询优化神器)的rows值越小,查询效率越高
# explain查询sql语句的执行计划,rows为2990010,速度慢
mysql> explain select count(id) from t1 where id = 33;
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
| 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 2990010 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
1 row in set (0.00 sec)
# 创建主键索引
mysql> create index xxx on t1(id);
Query OK, 0 rows affected (3.78 sec)
Records: 0 Duplicates: 0 Warnings: 0
# explain查询sql语句的执行计划,rows为1,速度快
mysql> explain select count(id) from t1 where id = 33;
+----+-------------+-------+------+---------------+------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+-------+------+-------------+
| 1 | SIMPLE | t1 | ref | xxx | xxx | 5 | const | 1 | Using index |
+----+-------------+-------+------+---------------+------+---------+-------+------+-------------+
1 row in set (0.00 sec)
mysql> desc t1;
+--------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| id | int(11) | YES | MUL | NULL | |
| name | varchar(20) | YES | | NULL | |
| gender | char(6) | YES | | NULL | |
| email | varchar(50) | YES | | NULL | |
+--------+-------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
# 命中主键索引后查询速度得到了优化
mysql> select count(id) from t1 where id = 33;
+-----------+
| count(id) |
+-----------+
| 1 |
+-----------+
1 row in set (0.00 sec)
2. 以重复度高的字段为基础创建索引,加速效果不明显
# 查看无索引字段name的sql语句执行计划,rows为2990010,速度慢
mysql> explain select count(id) from t1 where name="nana";
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
| 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 2990010 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
1 row in set (0.00 sec)
# 创建name字段的辅助索引
mysql> create index yyy on t1(name);
Query OK, 0 rows affected (5.14 sec)
Records: 0 Duplicates: 0 Warnings: 0
# 给name创建了普通索引,我们发现查看name字段的执行计划查询速度并没有得到很好的优化,rows为1495005,速度慢
mysql> explain select count(id) from t1 where name="nana";
+----+-------------+-------+------+---------------+------+---------+-------+---------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+-------+---------+-----------------------+
| 1 | SIMPLE | t1 | ref | yyy | yyy | 83 | const | 1495005 | Using index condition |
+----+-------------+-------+------+---------------+------+---------+-------+---------+-----------------------+
1 row in set (0.00 sec)
# 我们发现命中了辅助索引name字段后,查询的速度依然很慢。
# 原因是因为我们的name字段,数据全部都是nana。
# 这时候我们给name字段添加普通索引,该辅助索引的key值全部都是nana,不但没有优化查询效率,反而使查询效率变得更低了。
mysql> select count(id) from t1 where name="nana";
+-----------+
| count(id) |
+-----------+
| 2999999 |
+-----------+
1 row in set (3.73 sec)
# 我们通过explain查询sql语句的执行计划,查看name字段不等于nana的值,查询速度得到了优化,rows为2,速度快
mysql> explain select count(id) from t1 where name!="nana";
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
| 1 | SIMPLE | t1 | range | yyy | yyy | 83 | NULL | 2 | Using index condition |
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
1 row in set (0.00 sec)
# 命中索引重复度高的字段,取反的情况下,查询效率的得到了优化
mysql> select count(id) from t1 where name!="nana";
+-----------+
| count(id) |
+-----------+
| 0 |
+-----------+
1 row in set (0.00 sec)
3. 以占用空间大的字段为基础创建索引,加速效果不明显
mysql> select count(id) from t1 where email="nana1562562@haha";
+-----------+
| count(id) |
+-----------+
| 1 |
+-----------+
1 row in set (1.16 sec)
# 创建普通索引
mysql> create index zzz on t1(email);
Query OK, 0 rows affected (6.48 sec)
Records: 0 Duplicates: 0 Warnings: 0
# 以占用空间大的字段为基础创建索引,理论上加速效果是不明显的。
# 但是我们email字段的数据占用的空间不够大,所以实验效果不佳。
mysql> select count(id) from t1 where email="nana1562562@haha";
+-----------+
| count(id) |
+-----------+
| 1 |
+-----------+
1 row in set (0.00 sec)
关于范围查询
mysql> desc t1;
+--------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| id | int(11) | YES | MUL | NULL | |
| name | varchar(20) | YES | MUL | NULL | |
| gender | char(6) | YES | | NULL | |
| email | varchar(50) | YES | MUL | NULL | |
+--------+-------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
1. innodb存储引擎能够加速范围查询,但是查询范围越大,加速效果越不明显
# id>33的数据范围很大,速度慢
mysql> explain select count(id) from t1 where id>33;
+----+-------------+-------+-------+---------------+------+---------+------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+---------+--------------------------+
| 1 | SIMPLE | t1 | range | xxx | xxx | 5 | NULL | 1372191 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+---------+--------------------------+
1 row in set (0.00 sec)
# id>33 and id<50的数据范围不大,速度快
mysql> explain select count(id) from t1 where id>33 and id<50;
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| 1 | SIMPLE | t1 | range | xxx | xxx | 5 | NULL | 15 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
1 row in set (0.00 sec)
# id between 33 and 50 等同于33<=id<=50数据
# 范围不大,速度快
mysql> explain select count(id) from t1 where id between 33 and 50;
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| 1 | SIMPLE | t1 | range | xxx | xxx | 5 | NULL | 18 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
1 row in set (0.00 sec)
# id>33 and id<100000数据范围大,速度慢
mysql> explain select count(id) from t1 where id>33 and id<100000;
+----+-------------+-------+-------+---------------+------+---------+------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+--------+--------------------------+
| 1 | SIMPLE | t1 | range | xxx | xxx | 5 | NULL | 192494 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+--------+--------------------------+
1 row in set (0.00 sec)
# 查看命中id字段的sql语句查询计划,id!=1000数据,是一个很大的范围,rows为1373188
mysql> explain select count(id) from t1 where id!=1000;
+----+-------------+-------+-------+---------------+------+---------+------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+---------+--------------------------+
| 1 | SIMPLE | t1 | range | xxx | xxx | 5 | NULL | 1373188 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+---------+--------------------------+
1 row in set (0.00 sec)
# like模糊查询
# like指定一个明确的值,速度依然很快
mysql> explain select count(*) from t1 where email like "xxx";
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| 1 | SIMPLE | t1 | range | zzz | zzz | 203 | NULL | 1 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
1 row in set (0.00 sec)
# like匹配字符中有%,但是处于末尾,速度依然很快
mysql> explain select count(*) from t1 where email like "xxx%";
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| 1 | SIMPLE | t1 | range | zzz | zzz | 203 | NULL | 1 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
1 row in set (0.00 sec)
# like匹配字符中有%,但是处于开头,速度慢
mysql> explain select count(*) from t1 where email like "%xxx";
+----+-------------+-------+-------+---------------+------+---------+------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+---------+--------------------------+
| 1 | SIMPLE | t1 | index | NULL | zzz | 203 | NULL | 2744383 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+---------+--------------------------+
1 row in set (0.00 sec)
# 结论
> >=
< <=
!=
between and ( 等同于>= and <= )
like 后的内容应该尽量将%往右放,并且左半部分的内容应该尽量精确
关于条件字段参与运算
mysql> desc t1;
+--------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| id | int(11) | YES | MUL | NULL | |
| name | varchar(20) | YES | MUL | NULL | |
| gender | char(6) | YES | | NULL | |
| email | varchar(50) | YES | MUL | NULL | |
+--------+-------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
1. 关于条件字段参与运算
# 该sql语句是将id字段的数据当成条件遍历出来,依次乘12,再打印等值(命中主键索引)。速度慢
# 大量的时间花在了运算上
mysql> explain select count(id) from t1 where id*12=10000;
+----+-------------+-------+-------+---------------+------+---------+------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+---------+--------------------------+
| 1 | SIMPLE | t1 | index | NULL | xxx | 5 | NULL | 2744383 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+---------+--------------------------+
1 row in set (0.00 sec)
# 调整上一条sql语句的逻辑,先运算,再做等值查询(命中主键索引)。速度快
mysql> explain select count(id) from t1 where id=10000/12;
+----+-------------+-------+------+---------------+------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+-------+------+-------------+
| 1 | SIMPLE | t1 | ref | xxx | xxx | 5 | const | 1 | Using index |
+----+-------------+-------+------+---------------+------+---------+-------+------+-------------+
1 row in set (0.00 sec)
# 先将sql语句id字段进行遍历当成条件,传递给函数进行代码逻辑的运行,再命中主键索引
# 大量的时间花在了代码逻辑上
select count(id) from t1 where func(id) = 10000/12;
# 结论
关于条件字段参与运算的查询:不要让条件字段参与运算,或者说传递给函数
如果生产环境中碰到需要将条件字段参与运算或者传递给其他函数的情况,尽量让开发换一种逻辑去写代码。
因为出现让条件字段参与运算这种情况,会直接导致数据库查询效率变慢。
索引下推技术
索引下推技术默认是开启的
# 删除索引xxx
mysql> drop index xxx on t1;
Query OK, 0 rows affected (0.01 sec)
Records: 0 Duplicates: 0 Warnings: 0
# 删除索引yyy
mysql> drop index yyy on t1;
Query OK, 0 rows affected (0.00 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> desc t1;
+--------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| id | int(11) | YES | | NULL | |
| name | varchar(20) | YES | | NULL | |
| gender | char(6) | YES | | NULL | |
| email | varchar(50) | YES | MUL | NULL | |
+--------+-------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
# and查询
# 对于连续多个and: mysql会按照联合索引,从左到右的顺序找一个区分度高的索引字段(这样便可以快速锁定很小的范围),加速查询。
# 区分度最高的是email(命中普通索引),所以制作联合索引为(email,name,gender)
mysql> explain select count(id) from t1 where name="nana" and email="nana666@haha" and gender="male";
+----+-------------+-------+------+---------------+------+---------+-------+------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+-------+------+------------------------------------+
| 1 | SIMPLE | t1 | ref | zzz | zzz | 203 | const | 1 | Using index condition; Using where |
+----+-------------+-------+------+---------------+------+---------+-------+------+------------------------------------+
# or查询
# 对于连续多个or: mysql会按照条件的顺序,从左到右依次判断,即联合索引为(name,email,gender),并不会加速查询
mysql> explain select count(id) from t1 where name="nana" or email="nana666@haha" or gender="male";
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
| 1 | SIMPLE | t1 | ALL | zzz | NULL | NULL | NULL | 2744383 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
1 row in set (0.00 sec)
# and 与 or的逻辑
and连接的多个条件属于小范围,or连接的多个条件属于大范围。
对于连续多个and:
mysql会按照联合索引,从左到右的顺序找一个区分度高的索引字段(这样可以快速锁定很小的范围),加速查询。
对于连续多个or:
mysql会按照条件的顺序,从左到右依次判断,并不会加速查询
联合索引与最左前缀匹配规则
条件中需要用到多个字段,并且多次查询中的多个字段都包含某一个字段,可以创建联合索引
mysql> drop index zzz on t1;
Query OK, 0 rows affected (0.00 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> desc t1;
+--------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| id | int(11) | YES | | NULL | |
| name | varchar(20) | YES | | NULL | |
| gender | char(6) | YES | | NULL | |
| email | varchar(50) | YES | | NULL | |
+--------+-------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
# 重复度低且占用空间较小的字段应该尽量往左放,让其成为最左前缀
# 创建联合索引,索引创建为最左边的字段email
mysql> create index xxx on t1(email,name,gender);
Query OK, 0 rows affected (13.74 sec)
Records: 0 Duplicates: 0 Warnings: 0
# 查询多个字段,命中普通索引email字段,加速查询
mysql> explain select count(id) from t1 where name="nana" and email="nana666@haha" and gender="male";
+----+-------------+-------+------+---------------+------+---------+-------------------+------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+-------------------+------+-----------------------+
| 1 | SIMPLE | t1 | ref | xxx | xxx | 311 | const,const,const | 1 | Using index condition |
+----+-------------+-------+------+---------------+------+---------+-------------------+------+-----------------------+
1 row in set (0.00 sec)
# 查询多个字段,命中普通索引email字段,加速查询
mysql> explain select count(id) from t1 where name="nana" and email="nana666@haha";
+----+-------------+-------+------+---------------+------+---------+-------------+------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+-------------+------+-----------------------+
| 1 | SIMPLE | t1 | ref | xxx | xxx | 286 | const,const | 1 | Using index condition |
+----+-------------+-------+------+---------------+------+---------+-------------+------+-----------------------+
1 row in set (0.00 sec)
# 查询单个字段,命中普通索引email字段,加速查询
mysql> explain select count(id) from t1 where email="nana666@haha";
+----+-------------+-------+------+---------------+------+---------+-------+------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+-------+------+-----------------------+
| 1 | SIMPLE | t1 | ref | xxx | xxx | 203 | const | 1 | Using index condition |
+----+-------------+-------+------+---------------+------+---------+-------+------+-----------------------+
1 row in set (0.00 sec)
# 查看单个字段name,没有命中索引email,查询不加速
mysql> explain select count(id) from t1 where name="nana";
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
| 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 2744383 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
1 row in set (0.00 sec)
# 总结
什么时候需要创建联合索引?
条件中需要用到多个字段,并且多次查询中的多个字段都包含某一个字段
创建联合索引需要注意的问题:
重复度低且占用空间比较小的字段应该尽量往左放,让其称为最左前缀
使用联合索引,满足加速条件的规则
使用select查询数据的时候,where条件中必须包含最左前缀的字段,命中索引后才会实现加速
总结
1、在创建索引的时候,会把该列所有的数据按照b+tree的方式进行排序
2、为常作为查询条件的字段建立索引
3、限制索引的数目,不要每列都创建索引 每个索引都需要占用磁盘空间 索引越多,需要的磁盘空间就会越大,修改表的时候,对索引的重构和更新会很麻烦,越多的索引,会让更新表变得浪费时间。
4、在同一字段上,尽量避免创建多个索引,可以创建多个但是它们是有优先级的,先走一个就不会再走另一个索引了
5、避免很大的字段建索引,在数据很长的列上创建前缀索引
6、如果可以创建唯一索引,就创建唯一索引,查询速度更快
7、不要对重复度高的字段创建索引
8、索引不要参与计算
9、为经常要排序,分组,联合操作的列,创建联合索引经常需要order by、group by、distinct和union等操作的字段,排序操作会浪费很多时间。 如果为其建立索引,可以有效地避免排序操作
10、尽量使用前缀来索引创建索引的时候,可以给该列所有数据进行排序
11、删除不再使用或者很少使用的索引 表中的数据被大量更新,或者数据的使用方式被改变后,原有的一些索引可能不再需要。数据库管理员应当定期找出这些索引,将它们删除,从而减少索引对更新操作的影响。
12、避免使用select *
13、count(1) 或 count(列) 代替 count(*), ps:mysql中没有差别了
14、创建表时尽量时 varchar 代替 char
15、表的字段顺序固定长度的字段优先
16、使用连接(JOIN)来代替子查询(Sub-Queries)
17、连表时注意条件类型需一致