前两天碰到了一个问题,MySQL的一张表,1220万数据量,需要删除1200万数据,仅存储20万数据,讨论了三种方案,
1. 00:00直接执行truncate,只存储新数据。
2. 将1220万中的20万采用CTAS存到一张中间表,再通过rename改这两张表的名称,实现替换操作。
3. delete删除1200万数据。
经过综合考虑,用的方案3,方案选择的过程,不多说了,我们就来说下delete删除数据的问题。
如果按照Oracle的思维,堆表是存在高水位这个问题的,High-warter mark, HWM,存储空间就像水库一样,数据就像水库中的水,水的位置是存在一条线的,这就是水位线,在数据库表刚建立的时候,由于没有任何数据,所以这个时候水位线是空的,就是说HWM为最低值,当插入了数据以后,高水位线就会上涨。这里有个特性,如果采用delete语句删除数据,数据虽然被删除了,但是高水位线却没有降低,还是刚才删除数据以前那么高的水位,就是说这条高水位线在日常的增删操作中只会上涨,不会下降,
P.S. 准确地说在自动段空间管理(ASSM)下存在Low HWM和HWM两种水位线。
高水位线影响最显著的就是全表扫描的效率,因为当进行全表扫描时,会扫描高水位线以下的所有数据块,用上述的例子说,如果1220万数据,删除了1200万,只剩下20万,当进行全表扫描的时候,不会只扫描这20万数据的数据块,他还会扫描1220万数据的数据块。
数据删除了,效率没提高,你说气人不气人?
如果是OLTP的系统,要尽量避免全表扫描,通过索引,绕开高水位线带来的问题。
回到今天的主题,Oracle中的高水位,在MySQL中究竟存在不存在?
以前没碰到过,以为是和Oracle一样的现象,但这次让我知道,两者还是存在一些差异的。
首先,我们从实验开始,MySQL下创建测试表,此时数据为空,
mysql> create table test_delete ( -> id int not null, -> col varchar(60) not null, -> primary key(id) -> ); Query OK, 0 rows affected (0.02 sec)
数据表相关信息都是0,
mysql> select table_schema, table_name, ENGINE, round(DATA_LENGTH/1024/1024+ INDEX_LENGTH/1024/1024) total_mb, TABLE_ROWS, round(DATA_LENGTH/1024/1024) data_mb, round(INDEX_LENGTH/1024/1024) index_mb, round(DATA_FREE/1024/1024) free_mb, round(DATA_FREE/DATA_LENGTH*100,2) free_ratio from information_schema.TABLES where TABLE_SCHEMA='bisal' and TABLE_NAME='test_delete'; +--------------+-------------+--------+----------+------------+---------+----------+---------+------------+ | table_schema | table_name | ENGINE | total_mb | TABLE_ROWS | data_mb | index_mb | free_mb | free_ratio | +--------------+-------------+--------+----------+------------+---------+----------+---------+------------+ | bisal | test_delete | InnoDB | 0 | 0 | 0 | 0 | 0 | 0.00 | +--------------+-------------+--------+----------+------------+---------+----------+---------+------------+ 1 row in set (0.00 sec)
Oralce中普通的表都是堆表,而MySQL的InnoDB存储引擎的表都是根据主键顺序组织存放的,又称为“索引组织表”,数据即索引,索引即数据。
利用姜老师提供的工具,看下初始化的数据表文件信息(test_delete.ibd),B-tree Node作为B树节点,同时也是数据页,
[mysql@bisal py_innodb_page_info-master]$ python py_innodb_page_info.py /mysql/3306/data/bisal/test_delete.ibd -v page offset 00000000, page type <File Space Header> page offset 00000001, page type <Insert Buffer Bitmap> page offset 00000002, page type <File Segment inode> page offset 00000003, page type <B-tree Node>, page level <0000> //数据页,page level=0,表示他为叶子节点,B+树只有1层,当前无数据 page offset 00000000, page type <Freshly Allocated Page> page offset 00000000, page type <Freshly Allocated Page> Total number of page: 6: //总共分配的页数 Freshly Allocated Page: 2 //可用的数据页 Insert Buffer Bitmap: 1 //插入缓冲页 File Space Header: 1 //文件空间头 B-tree Node: 1 //数据页 File Segment inode: 1 //文件端inonde
模拟插入110万数据,select *执行时间是0.27秒,
mysql> select count(*) from test_delete; +----------+ | count(*) | +----------+ | 1100000 | +----------+ 1 row in set (0.24 sec) mysql> select * from test_delete where name = 'X'; Empty set (0.27 sec)
数据文件是44M,
-rw-r-----. 1 mysql mysql 8.0K 18 17:46 test_delete.frm -rw-r-----. 1 mysql mysql 44M 18 18:18 test_delete.ibd
根据INNODB_SYS_TABLES和INNODB_SYS_INDEXES,了解到,
(1) TBL_SPACEID是27。
(2) TABLE_ID是43。
(3) INDEX_ID是42。
(4) PAGE_NO是3。
(5) INDEX_TYPE是3。
其中,PAGE_NO=3标识B-tree的root page是3号页,INDEX_TYPE=3是聚集索引,INDEX_TYPE取值如下:
0 = nonunique secondary index;
1 = automatically generated clustered index (GEN_CLUST_INDEX);
2 = unique nonclustered index;
3 = clustered index;
32 = full-text index;
mysql> SELECT A.SPACE AS TBL_SPACEID, A.TABLE_ID, A.NAME AS TABLE_NAME, FILE_FORMAT, ROW_FORMAT, SPACE_TYPE, B.INDEX_ID , B.NAME AS INDEX_NAME, PAGE_NO, B.TYPE AS INDEX_TYPE FROM INNODB_SYS_TABLES A LEFT JOIN INNODB_SYS_INDEXES B ON A.TABLE_ID =B.TABLE_ID WHERE A.NAME ='bisal/test_delete'; +-------------+----------+-------------------+-------------+------------+------------+----------+------------+---------+------------+ | TBL_SPACEID | TABLE_ID | TABLE_NAME | FILE_FORMAT | ROW_FORMAT | SPACE_TYPE | INDEX_ID | INDEX_NAME | PAGE_NO | INDEX_TYPE | +-------------+----------+-------------------+-------------+------------+------------+----------+------------+---------+------------+ | 27 | 43 | bisal/test_delete | Barracuda | Dynamic | Single | 42 | PRIMARY | 3 | 3 | +-------------+----------+-------------------+-------------+------------+------------+----------+------------+---------+------------+ 1 row in set (0.00 sec)
此时,数据表的total_mb和data_mb是64M,free_mb是5M,
mysql> select table_schema, table_name, ENGINE, round(DATA_LENGTH/1024/1024+ INDEX_LENGTH/1024/1024) total_mb, TABLE_ROWS, round(DATA_LENGTH/1024/1024) data_mb, round(INDEX_LENGTH/1024/1024) index_mb, round(DATA_FREE/1024/1024) free_mb, round(DATA_FREE/DATA_LENGTH*100,2) free_ratio from information_schema.TABLES where TABLE_SCHEMA='bisal' and TABLE_NAME='test_delete'; +--------------+-------------+--------+----------+------------+---------+----------+---------+------------+ | table_schema | table_name | ENGINE | total_mb | TABLE_ROWS | data_mb | index_mb | free_mb | free_ratio | +--------------+-------------+--------+----------+------------+---------+----------+---------+------------+ | bisal | test_delete | InnoDB | 64 | 1096368 | 64 | 0 | 5 | 7.86 | +--------------+-------------+--------+----------+------------+---------+----------+---------+------------+ 1 row in set (0.00 sec)
再来看下数据文件的信息,插入110万数据,一共分配了4608个页,数据页是4021个,原来的根节点(page offset=3)升级为了存储目录项的页,page offset=4开始成为了叶子节点的数据页,page level=2,说明此时B树已经有3层了,
[mysql@bisal py_innodb_page_info-master]$ python py_innodb_page_info.py /mysql/3306/data/bisal/test_delete.ibd -v | less page offset 00000000, page type <File Space Header> page offset 00000001, page type <Insert Buffer Bitmap> page offset 00000002, page type <File Segment inode> page offset 00000003, page type <B-tree Node>, page level <0002> page offset 00000004, page type <B-tree Node>, page level <0000> page offset 00000005, page type <B-tree Node>, page level <0000> page offset 00000006, page type <B-tree Node>, page level <0000> page offset 00000007, page type <B-tree Node>, page level <0000> page offset 00000008, page type <B-tree Node>, page level <0000> page offset 00000009, page type <B-tree Node>, page level <0000> page offset 0000000a, page type <B-tree Node>, page level <0000> page offset 0000000b, page type <B-tree Node>, page level <0000> page offset 0000000c, page type <B-tree Node>, page level <0000> page offset 0000000d, page type <B-tree Node>, page level <0000> page offset 0000000e, page type <B-tree Node>, page level <0000> page offset 0000000f, page type <B-tree Node>, page level <0000> page offset 00000010, page type <B-tree Node>, page level <0000> page offset 00000011, page type <B-tree Node>, page level <0000> page offset 00000012, page type <B-tree Node>, page level <0000> page offset 00000013, page type <B-tree Node>, page level <0000> page offset 00000014, page type <B-tree Node>, page level <0000> page offset 00000015, page type <B-tree Node>, page level <0000> page offset 00000016, page type <B-tree Node>, page level <0000> page offset 00000017, page type <B-tree Node>, page level <0000> page offset 00000018, page type <B-tree Node>, page level <0000> page offset 00000019, page type <B-tree Node>, page level <0000> page offset 0000001a, page type <B-tree Node>, page level <0000> page offset 0000001b, page type <B-tree Node>, page level <0000> page offset 0000001c, page type <B-tree Node>, page level <0000> page offset 0000001d, page type <B-tree Node>, page level <0000> page offset 0000001e, page type <B-tree Node>, page level <0000> page offset 0000001f, page type <B-tree Node>, page level <0000> page offset 00000020, page type <B-tree Node>, page level <0000> page offset 00000021, page type <B-tree Node>, page level <0000> page offset 00000022, page type <B-tree Node>, page level <0000> page offset 00000023, page type <B-tree Node>, page level <0000> page offset 00000024, page type <B-tree Node>, page level <0001> page offset 00000025, page type <B-tree Node>, page level <0001> page offset 00000026, page type <B-tree Node>, page level <0001> page offset 00000027, page type <B-tree Node>, page level <0001> page offset 00000000, page type <Freshly Allocated Page> ... page offset 00000000, page type <Freshly Allocated Page> Total number of page: 4608: Freshly Allocated Page: 585 Insert Buffer Bitmap: 1 File Space Header: 1 B-tree Node: 4021 File Segment inode: 1
现在执行删除的操作,delete删除110万数据,
mysql> delete from test_delete; Query OK, 1100000 rows affected (3.18 sec)
select *操作执行时间,已经秒出了,从现象上看,和我们按照Oracle思维设想数据delete删除了,所谓“高水位"没降仍会影响数据检索的效率是恰恰相反的,
mysql> select count(*) from test_delete; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.00 sec) mysql> select * from test_delete where name = 'X'; Empty set (0.00 sec)
此时,操作系统上的数据文件还是44M,这点和Oracle相同,delete操作不会主动回收操作系统文件的存储空间,
-rw-r-----. 1 mysql mysql 8.0K 18 17:46 test_delete.frm -rw-r-----. 1 mysql mysql 44M 19 18:20 test_delete.ibd
TBL_SPACEID、TABLE_ID等信息均为改动,
mysql> SELECT A.SPACE AS TBL_SPACEID, A.TABLE_ID, A.NAME AS TABLE_NAME, FILE_FORMAT, ROW_FORMAT, SPACE_TYPE, B.INDEX_ID , B.NAME AS INDEX_NAME, PAGE_NO, B.TYPE AS INDEX_TYPE FROM INNODB_SYS_TABLES A LEFT JOIN INNODB_SYS_INDEXES B ON A.TABLE_ID =B.TABLE_ID WHERE A.NAME ='bisal/test_delete'; +-------------+----------+-------------------+-------------+------------+------------+----------+------------+---------+------------+ | TBL_SPACEID | TABLE_ID | TABLE_NAME | FILE_FORMAT | ROW_FORMAT | SPACE_TYPE | INDEX_ID | INDEX_NAME | PAGE_NO | INDEX_TYPE | +-------------+----------+-------------------+-------------+------------+------------+----------+------------+---------+------------+ | 27 | 43 | bisal/test_delete | Barracuda | Dynamic | Single | 42 | PRIMARY | 3 | 3 | +-------------+----------+-------------------+-------------+------------+------------+----------+------------+---------+------------+ 1 row in set (0.01 sec)
数据表的容量,还是64M,
mysql> select table_schema, table_name, ENGINE, round(DATA_LENGTH/1024/1024+ INDEX_LENGTH/1024/1024) total_mb, TABLE_ROWS, round(DATA_LENGTH/1024/1024) data_mb, round(INDEX_LENGTH/1024/1024) index_mb, round(DATA_FREE/1024/1024) free_mb, round(DATA_FREE/DATA_LENGTH*100,2) free_ratio from information_schema.TABLES where TABLE_SCHEMA='bisal' and TABLE_NAME='test_delete'; +--------------+-------------+--------+----------+------------+---------+----------+---------+------------+ | table_schema | table_name | ENGINE | total_mb | TABLE_ROWS | data_mb | index_mb | free_mb | free_ratio | +--------------+-------------+--------+----------+------------+---------+----------+---------+------------+ | bisal | test_delete | InnoDB | 64 | 44857 | 64 | 0 | 26 | 40.89 | +--------------+-------------+--------+----------+------------+---------+----------+---------+------------+ 1 row in set (0.00 sec)
但是从数据文件,我们看到,虽然分配的页,还是4608,按照我的理解,此时虽然页存在,但是其中已经没有数据了,原来存储目录项的页(page offset=3)已经被删除,无需扫描,实际执行SQL的时候,读取这个B树,已经没什么代价了,这是和Oracle堆表受“高水位”影响最重要的区别,
[mysql@bisal py_innodb_page_info-master]$ python py_innodb_page_info.py /mysql/3306/data/bisal/test_delete.ibd -v | less page offset 00000000, page type <File Space Header> page offset 00000001, page type <Insert Buffer Bitmap> page offset 00000002, page type <File Segment inode> page offset 00000003, page type <B-tree Node>, page level <0000> page offset 00000004, page type <B-tree Node>, page level <0000> page offset 00000005, page type <B-tree Node>, page level <0000> page offset 00000006, page type <B-tree Node>, page level <0000> page offset 00000007, page type <B-tree Node>, page level <0000> page offset 00000008, page type <B-tree Node>, page level <0000> page offset 00000009, page type <B-tree Node>, page level <0000> page offset 0000000a, page type <B-tree Node>, page level <0000> page offset 0000000b, page type <B-tree Node>, page level <0000> page offset 0000000c, page type <B-tree Node>, page level <0000> page offset 0000000d, page type <B-tree Node>, page level <0000> page offset 0000000e, page type <B-tree Node>, page level <0000> page offset 0000000f, page type <B-tree Node>, page level <0000> page offset 00000010, page type <B-tree Node>, page level <0000> page offset 00000011, page type <B-tree Node>, page level <0000> page offset 00000012, page type <B-tree Node>, page level <0000> page offset 00000013, page type <B-tree Node>, page level <0000> page offset 00000014, page type <B-tree Node>, page level <0000> page offset 00000015, page type <B-tree Node>, page level <0000> page offset 00000016, page type <B-tree Node>, page level <0000> page offset 00000017, page type <B-tree Node>, page level <0000> page offset 00000018, page type <B-tree Node>, page level <0000> page offset 00000019, page type <B-tree Node>, page level <0000> page offset 0000001a, page type <B-tree Node>, page level <0000> page offset 0000001b, page type <B-tree Node>, page level <0000> page offset 0000001c, page type <B-tree Node>, page level <0000> page offset 0000001d, page type <B-tree Node>, page level <0000> page offset 0000001e, page type <B-tree Node>, page level <0000> page offset 0000001f, page type <B-tree Node>, page level <0000> page offset 00000020, page type <B-tree Node>, page level <0000> page offset 00000021, page type <B-tree Node>, page level <0000> page offset 00000022, page type <B-tree Node>, page level <0000> page offset 00000023, page type <B-tree Node>, page level <0000> page offset 00000024, page type <B-tree Node>, page level <0001> page offset 00000025, page type <B-tree Node>, page level <0001> page offset 00000026, page type <B-tree Node>, page level <0001> page offset 00000027, page type <B-tree Node>, page level <0001> page offset 00000000, page type <Freshly Allocated Page> ... page offset 00000000, page type <Freshly Allocated Page> Total number of page: 4608: Freshly Allocated Page: 585 Insert Buffer Bitmap: 1 File Space Header: 1 B-tree Node: 4021 File Segment inode: 1
Oracle也有索引组织表,即Index Organized Table,简称IOT,如果使用delete删除IOT的数据,他的现象是否和MySQL相同?
我们创建一张IOT,同样插入110万数据,
SQL> create table t_iot( ID varchar2 (10), NAME varchar2 (20), constraint pk_id primary key (ID) ) organization index; SQL> create index idx_iot_01 on t_iot(name);
执行一条SQL,用时110毫秒,
SQL> select * from test_delete where name='X'; no rows selected Elapsed: 00:00:00.11
删除所有数据,
SQL> delete from test_delete; 1100000 rows deleted. Elapsed: 00:00:07.42
再次执行SQL,用时30毫秒,这个现象和MySQL是相同的,
SQL> select * from test_delete where name='x'; no rows selected Elapsed: 00:00:00.03
虽然,执行时间和数据质量有关,未必非常准确,但是至少说明了,IOT类型的表,在使用delete删除,select执行的时间上,并不会受到“高水位”的影响。其实,从严格的意义讲,刚才这句话,以及将堆表和IOT进行比较,都是不准确的,因为对索引组织表来说,就没有堆表这种“全表扫描”的操作,他就不是个“表”,他就是个“索引",扫的都是B树,以上例中IOT的执行计划为证,用的是索引快速全扫描,即使我指定/+* full(test_delete) */,我们看到提示"FULL hint is same as INDEX_FFS for IOT",说明对索引组织表来说,全表扫描就等于索引库快速全扫描,这些都是索引组织表的数据存储结构决定的,
------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 779 | 1028 (1)| 00:00:01 | |* 1 | INDEX FAST FULL SCAN| PK_ID | 1 | 779 | 1028 (1)| 00:00:01 | ------------------------------------------------------------------------------
话再说回来,对这种索引组织表执行delete删除,虽然看着好像没什么影响,但实际上,如果有条件,还是需要做下重构的,原因就是表的数据删除了,但是文件未收缩,数据存储在文件系统上的,删除数据等这些操作,都会在页面上留下一些“空洞”,或者随机写入会导致页分裂(导致页面的利用空间更少),另外对表进行增删改会引起对应的二级索引值的随机增删改,也会导致索引结构中的数据页面上留下一些“空洞”,虽然这些空洞有可能会被重复利用,但是还可能导致部分物理空间未被使用,也就是碎片,当检索数据的时候可能就会消耗更多的IO、CPU等资源。
MySQL回收空间的操作,可能有很多种,我刚学到了两种,
方案1,optimize table操作,会锁表,但从效果看,1200万数据,生产环境1秒多,还是能接受的,具体时间取决于数据质量和环境,建议通过测试,确定具体操作,
mysql> optimize table test_delete; +-------------------+----------+----------+-------------------------------------------------------------------+ | Table | Op | Msg_type | Msg_text | +-------------------+----------+----------+-------------------------------------------------------------------+ | bisal.test_delete | optimize | note | Table does not support optimize, doing recreate + analyze instead | | bisal.test_delete | optimize | status | OK | +-------------------+----------+----------+-------------------------------------------------------------------+ 2 rows in set (0.04 sec)
此时,文件已经成96K了,
-rw-r-----. 1 mysql mysql 8.0K 19 18:22 test_delete.frm -rw-r-----. 1 mysql mysql 96K 19 18:22 test_delete.ibd
方案2,alter table,这是一种online DDL操作,但是会消耗更多的空间等资源,相当于自动创建中间表进行切换,
mysql> alter table test_delete engine=Innodb; Query OK, 0 rows affected (0.05 sec) Records: 0 Duplicates: 0 Warnings: 0
方案1和2,都是相当于对表进行了重构,执行完成,TBL_SPACEID、TABLE_ID、INDEX_ID的值都改了,
mysql> SELECT A.SPACE AS TBL_SPACEID, A.TABLE_ID, A.NAME AS TABLE_NAME, FILE_FORMAT, ROW_FORMAT, SPACE_TYPE, B.INDEX_ID , B.NAME AS INDEX_NAME, PAGE_NO, B.TYPE AS INDEX_TYPE FROM INNODB_SYS_TABLES A LEFT JOIN INNODB_SYS_INDEXES B ON A.TABLE_ID =B.TABLE_ID WHERE A.NAME ='bisal/test_delete'; +-------------+----------+-------------------+-------------+------------+------------+----------+------------+---------+------------+ | TBL_SPACEID | TABLE_ID | TABLE_NAME | FILE_FORMAT | ROW_FORMAT | SPACE_TYPE | INDEX_ID | INDEX_NAME | PAGE_NO | INDEX_TYPE | +-------------+----------+-------------------+-------------+------------+------------+----------+------------+---------+------------+ | 32 | 44 | bisal/test_delete | Barracuda | Dynamic | Single | 44 | PRIMARY | 3 | 3 | +-------------+----------+-------------------+-------------+------------+------------+----------+------------+---------+------------+ 1 row in set (0.00 sec)
确实空间都已经被回收了,
mysql> select table_schema, table_name, ENGINE, round(DATA_LENGTH/1024/1024+ INDEX_LENGTH/1024/1024) total_mb, TABLE_ROWS, round(DATA_LENGTH/1024/1024) data_mb, round(INDEX_LENGTH/1024/1024) index_mb, round(DATA_FREE/1024/1024) free_mb, round(DATA_FREE/DATA_LENGTH*100,2) free_ratio from information_schema.TABLES where TABLE_SCHEMA='bisal' and TABLE_NAME='test_delete'; +--------------+-------------+--------+----------+------------+---------+----------+---------+------------+ | table_schema | table_name | ENGINE | total_mb | TABLE_ROWS | data_mb | index_mb | free_mb | free_ratio | +--------------+-------------+--------+----------+------------+---------+----------+---------+------------+ | bisal | test_delete | InnoDB | 0 | 0 | 0 | 0 | 0 | 0.00 | +--------------+-------------+--------+----------+------------+---------+----------+---------+------------+ 1 row in set (0.00 sec)
数据文件也都回到了初始状态,
[mysql@bisal py_innodb_page_info-master]$ python py_innodb_page_info.py /mysql/3306/data/bisal/test_delete.ibd -v page offset 00000000, page type <File Space Header> page offset 00000001, page type <Insert Buffer Bitmap> page offset 00000002, page type <File Segment inode> page offset 00000003, page type <B-tree Node>, page level <0000> page offset 00000000, page type <Freshly Allocated Page> page offset 00000000, page type <Freshly Allocated Page> Total number of page: 6: Freshly Allocated Page: 2 Insert Buffer Bitmap: 1 File Space Header: 1 B-tree Node: 1 File Segment inode: 1
其他方式,例如导入导出、truncate,都可以起到相同的效果。具体用什么方案,还是得结合实际的需求,例如存在不存在停机时间?需要追加增量的数据?
这个问题,看着好像不是很复杂,但是要细琢磨,确实能找到很多模糊的知识点,可能这个就是eygle曾经说的“由点及面”。我们碰到一个问题的时候,很可能就会引申出其他的问题,不知道,不清楚,模棱两可,不能自圆其说,就说明对这个问题没理解清楚,这需要我们能静下来,踏实地研究一下,虽然过程很艰难,还可能没得到任何答案,但是日积月累,方向上正确,就会让自己得到一些提升,共勉共勉了。