课程目标
完成本课程的学习后,您应该能够:
索引的种类很多,目前只关注四种:
B*树的构造类似于二叉树,能根据键提供一行或一个行集的快速访问,通常只需很少的读操作就能找到正确的行。不过,需要注意重要的一点,”B*树“中的”B“不代表二叉(binary),而代表平衡(balanced)。B*树索引并不是一颗二叉树,这一点在介绍如何在磁盘上物理地存储B*树时就会了解到。
位图索引
在常见的OLAP环境中,我们有大量使用位图索引的例子。但是在OLTP中,位图索引就是应用系统的噩梦。
分区索引
Oracle使用平衡树(B-tree)存储索引以便提升数据访问速度。当不使用索引时,用户必须对数据进行顺序扫描(sequential scan)来查找指定的值。如果有 n 行数据,那么平均需要扫描的行为 n/2。因此当数据量增长时,这种方法的开销将显著增长。
如果将一个已排序的值列(list of the values)划分为多个区间(range),每个区间的末尾包含指向下个区间的指针(pointer),而搜索树(search tree)中则保存指向每个区间的指针。此时在 n 行数据中查询一个值所需的时间为 log(n)。这就是 Oracle索引的基本原理。
下图显示了平衡树索引(B-tree index)的结构:
在一个平衡树索引(B-tree index)中,最底层的索引块(叶块(leaf block))存储了被索引的数据值,以及对应的 rowid。叶块之间以双向链表的形式相互连接。位于叶块之上的索引块被称为分支块,分枝块中包含了指向下层索引块的指针。如果被索引的列存储的是字符数据,那么索引值为这些字符数据在当前数据库字符集中的二进制值。
对于唯一索引,每个索引值对应着唯一的一个 rowid。对于非唯一索引,每个索引值对应着多个已排序的 rowid。因此在非唯一索引中,索引数据是按照索引键(index key)及 rowid 共同排序的。键值(key value)全部为 NULL 的行不会被索引,只有位图索引(bitmap index)和簇索引(cluster index)例外。在数据表中,如果两个数据行的全部键值都为 NULL,也不会与唯一索引相冲突。
有两种类型的索引块:
1、用于搜索的分支块(branch block)
2、用于存储索引数据的叶块(leaf block)
分支块中存储以下信息:
1、最小的键值前缀,用于在(本块的)两个键值之间做出分支选择。
2、指向包含所查找键值的子块的指针。
包含 n 个键值的分支块含有 n+1 个指针。键值及指针的数量同时还受索引块容量的限制。
所有叶块相对于其根分支块的深度是相同的。
叶块用于存储以下信息:
1、数据行的键值(key value) 。
2、键值对应数据行的 ROWID 。
所有的 键值-ROWID 对都与其左右的兄弟节点向链接,并按照(key,ROWID)的顺序排序。
平衡树数据结构(B-tree structure)具有以下优势:
1、平衡树(B-tree)内所有叶块的深度相同,因此获取索引内任何位置的数据所需的时间大致相同。
2、平衡树索引(B-tree index)能够自动保持平。
3、平衡树内的所有块的使用容量平均在块总容量的 3/4 左右。
4、在大区间范围内进行查询时,无论匹配个别值还是搜索一个区间,平衡树都能提供较好的查询性能。
5、数据插入(insert),更新(update),及删除(delete)的效率较高,且易于维护键值的顺序。
6、大型表,小型表利用平衡树进行搜索的效率都较好,且搜索效率不会因数据增长而降低。
2.1 B*tree索引的物理结构
SQL>create table test as select * from dba_objects; SQL>create index ind_object_name on test(object_name); SQL>select index_name,s.blevel from user_indexes s where s.index_name=‘IND_OBJECT_NAME‘; INDEX_NAME BLEVEL ------------------------------ ---------- IND_OBJECT_NAME 2 SQL>select object_id from dba_objects s where s.object_name=‘IND_OBJECT_NAME‘; OBJECT_ID ---------- 73522 SQL>alter session set events ‘immediate trace name treedump level 73522‘; 会话已更改。 在trace的目录中找到对应的dump文件
转摘:http://blog.csdn.net/stevendbaguo/article/details/9037949
第一个0的意思是position within previous level block(starting at -1 except root starting
at 0),翻译root是从0开始,其他的从-1开始;
nrow:
number of all index entries
总的索引的数量;
rrows: number of current
index entries 当前索引的数量;
level: branch block level(leaf block implicitly 0)
枝节点数据块的层次,叶子节点暗含是0;
16788147(八进制)
可以转换为数据块的位置,如下所示;
0x1002ab3(十六进制),可以与16788147相互转换。
--begin tree dump branch: 0x1002ab3 16788147 (0: nrow: 3, level: 2) branch: 0x1002f04 16789252 (-1: nrow: 244, level: 1) leaf: 0x1002ab4 16788148 (-1: nrow: 187 rrow: 187) leaf: 0x1002ab5 16788149 (0: nrow: 181 rrow: 181) leaf: 0x1002ab6 16788150 (1: nrow: 183 rrow: 183) leaf: 0x1002ab7 16788151 (2: nrow: 185 rrow: 185) leaf: 0x1002ab8 16788152 (3: nrow: 187 rrow: 187) leaf: 0x1002ab9 16788153 (4: nrow: 189 rrow: 189) leaf: 0x1002aba 16788154 (5: nrow: 190 rrow: 190) ..................................省略部分内容..................................... ----- end tree dump select dbms_utility.data_block_address_file(16788154) "file", dbms_utility.data_block_address_block(16788154) "block" from dual; file block ---------- ---------- 4 10938 alter system dump datafile 4 block 10938; Dump的内容: row#0[7996] flag: ------, lock: 0, len=40 col 0; len 30; (30): 2f 31 30 30 30 33 32 33 64 5f 44 65 6c 65 67 61 74 65 49 6e 76 6f 63 61 74 69 6f 6e 48 61 col 1; len 6; (6): 01 00 26 65 00 3e row#1[7956] flag: ------, lock: 0, len=40 col 0; len 30; (30): 2f 31 30 30 30 33 32 33 64 5f 44 65 6c 65 67 61 74 65 49 6e 76 6f 63 61 74 69 6f 6e 48 61 col 1; len 6; (6): 01 00 26 65 00 3f row#2[7916] flag: ------, lock: 0, len=40 col 0; len 30; (30): 2f 31 30 30 30 33 32 33 64 5f 44 65 6c 65 67 61 74 65 49 6e 76 6f 63 61 74 69 6f 6e 48 61 col 1; len 6; (6): 01 00 2b 69 00 0c row#3[7876] flag: ------, lock: 0, len=40 col 0; len 30; (30): 2f 31 30 30 30 33 32 33 64 5f 44 65 6c 65 67 61 74 65 49 6e 76 6f 63 61 74 69 6f 6e 48 61 col 1; len 6; (6): 01 00 2b 69 00 0d row#4[7836] flag: ------, lock: 0, len=40 col 0; len 30; (30): 2f 31 30 30 30 65 38 64 31 5f 4c 69 6e 6b 65 64 48 61 73 68 4d 61 70 56 61 6c 75 65 49 74 col 1; len 6; (6): 01 00 27 08 00 32 select chr(to_number(‘2f‘,‘xx‘)) from dual; select chr(to_number(‘31‘,‘xx‘)) from dual; select chr(to_number(‘30‘,‘xx‘)) from dual; select chr(to_number(‘30‘,‘xx‘)) from dual; select chr(to_number(‘30‘,‘xx‘)) from dual; select chr(to_number(‘33‘,‘xx‘)) from dual; select chr(to_number(‘32‘,‘xx‘)) from dual; select chr(to_number(‘33‘,‘xx‘)) from dual; select chr(to_number(‘64‘,‘xx‘)) from dual; select chr(to_number(‘5f‘,‘xx‘)) from dual; select chr(to_number(‘44‘,‘xx‘)) from dual; select chr(to_number(‘65‘,‘xx‘)) from dual; 01 00 27 08 00 32 如何转换为rowid请见: http://blog.csdn.net/stevendbaguo/article/details/8215225
2.2索引统计信息
SQL> select s.index_name, s.clustering_factor, s.num_rows, s.blevel,s.leaf_blocks from user_indexes s 2 where s.table_name = ‘DIM_DEPT‘; INDEX_NAME CLUSTERING_FACTOR NUM_ROWS BLEVEL ------------------------------ ----------------- ---------- ---------- LEAF_BLOCKS ----------- IDX_DIM_DEPT3 111 889 1 3
2.3 B*Tree索引的三大特征
a.索引树的高度一般都比较低;
b.索引由索引列存储的值及rowid组成;
c.索引本身是有序的;
这些特点将会给数据库的开发设计优化的相关工作带来意想不到的好处。
索引的高度:
Drop table t1 purge; Drop table t2 purge; Drop table t3 purge; Drop table t4 purge; Drop table t5 purge; Drop table t6 purge; Drop table t7 purge; create table t1 as select rownum as id,rownum+1 as id2 from dual connect by level <=5; create table t2 as select rownum as id,rownum+1 as id2 from dual connect by level <=50; create table t3 as select rownum as id,rownum+1 as id2 from dual connect by level <=500; create table t4 as select rownum as id,rownum+1 as id2 from dual connect by level <=5000; create table t5 as select rownum as id,rownum+1 as id2 from dual connect by level <=50000; create table t6 as select rownum as id,rownum+1 as id2 from dual connect by level <=500000; create table t7 as select rownum as id,rownum+1 as id2 from dual connect by level <=5000000; Create unique index ind_id_t1 on t1(id); Create unique index ind_id_t2 on t2(id); Create unique index ind_id_t3 on t3(id); Create unique index ind_id_t4 on t4(id); Create unique index ind_id_t5 on t5(id); Create unique index ind_id_t6 on t6(id); Create unique index ind_id_t7 on t7(id); Select segment_name,bytes/1024||’K’ from user_segments where segment_name like ‘IND_ID_T%‘; Select index_name,blevel,leaf_blocks,num_rows,distinct_keys,clustering_factor from user_indexes where index_name like ‘IND_ID_T%‘;
思考题:在t4和t7用索引取一条数据,是t4快很多吗?
Set autotrace traceonly Select * from t4 where id=100; Select * from t7 where id=100; Drop index ind_id_t4; Drop index ind_id_t7; Select * from t4 where id=100; Select * from t7 where id=100;
2.4索引的存储
- Count(*)的优化
Drop table test purge; Create table test as select * from dba_objects; Create index ind_object_id on test(object_id); Set autotrace traceonly Select count(*) from test; Select count(*) from test where object_id is not null; Alter table test modify object_id not null; Select count(*) from test;
- Sum/avg的优化
Drop table test purge; Create table test as select * from dba_objects; Create index ind_object_id on test(object_id); Set autotrace traceonly Select sum(object_id) from test; Select sum(object_id)from test where object_id is not null; Alter table test modify object_id not null; Select sum(object_id)from test;
- Max/min的优化及陷阱
Drop table test purge; Create table test as select * from dba_objects; Create index ind_object_id on test(object_id); Set autotrace traceonly Select max(object_id) from test; Drop table t_max purge; Create table t_max as select * from dba_objects; Insert into t_max select * from dba_objects; Insert into t_max select * from dba_objects; Insert into t_max select * from dba_objects; Insert into t_max select * from dba_objects; Insert into t_max select * from dba_objects; Commit; Create index ind_t_max on t_max(object_id); Select count(*) from t_max; Select max(object_id) from test; Set autotrace traceonly Select min(object_id),max(object_id) from t_max; Alter table t_max modify object_id not null; Select min(object_id),max(object_id) from t_max; Select (Select min(object_id) from t_max), (Select max(object_id) from t_max) from dual;
- 索引回表与优化
drop table test purge; create table test as select *from dba_objects; create index ind_object_id on test(object_id); set autotrace traceonly select * from test where object_id <100; select object_id from test where object_id <100;
2.5索引的有序性
- Order by 排序优化
Drop table test purge; create table test as select *from dba_objects; select * from test where object_id <100 order by object_id; create index ind_object_id on test(object_id); select * from test where object_id <100 order by object_id;
- Distinct 排重优化
Drop table test purge; create table test as select *from dba_objects; Alter table test modify object_id not null; select distinct object_id from test; create index ind_object_id on test(object_id); select /*+index(test)*/distinct object_id from test;
大多数情况下用到distinct是因为表记录有重复,我们首先要考虑的是为什么要重复。
- 索引全扫描和快速扫描
Drop table test purge; create table test as select *from dba_objects; Alter table test modify object_id not null; create index ind_object_id on test(object_id); Select count(*) from test; Select object_id from test order by object_id; Select max(object_id) from test order by object_id;
- Union 合并的优化(union和union all的区别)
drop table test1 purge; Drop table test2 purge; create table test1 as select *from dba_objects where object_type=‘SYNONYM‘; create table test2 as select *from dba_objects where object_type=‘JAVA CLASS‘; Alter table test1 modify object_id not null; Alter table test2 modify object_id not null; select object_id from test1 union select object_id from test2; create index ind_object_id1 on test1(object_id); create index ind_object_id2 on test2(object_id); select object_id from test1 union select object_id from test2; select /*+index(test1)*/object_id from test1 union select /*+index(test2)*/object_id from test2; select object_id from test1 union all select object_id from test2;
在有些场景下,几个结果集是没有重复的集合进行union,此时用union all可以消除排序。