这是来自一个真实案例的测试,该库由于频繁的全表扫描导致直接路径读,由于DML的操作导致大量的Enquiry:KO-Fast Object Checkpoint等待事件,导致系统缓慢,问题时段会持续一个小时左右,导致系统整个缓慢,需要调优处理,我们将等待时间最长的SQL都拉出来,逐个分析,发现等待最严重的一个SQL的全表扫描无法消除,但是从字段分布看,只需要返回3行记录,但是Oracle依然选择全扫,我们通过模拟这个过程,重现问题根源,最后通过本文的最后的方法消除全表扫描,使得Oracle找到正确的执行计划。
创建测试表
create table a_t1(id number,district varchar2(70));
插入数据
begin
for i in 1..100000 loop
insert into a_t1 values(i,'ABCDEFGHIJKLMNOPQRSTABCDEFGHIJKLMNOPQRST'||i);
end loop;
commit;
end;
/
SCOTT@orcl1>select column_name,num_distinct,num_buckets,avg_col_len,histogram from user_tab_columns where table_name='A_T1';
COLUMN_NAME NUM_DISTINCT NUM_BUCKETS AVG_COL_LEN HISTOGRAM
------------------------------ ------------ ----------- ----------- ---------------
ID NONE
DISTRINCT NONE
SCOTT@orcl1>select count(distinct DISTRINCT) from a_t1;
COUNT(DISTINCT DISTRINCT)
------------------------
100000
Elapsed: 00:00:00.05
在字段DISTRINCT创建索引
SCOTT@orcl1> create index idx_at1_DISTRINCT on a_t1(DISTRINCT);
Index created.
Elapsed: 00:00:00.43
收集表统计信息
SCOTT@orcl1>select column_name,num_distinct,num_buckets,avg_col_len,histogram from user_tab_columns where table_name='A_T1';
COLUMN_NAME NUM_DISTINCT NUM_BUCKETS AVG_COL_LEN HISTOGRAM
------------------------------ ------------ ----------- ----------- ---------------
ID 100000 1 5 NONE
DISTRINCT 99568 1 46 NONE
Elapsed: 00:00:00.01
SCOTT@orcl1>select table_name,column_name,avg_col_len,num_buckets,histogram from user_tab_col_statistics where table_name='A_T1';
TABLE_NAME COLUMN_NAME AVG_COL_LEN NUM_BUCKETS HISTOGRAM
------------------------------ ------------------------------ ----------- ----------- ---------------
A_T1 ID 5 1 NONE
A_T1 DISTRINCT 46 1 NONE
Elapsed: 00:00:00.02
此时没有直方图信息,我们查询SQL
SCOTT@orcl1>select * from a_t1 where DISTRINCT='ABCDEFGHIJKLMNOPQRSTABCDEFGHIJKLMNOPQRST552';
ID DISTRINCT
---------- ----------------------------------------------------------------------
552 ABCDEFGHIJKLMNOPQRSTABCDEFGHIJKLMNOPQRST552
Elapsed: 00:00:00.02
Execution Plan
----------------------------------------------------------
Plan hash value: 1710520156
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 51 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| A_T1 | 1 | 51 | 4 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_AT1_DISTRINCT | 1 | | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("DISTRINCT"='ABCDEFGHIJKLMNOPQRSTABCDEFGHIJKLMNOPQRST552')
Statistics
----------------------------------------------------------
1 recursive calls
db block gets
5 consistent gets
physical reads
redo size
640 bytes sent via SQL*Net to client
524 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
sorts (memory)
sorts (disk)
1 rows processed
这是最优的执行计划,因为谓词字段值只有一个。下面我们收集索引列的直方图。
SCOTT@orcl1>exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'A_T1',estimate_percent=>10,method_opt=>'for all indexed columns' , cascade => true);
PL/SQL procedure successfully completed.
Elapsed: 00:00:01.55
SCOTT@orcl1> select table_name,column_name,avg_col_len,num_buckets,histogram from user_tab_col_statistics where table_name='A_T1';
TABLE_NAME COLUMN_NAME AVG_COL_LEN NUM_BUCKETS HISTOGRAM
------------------------------ ------------------------------ ----------- ----------- ---------------
A_T1 ID 5 1 NONE
A_T1 DISTRINCT 46 1 FREQUENCY
Elapsed: 00:00:00.07
此时是频率直方图,看直方图的桶数据统计
SCOTT@orcl1>select table_name,column_name,endpoint_number,endpoint_value from user_histograms where table_name='A_T1';
TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE
------------------------------ ------------------------------ --------------- --------------
A_T1 DISTRINCT 9829 3.3884E+35 <<<<都在一个桶中
A_T1 ID 1
A_T1 ID 1 100000
再看执行计划
SCOTT@orcl1>/
ID DISTRINCT
---------- ----------------------------------------------------------------------
552 ABCDEFGHIJKLMNOPQRSTABCDEFGHIJKLMNOPQRST552
Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
Plan hash value: 2172434975
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100K| 4980K| 241 (2)| 00:00:03 |
|* 1 | TABLE ACCESS FULL| A_T1 | 100K| 4980K| 241 (2)| 00:00:03 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("DISTRINCT"='ABCDEFGHIJKLMNOPQRSTABCDEFGHIJKLMNOPQRST552')
Statistics
----------------------------------------------------------
recursive calls
db block gets
822 consistent gets
physical reads
redo size
636 bytes sent via SQL*Net to client
524 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
sorts (memory)
sorts (disk)
1 rows processed
此时是错误的执行计划,CBO选择了全表扫描。
这里我们来梳理下结果,我们有一个表a_t1 字段DISTRINCT都是唯一值,通过游戏买卖地图索引找数据应该最快,谓词是"DISTRINCT"='ABCDEFGHIJKLMNOPQRSTABCDEFGHIJKLMNOPQRST552' ,所以,因为业务知道数据情况并创建了索引,这是正确的方法,但是Orale确没有走索引,而是直接选择了全表扫,这里需要分析Oracle为了不走索引,也就是索引提供的信息Oracle认为成本更高。下面我们看强制走索引的执行计划。
SCOTT@orcl1>select /*+ index(a_t1,IDX_AT1_DISTRINCT) */ * from a_t1 where DISTRINCT='ABCDEFGHIJKLMNOPQRSTABCDEFGHIJKLMNOPQRST552';
ID DISTRINCT
---------- ----------------------------------------------------------------------
552 ABCDEFGHIJKLMNOPQRSTABCDEFGHIJKLMNOPQRST552
Elapsed: 00:00:00.00
Execution Plan
----------------------------------------------------------
Plan hash value: 1710520156
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100K| 4980K| 20940 (1)| 00:04:12 |
| 1 | TABLE ACCESS BY INDEX ROWID| A_T1 | 100K| 4980K| 20940 (1)| 00:04:12 |
|* 2 | INDEX RANGE SCAN | IDX_AT1_DISTRINCT | 100K| | 1309 (1)| 00:00:16 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("DISTRINCT"='ABCDEFGHIJKLMNOPQRSTABCDEFGHIJKLMNOPQRST552')
Statistics
----------------------------------------------------------
recursive calls
db block gets
5 consistent gets
physical reads
redo size
640 bytes sent via SQL*Net to client
523 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
sorts (memory)
sorts (disk)
1 rows processed
我们看第2步骤,|* 2 | INDEX RANGE SCAN | IDX_AT1_DISTRINCT | 100K| | 1309 (1)| 00:00:16 |,也就是Oracle认为要是走索引找一行数据需要返回10万行数据,显然这里跟我们想象的有出入,我们认为索引针对该值"DISTRINCT"='ABCDEFGHIJKLMNOPQRSTABCDEFGHIJKLMNOPQRST552’只有一条符合条件记录,但是Oracle认为有10万条记录满足条件,走索引成本太高,这里我们就有理由相信Oracle这里有问题,通过MOS查找我们找到一篇文章。
Histogram of Character Columns Longer Than 32 Characters Causes Incorrect SQL Plan of Table Full Scan (Doc ID 2413826.1)
也就是当字段平均长度超过32字节,Oracle直方图只会存储前32个字节,这样的索引列当该列因为统计生成了直方图时,这个直方图只会根据字典的前32个字节生成,导致Oracle根据直方图计算执行计划时选择走全表扫描
解决方法:删除列上的直方图(与我们刚创建该表没有收集统计信息一样)
SCOTT@orcl1>exec DBMS_STATS.DELETE_COLUMN_STATS(ownname=>'SCOTT',tabname=>'A_T1',colname=>'DISTRINCT',col_stat_type=>'HISTOGRAM');
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.07
SCOTT@orcl1>select table_name,column_name,avg_col_len,num_buckets,histogram from user_tab_col_statistics where table_name='A_T1'
TABLE_NAME COLUMN_NAME AVG_COL_LEN NUM_BUCKETS HISTOGRAM
------------------------------ ------------------------------ ----------- ----------- ---------------
A_T1 ID 5 1 NONE
A_T1 DISTRINCT 46 1 NONE
再次执行该语句看执行计划,此时Oracle不再基于直方图,因为该字段的良好分布特性,走索引是Oracle最优选择,这次删除直方图后,走对了执行计划。
SCOTT@orcl1>select * from a_t1 where DISTRINCT='ABCDEFGHIJKLMNOPQRSTABCDEFGHIJKLMNOPQRST552' ;
ID DISTRINCT
---------- ----------------------------------------------------------------------
552 ABCDEFGHIJKLMNOPQRSTABCDEFGHIJKLMNOPQRST552
Elapsed: 00:00:00.00
Execution Plan
----------------------------------------------------------
Plan hash value: 1710520156
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 51 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| A_T1 | 1 | 51 | 4 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_AT1_DISTRINCT | 1 | | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("DISTRINCT"='ABCDEFGHIJKLMNOPQRSTABCDEFGHIJKLMNOPQRST552')
Statistics
----------------------------------------------------------
1 recursive calls
db block gets
5 consistent gets
physical reads
redo size
640 bytes sent via SQL*Net to client
523 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
sorts (memory)
sorts (disk)
1 rows processed