♣
题目 部分
在Oracle中,什么是多列统计信息(Extended Statistics)?
♣
答案部分
Oracle优化器对于基数值的估算是否准确关系到能否生成最优的执行计划,而基数值估算的准确性又取决于SQL中各个对象的统计信息是否完整、是否能真实反映出对象的数据分布情况。因此使用何种方法收集统计信息是很有讲究的:对于数据倾斜度较大的表需要收集直方图,在此基础上如果有多个列存在相关性,那么多列统计信息(也叫扩展统计信息)收集又是一个更好的选择。
在一般情况下,SQL语句的WHERE子句后面针对单张表都有多个条件,也就是根据多列的条件筛选得到数据。默认情况下,Oracle会把多列的选择率(Selectivity)相乘从而得到WHERE语句的选择率,但是这样有可能造成选择率不准确,从而导致优化器做出错误的判断。为了能够让优化器做出准确的判断,从而生成准确的执行计划,Oracle在11g数据库中引入了收集多列统计信息。多列统计信息包含列组统计信息(Column Group Statistics)和表达式的统计信息(Expression Statistics)。
使用程序包DBMS_STATS中的新函数CREATE_EXTENDED_STATS创建一个虚拟列,然后对表收集统计信息。如下所示,定义了两个扩展列:
1SELECT DBMS_STATS.CREATE_EXTENDED_STATS(OWNNAME => 'TEST', 2 TABNAME => 'T', 3 EXTENSION => '(UPPER(PAD))'), 4 DBMS_STATS.CREATE_EXTENDED_STATS(OWNNAME => 'TEST', 5 TABNAME => 'T', 6 EXTENSION => '(VAL2,VAL3)') 7 FROM DUAL;
以上SQL是对TEST用户下的T表,分别基于表达式和基于多列创建虚拟列,下次再收集表的统计信息时,将会自动收集到多列统计信息。需要注意的是,不能对SYS用户下的表创建扩展的统计信息,否则会报错“ORA-20000: Unable to create extension: not supported for SYS owned table”。
使用Oracle自带的DBMS_STATS包提供的存储过程DROP_EXTENDED_STATS来删除扩展统计信息:
1EXEC DBMS_STATS.DROP_EXTENDED_STATS(OWNNAME => 'TEST',TABNAME => 'T',EXTENSION => '(UPPER(PAD))'); 2EXEC DBMS_STATS.DROP_EXTENDED_STATS(OWNNAME => 'TEST',TABNAME => 'T',EXTENSION => '(VAL2,VAL3)');
定义扩展统计信息也可以直接在包DBMS_STATS中指定METHOD_OPT,收集统计信息时,把列组合作为单独列使用,如下所示:
1BEGIN 2 DBMS_STATS.GATHER_TABLE_STATS ( 3 OWNNAME => 'SCOTT', 4 TABNAME => 'BOOKS', 5 ESTIMATE_PERCENT=> 100, 6 METHOD_OPT => 'FOR ALL COLUMNS SIZE SKEWONLY FOR COLUMNS (HOTEL_ID,RATE_CATEGORY)', 7 CASCADE => TRUE 8 ); 9END;
在视图DBA_STAT_EXTENSIONS中,可以看到在数据库中定义的扩展统计信息:
1SQL> SELECT EXTENSION_NAME, EXTENSION 2 2 FROM DBA_STAT_EXTENSIONS 3 3 WHERE TABLE_NAME='BOOKS'; 4EXTENSION_NAME EXTENSION 5------------------------------ ------------------------------ 6SYS_STUW3MXAI1XLZHCHDYKJ9E4K90 ("HOTEL_ID","RATE_CATEGORY")
当不清楚需要创建哪些列的扩展统计信息时,可以针对一个表,基于特定的工作负荷,通过使用DBMS_STATS.SEED_COL_USAGE和REPORT_COL_USAGE来确定需要哪些列组。需要注意的是,这种技术不适用于包含表达式列的统计工作。主要过程如下所示:
1EXEC DBMS_STATS.SEED_COL_USAGE(NULL,NULL,TIME_LIMIT=>100); 2EXPLAIN PLAN FOR SQL语句; 3SELECT DBMS_STATS.REPORT_COL_USAGE(OWNNAME=>'LHR',TABNAME=>'T_ES_20170601_LHR') FROM DUAL; 4SELECT DBMS_STATS.CREATE_EXTENDED_STATS(OWNNAME=>'LHR',TABNAME=>'T_ES_20170601_LHR') FROM DUAL;
多列统计信息的一个使用示例如下所示:
首先,创建测试表:
1DROP TABLE T_ES_20170601_LHR; 2CREATE TABLE T_ES_20170601_LHR (C1 NUMBER,C2 VARCHAR2(2),C3 VARCHAR2(20)); 3DECLARE 4BEGIN 5 FOR I IN 1 .. 5000 LOOP 6 INSERT INTO T_ES_20170601_LHR VALUES (1, 'AA', DBMS_RANDOM.STRING('l', 20)); 7 INSERT INTO T_ES_20170601_LHR VALUES (2, 'BB', DBMS_RANDOM.STRING('l', 20)); 8 INSERT INTO T_ES_20170601_LHR VALUES (3, 'CC', DBMS_RANDOM.STRING('l', 20)); 9 INSERT INTO T_ES_20170601_LHR VALUES (4, 'DD', DBMS_RANDOM.STRING('l', 20)); 10 END LOOP; 11 COMMIT; 12 END; 13 / 14INSERT INTO T_ES_20170601_LHR VALUES(11,'A','AAAAAAA'); 15INSERT INTO T_ES_20170601_LHR VALUES(22,'B','BBBBBBB'); 16INSERT INTO T_ES_20170601_LHR VALUES(33,'C','CCCCCCC'); 17INSERT INTO T_ES_20170601_LHR VALUES(44,'D','DDDDDDD'); 18COMMIT;
数据分布如下所示:
1LHR@orclasm > SELECT COUNT(1) FROM T_ES_20170601_LHR; 2 COUNT(1) 3---------- 4 20004 5LHR@orclasm > SELECT C1,C2,COUNT(1) FROM T_ES_20170601_LHR GROUP BY C1,C2 ORDER BY C1; 6 C1 C2 COUNT(1) 7---------- -- ---------- 8 1 AA 5000 9 2 BB 5000 10 3 CC 5000 11 4 DD 5000 12 11 A 1 13 22 B 1 14 33 C 1 15 44 D 1 168 rows selected.
接下来收集T_ES_20170601_LHR表的统计信息,但不收集直方图的信息(收集前确认默认的ESTIMATE_PERCENT为AUTO_SAMPLE_SIZE):
1LHR@orclasm > SELECT DBMS_STATS.GET_PREFS('ESTIMATE_PERCENT',NULL,NULL) FROM DUAL; 2DBMS_STATS.GET_PREFS('ESTIMATE_PERCENT',NULL,NULL) 3----------------------------------- 4DBMS_STATS.AUTO_SAMPLE_SIZE 5 6LHR@orclasm > EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'LHR',TABNAME=>'T_ES_20170601_LHR',METHOD_OPT=>'FOR ALL COLUMNS SIZE 1'); 7PL/SQL procedure successfully completed. 8 9LHR@orclasm > SET LINESIZE 200 10LHR@orclasm > SELECT OWNER,TABLE_NAME,NUM_DISTINCT,SAMPLE_SIZE,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COL_STATISTICS WHERE OWNER='LHR' AND TABLE_NAME='T_ES_20170601_LHR'; 11OWNER TABLE_NAME NUM_DISTINCT SAMPLE_SIZE COLUMN_NAME HISTOGRAM 12------------------------------ ------------------------------ ------------ ----------- ------------------------------ --------------- 13LHR T_ES_20170601_LHR 8 20004 C1 NONE 14LHR T_ES_20170601_LHR 8 20004 C2 NONE 15LHR T_ES_20170601_LHR 20004 20004 C3 NONE
下面分别执行如下2条SQL语句,然后查看预估行数:
SELECT * FROM T_ES_20170601_LHR WHERE C1=1 AND C2='AA';
SELECT * FROM T_ES_20170601_LHR WHERE C1=11 AND C2='A';
1LHR@orclasm > SELECT COUNT(*) FROM T_ES_20170601_LHR WHERE C1=1 AND C2='AA'; 2 COUNT(*) 3---------- 4 5000 5LHR@orclasm > EXPLAIN PLAN FOR SELECT COUNT(*) FROM T_ES_20170601_LHR WHERE C1=1 AND C2='AA'; 6Explained. 7LHR@orclasm > SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY()); 8PLAN_TABLE_OUTPUT 9-------------------------------------------------------------------------------- 10Plan hash value: 3668985715 11---------------------------------------------------------------------------------------- 12| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 13---------------------------------------------------------------------------------------- 14| 0 | SELECT STATEMENT | | 1 | 6 | 27 (0)| 00:00:01 | 15| 1 | SORT AGGREGATE | | 1 | 6 | | | 16|* 2 | TABLE ACCESS FULL| T_ES_20170601_LHR | 313 | 1878 | 27 (0)| 00:00:01 | 17---------------------------------------------------------------------------------------- 18Predicate Information (identified by operation id): 19--------------------------------------------------- 20 2 - filter("C1"=1 AND "C2"='AA') 21LHR@orclasm > SELECT COUNT(*) FROM T_ES_20170601_LHR WHERE C1=11 AND C2='A'; 22 COUNT(*) 23---------- 24 1 25LHR@orclasm > EXPLAIN PLAN FOR SELECT COUNT(*) FROM T_ES_20170601_LHR WHERE C1=11 AND C2='A'; 26Explained. 27LHR@orclasm > SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY()); 28PLAN_TABLE_OUTPUT 29-------------------------------------------------------------------------------- 30Plan hash value: 3668985715 31---------------------------------------------------------------------------------------- 32| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 33---------------------------------------------------------------------------------------- 34| 0 | SELECT STATEMENT | | 1 | 6 | 27 (0)| 00:00:01 | 35| 1 | SORT AGGREGATE | | 1 | 6 | | | 36|* 2 | TABLE ACCESS FULL| T_ES_20170601_LHR | 313 | 1878 | 27 (0)| 00:00:01 | 37---------------------------------------------------------------------------------------- 38Predicate Information (identified by operation id): 39--------------------------------------------------- 40 2 - filter("C1"=11 AND "C2"='A')
可以看到有如下的结果:
SELECT * FROM T_ES_20170601_LHR WHERE C1=1 AND C2='AA';--实际返回5000条,预估313条
SELECT * FROM T_ES_20170601_LHR WHERE C1=11 AND C2='A';--实际返回1条,预估313条
在上面的两个查询中Cardinality的计算方法为:ROUND(NUM_ROWS*(1/NUM_DISTINCT_C1)*(1/NUM_DISTINCT_C2))=ROUND(20004*(1/8)*(1/8))=313,和执行计划里的313相吻合,因为没有收集列的直方图信息,所以优化器估算返回行数和实际返回行数还是有不少差距。
下面对C1、C2列收集直方图后重新执行上面两个查询:
1LHR@orclasm > EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'LHR',TABNAME=>'T_ES_20170601_LHR',METHOD_OPT=>'FOR COLUMNS C1 SIZE SKEWONLY,C2 SIZE SKEWONLY'); 2 3PL/SQL procedure successfully completed. 4 5LHR@orclasm > SELECT OWNER,TABLE_NAME,NUM_DISTINCT,DENSITY,NUM_BUCKETS,SAMPLE_SIZE,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COL_STATISTICS WHERE OWNER='LHR' AND TABLE_NAME='T_ES_20170601_LHR'; 6 7OWNER TABLE_NAME NUM_DISTINCT DENSITY NUM_BUCKETS SAMPLE_SIZE COLUMN_NAME HISTOGRAM 8------- ------------------ ------------ ---------- ----------- ----------- ------------ --------------- 9LHR T_ES_20170601_LHR 8 .000024995 8 20004 C1 FREQUENCY 10LHR T_ES_20170601_LHR 8 .000024995 8 20004 C2 FREQUENCY 11LHR T_ES_20170601_LHR 20004 .00004999 1 20004 C3 NONE
对于C1、C2列DENSITY值的计算:1/(NUM_ROWS*2)=1/(20004*2)=0.000024995
对于c2列因为没有直方图,density值是这样计算出来的:1/num_distinct_c3=0.000050155
1LHR@orclasm > COL COLUMN_NAME FORMAT A30 2LHR@orclasm > COL ENDPOINT_ACTUAL_VALUE FORMAT A50 3LHR@orclasm > SET LINESIZE 170 4LHR@orclasm > SET PAGESIZE 100 5LHR@orclasm > SELECT OWNER,TABLE_NAME,COLUMN_NAME,ENDPOINT_NUMBER,ENDPOINT_VALUE FROM DBA_TAB_HISTOGRAMS WHERE TABLE_NAME='T_ES_20170601_LHR'; 6 7OWNER TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE 8------------------------------ ------------------------------ ------------------------------ --------------- -------------- 9LHR T_ES_20170601_LHR C1 5000 1 10LHR T_ES_20170601_LHR C1 10000 2 11LHR T_ES_20170601_LHR C1 15000 3 12LHR T_ES_20170601_LHR C1 20000 4 13LHR T_ES_20170601_LHR C1 20001 11 14LHR T_ES_20170601_LHR C1 20002 22 15LHR T_ES_20170601_LHR C1 20003 33 16LHR T_ES_20170601_LHR C1 20004 44 17LHR T_ES_20170601_LHR C2 1 3.3750E+35 18LHR T_ES_20170601_LHR C2 5001 3.3882E+35 19LHR T_ES_20170601_LHR C2 5002 3.4269E+35 20LHR T_ES_20170601_LHR C2 10002 3.4403E+35 21LHR T_ES_20170601_LHR C2 10003 3.4788E+35 22LHR T_ES_20170601_LHR C2 15003 3.4924E+35 23LHR T_ES_20170601_LHR C2 15004 3.5308E+35 24LHR T_ES_20170601_LHR C2 20004 3.5446E+35 25LHR T_ES_20170601_LHR C3 0 3.3882E+35 26LHR T_ES_20170601_LHR C3 1 6.3594E+35 27 2818 rows selected.
“C1=1 AND C2='AA'”作为PREDICATE执行查询,看下这次是否CARDINALITY值会更加接近真实返回值:
1LHR@orclasm > explain plan for select count(*) from T_ES_20170601_LHR where c1=1 and c2='AA'; 2Explained. 3LHR@orclasm > select * from table(dbms_xplan.display()); 4PLAN_TABLE_OUTPUT 5------------------------------------------------------------- 6Plan hash value: 3668985715 7---------------------------------------------------------------------------------------- 8| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 9---------------------------------------------------------------------------------------- 10| 0 | SELECT STATEMENT | | 1 | 6 | 27 (0)| 00:00:01 | 11| 1 | SORT AGGREGATE | | 1 | 6 | | | 12|* 2 | TABLE ACCESS FULL| T_ES_20170601_LHR | 1250 | 7500 | 27 (0)| 00:00:01 | 13---------------------------------------------------------------------------------------- 14Predicate Information (identified by operation id): 15--------------------------------------------------- 16 2 - filter("C1"=1 AND "C2"='AA')
执行计划里的Rows预估方法为:ROUND(NUM_ROWS*(5000/20004)*(5000/20004))=ROUND(20004*0.0624)=1250,相比未收集直方图之前的313更接近于真实值5000,可见有了直方图之后的估算更加准确了。
C1=11 AND C2='A'作为PREDICATE执行查询,看下这次是否CARDINALITY值会更加接近真实返回值:
1LHR@orclasm > explain plan for select count(*) from T_ES_20170601_LHR where c1=11 and c2='A'; 2Explained. 3LHR@orclasm > select * from table(dbms_xplan.display()); 4PLAN_TABLE_OUTPUT 5------------------------------------------------------------------- 6Plan hash value: 3668985715 7---------------------------------------------------------------------------------------- 8| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 9---------------------------------------------------------------------------------------- 10| 0 | SELECT STATEMENT | | 1 | 6 | 27 (0)| 00:00:01 | 11| 1 | SORT AGGREGATE | | 1 | 6 | | | 12|* 2 | TABLE ACCESS FULL| T_ES_20170601_LHR | 1 | 6 | 27 (0)| 00:00:01 | 13---------------------------------------------------------------------------------------- 14Predicate Information (identified by operation id): 15--------------------------------------------------- 16 2 - filter("C1"=11 AND "C2"='A')
执行计划里的Rows预估方法为:NUM_ROWS*(1/20004)*(1/20004)=0.00005,近似取值为1。
可见在收集了直方图后的Cardinality值比没有直方图的情况虽然更接近真实值,但还是有不少差距,下面收集多列统计信息。多列统计信息可以根据列与列之间的相关性将相关程度高的几列划入Column Group,之后的统计信息就是基于这个Column Group进行收集。本例T_ES_20170601_LHR表里的C1、C2两个字段就具有一定的相关性,例如C1=1的字段只和C2='AA'的字段组合成一行,C1=1的字段不会和除了C2='AA'以外的值组合成一行,这就是C1、C2之间存在明显的相关性,所以C1和C2可以构成一个COLUMN GROUP来形成更精确的统计信息,对Column Group收集统计信息的方法有两种:
1、采纳系统检测工作负载后给出的建议值后收集统计,如果DBA对表里数据构成情况及表中哪些列具有相关性事先不知道的情况下可以采用这种方法,Oracle会根据当前的负载给出哪些表里的哪几个列之间存在相关性的建议,DBA如果采纳这个建议就可以在这几个列上创建出Column Group。
2、手动创建Column Group后再收集统计信息,对表中具有相关性的列心知肚明,就可以使用手动创建的方法。
下面简要介绍一下这两种方法:
方法1:采纳系统检测工作负载后给出的建议值来生成column group
这个方法里又有两种选择,既可以让Oracle针对特定的SQL语句来评估是否有创建Column Groups的必要,也可以从sql cursor cache、auto workload repository等已经生成的负载里兜取已经执行过的SQL语句来评估是否可以创建column groups。可以针对一个表,基于特定的工作负荷,通过使用DBMS_STATS.SEED_COL_USAGE和REPORT_COL_USAGE来确定需要哪些列组。当不清楚需要创建哪些列的扩展统计信息时,这个技术是非常有用的。需要注意的是,这种技术不适用于包含表达式列的统计工作。
针对“SELECT * FROM T_ES_20170601_LHR WHERE C1=1 AND C2='AA'”让Oracle生成创建Column Group的建议。
1LHR@orclasm > EXEC DBMS_STATS.SEED_COL_USAGE(NULL,NULL,TIME_LIMIT=>100); 2PL/SQL procedure successfully completed. 3LHR@orclasm > EXPLAIN PLAN FOR SELECT * FROM T_ES_20170601_LHR WHERE C1=1 AND C2='AA'; 4Explained. 5LHR@orclasm > SET LONG 20000 6LHR@orclasm > SET PAGESIZE 100 7LHR@orclasm > SELECT DBMS_STATS.REPORT_COL_USAGE(OWNNAME=>'LHR',TABNAME=>'T_ES_20170601_LHR') FROM DUAL; 8DBMS_STATS.REPORT_COL_USAGE(OWNNAME=>'LHR',TABNAME=>'T_ES_20170601_LHR') 9-------------------------------------------------------------------------------- 10LEGEND: 11....... 12 13EQ : Used in single table EQuality predicate 14RANGE : Used in single table RANGE predicate 15LIKE : Used in single table LIKE predicate 16NULL : Used in single table is (not) NULL predicate 17EQ_JOIN : Used in EQuality JOIN predicate 18NONEQ_JOIN : Used in NON EQuality JOIN predicate 19FILTER : Used in single table FILTER predicate 20JOIN : Used in JOIN predicate 21GROUP_BY : Used in GROUP BY expression 22............................................................................... 23 24############################################################################### 25 26COLUMN USAGE REPORT FOR LHR.T_ES_20170601_LHR 27............................................. 28 291. C1 : EQ 302. C2 : EQ 313. (C1, C2) : FILTER 32###############################################################################
根据上面(C1, C2):filter的建议,生成Column Group:
1LHR@orclasm > SELECT DBMS_STATS.CREATE_EXTENDED_STATS(OWNNAME=>'LHR',TABNAME=>'T_ES_20170601_LHR') FROM DUAL; 2 3DBMS_STATS.CREATE_EXTENDED_STATS(OWNNAME=>'LHR',TABNAME=>'T_ES_20170601_LHR') 4-------------------------------------------------------------------------------- 5############################################################################### 6 7EXTENSIONS FOR LHR.T_ES_20170601_LHR 8.................................... 9 101. (C1, C2) : SYS_STUF3GLKIOP5F4B0BTTCFTMX0W created 11###############################################################################
DBA_STAT_EXTENSIONS查询Column Group信息:
1LHR@orclasm > COL EXTENSION FORMAT A50 2LHR@orclasm > SET LINESIZE 170 3LHR@orclasm > SELECT * FROM DBA_STAT_EXTENSIONS WHERE TABLE_NAME='T_ES_20170601_LHR'; 4OWNER TABLE_NAME EXTENSION_NAME EXTENSION CREATOR DROPPABLE 5------ ------------------- ------------------------------ --------------- ------- ----------- 6LHR T_ES_20170601_LHR SYS_STUF3GLKIOP5F4B0BTTCFTMX0W ("C1","C2") USER YES
“SYS_STUF3GLKIOP5F4B0BTTCFTMX0W”是系统为Column Group自动生成的名称,可以把它看作表中的一个列,针对“SYS_STUF3GLKIOP5F4B0BTTCFTMX0W”列生成统计信息:
1LHR@orclasm > SET LINESIZE 170 2LHR@orclasm > COL EXTENSION FORMAT A15 3LHR@orclasm > SELECT T1.OWNER,T1.TABLE_NAME,T1.COLUMN_NAME,T2.EXTENSION,NUM_DISTINCT,SAMPLE_SIZE,HISTOGRAM FROM DBA_TAB_COL_STATISTICS T1,DBA_STAT_EXTENSIONS T2 WHERE T1.OWNER='LHR' AND T1.TABLE_NAME='T_ES_20170601_LHR' AND T1.OWNER=T2.OWNER AND T1.TABLE_NAME=T2.TABLE_NAME AND T1.COLUMN_NAME=T2.EXTENSION_NAME; 4 5no rows selected 6 7LHR@orclasm > EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'LHR',TABNAME=>'T_ES_20170601_LHR',METHOD_OPT=>'FOR COLUMNS SYS_STUF3GLKIOP5F4B0BTTCFTMX0W SIZE SKEWONLY'); 8 9PL/SQL procedure successfully completed. 10 11LHR@orclasm > SELECT T1.OWNER,T1.TABLE_NAME,T1.COLUMN_NAME,T2.EXTENSION,NUM_DISTINCT,SAMPLE_SIZE,HISTOGRAM FROM DBA_TAB_COL_STATISTICS T1,DBA_STAT_EXTENSIONS T2 WHERE T1.OWNER='LHR' AND T1.TABLE_NAME='T_ES_20170601_LHR' AND T1.OWNER=T2.OWNER AND T1.TABLE_NAME=T2.TABLE_NAME AND T1.COLUMN_NAME=T2.EXTENSION_NAME; 12 13OWNER TABLE_NAME COLUMN_NAME EXTENSION NUM_DISTINCT SAMPLE_SIZE HISTOGRAM 14------- ------------------- ------------------------------ --------------- ------------ ----------- --------------- 15LHR T_ES_20170601_LHR SYS_STUF3GLKIOP5F4B0BTTCFTMX0W ("C1","C2") 8 20004 FREQUENCY
可以看到已经为SYS_STUF3GLKIOP5F4B0BTTCFTMX0W生成了统计信息,这个统计就是多列统计(Multicolumns Statistics)或者列组统计(Column Group Statistics)
方法2:手动创建Column Group,手动创建Column Group后再通过DBMS_STATS.GATHER_TABLE_STATS收集统计
1SELECT DBMS_STATS.CREATE_EXTENDED_STATS(ownname=>'LHR',tabname=>'T_ES_20170601_LHR',extension=>'(c1,c2)') 2FROM DUAL; 3DBMS_STATS.CREATE_EXTENDED_STATS(OWNNAME=>'LHR',TABNAME=>'T_ES_20170601_LHR',EXTENSION=>'(C1,C2)') 4------------------------------------------------------------------- 5SYS_STU3RTXGYOX7NS$MIUDXQDMQ0C 6EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'LHR',TABNAME=>'T_ES_20170601_LHR',METHOD_OPT=>'FOR COLUMNS SYS_STU3RTXGYOX7NS$MIUDXQDMQ0C SIZE SKEWONLY');)
或者一步到位,直接对C1、C2列执行统计信息收集,同时也会生成Column Group
1EXEC DBMS_STATS.gather_table_stats('LHR','T_ES_20170601_LHR',method_opt=>'for columns (c1,c2) size skewonly');
先来看看对于代表组合列(c1,c2)的SYS_STUF3GLKIOP5F4B0BTTCFTMX0W列在DBA_TAB_HISTOGRAM里的数据分布情况
1LHR@orclasm > COL COLUMN_NAME FORMAT A30 2LHR@orclasm > COL ENDPOINT_ACTUAL_VALUE FORMAT A50 3LHR@orclasm > SET LINESIZE 170 4LHR@orclasm > SET PAGESIZE 100 5LHR@orclasm > SELECT OWNER,TABLE_NAME,COLUMN_NAME,ENDPOINT_NUMBER,ENDPOINT_VALUE FROM DBA_TAB_HISTOGRAMS WHERE TABLE_NAME='T_ES_20170601_LHR' AND COLUMN_NAME='SYS_STUF3GLKIOP5F4B0BTTCFTMX0W'; 6 7OWNER TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE 8------------------------------ ------------------------------ ------------------------------ --------------- -------------- 9LHR T_ES_20170601_LHR SYS_STUF3GLKIOP5F4B0BTTCFTMX0W 1 716089956 10LHR T_ES_20170601_LHR SYS_STUF3GLKIOP5F4B0BTTCFTMX0W 5001 2693090364 11LHR T_ES_20170601_LHR SYS_STUF3GLKIOP5F4B0BTTCFTMX0W 5002 3718690277 12LHR T_ES_20170601_LHR SYS_STUF3GLKIOP5F4B0BTTCFTMX0W 10002 3926166024 13LHR T_ES_20170601_LHR SYS_STUF3GLKIOP5F4B0BTTCFTMX0W 10003 5232674306 14LHR T_ES_20170601_LHR SYS_STUF3GLKIOP5F4B0BTTCFTMX0W 15003 5561960012 15LHR T_ES_20170601_LHR SYS_STUF3GLKIOP5F4B0BTTCFTMX0W 20003 5832235708 16LHR T_ES_20170601_LHR SYS_STUF3GLKIOP5F4B0BTTCFTMX0W 20004 6322890850 17 188 rows selected.
预测一下有了基于(c1、c2)的Column Groups后,SQL语句“SELECT * FROM T_ES_20170601_LHR WHERE C1=1 AND C2='AA';”的Cardinality返回值会变成:
Cardinality=NUM_ROWS*5000/20004=20004*5000/20004=5000
生成了Column Group Statistics之后再次执行一开始的那句SQL:“SELECT * FROM T_ES_20170601_LHR WHERE C1=1 AND C2='AA';”,看看是否能帮助优化器算出更精确的Cardinality:
1LHR@orclasm > EXPLAIN PLAN FOR SELECT COUNT(*) FROM T_ES_20170601_LHR WHERE C1=1 AND C2='AA'; 2Explained. 3LHR@orclasm > SET LINESIZE 150 4LHR@orclasm > SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY()); 5PLAN_TABLE_OUTPUT 6----------------------------------------------------------------------------------------------- 7Plan hash value: 3668985715 8---------------------------------------------------------------------------------------- 9| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 10---------------------------------------------------------------------------------------- 11| 0 | SELECT STATEMENT | | 1 | 6 | 27 (0)| 00:00:01 | 12| 1 | SORT AGGREGATE | | 1 | 6 | | | 13|* 2 | TABLE ACCESS FULL| T_ES_20170601_LHR | 5000 | 30000 | 27 (0)| 00:00:01 | 14---------------------------------------------------------------------------------------- 15Predicate Information (identified by operation id): 16--------------------------------------------------- 17 2 - filter("C1"=1 AND "C2"='AA')
总结:如果表中的数据倾斜度较大,那么收集直方图能最大程度的帮助优化器计算出准确的Cardinality,从而避免产生差的执行计划;再进一步,如果存在倾斜的多个列共同构成了Predicate里的等值连接且这些列间存在较强的列相关性的话,那么生成带有直方图的多列统计信息是一个上佳的选择,能够最大程度的帮助优化器准确预测出Cardinality。
& 说明:
有关多列统计信息的更多内容可以参考我的BLOG:http://blog.itpub.net/26736162/viewspace-2139297/
本文选自《Oracle程序员面试笔试宝典》,作者:小麦苗
---------------优质麦课------------
详细内容可以添加麦老师微信或QQ私聊。
About Me:小麦苗
● 本文作者:小麦苗,只专注于数据库的技术,更注重技术的运用
● 作者博客地址:http://blog.itpub.net/26736162/abstract/1/
● 本系列题目来源于作者的学习笔记,部分整理自网络,若有侵权或不当之处还请谅解
● 版权所有,欢迎分享本文,转载请保留出处
● QQ:646634621 QQ群:618766405
● 提供OCP、OCM和高可用部分最实用的技能培训
● 题目解答若有不当之处,还望各位朋友批评指正,共同进步
长按下图识别二维码或微信扫描下图二维码来关注小麦苗的微信公众号:xiaomaimiaolhr,学习最实用的数据库技术。