rac数据库默认sql tuning advisor,导致大量library cache lock

rac数据库默认sql tuning advisor,导致大量library cache lock

问题现象:
客户反映周六周日固定十点钟,一个程序会特别慢(大概10分钟),平时1到2秒。
查看当时的日志发现:
DBMS_STATS: GATHER_STATS_JOB encountered errors.  Check the trace file.
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_j002_51847.trc:
ORA-04021: timeout occurred while waiting to lock object 
查看trc文件发现:
ORA-04021: timeout occurred while waiting to lock object 
*** 2015-09-20 10:17:51.129
DBMS_STATS: GATHER_STATS_JOB: GATHER_TABLE_STATS('"CUSTOMERUSER"','"CUSTOM_TABLE"','"CUSTOMER_201509"', ...)
DBMS_STATS: ORA-04021: timeout occurred while waiting to lock object

工程师根据经验判断是锁导致,一下是解决方法和分析过程:
根据分析分析,该问题的原因是Automatic SQL Tuning Advisor Job在尝试调优 
一个较为复杂的SQL "6ru8vx9kjbv1k" 时,花了太多的时间在CPU上,并且它以S模式持有了 
CUSTOMERUSER.CUSTOM_TABLE上的library cache lock,而另外一个收集统计信息的job需要以X模式来 
获取CUSTOMERUSER.CUSTOM_TABLE上的library cache lock,S和X不兼容导致阻塞,直到收集统计信息的job超时 
在此期间,所有的需要在CUSTOMERUSER.CUSTOM_TABLE的hard parse由于都需要以S模式申请表上的library cache lock 
而被收集统计信息的那个job阻塞,因此您的数据库会出现大量的library cache lock而拖慢了应用。

解决办法是关闭Automatic SQL Tuning Advisor Job,这不会对您的数据库产生任何性能影响,因为 
Automatic SQL Tuning Advisor Job只是给出优化建议,并没有执行任何实质性的操作,除非您接受优化建议。      
            
 
 
To implement the solution, please execute the following steps:

Disable the Automatic SQL Tuning Advisor Job, for example:

connect / as sysdba

BEGIN 
DBMS_AUTO_TASK_ADMIN.DISABLE( 
client_name => 'sql tuning advisor', 
operation => NULL, 
window_name => NULL); 
END; 
/

If you need to re-enable in future:

connect / as sysdba

BEGIN 
DBMS_AUTO_TASK_ADMIN.ENABLE( 
client_name => 'sql tuning advisor', 
operation => NULL, 
window_name => NULL); 
END; 
/

Note: Disable Automatic SQL Tuning Advisor Job has NO performance impacts on the database.

Thus the Automatic SQL Tuning Advisor Job will not ON CPU for a long time, 
the GATHER_STATS_JOB which need to request S mode library cache lock 
will not be blocked, thus the issue will not happen.

The following justifies how the issue is related to this issue:

1. An Automatic SQL Tuning Advisor Job (Instance 1, J001, OS id=1864, pid: 75, sid: 2178, sess_ser: 58349) 
is tuning SQL "6ru8vx9kjbv1k" and ON CPU for too long time, it held the library cache lock on table object CUSTOMERUSER.CUSTOM_TABLE 
in S mode.

O/S info: user: oracle, term: UNKNOWN, ospid: 1864 
OSD pid info: Unix process pid: 1864, image: oracle@rac1 (J001) 
(session) sid: 2178 ser: 58349 trans: (nil), creator: 0x127112e890 
edition#: 100 oct: 3, prv: 0, sql: 0x131ce187b0, psql: 0x131c16a008, user: 85/CUSTOMERUSER

Current Wait Stack: 
Not in wait; last wait ended 18 min 58 sec ago

LibraryObjectPin: Address=0x12ff8bd270 Handle=0x130db54d18 Mode=S Lock=0x12fd5fa510 ..... <<< hold in S mode 
LibraryHandle: Address=0x130db54d18 Hash=37d0ddf6 LockMode=S PinMode=S LoadLockMode=0 Status=VALD 
ObjectName: Name=CUSTOMERUSER.CUSTOM_TABLE <<<< 
FullHashValue=edc24808d2cf3f125643190137d0ddf6 Namespace=TABLE/PROCEDURE(01) Type=TABLE(02) Identifier=87584 OwnerIdn=85

Current SQL:

LibraryHandle: Address=0x131ce187b0 Hash=6515ec32 LockMode=N PinMode=0 LoadLockMode=0 Status=VALD 
ObjectName: Name=SELECT /* DS_SVC */ /*+ dynamic_sampling(0) no_sql_tune no_monitoring optimizer_features_enable(default) opt_param('parallel_execution_enabled', 'false') result_cache */ COUNT(C1) FROM (SELECT /*+ qb_name("innerQuery") NO_INDEX_FFS( "POINT") */ 1 AS C1 FROM "MONITOR_POINT" "POINT" WHERE ("POINT"."POINTNAME"=U'\6D77\6EE8\8DEF-\6606\660E\8DEF')) innerQuery

2. And then the GATHER_STATS_JOB(Instance 1, J000, OS id=1862, pid: 66, sid: 1920, sess_ser: 19755) 
attempted to gather table stats on table CUSTOMERUSER.CUSTOM_TABLE and want to acquire the library cache lock(CUSTOMERUSER.CUSTOM_TABLE) 
in X mode, so it was blocked by the Automatic SQL Tuning Advisor Job until time out (ORA-04021: timeout occurred while waiting to lock object)

------------------------------------------------------------------------------- 
Chain 1: 
------------------------------------------------------------------------------- 
Oracle session identified by: 

instance: 1 (orcl.orcl1) 
os id: 1862 >>>>> DBMS_STATS: GATHER_STATS_JOB: GATHER_TABLE_STATS('"CUSTOMERUSER"','"CUSTOM_TABLE"','"CUSTOMER_201509"', ...) 
process id: 66, oracle@rac1 (J000) 
session id: 1920 
session serial #: 19755 

is waiting for 'library cache lock' with wait info: 

p1: 'handle address'=0x130db54d18 
p2: 'lock address'=0x133cd134c8 
p3: '100*mode+namespace'=0x1562000010003 <<< the last is 3 which is in X mode. 
time in wait: 4 min 49 sec 
... 

and is blocked by 
=> Oracle session identified by: 

instance: 1 (orcl.orcl1) 
os id: 1864 
process id: 75, oracle@rac1 (J001) 
session id: 2178 
session serial #: 58349 

which is not in a wait: 

last wait: 5 min 37 sec ago 
...... 
}

3. Any other sessions which need to acquire the library cache lock(CUSTOMERUSER.CUSTOM_TABLE) in S mode during hard parse will be blocked 
by the GATHER_STATS_JOB (because it request X mode library cache lock), thus the issue happened.

This is explained in the following note: 
WAITEVENT: "library cache lock" Reference Note ( Doc ID 34578.1 )

The root cause of this issue is:

An Automatic SQL Tuning Advisor Job (Instance 1, J001, OS id=1864, pid: 75, sid: 2178, sess_ser: 58349) 
is tuning SQL "6ru8vx9kjbv1k" and ON CPU for too long time, it held the library cache lock on table object CUSTOMERUSER.CUSTOM_TABLE 
in S mode.

And then the GATHER_STATS_JOB(Instance 1, J000, OS id=1862, pid: 66, sid: 1920, sess_ser: 19755) 
attempted to gather table stats on table CUSTOMERUSER.CUSTOM_TABLE and want to acquire the library cache lock(CUSTOMERUSER.CUSTOM_TABLE) 
in X mode, so it was blocked by the Automatic SQL Tuning Advisor Job.

Any other sessions which need to acquire the library cache lock(CUSTOMERUSER.CUSTOM_TABLE) in S mode during hard parse will be blocked 
by the GATHER_STATS_JOB (because it request X mode library cache lock), thus the issue happened.

-- Trace File --

FileName 
------------ 
orcl1_diag_10396_140.trc

FileComment 
-----------------

*** 2015-09-20 10:17:51.501 
Process diagnostic dump for oracle@rac1 (J001), OS id=1864, 
pid: 75, proc_ser: 123, sid: 2178, sess_ser: 58349 
------------------------------------------------------------------------------- 
os thread scheduling delay history: (sampling every 1.000000 secs) 
 0.000000 secs at [ 10:17:50 ] 
   NOTE: scheduling delay has not been sampled for 0.915705 secs  0.000000 secs from [ 10:17:46 - 10:17:51 ], 5 sec avg 
 0.000000 secs from [ 10:16:52 - 10:17:51 ], 1 min avg 
 0.000000 secs from [ 10:12:52 - 10:17:51 ], 5 min avg 
loadavg : 1.76 1.77 1.20 
Memory (Avail / Total) = 141536.16M / 258301.68M 
Swap (Avail / Total) = 64000.99M /  64000.99M 
F S UID        PID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY          TIME CMD 
0 R oracle    1864     1 99  80   0 - 19929533 ?   10:01 ?        00:15:55 ora_j001_orcl1 
Short stack dump:

ksedsts <- ksdxfstk <- ksdxcb <- sspuser <- __sighandler 
       <- kpp_concatq <- qsmett_tokenize_text <- qsmett_tokenize_text <- qknrcAllocate <- qkadrv 
        <- opitca <- kksFullTypeCheck <- rpiswu2 <- kksLoadChild <- kxsGetRuntimeLock 
         <- kksfbc <- kkspsc0 <- kksParseCursor <- opiosq0 <- opiall0 
          <- opikpr <- opiodr <- rpidrus <- skgmstack <- rpiswu2 
           <- kprball <- qksdsExeStmt <- qksdsExecute <- kkoatVerifyEst <- kkeAdjSingTabCard 
            <- kkecdn <- kkotap <- kkoiqb <- kkooqb <- kkoqbc 
             <- apakkoqb <- apaqbdDescendents <- apaqbdList <- apaqbdDescendents <- apaqbdList 
              <- apaqbd <- kkqctCostTransfQB <- kkqctdrvCVM <- kkqvmTrMrg <- kkqvmdrv2 
               <- kkqctdrvTD <- kkqdrv <- kkqctdrvIT <- apadrv <- opitca 
                <- kksFullTypeCheck <- rpiswu2 <- kksLoadChild <- kxsGetRuntimeLock <- kksfbc 
                 <- kkspsc0 <- kksParseCursor <- opiosq0 <- kpooprx <- kpoal8 
                  <- opiodr <- kpoodrc <- rpiswu2 <- kpoodr <- upirtrc 
                   <- kpurcsc <- kpuexec <- OCIStmtExecute <- qksanExecSql <- qksanAnalyzeSql 
                    <- qksanAnalyzeSegSql <- kestsaDoRounds <- kestsaAutoTuneSql <- kestsaAutoTuneDrv <- kestsTuneSqlDrv 
                     <- kesaiExecAction <- kesaiTuneSqlDrv <- spefcifa <- spefmccallstd <- pextproc 
                      <- peftrusted <- psdexsp <- rpiswu2 <- psdextp <- pefccal 
                       <- pefcal <- pevm_FCAL <- pfrinstr_FCAL <- pfrrun_no_tool <- pfrrun 
                        <- plsql_run <- peicnt <- kkxexe <- opiexe <- kpoal8 
                         <- opiodr <- kpoodr <- upirtrc <- kpurcsc <- kpuexec 
                          <- OCIStmtExecute <- jslvec_execcb <- jslvswu <- jslve_execute0 <- jslve_execute 
                           <- rpiswu2 <- kkjex1e <- kkjsexe <- kkjrdp <- opirip 
                            <- opidrv <- sou2o <- opimai_real <- ssthrdmain <- main

SO: 0x127112e890, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3 
  proc=0x127112e890, name=process, file=ksu.h LINE:12721, pg=0 
 (process) Oracle pid:75, ser:123, calls cur/top: 0x11c582d260/0x11c5828d20 
           flags : (0x0) - 
           flags2: (0x10),  flags3: (0x10) 
           intr error: 0, call error: 0, sess error: 0, txn error 0 
           intr queue: empty 
   ksudlp FALSE at location: 0 
 (post info) last post received: 0 0 240 
             last post received-location: kqr.h LINE:2245 ID:kqrbtm 
             last process to post me: 0x13411b8788 2 6 
             last post sent: 0 0 35 
             last post sent-location: k sr2. h LINE:629 ID:ksrpublish 
             last process posted by me: 0x13411b8788 2 6 
   (latch info) wait_event=0 bits=0x0 
   Process Group: DEFAULT, pseudo proc: 0x1341292dd0 
   O/S info: user: oracle, term: UNKNOWN, ospid: 1864 
   OSD pid info: Unix process pid: 1864, image: oracle@rac1 (J001)

SO: 0x12b15ca2f8, type: 4, owner: 0x127112e890, flag: INIT/-/-/0x00 if: 0x3 c: 0x3 
    proc=0x127112e890, name=session, file=ksu.h LINE:12729, pg=0 
   (session) sid: 2178 ser: 58349 trans: (nil), creator: 0x127112e890 
             flags: (0x8010041) USR/- flags_idl: (0x1) BSY/-/-/-/-/- 
             flags2: (0x40009) -/-/INC 
             DID: , short-term DID: 
             txn branch: (nil) 
             edition#: 100              oct: 3, prv: 0, sql: 0x131ce187b0, psql: 0x131c16a008, user: 85/CUSTOMERUSER 
   ksuxds FALSE at location: 0 
   service name: SYS$USERS 
   client details: 
     O/S info: user: oracle, term: UNKNOWN, ospid: 1864 
     machine: rac1 program: oracle@rac1 (J001) 
     application name: DBMS_SCHEDULER, hash value=2478762354 
     action name: ORA$AT_SQ_SQL_SW_5829, hash value=3238241282 
   Current Wait Stack: 
     Not in wait; last wait ended 18 min 58 sec ago 
   There are 1 sessions blocked by this session. 
   Dumping one waiter: 
     inst: 1, sid: 1920, ser: 19755 
     wait event: 'library cache lock' 
       p1: 'handle address'=0x130db54d18 
       p2: 'lock address'=0x133cd134c8 
       p3: '100*mode+namespace'=0x1562000010003 
     row_wait_obj#: 4294967295, block#: 0, row#: 0, file# 0 
     min_blocked_time: 891 secs, waiter_cache_ver: 5529

LibraryObjectLock:  Address=0x12fcdf1338 Handle=0x131ce187b0 Mode=N CanBeBrokenCount=1 Incarnation=1 ExecutionCount=0

LibraryHandle:  Address=0x131ce187b0 Hash=6515ec32 LockMode=N PinMode=0 LoadLockMode=0 Status=VALD 
       ObjectName:  Name=SELECT /* DS_SVC */ /*+ dynamic_sampling(0) no_sql_tune no_monitoring optimizer_features_enable(default) opt_param('parallel_execution_enabled', 'false')  result_cache */ COUNT(C1) FROM (SELECT /*+ qb_name("innerQuery") NO_INDEX_FFS( "POINT")  */ 1 AS C1 FROM "MONITOR_POINT" "POINT" WHERE ("POINT"."POINTNAME"=U'\6D77\6EE8\8DEF-\6606\660E\8DEF')) innerQuery

NamespaceDump:   
         Parent Cursor:  sql_id=6ru8vx9kjbv1k parent=0x112a35bdb0 maxchild=1 plk=y ppn=n

LibraryObjectPin: Address=0x12ff8bd270 Handle=0x130db54d18 Mode=S Lock=0x12fd5fa510 ClusterLock=0x12a210d508 User=0x12b15ca2f8 Session=0x12b15ca2f8 Count=1 Mask=0741 Flags=[00] SavepointNum=0x4a8a 
LibraryHandle: Address=0x130db54d18 Hash=37d0ddf6 LockMode=S PinMode=S LoadLockMode=0 Status=VALD 
ObjectName: Name=CUSTOMERUSER.CUSTOM_TABLE 
FullHashValue=edc24808d2cf3f125643190137d0ddf6 Namespace=TABLE/PROCEDURE(01) Type=TABLE(02) Identifier=87584 OwnerIdn=85

-- Script Output --

FileName 
------------ 
m_ash.dmp

FileComment 
----------------- 
Get the holder session information: instance 1 session 2178 serial# 58349

select t.dbid, 
t.sample_id, 
t.sample_time, 
t.instance_number, 
t.session_id, 
t.sql_opcode, 
t.sql_id, 
t.top_level_sql_id, 
t.session_type, 
t.session_state, 
t.program, 
t.action 
from m_ash_11426874721 t 
where instance_number = 1 
and session_id =2178 
and session_serial# =58349 
order by dbid, instance_number, sample_time;

SAMPLE_ID SAMPLE_TIME INSTANCE_NUMBER SESSION_ID SQL_OPCODE SQL_ID TOP_LEVEL_SQL_ID SESSION_TYPE SESSION_STATE PROGRAM ACTION 
19169400 2015-09-20 10:02:06.492 1 2178 3 6ru8vx9kjbv1k 5zruc4v6y32f9 FOREGROUND ON CPU oracle@rac1 (J001) ORA$AT_SQ_SQL_SW_5829 
19169410 2015-09-20 10:02:16.532 1 2178 3 6ru8vx9kjbv1k 5zruc4v6y32f9 FOREGROUND ON CPU oracle@rac1 (J001) ORA$AT_SQ_SQL_SW_5829 
19169420 2015-09-20 10:02:26.562 1 2178 3 6ru8vx9kjbv1k 5zruc4v6y32f9 FOREGROUND ON CPU oracle@rac1 (J001) ORA$AT_SQ_SQL_SW_5829 
19169430 2015-09-20 10:02:36.602 1 2178 3 6ru8vx9kjbv1k 5zruc4v6y32f9 FOREGROUND ON CPU oracle@rac1 (J001) ORA$AT_SQ_SQL_SW_5829 
19169440 2015-09-20 10:02:46.642 1 2178 3 6ru8vx9kjbv1k 5zruc4v6y32f9 FOREGROUND ON CPU oracle@rac1 (J001) ORA$AT_SQ_SQL_SW_5829 
19169450 2015-09-20 10:02:56.662 1 2178 3 6ru8vx9kjbv1k 5zruc4v6y32f9 FOREGROUND ON CPU oracle@rac1 (J001) ORA$AT_SQ_SQL_SW_5829

19172820 2015-09-20 10:59:18.029 1 2178 3 6ru8vx9kjbv1k 5zruc4v6y32f9 FOREGROUND ON CPU oracle@rac1 (J001) ORA$AT_SQ_SQL_SW_5829 
19172830 2015-09-20 10:59:28.049 1 2178 3 6ru8vx9kjbv1k 5zruc4v6y32f9 FOREGROUND ON CPU oracle@rac1 (J001) ORA$AT_SQ_SQL_SW_5829 
19172840 2015-09-20 10:59:38.089 1 2178 3 6ru8vx9kjbv1k 5zruc4v6y32f9 FOREGROUND ON CPU oracle@rac1 (J001) ORA$AT_SQ_SQL_SW_5829 
19172850 2015-09-20 10:59:48.119 1 2178 3 6ru8vx9kjbv1k 5zruc4v6y32f9 FOREGROUND ON CPU oracle@rac1 (J001) ORA$AT_SQ_SQL_SW_5829 
19172860 2015-09-20 10:59:58.149 1 2178 3 6ru8vx9kjbv1k 5zruc4v6y32f9 FOREGROUND ON CPU oracle@rac1 (J001) ORA$AT_SQ_SQL_SW_5829

Summary 
======= 
The holder is instance 1 session 2178 serial# 58349, it is running SQL Tuning Advisor on '6ru8vx9kjbv1k' and hang on CPU.

-- Trace File --

FileName 
------------ 
orcl1_dia0_10404_315.trc

FileComment 
----------------- 
*** 2015-09-20 10:04:49.209 
Verified Hangs in the System 
Root Chain Total Hang 
Hang Hang Inst Root #hung #hung Hang Hang Resolution 
ID Type Status Num Sess Sess Sess Conf Span Action 
----- ---- -------- ---- ----- ----- ----- ------ ------ ------------------- 
385 HANG VICSELTD 1 2178 2 2 HIGH LOCAL IGNRD:LongOp:Parse 
Hang Ignored Reason: Automatic hang resolution is not performed on a session 
that is parsing SQL.

inst# SessId Ser# OSPID PrcNm Event 
----- ------ ----- --------- ----- ----- 
1 1920 19755 1862 J000 library cache lock 
1 2178 58349 1864 J001 not in wait

------------------------------------------------------------------------------- 
Chain 1: 
------------------------------------------------------------------------------- 
Oracle session identified by: 

instance: 1 (orcl.orcl1) 
os id: 1862 >>>>> DBMS_STATS: GATHER_STATS_JOB: GATHER_TABLE_STATS('"CUSTOMERUSER"','"CUSTOM_TABLE"','"CUSTOMER_201509"', ...) 
process id: 66, oracle@rac1 (J000) 
session id: 1920 
session serial #: 19755 

is waiting for 'library cache lock' with wait info: 

p1: 'handle address'=0x130db54d18 
p2: 'lock address'=0x133cd134c8 
p3: '100*mode+namespace'=0x1562000010003 
time in wait: 4 min 49 sec 
timeout after: 10 min 10 sec 
wait id: 24853 
blocking: 0 sessions 
wait history: 
* time between current wait and wait #1: 0.001207 sec 
1. event: 'enq: IV - contention' 
time waited: 0.000616 sec 
wait id: 24852 p1: 'type|mode'=0x49560005 
p2: 'id1'=0x53594e43 
p3: 'id2'=0xf 
* time between wait #1 and #2: 0.000107 sec 
2. event: 'enq: IV - contention' 
time waited: 0.001168 sec 
wait id: 24851 p1: 'type|mode'=0x49560005 
p2: 'id1'=0x4c4f434b 
p3: 'id2'=0xf 
* time between wait #2 and #3: 0.009759 sec 
3. event: 'row cache lock' 
time waited: 0.000081 sec 
wait id: 24850 p1: 'cache id'=0x8 
p2: 'mode'=0x0 
p3: 'request'=0x5 

and is blocked by 
=> Oracle session identified by: 

instance: 1 (orcl.orcl1) 
os id: 1864 
process id: 75, oracle@rac1 (J001) 
session id: 2178 
session serial #: 58349 

which is not in a wait: 

last wait: 5 min 37 sec ago 
blocking: 1 session 
wait history: 
1. event: 'gc cr multi block request' 
time waited: 0.000884 sec 
wait id: 9858 p1: 'file#'=0x7e 
p2: 'block#'=0x6426 
p3: 'class#'=0x1 
* time between wait #1 and #2: 0.000335 sec 
2. event: 'gc cr multi block request' 
time waited: 0.001178 sec 
wait id: 9857 p1: 'file#'=0x7e 
p2: 'block#'=0x6422 
p3: 'class#'=0x1 
* time between wait #2 and #3: 0.000343 sec 
3. event: 'gc cr block 2-way' 
time waited: 0.000560 sec 
wait id: 9856 p1: ''=0x7e 
p2: ''=0x5726 
p3: ''=0x1 
}

Chain 1 Signature: <='library cache lock' 
Chain 1 Signature Hash: 0x24734cf

上一篇:library cache lock和cursor: pin S wait on X等待


下一篇:python中的GIL(全局解释锁)多线程能够提升效率