一.1 BLOG文档结构图
一.2 前言部分
一.2.1 导读和注意事项
各位技术爱好者,看完本文后,你可以掌握如下的技能,也可以学到一些其它你所不知道的知识,~O(∩_∩)O~:
① 等待事件 enq: PS - contention的解决办法
② 一般等待事件的解决办法
Tips:
① 若文章代码格式有错乱,推荐使用QQ或360浏览器,也可以下载pdf格式的文档来查看,pdf文档下载地址:http://yunpan.cn/cdEQedhCs2kFz(提取码:ed9b)
② 本篇BLOG中代码部分需要特别关注的地方我都用黄色背景和红色字体来表示,比如下边的例子中,thread 1的最大归档日志号为33,thread 2的最大归档日志号为43是需要特别关注的地方,命令一般使用粉红颜色标注,注释一般采用蓝色字体表示。
List of Archived Logs in backup set 11
Thrd Seq Low SCN Low Time Next SCN Next Time
---- ------- ---------- ------------------- ---------- ---------
1 32 1621589 2015-05-29 11:09:52 1625242 2015-05-29 11:15:48
1 33 1625242 2015-05-29 11:15:48 1625293 2015-05-29 11:15:58
2 42 1613951 2015-05-29 10:41:18 1625245 2015-05-29 11:15:49
2 43 1625245 2015-05-29 11:15:49 1625253 2015-05-29 11:15:53
[ZFXDESKDB1:root]:/>lsvg -o
T_XDESK_APP1_vg
rootvg
[ZFXDESKDB1:root]:/>
[ZFXDESKDB1:root]:/>lsvg rootvg
====》2097152*512/1024/1024/1024=1G
本文如有错误或不完善的地方请大家多多指正,ITPUB留言或QQ皆可,您的批评指正是我写作的最大动力。
一.2.2 相关参考文章链接
一.2.3 本文简介
一.3 相关知识点扫盲
一.4 故障分析及解决过程
一.4.1 故障环境介绍
项目 |
source db |
db 类型 |
RAC |
db version |
10.2.0.4.0 |
db 存储 |
RAW |
ORACLE_SID |
XXX |
db_name |
XXX |
主机IP地址: |
XXX |
OS版本及kernel版本 |
AIX 5.3.0.0 |
OS hostname |
XXX |
一.4.2 故障发生现象及报错信息
开发人员反馈数据库很慢,让帮忙查查原因,那首当其冲的就是看主机的情况了,主机是AIX系统,采用TOPAS查看主机的情况,如下图,从图中可以看出的确有一个oracle的进程非常占用CPU资源:
一.4.3 故障分析及解决过程
根据os的进程号到数据库中查看相关的会话:
SELECT a.INST_ID, a.SQL_ID, a.EVENT, a.PREV_SQL_ID, a.STATUS,a.USERNAME,a.OSUSER
FROM gv$session a, gv$process b
WHERE a.PADDR = b.ADDR
and b.SPID = 3109012;
可以看到该会话的等待事件是enq: PS - contention,并且有相关的SQL和OSUSER,可以联系到当时的开发人员,据说已经跑了1个小时了,我们先来看看具体的sql内容:
SELECT *
FROM gv$sqlarea a
WHERE a.SQL_ID = 'cg7q9tn7u5vyx'
and a.INST_ID = 1;
SQL文本copy出来:
SELECT t.*, s.sid, s.serial#, s.machine, s.program, s.osuser
FROM (SELECT b.INST_ID,
c.USERNAME,
a.event,
to_char(a.cnt) AS seconds,
a.sql_id,
dbms_lob.substr(b.sql_fulltext, 100, 1) sqltext
FROM (SELECT rownum rn, t.*
FROM (SELECT s.INST_ID,
decode(s.session_state,
'WAITING',
s.event,
'Cpu + Wait For Cpu') Event,
s.sql_id,
s.user_id,
COUNT(*) CNT
FROM gv$active_session_history s
WHERE sample_time > SYSDATE - 30 / 1440
GROUP BY INST_ID,
s.user_id,
decode(s.session_state,
'WAITING',
s.event,
'Cpu + Wait For Cpu'),
s.sql_id
ORDER BY CNT DESC) t
WHERE rownum < 20) a,
gv$sqlarea b,
dba_users c
WHERE a.sql_id = b.sql_id
AND a.user_id = c.user_id
AND a.INST_ID = b.INST_ID
ORDER BY CNT DESC) t,
gv$session s
WHERE t.sql_id = s.sql_id(+)
AND t.INST_ID = s.INST_ID(+)
ORDER BY t.INST_ID
从文本中可以看出该sql查询的是数据字典,估计是从网上copy过来的,以哥多年的开发经验瞅了一眼就发现一个特殊的地方dbms_lob.substr(b.sql_fulltext, 100, 1),
这类clob类型的都比较耗费资源,因为比较忙就不深入的分析了,简单看了下把该句修改为b.SQL_TEXT,满足要求即可,没有必要去查询clob。
简单修改后:
SELECT t.*, s.sid, s.serial#, s.machine, s.program, s.osuser
FROM (SELECT b.INST_ID,
c.USERNAME,
a.event,
to_char(a.cnt) AS seconds,
a.sql_id,
--dbms_lob.substr(b.sql_fulltext, 100, 1) sqltext ,
b.SQL_TEXT
FROM (SELECT rownum rn, t.*
FROM (SELECT s.INST_ID,
decode(s.session_state,
'WAITING',
s.event,
'Cpu + Wait For Cpu') Event,
s.sql_id,
s.user_id,
COUNT(*) CNT
FROM gv$active_session_history s
WHERE sample_time > SYSDATE - 30 / 1440
GROUP BY INST_ID,
s.user_id,
decode(s.session_state,
'WAITING',
s.event,
'Cpu + Wait For Cpu'),
s.sql_id
ORDER BY CNT DESC) t
WHERE rownum < 20) a,
gv$sqlarea b,
dba_users c
WHERE a.sql_id = b.sql_id
AND a.user_id = c.user_id
AND a.INST_ID = b.INST_ID
ORDER BY CNT DESC) t,
gv$session s
WHERE t.sql_id = s.sql_id(+)
AND t.INST_ID = s.INST_ID(+)
ORDER BY t.INST_ID;
执行一下:
效率还是可以的,从之前的1个小时没有跑出来到现在的6秒,还是很不错的,主要是需要找出SQL中的瓶颈部分,这个就需要经验和多读书、多看报。少吃零食多睡觉了。^_^
下来问了下开发人员说可以停掉的,那我就kill掉了,kill掉后主机的情况如下:
一.4.3.1 metalink解释
参考:Metalink: Bug 5476091
Description
If a session is waiting on a mutex wait (eg: 'cursor: pin X')
then interrupts to the session are ignored.
eg: Ctrl-C does not have any effect.
This issue can show up as a deadlock in a Parallel Query
between the QC (Query coordinator) and one of its slaves
with the QC waiting on "enq: PS - contention" deadlocked
against the slave holding the requested PS enqueue.
Bug 5476091 - Ctrl-C ignored for sessions waiting for mutexes / Deadlock with "enq: PS" - superceded (文档 ID 5476091.8)
Bug 5476091 Ctrl-C ignored for sessions waiting for mutexes / Deadlock with "enq: PS" - superceded
This note gives a brief overview of bug 5476091.
The content was last updated on: 21-JUL-2015
Click here for details of each of the sections below.
Affects:
Product (Component) |
Oracle Server (Rdbms) |
Range of versions believed to be affected |
Versions BELOW 11.2 |
Versions confirmed as being affected |
· 10.2.0.4 · 10.2.0.3 |
Platforms affected |
Generic (all / most platforms affected) |
Note that this fix has been superseded by the fix in Bug:10214450
Fixed:
This fix has been superseded - please see the fixed version information for Bug:10214450 . The box below only shows versions where the code change/s for 5476091 are first included - those versions may not contain the later improved fix.
The fix for 5476091 is first included in |
· 11.2.0.1 (Base Release) · 11.1.0.7 (Server Patch Set) · 10.2.0.5 (Server Patch Set) |
Symptoms: |
Related To: |
· Deadlock · Hang (Process Hang) · Mutex Contention · Waits for "cursor: pin X" · Waits for "enq: PS - contention" |
· Parallel Query (PQO) |
Description
If a session is waiting on a mutex wait (eg: 'cursor: pin X')
then interrupts to the session are ignored.
eg: Ctrl-C does not have any effect.
This issue can show up as a deadlock in a Parallel Query
between the QC (Query coordinator) and one of its slaves
with the QC waiting on "enq: PS - contention" deadlocked
against the slave holding the requested PS enqueue.
Note:
This fix is superceded by the fix in bug 10214450
Please note: The above is a summary description only. Actual symptoms can vary. Matching to any symptoms here does not confirm that you are encountering this problem. For questions about this bug please consult Oracle Support. |
References
Bug:5476091 (This link will only work for PUBLISHED bugs)
Note:245840.1 Information on the sections in this article
一.5 故障处理总结
到此所有的处理算是基本完毕,过程很简单,但是不同的场景处理方式有很多种,我们应该学会灵活变通。