坚持真实者的最大问题在于,他们是基于过去来定义真实性,因此他们认为改变就意味着失去了真实性。
作为dba,在使用oracle数据库过程中,因版本问题,难免会遇到各种bug。部分bug对于数据库影响较小,使用过程中可以忽略,但是有些bug就可能导致数据库宕机,不得不进行处理。
1,基础环境
操作系统:AIX
数据库:三节点rac,版本11.2.0.1
2,问题
数据库运行中突然宕机,后台提示:ORA-00600: 内部错误代码, 参数: [kjbrref:pkey], [1504744], [4], [259923], [0], [], [], [], [], [], [], []
ORA-00600: 内部错误代码, 参数: [kjbrref:pkey], [1504744], [4], [259923], [0], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/ffprod/ffprod1/incident/incdir_920129/ffprod1_lms3_54235_i920129.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Wed Jul 24 02:02:06 2019
Errors in file /u01/app/oracle/diag/rdbms/ffprod/ffprod1/trace/ffprod1_lms1_54227.trc (incident=920113):
ORA-00600: 内部错误代码, 参数: [kjbrref:pkey], [1506092], [4], [259923], [0], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/ffprod/ffprod1/incident/incdir_920113/ffprod1_lms1_54227_i920113.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Wed Jul 24 02:02:06 2019
Errors in file /u01/app/oracle/diag/rdbms/ffprod/ffprod1/trace/ffprod1_lms2_54231.trc (incident=920121):
ORA-00600: 内部错误代码, 参数: [kjbrref:pkey], [1507650], [4], [259923], [0], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/ffprod/ffprod1/incident/incdir_920121/ffprod1_lms2_54231_i920121.trc
Wed Jul 24 02:02:06 2019
Errors in file /u01/app/oracle/diag/rdbms/ffprod/ffprod1/trace/ffprod1_lms0_54223.trc (incident=920105):
ORA-00600: 内部错误代码, 参数: [kjbrref:pkey], [1506697], [4], [259923], [0], [], [], [], [], [], [], []
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Incident details in: /u01/app/oracle/diag/rdbms/ffprod/ffprod1/incident/incdir_920105/ffprod1_lms0_54223_i920105.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Wed Jul 24 02:02:07 2019
Dumping diagnostic data in directory=[cdmp_20190724020207], requested by (instance=1, osid=54227 (LMS1)), summary=[incident=920113].
Wed Jul 24 02:02:08 2019
Sweep [inc][920129]: completed
Sweep [inc][920121]: completed
Sweep [inc][920113]: completed
Sweep [inc][920105]: completed
Sweep [inc2][920129]: completed
Sweep [inc2][920121]: completed
Sweep [inc2][920113]: completed
Sweep [inc2][920105]: completed
Errors in file /u01/app/oracle/diag/rdbms/ffprod/ffprod1/trace/ffprod1_lms1_54227.trc:
ORA-00600: 内部错误代码, 参数: [kjbrref:pkey], [1506092], [4], [259923], [0], [], [], [], [], [], [], []
System state dump requested by (instance=1, osid=54227 (LMS1)), summary=[abnormal instance termination].
LMS1 (ospid: 54227): terminating the instance due to error 484
System State dumped to trace file /u01/app/oracle/diag/rdbms/ffprod/ffprod1/trace/ffprod1_diag_54209.trc
Wed Jul 24 02:02:13 2019
ORA-1092 : opitsk aborting process
Wed Jul 24 02:02:16 2019
ORA-1092 : opitsk aborting process
Wed Jul 24 02:02:17 2019
ORA-1092 : opitsk aborting process
Wed Jul 24 02:02:17 2019
License high water mark = 328
Wed Jul 24 02:02:22 2019
Termination issued to instance processes. Waiting for the processes to exit
Instance termination failed to kill one or more processes
Instance terminated by LMS1, pid = 54227
Wed Jul 24 02:02:32 2019
USER (ospid: 160588): terminating the instance
Termination issued to instance processes. Waiting for the processes to exit
Wed Jul 24 02:02:42 2019
Instance termination failed to kill one or more processes
Instance terminated by USER, pid = 160588
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/11.2.0/db_1
System name: Linux
Node name: ffrac1
Release: 2.6.32-431.el6.x86_64
Version: #1 SMP Fri Nov 22 03:15:09 UTC 2013
Machine: x86_64
Instance name: ffprod1
Redo thread mounted by this instance: 1
Oracle process number: 13
Unix process pid: 54223, image: oracle@ffrac1 (LMS0)
*** 2019-07-24 02:02:06.415
*** SESSION ID:(1236.1) 2019-07-24 02:02:06.415
*** CLIENT ID:() 2019-07-24 02:02:06.415
*** SERVICE NAME:(SYS$BACKGROUND) 2019-07-24 02:02:06.415
*** MODULE NAME:() 2019-07-24 02:02:06.415
*** ACTION NAME:() 2019-07-24 02:02:06.415
Dump continued from file: /u01/app/oracle/diag/rdbms/ffprod/ffprod1/trace/ffprod1_lms0_54223.trc
ORA-00600: 内部错误代码, 参数: [kjbrref:pkey], [1506697], [4], [259923], [0], [], [], [], [], [], [], []
========= Dump for incident 920105 (ORA 600 [kjbrref:pkey]) ========
----- Beginning of Customized Incident Dump(s) -----
GCS SHADOW 0x8bfd29b30,101 resp[(nil),0x16fd89.4] pkey 259923.0
grant 1 cvt 0 mdrole 0x1 st 0x100 lst 0x40 GRANTQ rl LOCAL
master 1 owner 1 sid 0 remote[(nil),0] hist 0x1163408b1a04319
history 0x19.0x6.0x1.0xd.0xb.0x1.0xd.0xb.0x1.0x0.
cflag 0x0 sender 0 flags 0x0 replay# 0 abast (nil).x0.1 dbmap (nil)
disk: 0x0000.00000000 write request: 0x0000.00000000
pi scn: 0x0000.00000000 sq[0x8bfd29b30,0x8bfd29b30]
msgseq 0x0 updseq 0x0 reqids[124,0,0] infop (nil) lockseq x4d1
pkey 259923.0 undo 0 stat 5 masters[32768, 2->32768] reminc 104 RM# 8270
flg x6 type x0 afftime xfd9470ea, acquire time 5615459
nreplays by lms 0 = 224
nreplays by lms 1 = 219
nreplays by lms 2 = 113
nreplays by lms 3 = 233
benefit 0, total 0, remote 0, cr benefit 0, cr total 0, cr remote 0
hv 1 [stat 0x0, 1->1, wm 32768, RMno 0, reminc 102, dom 0]
kjga st 0x4, step 0.35.0, cinc 104, rmno 8270, flags 0x20
lb 0, hb 65535, myb 16305, drmb 16305, apifrz 1
GCS SHADOW END
GLOBAL CACHE ELEMENT DUMP (address: 0x8bfd29ab8):
id1: 0x16fd89 id2: 0x4 pkey: OBJ#259923 block: (4/1506697)
lock: S rls: 0x0 acq: 0x0 latch: 0
flags: 0x20 fair: 0 recovery: 0 fpin: 'kdswh11: kdst_fetch'
bscn: 0x0.0 bctx: (nil) write: 0 scan: 0xd0032e4
lcp: (nil) lnk: [NULL] lch: [0x92f3f9df0,0x92f3f9df0]
seq: 869 hist: 66 144:0 7 352 329 144:6 14 7 352 329
LIST OF BUFFERS LINKED TO THIS GLOBAL CACHE ELEMENT:
flg: 0x00080000 state: SCURRENT tsn: 4 tsh: 2
addr: 0x92f3f9cc0 obj: 259923 cls: DATA bscn: 0xec7.b6036570
BH (0x92f3f9cc0) file#: 4 rdba: 0x0116fd89 (4/1506697) class: 1 ba: 0x92a6f8000
set: 166 pool: 3 bsz: 8192 bsi: 0 sflg: 1 pwc: 0,25
dbwrid: 5 obj: 259923 objn: 110091 tsn: 4 afn: 4 hint: f
hash: [0x64ee27fa0,0x12c50cd458] lru: [0xb2f6531f8,0x3df4d6fe0]
ckptq: [NULL] fileq: [NULL] objq: [0xb2f653220,0x3df4d7008] objaq: [0xb2f653230,0x3df4d7018]
st: SCURRENT md: NULL fpin: 'kdswh11: kdst_fetch' tch: 2 le: 0x8bfd29ab8
flags: only_sequential_access
LRBA: [0x0.0.0] LSCN: [0x0.0] HSCN: [0xffff.ffffffff] HSUB: [65535]
buffer tsn: 4 rdba: 0x0116fd89 (4/1506697)
scn: 0x0ec7.b6036570 seq: 0x01 flg: 0x06 tail: 0x65700601
frmt: 0x02 chkval: 0x57d6 type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
Dump of memory from 0x000000092A6F8000 to 0x000000092A6FA000
92A6F8000 0000A206 0116FD89 B6036570 06010EC7 [........pe......]
92A6F8010 000057D6 00000001 0003F753 B6034EC7 [.W......S....N..]
92A6F8020 00000EC7 00320002 0116FC01 00160013 [......2.........]
92A6F8030 0002931B 014E9FE1 001EB0B6 00002026 [......N.....& ..]
92A6F8040 B6036570 00000000 00000000 00000000 [pe..............]
92A6F8050 00000000 00000000 00000000 00000000 [................]
92A6F8060 00000000 00260100 005EFFFF 036303C1 [......&...^...c.]
92A6F8070 00000363 15480026 16AF15F5 181A1760 [c...&.H.....`...]
92A6F8080 19A418E0 1B161A60 1C821BCA 1E1F1D51 [....`.......Q...]
92A6F8090 03C11EDF 0530047B 06C105FD 0844076F [....{.0.....o.D.]
92A6F80A0 09A808FB 0B240A6D 0C970BDD 0E240D58 [....m.$.....X.$.]
92A6F80B0 0F870ED6 1108104A 127311BF 13D41323 [....J.....s.#...]
92A6F80C0 00001487 00000000 00000000 00000000 [................]
92A6F80D0 00000000 00000000 00000000 00000000 [................]
3,分析
该问题是由于bug导致lms进程出现问题,导致数据库宕机。通过查看oracle metalink相关资料,解决思路为:
a,建议打相应patch补丁,该问题在11.2.0.3.1PSU中被fixed
b,暂时解决办法:修改隐藏参数_gc_read_mostly_locking
4,解决
因生产系统,我通过修改隐藏参数解决问题:
SQL> alter system set "_gc_read_mostly_locking"=FALSE scope=spfile;
欢迎大家关注以下公众号进行数据库方面知识探讨: