[20170104]ORA-08103错误的模拟和诊断.txt

[20170104]ORA-08103错误的模拟和诊断.txt

参考连接http://www.oratea.com/2016/12/10/ora-08103%e9%94%99%e8%af%af%e7%9a%84%e6%a8%a1%e6%8b%9f%e5%92%8c%e8%af%8a%e6%96%ad/,重复测试:

$ oerr ora 8103
08103, 00000, "object no longer exists"
// *Cause:  The object has been deleted by another user since the operation
//          began, or a prior incomplete recovery restored the database to
//          a point in time during the deletion of the object.
// *Action: Delete the object if this is the result of an incomplete
//          recovery.

在 Master Note for Handling Oracle Database Corruption Issues (文档 ID 1088018.1)中对ORA-8103错误的描述如下:

The object has been deleted by another user since the operation began.
If the error is reproducible, following may be the reasons:-
a.) The header block has an invalid block type.
b.) The data_object_id (seg/obj) stored in the block is different than the data_object_id stored in the segment header.

See dba_objects.data_object_id and compare it to the decimal value stored in the block (field seg/obj).

文档解释为:ORA-8103错误如果只发生1次,那表明该会话操作前,对象被其它会话删除了。
如果ORA-8103错误重复发生,可能有两个原因:

a) 段头是坏块
b) 数据块上存储的data_object_id和段头上存储的data_object_id不一致。

1.环境:
SCOTT@book> @ &r/ver1
PORT_STRING                    VERSION        BANNER
------------------------------ -------------- --------------------------------------------------------------------------------
x86_64/Linux 2.4.xx            11.2.0.4.0     Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

SCOTT@book> create table t1 tablespace users as select * from all_objects;
Table created.

SCOTT@book> select object_id,data_object_id from dba_objects where object_name = 'T1' and owner = user;
OBJECT_ID DATA_OBJECT_ID
---------- --------------
     89097          89097

--89097 = 0x15c09

SCOTT@book> select segment_type, HEADER_FILE, HEADER_BLOCK from dba_segments where owner = user and segment_name = 'T1';
SEGMENT_TYPE       HEADER_FILE HEADER_BLOCK
------------------ ----------- ------------
TABLE                        4          522

SCOTT@book> select FILE_ID, block_id, blocks from dba_extents where owner = user and segment_name = 'T1';
   FILE_ID   BLOCK_ID     BLOCKS
---------- ---------- ----------
         4        520          8
         4        528          8
         4        536          8
         4        544          8
         4        552          8
         4        560          8
         4        568          8
         4        576          8
         4        584          8
         4        592          8
         4        600          8
         4        608          8
         4        616          8
         4        624          8
         4        632          8
         4        640          8
         4        768        128
         4        896        128
         4       1024        128
         4       1152        128
         4       1280        128
         4       1408        128
         4       1536        128
         4       1664        128
         4       1792        128

25 rows selected.

2.模拟段头损坏:
SCOTT@book> alter system checkpoint ;
System altered.

BBED> set dba 4,522
        DBA             0x0100020a (16777738 4,522)

BBED> dump /v count 32
File: /mnt/ramdisk/book/users01.dbf (4)
Block: 522                               Offsets:    0 to   31                            Dba:0x0100020a
-----------------------------------------------------------------------------------------------------------
23a20000 0a020001 cf113400 00000104 35f20000 00000000 00000000 00000000 l #.........4.....5...............
<32 bytes per line>

BBED> m /x 24a2 offset 0
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
File: /mnt/ramdisk/book/users01.dbf (4)
Block: 522                                                  Offsets:    0 to   31  Dba:0x0100020a
---------------------------------------------------------------------------------------------------
24a20000 0a020001 cf113400 00000104 35f20000 00000000 00000000 00000000
<64 bytes per line>

SCOTT@book> alter system flush buffer_cache;
System altered.

SCOTT@book> select count(*) from t1;
select count(*) from t1
                     *
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 4, block # 522)
ORA-01110: data file 4: '/mnt/ramdisk/book/users01.dbf'

--//如果段头为坏块,则读取直接报坏块。

3.修改段头上的data_object_id:
--//还原
BBED> m /x 23a2 offset 0
File: /mnt/ramdisk/book/users01.dbf (4)
Block: 522                                                  Offsets:    0 to   31  Dba:0x0100020a
---------------------------------------------------------------------------------------------------
23a20000 0a020001 cf113400 00000104 32f20000 00000000 00000000 00000000
<64 bytes per line>

BBED> sum apply
Check value for File 4, Block 522:
current = 0xf235, required = 0xf235

SCOTT@book> alter system flush buffer_cache;
System altered.

SCOTT@book> select count(*) from t1;
  COUNT(*)
----------
     84752

SCOTT@book> @ &r/10to16 89097
10 to 16 HEX   REVERSE16
-------------- ------------------
0000000015c09 0x095c0100

SCOTT@book> @ &r/bbvi 4 522
BVI_COMMAND
----------------------------------------------------
bvi -b 4276224 -s 8192 /mnt/ramdisk/book/users01.dbf

--//检索095c0100,09换成0a.
BBED> set dba 4,522
        DBA             0x0100020a (16777738 4,522)

BBED> sum apply
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
Check value for File 4, Block 522:
current = 0xf236, required = 0xf236

SCOTT@book> alter system flush buffer_cache;
System altered.

SCOTT@book> select count(*) from t1;
select count(*) from t1
*
ERROR at line 1:
ORA-08103: object no longer exists

--总结:段头上的data_object_id出现问题,导致ORA-08103错误发生

4.模拟数据块data_object_id与段头不一致:
--还原略.

select owner, object_name,
       dbms_rowid.rowid_relative_fno(rowid) as fno,
       dbms_rowid.rowid_block_number(rowid) as bno,
       dbms_rowid.rowid_row_number(rowid) as rno,
       to_char(dbms_utility.make_data_block_address(dbms_rowid.rowid_relative_fno(rowid),
               dbms_rowid.rowid_block_number(rowid)), 'xxxxxxxxxx') as dba
  7   from t1 where object_id = 7840;
OWNER  OBJECT_NAME                 FNO        BNO        RNO DBA
------ -------------------- ---------- ---------- ---------- -----------
SYS    V_$DIAG_EM_TARGET_IN          4        625         46     1000271
       FO

BBED> set dba 4,625
        DBA             0x01000271 (16777841 4,625)

BBED> p /d ktbbh.ktbbhsid
union ktbbhsid, 4 bytes                     @24
   ub4 ktbbhsg1                             @24       89097 
   ub4 ktbbhod1                             @24       89097

--查询相关文档,ktbbhsg1=>对象段号,ktbbhod1==>对象号.感觉不对,这个类型union.

BBED> assign ktbbh.ktbbhsid.ktbbhsg1=89098
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
ub4 ktbbhsg1                                @24       0x00015c0a

BBED> p /d ktbbh.ktbbhsid
union ktbbhsid, 4 bytes                     @24
   ub4 ktbbhsg1                             @24       89098
   ub4 ktbbhod1                             @24       89098

--噢,一起修改,也验证了前面的判断.

BBED> sum apply
Check value for File 4, Block 625:
current = 0xee92, required = 0xee92

SCOTT@book> alter system flush buffer_cache;
System altered.

SCOTT@book> select count(*) from t1;
select count(*) from t1
*
ERROR at line 1:
ORA-08103: object no longer exists

SCOTT@book> select count(*) from t1 where rownum<=10 ;
  COUNT(*)
----------
        10

--//问题来了如何判断.那个块存在问题.

$ dbv file=/mnt/ramdisk/book/users01.dbf
DBVERIFY: Release 11.2.0.4.0 - Production on Wed Jan 4 09:41:27 2017
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
DBVERIFY - Verification starting : FILE = /mnt/ramdisk/book/users01.dbf
DBVERIFY - Verification complete
Total Pages Examined         : 16000
Total Pages Processed (Data) : 11459
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 3585
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 690
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 266
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 3412431 (0.3412431)

RMAN> validate datafile 4;
Starting validate at 2017-01-04 09:42:19
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=35 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=46 device type=DISK
allocated channel: ORA_DISK_3
channel ORA_DISK_3: SID=56 device type=DISK
channel ORA_DISK_1: starting validation of datafile
channel ORA_DISK_1: specifying datafile(s) for validation
input datafile file number=00004 name=/mnt/ramdisk/book/users01.dbf
channel ORA_DISK_1: validation complete, elapsed time: 00:00:01
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
4    OK     0              266          16027           3412431
  File Name: /mnt/ramdisk/book/users01.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              11459
  Index      0              3585
  Other      0              690
Finished validate at 2017-01-04 09:42:21

--这样检查根本看不出来那块存在问题.另外backup check logical database;也不用.

SCOTT@book> alter session set events '8103 trace name errorstack level 3';
Session altered.

--//感觉监测8103就足够了.

SCOTT@book> select count(*) from t1;
select count(*) from t1
*
ERROR at line 1:
ORA-08103: object no longer exists

dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=3, mask=0x0)
----- Error Stack Dump -----
ORA-08103: object no longer exists
----- Current SQL Statement for this session (sql_id=5bc0v4my7dvr5) -----
select count(*) from t1
...

kcbzibmlt: dump suspect buffer, err=8103
buffer tsn: 4 rdba: 0x01000271 (4/625)
scn: 0x0000.003411ae seq: 0x02 flg: 0x04 tail: 0x11ae0602
frmt: 0x02 chkval: 0xee92 type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
Dump of memory from 0x0000000062E16000 to 0x0000000062E18000
...

Block header dump:  0x01000271
Object id on Block? Y
seg/obj: 0x15c0a  csc: 0x00.34118b  itc: 3  flg: E  typ: 1 - DATA
--obj不对.15c0a = 89098
     brn: 1  bdba: 0x1000268 ver: 0x01 opc: 0
     inc: 0  exflg: 0

Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0xffff.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.0034118b
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x03   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
bdba: 0x01000271
data_block_dump,data header at 0x62e1607c
===============
tsiz: 0x1f80
hsiz: 0xa6
pbl: 0x62e1607c
     76543210
flag=--------
ntab=1
nrow=74
frre=-1
fsbo=0xa6
fseo=0x408
avsp=0x362
tosp=0x362
...
Dump of buffer cache at level 8 for tsn=4 rdba=16777841
BH (0x62fea448) file#: 4 rdba: 0x01000271 (4/625) class: 1 ba: 0x62e16000
  set: 69 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 2,19
  dbwrid: 2 obj: 89097 objn: 89097 tsn: 4 afn: 4 hint: f
  hash: [0x73fef7e0,0x849e1bd0] lru: [0x62fea670,0x842000a8]
  ckptq: [NULL] fileq: [NULL] objq: [0x633e7ab8,0x7d249880] objaq: [0x633e7ac8,0x7d249870]
  use: [0x8448b790,0x8448b790] wait: [NULL]
  st: READING md: EXCL tch: 0
  flags: only_sequential_access
  Using State Objects
    ----------------------------------------
    SO: 0x8448b710, type: 38, owner: 0x855656e8, flag: INIT/-/-/0x00 if: 0x1 c: 0x1
     proc=0x8591f618, name=buffer handle, file=kcb2.h LINE:2761, pg=0
    (buffer) (CR) PR: 0x8591f618 FLG: 0x0
    class bit: 0x0
     cr[0]:
     sh[0]:
    kcbbfbp: [BH: 0x62fea448, LINK: 0x8448b790]
    type: normal pin
    multiblock read ptr: 0x7da5dc80, count: 127
    where: kdswh11: kdst_fetch, why: 0
BH (0x73fef728) file#: 4 rdba: 0x01000271 (4/625) class: 1 ba: 0x73e9e000
  set: 64 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 4,19
  dbwrid: 0 obj: 89075 objn: 89075 tsn: 4 afn: 4 hint: f
  hash: [0x849e1bd0,0x62fea500] lru: [0x73fef950,0x73fef6e0]
  lru-flags: on_auxiliary_list
  ckptq: [NULL] fileq: [NULL] objq: [NULL] objaq: [NULL]
  st: FREE md: NULL fpin: 'kdswh02: kdsgrp' tch: 0 lfb: 33
  flags:

--//在数据缓存是对的.

5.换一种方法测试(利用10231事件):
$ oerr ora 10231
10231, 00000, "skip corrupted blocks on _table_scans_"
// *Cause:
// *Action: such blocks are skipped in table scans, and listed in trace files

--alter session set db_file_multiblock_read_count=1 ;
ALTER SESSION SET EVENTS '10231 TRACE NAME CONTEXT FOREVER, LEVEL 10';
ALTER SESSION SET events '10046 trace name context forever, level 12';

--退出重新登录.补充我感觉不需要设置设置db_file_multiblock_read_count=1 ; 估计10046事件也不需要,看上面的信息.

ALTER SESSION SET EVENTS '10231 TRACE NAME CONTEXT FOREVER, LEVEL 10';
ALTER SESSION SET events '10046 trace name context forever, level 12';

SCOTT@book> select count(*) from t1;
select count(*) from t1
                     *
ERROR at line 1:
ORA-08103: object no longer exists

--有点意外,10231事件并没有跳过这个有问题的块,直接报错.想想也正常,这个块通过了dbv以及rman检查测试.

select count(*) from t1
END OF STMT
PARSE #139737249221224:c=67990,e=68161,p=24,cr=440,cu=0,mis=1,r=0,dep=0,og=1,plh=3724264953,tim=1483494987390074
EXEC #139737249221224:c=0,e=26,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3724264953,tim=1483494987390185
WAIT #139737249221224: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1483494987390254
WAIT #139737249221224: nam='Disk file operations I/O' ela= 29 FileOperation=2 fileno=4 filetype=2 obj#=89097 tim=1483494987390566
WAIT #139737249221224: nam='direct path read' ela= 145 file number=4 first dba=523 block cnt=13 obj#=89097 tim=1483494987390762
WAIT #139737249221224: nam='direct path read' ela= 134 file number=4 first dba=537 block cnt=15 obj#=89097 tim=1483494987390977
WAIT #139737249221224: nam='direct path read' ela= 66 file number=4 first dba=553 block cnt=15 obj#=89097 tim=1483494987391204
WAIT #139737249221224: nam='direct path read' ela= 43 file number=4 first dba=569 block cnt=15 obj#=89097 tim=1483494987391402
WAIT #139737249221224: nam='direct path read' ela= 43 file number=4 first dba=585 block cnt=15 obj#=89097 tim=1483494987391606
WAIT #139737249221224: nam='direct path read' ela= 44 file number=4 first dba=601 block cnt=15 obj#=89097 tim=1483494987391818
WAIT #139737249221224: nam='direct path read' ela= 43 file number=4 first dba=617 block cnt=15 obj#=89097 tim=1483494987392017
WAIT #139737249221224: nam='direct path read' ela= 43 file number=4 first dba=633 block cnt=15 obj#=89097 tim=1483494987392208
WAIT #139737249221224: nam='db file sequential read' ela= 9 file#=4 block#=625 blocks=1 obj#=89097 tim=1483494987392416
kcbzibmlt: dump suspect buffer, err=8103
buffer tsn: 4 rdba: 0x01000271 (4/625)
scn: 0x0000.003411ae seq: 0x02 flg: 0x04 tail: 0x11ae0602
frmt: 0x02 chkval: 0xee92 type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
Dump of memory from 0x0000000061B84000 to 0x0000000061B86000

--我后面测试仅仅10046也能发现问题. 单独使用10231也可以,不在另行测试.

6.如何跳过,这种情况在真正的生产环境很难出现:
--应该损坏的块对于的表结构跟原来的块对于的表结构一样.要通过扫描,先破坏这个块.设置10231事件.

BBED> set dba 4,625
        DBA             0x01000271 (16777841 4,625)

BBED> assign kcbh.type_kcbh=0x07
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
ub1 type_kcbh                               @0        0x07

BBED> sum apply
Check value for File 4, Block 625:
current = 0xee93, required = 0xee93

SCOTT@book> ALTER SESSION SET EVENTS '10231 TRACE NAME CONTEXT FOREVER, LEVEL 10';
Session altered.

SCOTT@book> alter system flush buffer_cache;
System altered.

SCOTT@book> select count(*) from t1;
COUNT(*)
----------
84678

--检查跟踪可以发现坏块.dba=4,625.

7.补充学习bbed修复:
BBED> set dba 4,625
        DBA             0x01000271 (16777841 4,625)

BBED> assign kcbh.type_kcbh=0x06
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
ub1 type_kcbh                               @0        0x06

BBED> assign ktbbh.ktbbhsid.ktbbhsg1=89097
ub4 ktbbhsg1                                @24       0x00015c09

BBED> p /d ktbbh.ktbbhsid
union ktbbhsid, 4 bytes                     @24
   ub4 ktbbhsg1                             @24       89097
   ub4 ktbbhod1                             @24       89097

BBED> sum apply
Check value for File 4, Block 625:
current = 0xee91, required = 0xee91

SCOTT@book> alter system flush buffer_cache;
System altered.

SCOTT@book> select count(*) from t1;
  COUNT(*)
----------
     84752

--对比前面比原来的数据多.

上一篇:安卓开发之ScrollView


下一篇:innodb表空间加密