[20140507]实例crash恢复.txt

[20140507]实例crash恢复.txt

如果发生了实例崩溃,只需要在日志文件中找到检查点位置(low cache rba),从此开始应用所有的重做日志文件, 就完成了前滚操作。
实例崩溃后,再次启动数据库,oracle会到控制文件中读取low cache rba,这就是检查点位置。 从此处开始应用重做日志,应用到on
disk rba的位置。on disk rba是磁盘中重做日志文件的最后一条重做记录的rba。 加快恢复速度,确定恢复日志的起点。

不管redo记录的事务提交还是非提交,都会应用,这个相当于前滚Roll forward (crash recovery),然后再根据undo做事务的恢复,这个叫
后滚Roll backward (transaction recovery)

In the roll backward phase, the data within the undo tablespace will be applied to roll back the uncommitted changes.The
SMON process will initiate the rollback of the uncommitted buffers.

--自己做一个测试,理解这些问题:

1.建立测试环境:
SCOTT@test> @ver
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production

SCOTT@test> create table t ( id number,name varchar2(10));
Table created.

SCOTT@test> insert into t select rownum,'ABCDEF' from dual connect by level10 rows created.

SCOTT@test> commit ;
Commit complete.

SCOTT@test> select rowid,t.* from t where rownumROWID                      ID NAME
------------------ ---------- --------------------
AABF02AAEAAAAJ1AAA          1 ABCDEF

SCOTT@test> @lookup_rowid AABF0+AAEAAAAJ1AAA
    OBJECT       FILE      BLOCK        ROW DBA                  TEXT
---------- ---------- ---------- ---------- -------------------- ----------------------------------------
    286014          4        629          0 4,629                alter system dump datafile 4 block 629 ;

SCOTT@test> alter system checkpoint ;
System altered.
--将脏块写盘.

SCOTT@test> update t set name='1234' ;
10 rows updated.
--注意name的字符串长度与原来不同,不提交,打开另外的会话.执行
@checkpoint
shutdown abort.

SYS@test> @checkpoint
                                                       检查点队列
  检查点队列                                           on disk rba                                        检查点队列
    脏块数量                                           时间戳              当前时间                       on disk rba scn    检查点心跳
       CPDRT low_rba              on_disk_rba          CPODT               SYSDATE              DIFF_DATE CPODS                   CPHBT  CURRENT_SCN     DIFF_SCN         INDX
------------ -------------------- -------------------- ------------------- ------------------- ---------- ---------------- ------------ ------------ ------------ ------------
          24 143.1242.0           143.1358.0           2014-05-07 15:35:48 2014-05-07 15:35:49       1.00 3270385840          846926007   3270385842            2            0

DATAFILE:

rtckp_rba            RTCKP_SCN         CURRENT_SCN     DIFF_SCN RTCKP_TIM           SYSDATE              DIFF_DATE
-------------------- ---------------- ------------ ------------ ------------------- ------------------- ----------
143.1242.16          3270385727         3270385843          116 2014-05-07 15:34:13 2014-05-07 15:35:49      96.00


v$instance_recovery:

     INST_ID ACTUAL_REDO_BLKS TARGET_REDO_BLKS     90%_blks TIMEOUT_BLKS  TARGET_MTTR ESTIMATED_MTTR
------------ ---------------- ---------------- ------------ ------------ ------------ --------------
           1              116           165888       165888                         0             12

SYS@test> shutdown abort
ORACLE instance shut down.

==使用bbed观察:
BBED> set dba 4,629
        DBA             0x01000275 (16777845 4,629)

BBED> p *kdbr[0]
rowdata[0]
----------
ub1 rowdata[0]                              @8058     0x2c

BBED> x /rnc
rowdata[0]                                  @8058
----------
flag@8058: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@8059: 0x01
cols@8060:    2

col    0[2] @8061: 1
col    1[6] @8064: ABCDEF

BBED> dump /v
File: /u01/app/oracle11g/oradata/test/users01.dbf (4)
Block: 629                                                         Offsets: 8058 to 8191                                                      Dba:0x01000275
---------------------------------------------------------------------------------------------------------------------------------------------------------------
2c010202 c1020641 42434445 462c0102 02c10306 41424344 45462c01 0202c104 06414243 4445462c 010202c1 05064142 l ,......ABCDEF,......ABCDEF,......ABCDEF,......AB
43444546 2c010202 c1060641 42434445 462c0102 02c10706 41424344 45462c01 0202c108 06414243 4445462c 010202c1 l CDEF,......ABCDEF,......ABCDEF,......ABCDEF,....
09064142 43444546 2c010202 c10a0641 42434445 462c0102 02c10b06 41424344 45460106 f31f                       l ..ABCDEF,......ABCDEF,......ABCDEF....

--可以发现里面的信息是第1次insert的记录.

2.启动数据库到mount状态:
SYS@test> startup mount
ORACLE instance started.

Total System Global Area 1603411968 bytes
Fixed Size                  2228784 bytes
Variable Size            1006636496 bytes
Database Buffers          587202560 bytes
Redo Buffers                7344128 bytes
Database mounted.
--数据库在mount状态.

SYS@test> select * from v$log ;
    GROUP#    THREAD#  SEQUENCE#      BYTES  BLOCKSIZE    MEMBERS ARC STATUS           FIRST_CHANGE# FIRST_TIME          NEXT_CHANGE# NEXT_TIME
---------- ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- ------------------- ------------ -------------------
         1          1        142   52428800        512          1 YES INACTIVE            3270384619 2014-05-07 15:24:16   3270384751 2014-05-07 15:24:18
         3          1        141   52428800        512          1 YES INACTIVE            3270344812 2014-05-07 11:26:32   3270384619 2014-05-07 15:24:16
         2          1        143   52428800        512          1 NO  CURRENT             3270384751 2014-05-07 15:24:18   2.8147E+14

SYS@test> @checkpoint

REDO:
                                                       检查点队列
  检查点队列                                           on disk rba                                        检查点队列
    脏块数量                                           时间戳              当前时间                       on disk rba scn    检查点心跳
       CPDRT low_rba              on_disk_rba          CPODT               SYSDATE              DIFF_DATE CPODS                   CPHBT  CURRENT_SCN     DIFF_SCN         INDX
------------ -------------------- -------------------- ------------------- ------------------- ---------- ---------------- ------------ ------------ ------------ ------------
          24 143.1242.0           143.1358.0           2014-05-07 15:35:48 2014-05-07 15:43:08     440.00 3270385840          846965749            0  -3270385840            0


DATAFILE:

rtckp_rba            RTCKP_SCN         CURRENT_SCN     DIFF_SCN RTCKP_TIM           SYSDATE              DIFF_DATE
-------------------- ---------------- ------------ ------------ ------------------- ------------------- ----------
143.1242.16          3270385727                  0  -3270385727 2014-05-07 15:34:13 2014-05-07 15:43:08     535.00


v$instance_recovery:

     INST_ID ACTUAL_REDO_BLKS TARGET_REDO_BLKS     90%_blks TIMEOUT_BLKS  TARGET_MTTR ESTIMATED_MTTR
------------ ---------------- ---------------- ------------ ------------ ------------ --------------
           1                0                                                       0              0


=>恢复就是从low_rba(143.1242.0)=>on_disk_rba(143.1358.0).

$ strace -fo /tmp/xx1.txt rlsql
SYS@test> alter database open ;
Database altered.

3.检查验证:

--首先检查alert*.log
Beginning crash recovery of 1 threads
parallel recovery started with 3 processes
Started redo scan
Completed redo scan
read 58 KB redo, 24 data blocks need recovery
Started redo application at
Thread 1: logseq 143, block 1242, scn 3270385727
Recovery of Online Redo Log: Thread 1 Group 2 Seq 143 Reading mem 0
  Mem# 0: /u01/app/oracle11g/oradata/test/redo02.log
Completed redo application of 0.01MB
Completed crash recovery at
Thread 1: logseq 143, block 1359, scn 3270405841
24 data blocks read, 24 data blocks written, 58 redo k-bytes read
Wed May 07 15:44:51 2014
LGWR: STARTING ARCH PROCESSES

--注意看提示:
Started redo application at
Thread 1: logseq 143, block 1242, scn 3270385727
...
Completed crash recovery at
Thread 1: logseq 143, block 1359, scn 3270405841

--这些与前面查询的low_rba=143.1242.0,on_disk_rba=143.1359.0基本对上,仅仅结束多读1块.
--检查/tmp/xx1.txt文件. redo的1块=512byte.
--1242*512=635904
--1359*512=695808
--695808-635904=59904
--59904/1024=58.5

$ grep 635904 /tmp/xx1.txt
11984 pread(269, "\1\"\0\0\332\4\0\0\217\0\0\0\20\200R(`\4\0\0\6\0\0\0? "..., 4194304, 635904) = 4194304

$ grep 695808 /tmp/xx1.txt
11984 pread(269, "\1\"\0\0O\5\0\0\0\0\0\0\0\200N\247\0\0\0\0\0\0\0\0\0\0"..., 4096, 695808) = 4096
--从跟踪文件可以看出这些信息.

4.查看信息:
SCOTT@test> select * from t where rownum        ID NAME
---------- --------------------
         1 ABCDEF


--使用bbed观察:
BBED> set dba 4,629
        DBA             0x01000275 (16777845 4,629)

BBED> p *kdbr[0]
rowdata[0]
----------
ub1 rowdata[0]                              @7818     0x2c

BBED> x /rnc
rowdata[0]                                  @7818
----------
flag@7818: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@7819: 0x00
cols@7820:    2

col    0[2] @7821: 1
col    1[6] @7824: ABCDEF


BBED> dump /v
File: /u01/app/oracle11g/oradata/test/users01.dbf (4)
Block: 629                                                         Offsets: 7818 to 8191                                                      Dba:0x01000275
---------------------------------------------------------------------------------------------------------------------------------------------------------------
2c000202 c1020641 42434445 462c0002 02c10306 41424344 45462c00 0202c104 06414243 4445462c 000202c1 05064142 l ,......ABCDEF,......ABCDEF,......ABCDEF,......AB
43444546 2c000202 c1060641 42434445 462c0002 02c10706 41424344 45462c00 0202c108 06414243 4445462c 000202c1 l CDEF,......ABCDEF,......ABCDEF,......ABCDEF,....
09064142 43444546 2c000202 c10a0641 42434445 462c0002 02c10b06 41424344 45462c02 0202c10b 04313233 342c0202 l ..ABCDEF,......ABCDEF,......ABCDEF,......1234,..
02c10a04 31323334 2c020202 c1090431 3233342c 020202c1 08043132 33342c02 0202c107 04313233 342c0202 02c10604 l ....1234,......1234,......1234,......1234,......
31323334 2c020202 c1050431 3233342c 020202c1 04043132 33342c02 0202c103 04313233 342c0202 02c10204 31323334 l 1234,......1234,......1234,......1234,......1234
2c000202 c1020641 42434445 462c0002 02c10306 41424344 45462c00 0202c104 06414243 4445462c 000202c1 05064142 l ,......ABCDEF,......ABCDEF,......ABCDEF,......AB
43444546 2c000202 c1060641 42434445 462c0002 02c10706 41424344 45462c00 0202c108 06414243 4445462c 000202c1 l CDEF,......ABCDEF,......ABCDEF,......ABCDEF,....
09064142 43444546 2c000202 c10a0641 42434445 462c0002 02c10b06 41424344 45460a06 016f                       l ..ABCDEF,......ABCDEF,......ABCDEF...o

--看注意看里面1234,如果对比前面的应用,可以发现,update没有提交的信息已经写入磁盘,然后rollback回滚,另外*kdbr[0]的内容,与前面看到的不同,那是回滚后的位置.
BBED> set offset 8058          OFFSET          8058

BBED> x /rnc
rowdata[240]                                @8058
------------
flag@8058: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@8059: 0x00
cols@8060:    2

col    0[2] @8061: 1
col    1[6] @8064: ABCDEF

BBED> set offset 7948
        OFFSET          7948

BBED> x /rnc
rowdata[130]                                @7948
------------
flag@7948: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@7949: 0x02
cols@7950:    2

col    0[2] @7951: 10
col    1[4] @7954: 1234

 

--附录checkpoint.sql脚本:
$ cat sqllaji/checkpoint.sql
column low_rba format a20
column on_disk_rba format a20
column rtckp_rba format a20
column diff_date format 999999.99
column CPOSD_ono_disk_rba_scn format 99999999999999999999999999999999
column cpdrt heading "检查点队列|脏块数量|CPDRT"
column cpodt_on_disk_rba heading "检查点队列|on disk rba|时间戳|CPODT"
column cpods heading "检查点队列|on disk rba scn|CPODS"
column cphbt heading "检查点心跳|CPHBT"
column current_sysdate heading "当前时间|SYSDATE"
set num 12
PROMPT
PROMPT REDO:
PROMPT
SELECT cpdrt ,
       cplrba_seq || '.' || cplrba_bno || '.' || cplrba_bof "low_rba",
       cpodr_seq || '.' || cpodr_bno || '.' || cpodr_bof "on_disk_rba",
       TO_DATE (CPODT, 'MM-DD-YYYY HH24:MI:SS') cpodt_on_disk_rba,
       SYSDATE current_sysdate,
       ROUND ( (SYSDATE - TO_DATE (CPODT, 'MM-DD-YYYY HH24:MI:SS')) * 86400,
              2)
          diff_date,
       CPODS ,
           CPHBT,
       current_scn,
       current_scn - cpods diff_scn,
       indx
  FROM x$kcccp, v$database
WHERE CPLRBA_SEQ 0;

PROMPT
PROMPT DATAFILE:
PROMPT
SELECT rtckp_rba_seq || '.' || rtckp_rba_bno || '.' || rtckp_rba_bof
          "rtckp_rba",
       rtckp_scn,
       current_scn,
       current_scn - rtckp_scn diff_scn,
       TO_DATE (rtckp_tim, 'MM-DD-YYYY HH24:MI:SS') rtckp_tim,
       SYSDATE,
       ROUND (
          (SYSDATE - TO_DATE (rtckp_tim, 'MM-DD-YYYY HH24:MI:SS')) * 86400,
          2) diff_date
  FROM x$kccrt, v$database;


PROMPT
PROMPT v$instance_recovery:
PROMPT
SELECT inst_id,
       actual_redo_blks,
       target_redo_blks,
       LOG_FILE_SIZE_REDO_BLKS AS "90%_blks",
       LOG_CHKPT_TIMEOUT_REDO_BLKS AS timeout_blks,
       target_mttr,
       estimated_mttr
  FROM gv$instance_recovery;

上一篇:Android 中 View的类关系图


下一篇:HA高可用特殊概念仲裁机制介绍