转载于:等待事件ARCH wait on ATTACH_ITPUB博客
在检查测试数据库的备份日志文件时发现备份多个备份任务没有结束。
从操作系统中检查rman进程,发现系统中存在多个没有完成的备份任务:
[oracle@demo2 oracle]$ ps -ef|grep rman
oracle 2053 2045 0 Jul11 ? 00:00:00 bash -c ?ORACLE_HOME=/opt/ora9/product/9.2?export ORACLE_HOME?ORACLE_SID=testdata?export ORACLE_SID?/opt/ora9/product/9.2/bin/rman target / nocatalog msglog rman_backup_070711.out append << EOF?RUN {?CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 3 DAYS;?CONFIGURE CONTR
oracle 2054 2053 0 Jul11 ? 00:00:00 rman ora9/product/9.2/bin/rman target / nocatalog msglog rman_backup_070711.out append
oracle 3105 3097 0 Jul12 ? 00:00:00 bash -c ?ORACLE_HOME=/opt/ora9/product/9.2?export ORACLE_HOME?ORACLE_SID=testdata?export ORACLE_SID?/opt/ora9/product/9.2/bin/rman target / nocatalog msglog rman_backup_070712.out append << EOF?RUN {?CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 3 DAYS;?CONFIGURE CONTR
oracle 3106 3105 0 Jul12 ? 00:00:00 rman ora9/product/9.2/bin/rman target / nocatalog msglog rman_backup_070712.out append
oracle 4397 4388 0 09:00 ? 00:00:00 bash -c ?ORACLE_HOME=/opt/ora9/product/9.2?export ORACLE_HOME?ORACLE_SID=testdata?export ORACLE_SID?/opt/ora9/product/9.2/bin/rman target / nocatalog msglog rman_backup_070713.out append << EOF?RUN {?CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 3 DAYS;?CONFIGURE CONTR
oracle 4398 4397 0 09:00 ? 00:00:00 rman ora9/product/9.2/bin/rman target / nocatalog msglog rman_backup_070713.out append
oracle 4700 4668 0 15:48 pts/2 00:00:00 rman target /
oracle 5051 5027 0 17:02 pts/3 00:00:00 grep rman
首先检查数据库中和rman相关的会话的等待事件:
SQL> select a.sid, a.event from v$session_wait a, v$session b where a.sid = b.sid and b.program like 'rman%';
SID EVENT
---------- ----------------------------------------------------------------
14 enqueue
28 enqueue
31 enqueue
17 SQL*Net message from client
19 SQL*Net message from client
24 SQL*Net message from client
34 SQL*Net message from client
35 SQL*Net message from client
32 SQL*Net message from client
30 SQL*Net message from client
25 SQL*Net message from client
26 SQL*Net message from client
27 SQL*Net message from client
13 rows selected.
有三个会话处于enqueue状态,于是检查v$lock视图中的锁信息和相应的会话信息:
SQL> select * from v$lock where sid > 8;
ADDR KADDR SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK
-------- -------- ---------- -- ---------- ---------- ---------- ---------- ---------- ----------
8939254C 8939255C 9 WL 1 2302 5 0 285806 1
893924B4 893924C4 14 WL 1 2302 0 4 226191 0
89392468 89392478 14 IS 0 0 4 0 226191 0
89392F28 89392F38 28 WL 1 2302 0 4 39084 0
89392EDC 89392EEC 28 IS 0 0 4 0 39084 0
89392E90 89392EA0 31 WL 1 2302 0 4 131947 0
89392E44 89392E54 31 IS 0 0 4 0 131947 0
7 rows selected.
SQL> select sid, program from v$session where sid in (select sid from v$lock where sid > 8);
SID PROGRAM
---------- ------------------------------------------------
9 oracle@demo2 (ARC0)
14 rman@demo2 (TNS V1-V3)
28 rman@demo2 (TNS V1-V3)
31 rman@demo2 (TNS V1-V3)
从检查结果可以看到,三个会话都被archive进程锁住了。V$LOCK视图中的ID2列包含的是V$LOG视图中的SEQUENCE#列。
SQL> select group#, thread#, sequence#, archived, status from v$log;
GROUP# THREAD# SEQUENCE# ARC STATUS
---------- ---------- ---------- --- ----------------
1 1 2310 YES INACTIVE
2 1 2311 NO CURRENT
3 1 2302 NO INACTIVE
显然序号为2302的日志并没有完成归档,下面从V$ARCHIVED_LOG视图验证一下:
SQL> select name, thread#, sequence# from v$archived_log where sequence# between 2301 and 2305;
NAME THREAD# SEQUENCE#
------------------------------------------------------------ ---------- ----------
/data/oradata/testdata/archive/1_2301.dbf 1 2301
/data/oradata/testdata/archive/1_2303.dbf 1 2303
/data/oradata/testdata/archive/1_2304.dbf 1 2304
/data/oradata/testdata/archive/1_2305.dbf 1 2305
看看Oracle的ARCHIVE进程在等待什么:
SQL> select event from v$session_wait where sid = 9;
EVENT
----------------------------------------------------------------
ARCH wait on ATTACH
在metalink上检查了一下这个问题,感觉可能和操作系统的bug有关。
不过Oracle并没有给出解决方法。
检查数据库设置了两个归档进程,那么人工杀掉错误的归档进程,应该不会造成太多的影响:
SQL> show parameter log_archive_max
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
log_archive_max_processes integer 2
SQL> select spid from v$process where addr = (select paddr from v$session where sid = 9);
SPID
------------
938
SQL> host
[oracle@demo2 oracle]$ kill -9 938
下面检查V$LOG视图问题已经解决。
SQL> select group#, thread#, sequence#, archived, status from v$log;
GROUP# THREAD# SEQUENCE# ARC STATUS
---------- ---------- ---------- --- ----------------
1 1 2310 YES INACTIVE
2 1 2311 NO CURRENT
3 1 2302 YES INACTIVE