1.在检查rman删除归档日志的crontab时,发现日志里有ora-15028的报错,rman意外终止
- RMAN-00571: ===========================================================
- RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
- RMAN-00571: ===========================================================
- RMAN-03009: failure of delete command on default channel at 06/22/2021 07:33:02
- ORA-15028: ASM file '+ARCH/zystudio/archivelog/2021_02_24/thread_2_seq_252744.6807.1065318313' not dropped; currently being accessed
-
复制代码
先来看一下15028的错误描述
- [oracle@zystudio2 ~]$ oerr ora 15028
- 15028, 00000, "ASM file '%s' not dropped; currently being accessed"
- // *Cause: An attempt was made to drop an ASM file, but the file was
- // being accessed by one or more clients and therefore could
- // not be dropped.
- // *Action: Stop all clients that are using this file and retry the drop
- // command. Query the V$ASM_CLIENT fixed view in an ASM instance
- // or use ASMCMD LSOF to list active clients.
-
复制代码
2.尝试手动删除,依然报错
- ASMCMD> cp thread_2_seq_252744.6807.1065318313 /tmp/thread_2_seq_252744.6807.1065318313
- copying +arch/zystudio/archivelog/2021_02_24/thread_2_seq_252744.6807.1065318313 -> /tmp/thread_2_seq_252744.6807.1065318313
- ASMCMD> rm thread_2_seq_252744.6807.1065318313
- ORA-15032: not all alterations performed
- ORA-15028: ASM file '+arch/zystudio/archivelog/2021_02_24/thread_2_seq_252744.6807.1065318313' not dropped; currently being accessed (DBD ERROR: OCIStmtExecute)
复制代码
3.asm里,lsof看一下文件被哪个实例占用
- ASMCMD> lsof -G arch
- DB_Name Instance_Name Path
- zystudio zystudio2 +arch/zystudio/archivelog/2021_02_24/thread_2_seq_252744.6807.1065318313
复制代码
查询一下这个归档文件的状态
- SQL> select NAME,FIRST_TIME,CREATOR,REGISTRAR,STATUS,COMPLETION_TIME,archived,deleted from v$archived_log where name like '%2527%'
- 2 /
- NAME FIRST_TIME CREATOR REGISTR S COMPLETION_T ARC DEL
- -------------------------------------------------------------------------------- ------------ ------- ------- - ------------ --- ---
- +ARCH/zystudio/archivelog/2021_02_24/thread_2_seq_252744.6807.1065318313 24-FEB-21 RMAN RMAN A 16-JUN-21 YES NO
-
复制代码
4.设置15028事件,在ASM实例下sqlplus / as sysasm
- SQL>alter system set events '15028 trace name systemstate_global level 267';
-
- then,manual delete the archivelog in asmcmd
-
- SQL>alter system set events '15028 trace name systemstate_global off';
复制代码
但是,发现trace目录没有生成diag*.trc的文件,所以需要重置一下
- First identify the process id of the background process in question. In this example I will use the LGWR background process:
- SQL> select pid, program from v$process where program like '%DIAG%';
-
- PID PROGRAM
- ---------- ------------------------------------------------
- 6 oracle@zystudio2 (DIAG)
- OR
- SQL> select pid, spid, pname from v$process where pname like '%DIAG%';
-
- PID SPID PNAME
- --------------- ------------------------ -----
- 39 105458 DIAG
- Secondly, use oradebug to set the orapid and thereby attach to the background process:
- SQL> oradebug setorapid 6
- Unix process pid: 21955, image: oracle@zystudio2 (DIAG)
-
- SQL> oradebug close_trace
复制代码
确认trace目录有diag.trc文件后,再做一遍15028事件
5.再asm/db上分别跑如下脚本,会分别生生asm/db的会话和文件打开情况,已被后续排查(可选)
- spool <ASM|db>_<#>_lock_session_process_details.html
- SET MARKUP HTML ON
- set echo on
- set pagesize 200
- alter session set nls_date_format='DD-MON-YYYY HH24:MI:SS';
- select 'THIS REPORT WAS GENERATED AT: ==)> ' , sysdate " " from dual;
- select 'HOSTNAME ASSOCIATED WITH THIS INSTANCE: ==)> ' , MACHINE " " from v$session where program like '%SMON%';
- select * from gv$lock ;
- select * from v$session;
- select sw.sid, l.inst_id, l.block from gv$lock l, gv$session_wait sw where l.sid = sw.sid and l.inst_id = sw.inst_id;
- select * from v$process;
- select * from v$version;
- show parameter all
- spool off
- exit
复制代码
6.查看15028事件生成的diag.trc文件,直接搜索有问题的文件,找到打开该文件的pid,ospid信息
- opennm: +ARCH/zystudio/archivelog/2021_02_24/thread_2_seq_252744.6807.1065318313
- openflags: 0x80000000
- fullnm: +arch/zystudio/archivelog/2021_02_24/thread_2_seq_252744.6807.1065318313
- redun: 0x11, fdflg: 0x0, blksiz: 512, fsiz: 906972
- ftype: 4, extsz: [4294967295,0,0]
- extents: 443, start: 60, count: 383, xp: 0x0
- base: 0, lxcnt: 0
- enq: 0x700000131107bb8, cod: 0x0
- unlock[0].au: 0, unlock[0].disk: 0, unlock[0].flags 0x0
- unlock[1].au: 0, unlock[1].disk: 0, unlock[1].flags 0x0
- unlock[2].au: 0, unlock[2].disk: 0, unlock[2].flags 0x0
- unlock[3].au: 0, unlock[3].disk: 0, unlock[3].flags 0x0
- unlock[4].au: 0, unlock[4].disk: 0, unlock[4].flags 0x0
- unlock[5].au: 0, unlock[5].disk: 0, unlock[5].flags 0x0
- strpsz: 1048576, strpwdth: 1
- lnk: 0x0, 0x0
- aba: 0, 0
- <font color="Red">client pid: 361 osid: 20972400</font>
复制代码
7.通过v$process,v$session查询client pid: 361 osid: 20972400的session信息,或者从第五步生成的html文件里直接搜索,得到的信息如下
- select sid,serial#,osuser,username,PROCESS,program,LOGON_TIME,PREV_EXEC_START,event,status from v$session where program like '%rman%';
-
- SID SERIAL# OSUSER USERNAME PROGRAM LOGON_TIME PREV_EXEC_ST EVENT STATUS
- ---------- ---------- --------------- ----------- ------------------------------ ------------ ------------ ---------------------------------------- ----
- 1634 20363 oracle SYS rman@zystudio2 (TNS V1-V3) 02-DEC-20 02-DEC-20 Backup: MML write backup piece ACTIVE
- 2503 13129 oracle SYS rman@zystudio2 (TNS V1-V3) 02-DEC-20 02-DEC-20 Backup: MML write backup piece ACTIVE
- 2840 42185 oracle SYS rman@zystudio2 (TNS V1-V3) 24-FEB-21 24-FEB-21 Backup: MML write backup piece ACTIVE
- 4058 19839 oracle SYS rman@zystudio2 (TNS V1-V3) 24-FEB-21 24-FEB-21 Backup: MML write backup piece KILLED
- 4490 385 oracle SYS rman@zystudio2 (TNS V1-V3) 29-NOV-20 29-NOV-20 Backup: MML write backup piece ACTIVE
- 4514 1279 oracle SYS rman@zystudio2 (TNS V1-V3) 29-NOV-20 29-NOV-20 Backup: MML write backup piece ACTIVE
复制代码
8.原来是被rman会话占用,从日期可以看到这个会话已经很久了,确认杀掉没有问题
- SQL> alter system disconnect session '4058,19839' immediate;
- SQL> alter system disconnect session '2840,42185' immediate;
- SQL> alter system disconnect session '1634,20363' immediate;
- SQL> alter system disconnect session '2503,13129' immediate;
- SQL> alter system disconnect session '4490,385' immediate;
- SQL> alter system disconnect session '4514,1279' immediate;
复制代码
9.再次调用删除归档的rman脚本,问题解决。
|