又遇BUG-ORA-01148:数据文件忽然变为recover状态

现象:
RAC环境,数据文件状态变为recover,查看alert日志有如下报错:

Wed Jun 26 02:31:03 2013

Thread 1 advanced to log sequence 33187

 Current log# 1 seq# 33187 mem# 0: +TJDISK/tj/onlinelog/group_1.257.757797483

Wed Jun 26 10:10:03 2013

Errors in file /opt/app/diag/rdbms/tj/tj1/trace/tj1_dbw0_6145.trc:

ORA-01148: cannot refresh file size for datafile 17

ORA-01110: data file 17: '+TJDISK/tj/datafile/ntj_index03.301.757894747'

ORA-01031: insufficient privileges

Automatic datafile offline due to media error on

file 17: +TJDISK/tj/datafile/ntj_index03.301.757894747

Unexpected communication failure with ASM instance:

 error 1031 (ORA-01031: insufficient privileges

)

Wed

分析:

1.查看所有节点的messages系统日志、asm日志均没有出现错误信息。

2.查看DG的raw权限,也没有异常。

/dev/raw/raw6

/dev/raw/raw7

3.该Datafile为autoextend模式。

SQL> select file_name,autoextensible from dba_data_files where file_name like '+TJDISK/tj/datafile/ntj_index03.301.757894747';

FILE_NAME                                                        

--------------------------------------------------------------------------------

AUT

---

+TJDISK/tj/datafile/ntj_index03.301.757894747

YES

最后查了下Metalink,怀疑是命中了Oracle的一个BUG:Bug 16734525或Bug 9357097(Bug 16734525 is the duplicate of Bug 9357097)。

Bug 16734525 : ORA-1148: CANNOT REFRESH FILE SIZE FOR DATAFILE

Hdr: 16734525 10.2.0.5 RDBMS 11.1.0.7 ASM PRODID-5 PORTID-23 ORA-1148 9357097

Abstract: ORA-1148: CANNOT REFRESH FILE SIZE FOR DATAFILE

*** 04/27/13 02:21 am ***

PROBLEM:--------

Fri Apr 26 11:31:28 EDT 2013

Redo Shipping Client Connected as PUBLIC--

Connected User is ValidRedo Shipping Client Connected as PUBLIC--

Connected User is Valid

Fri Apr 26 11:44:55 EDT 2013

Errors in file /home/oracle/admin/ctopprul/bdump/ctopprul1_dbw0_20315.trc:

ORA-1148: cannot refresh file size for datafile 340

ORA-1110: data file 340: '+DATA/ctopprul_rdc/datafile/wires_data.1968.789654733'

ORA-1031: insufficient privilegesFri

Apr 26 11:44:55 EDT 2013

Automatic datafile offline due to media error onfile 340: +DATA/ctopprul_rdc/datafile/wires_data.1968.789654733

Fri Apr 26 11:44:59 EDT 2013

Unexpected communication failure with ASM instance: error 1031

ORA-1031: insufficient privileges)

NOTE: ASMB process state dumped to trace file /home/oracle/admin/ctopprul/bdump/ctopprul1_dbw0_20315.trc

NOTE: force a map free for map id 345 DIAGNOSTIC

ANALYSIS:--------------------

1. Matches the bug 9357097: SMALL BEEHIVE: FAILURE TO REFRESH FILE SIZE DUE TO SPACE OFFLINES DATAFILE

Need to confirm from DEV as audit file space issues were not there

2. Not using role separation and oracle executable is with correct permissions

3. CT is not sure if dbv or rman validate was run on the problematic datafiledue to media error

ORA-1148: cannot refresh file size for datafile 340

ORA-1110: data file 340: '+DATA/ctopprul_rdc/datafile/wires_data.1968.789654733'

ORA-1031: insufficient privileges

Fri Apr 26 11:44:55 EDT 2013

Automatic datafile offline due to media error on  >>>>>>>>>>>> Media error

4. Ulimit was showing nofiles of low value         Customer Visible         

[Open Update screen]     

[Double Click on Activity Text to enable Save operation]         

[Audit]Hi team,   Oracle:-----------  -

Checked if there was any space issues on the server and nothing foundas the above bug is hit when audit files are not able to write-OS watcher logs shows normal

WORKAROUND:-----------

RELATED BUGS:-------------

REPRODUCIBILITY:----------------

TEST CASE:----------

STACK TRACE:------------

SUPPORTING INFORMATION:-----------------------

Uploaded all the relevant info to the bug 24 HOUR CONTACT INFORMATION FOR P1

BUGS:----------------------------------------

DIAL-IN INFORMATION:--------------------

IMPACT DATE:------------

Bug 9357097  ORA-1148 Failure to refresh file size offlines datafile producing ORA-372 ORA-376

Symptoms:

Related To:

1 Error May Occur

2 ORA-1148 / ORA-372 / ORA-376

Range of versions believed to be affected   <-- 12.1下的版本都有可能命中

Versions BELOW 12.1    

Versions confirmed as being affected

?11.2.0.1

?11.1.0.7

?10.2.0.5

?10.2.0.4

Platforms affected

 Generic (all / most platforms affected)

Fixed:

This issue is fixed in                  <-- 12.1.0.1 11.2.0.2中已修复

 ?12.1.0.1 (Base Release)

?11.2.0.2 (Server Patch Set)

DBWR can offline the datafile with message "Automatic datafile offline due to media error"

if file size refresh fails with error ORA-1148.

As the file is offline, subsequent attempts to read the affected file produce

error ORA-372 or ORA-376 requiring media recovery.

解决方法:

临时解决方法是将手动将文件online,

Oracle并没有提供专门的补丁,需要升级到对应版本才能彻底解决(11.2.0.2)。

诊断时在ASM实例中执行了以下脚本。

SPOOL ASM_FIRST<instance#>.HTML

SET MARKUP HTML ON

set echo on

set pagesize 200

alter session set nls_date_format='DD-MON-YYYY HH24:MI:SS';

select 'THIS ASM REPORT WAS GENERATED AT: ==)> ' , sysdate " " from dual;

select 'HOSTNAME ASSOCIATED WITH THIS ASM INSTANCE: ==)> ' , MACHINE " " from v$session where program like '%SMON%';

select * from v$asm_diskgroup;

SELECT * FROM V$ASM_DISK ORDER BY GROUP_NUMBER,DISK_NUMBER;

SELECT * FROM V$ASM_CLIENT;

select * from V$ASM_ATTRIBUTE;

select * from gv$asm_operation;

select * from v$version;

show parameter asm

show parameter cluster

show parameter instance_type

show parameter instance_name

show parameter spfile

show sga

spool off

exit

上一篇:Node.js实战14:一个简单的TCP服务器。


下一篇:HDU2688 树状数组(逆序数)