现象:
RAC环境,数据文件状态变为recover,查看alert日志有如下报错:
Wed Jun 26 02:31:03 2013
Thread 1 advanced to log sequence 33187
Current log# 1 seq# 33187 mem# 0: +TJDISK/tj/onlinelog/group_1.257.757797483
Wed Jun 26 10:10:03 2013
Errors in file /opt/app/diag/rdbms/tj/tj1/trace/tj1_dbw0_6145.trc:
ORA-01148: cannot refresh file size for datafile 17
ORA-01110: data file 17: '+TJDISK/tj/datafile/ntj_index03.301.757894747'
ORA-01031: insufficient privileges
Automatic datafile offline due to media error on
file 17: +TJDISK/tj/datafile/ntj_index03.301.757894747
Unexpected communication failure with ASM instance:
error 1031 (ORA-01031: insufficient privileges
)
Wed
分析:
1.查看所有节点的messages系统日志、asm日志均没有出现错误信息。
2.查看DG的raw权限,也没有异常。
/dev/raw/raw6
/dev/raw/raw7
3.该Datafile为autoextend模式。
SQL> select file_name,autoextensible from dba_data_files where file_name like '+TJDISK/tj/datafile/ntj_index03.301.757894747';
FILE_NAME
--------------------------------------------------------------------------------
AUT
---
+TJDISK/tj/datafile/ntj_index03.301.757894747
YES
最后查了下Metalink,怀疑是命中了Oracle的一个BUG:Bug 16734525或Bug 9357097(Bug 16734525 is the duplicate of Bug 9357097)。
Bug 16734525 : ORA-1148: CANNOT REFRESH FILE SIZE FOR DATAFILE
Hdr: 16734525 10.2.0.5 RDBMS 11.1.0.7 ASM PRODID-5 PORTID-23 ORA-1148 9357097
Abstract: ORA-1148: CANNOT REFRESH FILE SIZE FOR DATAFILE
*** 04/27/13 02:21 am ***
PROBLEM:--------
Fri Apr 26 11:31:28 EDT 2013
Redo Shipping Client Connected as PUBLIC--
Connected User is ValidRedo Shipping Client Connected as PUBLIC--
Connected User is Valid
Fri Apr 26 11:44:55 EDT 2013
Errors in file /home/oracle/admin/ctopprul/bdump/ctopprul1_dbw0_20315.trc:
ORA-1148: cannot refresh file size for datafile 340
ORA-1110: data file 340: '+DATA/ctopprul_rdc/datafile/wires_data.1968.789654733'
ORA-1031: insufficient privilegesFri
Apr 26 11:44:55 EDT 2013
Automatic datafile offline due to media error onfile 340: +DATA/ctopprul_rdc/datafile/wires_data.1968.789654733
Fri Apr 26 11:44:59 EDT 2013
Unexpected communication failure with ASM instance: error 1031
ORA-1031: insufficient privileges)
NOTE: ASMB process state dumped to trace file /home/oracle/admin/ctopprul/bdump/ctopprul1_dbw0_20315.trc
NOTE: force a map free for map id 345 DIAGNOSTIC
ANALYSIS:--------------------
1. Matches the bug 9357097: SMALL BEEHIVE: FAILURE TO REFRESH FILE SIZE DUE TO SPACE OFFLINES DATAFILE
Need to confirm from DEV as audit file space issues were not there
2. Not using role separation and oracle executable is with correct permissions
3. CT is not sure if dbv or rman validate was run on the problematic datafiledue to media error
ORA-1148: cannot refresh file size for datafile 340
ORA-1110: data file 340: '+DATA/ctopprul_rdc/datafile/wires_data.1968.789654733'
ORA-1031: insufficient privileges
Fri Apr 26 11:44:55 EDT 2013
Automatic datafile offline due to media error on >>>>>>>>>>>> Media error
4. Ulimit was showing nofiles of low value Customer Visible
[Open Update screen]
[Double Click on Activity Text to enable Save operation]
[Audit]Hi team, Oracle:----------- -
Checked if there was any space issues on the server and nothing foundas the above bug is hit when audit files are not able to write-OS watcher logs shows normal
WORKAROUND:-----------
RELATED BUGS:-------------
REPRODUCIBILITY:----------------
TEST CASE:----------
STACK TRACE:------------
SUPPORTING INFORMATION:-----------------------
Uploaded all the relevant info to the bug 24 HOUR CONTACT INFORMATION FOR P1
BUGS:----------------------------------------
DIAL-IN INFORMATION:--------------------
IMPACT DATE:------------
Bug 9357097 ORA-1148 Failure to refresh file size offlines datafile producing ORA-372 ORA-376
Symptoms:
Related To:
1 Error May Occur
2 ORA-1148 / ORA-372 / ORA-376
Range of versions believed to be affected <-- 12.1下的版本都有可能命中
Versions BELOW 12.1
Versions confirmed as being affected
?11.2.0.1
?11.1.0.7
?10.2.0.5
?10.2.0.4
Platforms affected
Generic (all / most platforms affected)
Fixed:
This issue is fixed in <-- 12.1.0.1 11.2.0.2中已修复
?12.1.0.1 (Base Release)
?11.2.0.2 (Server Patch Set)
DBWR can offline the datafile with message "Automatic datafile offline due to media error"
if file size refresh fails with error ORA-1148.
As the file is offline, subsequent attempts to read the affected file produce
error ORA-372 or ORA-376 requiring media recovery.
解决方法:
临时解决方法是将手动将文件online,
Oracle并没有提供专门的补丁,需要升级到对应版本才能彻底解决(11.2.0.2)。
诊断时在ASM实例中执行了以下脚本。
SPOOL ASM_FIRST<instance#>.HTML
SET MARKUP HTML ON
set echo on
set pagesize 200
alter session set nls_date_format='DD-MON-YYYY HH24:MI:SS';
select 'THIS ASM REPORT WAS GENERATED AT: ==)> ' , sysdate " " from dual;
select 'HOSTNAME ASSOCIATED WITH THIS ASM INSTANCE: ==)> ' , MACHINE " " from v$session where program like '%SMON%';
select * from v$asm_diskgroup;
SELECT * FROM V$ASM_DISK ORDER BY GROUP_NUMBER,DISK_NUMBER;
SELECT * FROM V$ASM_CLIENT;
select * from V$ASM_ATTRIBUTE;
select * from gv$asm_operation;
select * from v$version;
show parameter asm
show parameter cluster
show parameter instance_type
show parameter instance_name
show parameter spfile
show sga
spool off
exit