监视DG状态
物理备库(Physical Standby)
检查进程状态
SELECT CLIENT_PROCESS, PROCESS, THREAD#, SEQUENCE#, STATUS
FROM V$MANAGED_STANDBY
WHERE CLIENT_PROCESS='LGWR' OR PROCESS='MRP0';
-- oracle 12.2+
select name,pid,role,action,group#,thread#,sequence#,delay_mins from v$dataguard_process;
V$DATAGUARD_PROCESS视图
显示正在运行的Data Guard 进程,从12.2.0.1开始使用,不建议使用V$MANAGED_STANDBY
视图。
name: 进程名称
- ARCn - Archiver process
- DTS - Data transport process
- FAL - File/announce process
- LGWR - Log Writer Process
- MRP0 - Detached recovery server process
- NSSn - SYNC Redo Transport process
- ORA - Foreground process
- RFS - Remote file server
- RMI - Remote message process
- TMON - Redo Transport Process monitor
- TTnn - Redo Transport Slave Process
ROLE:进程对应的角色
- async ORL multi
- async ORL single
- async SRL multi
- async SRL single
- log writer
- sync
- archive redo
- archive local
- archive gap
- RFS async
- RFS sync
- RFS archive
- RFS gap
- RFS SMON
- data transport
- data receive
- redo transport monitor
- heartbeat redo informer
- process kill
- post role transition
- gap manager
- update TMI
- RFS ping
- FAL GAP
- FAL announce
- failover
- switchover
- remote failover
- remote switchover
- redo transport timer
- announce request
- managed recovery
- recovery
- controlfile update
- UNKNOWN
ACTION:当前进程执行的操作
- UNUSED
- STARTING
- CONNECTED
- ATTACHED
- IDLE
- ERROR
- OPENING
- CLOSING
- WRITING
- RECEIVING
- ANNOUNCING
- REGISTERING
- WAIT_FOR_LOG
- WAIT_FOR_GAP
- APPLYING_LOG
- TERMINATING
- PROCESSING
- UNKNOWN
CLIENT_ROLE:和备库上RFS或DTS进程通信的进程角色
- none
- async ORL multi
- async ORL single
- async SRL multi
- async SRL single
- log writer
- sync
- archive redo
- archive gap
- data transport
- gap manager
- failover
- switchover
- announce request
- managed recovery
- recovery
- UNKNOWN
DELAY_MINS:已分钟为单位归档日志延迟时间间隔
延迟检查
监控adg延迟时间(单位/s)
select inst_id, NAME,(substr(value,5,2)*3600+substr(value,8,2)*60+substr(value,11,2)) lag from GV$DATAGUARD_STATS where name like '%lag';
select gap.inst_id, gap.thread#, gap.low_sequence#, gap.high_sequence# from gv$archive_gap gap;
监视v$database.switchover_status
值是否为UNRESOLVABLE GAP
select SWITCHOVER_STATUS from v$database;
日志传输不到备库
set lines 168 pages 99
col dest_name for a32
col destination for a18
col error for a32
SELECT inst_id, dest_name, destination, status, error
FROM gV$ARCHIVE_DEST WHERE TARGET='STANDBY';
检查同步状态
select sequence#, first_time, next_time, applied from v$archived_log;
select archived_thread#, archived_seq#, applied_thread#, applied_seq# from v$archive_dest_status;
select thread#, max (sequence#) from v$log_history group by thread#;
select thread#, max (sequence#) from v$archived_log where APPLIED='YES' group by thread#;
col name for a13
col value for a13
col unit for a30
set lines 132
select name, value, unit, time_computed
from v$dataguard_stats where name in ('transport lag', 'apply lag');
set linesize 300
col start_time format a20
col item format a20
select to_char(start_time, 'yyyy-mm-dd hh24:mi:ss') start_time, item , sofar, units
from v$recovery_progress
where item in ('Active Apply Rate', 'Average Apply Rate', 'Redo Applied');
健康检查
sqlplus -S "/ as sysdba"<<EOF
-- STATUS: 进程状态值
-- ### ALLOCATED : 正准备连接Primary数据库
-- ### ATTACHED : 正在连接Primary数据库
-- ### CONNECTED : 已连接至Primary数据库
-- ### IDLE : 空闲中
-- ### RECEIVING : 归档文件接收中
-- ### OPENING : 归档文件处理中
-- ### CLOSING : 归档文件处理完,收尾中
-- ### WRITING : REDO数据库写向归档文件中
-- ### WAIT_FOR_LOG: 等待新的REDO数据中
-- ### WAIT_FOR_GAP: 归档有中断,正等待中断的那部分REDO数据
-- ### APPLYING_LOG: 应用REDO数据中
set feedback off;
set lines 200 pages 99
ttitle 'dg process info'
select inst_id, process, status, client_process, sequence# from gv\$managed_standby;
ttitle 'dg event info'
select inst_id, to_char(timestamp, 'yyyy-mm-dd hh24:mi:ss') ctime, message
from gv\$dataguard_status
where timestamp > sysdate - 2
;
col value for a15
col ctime for a20
col NAME for a18
col datum_time for a20
ttitle 'standby: archive log lag info'
select inst_id,
to_char(sysdate, 'yyyy-mm-dd hh24:mi:ss') ctime,
name,
value,
datum_time,
(substr(value, 5, 2) * 3600 + substr(value, 8, 2) * 60 +
substr(value, 11, 2)) lag_sec
from gv\$dataguard_stats
where 1 = 1
-- and name like '%lag'
;
ttitle 'archive dest status'
col dest_name format a20
col error format a32
col db_unique_name format a15
col status for a8
col process for a8
col type for a8
col valid_type for a12
col valid_role for a12
col compression for a10
select inst_id,
dest_id,
dest_name,
status,
delay_mins,
process,
error,
type,
valid_type,
valid_role,
compression,
db_unique_name
from gv\$archive_dest
where status <> 'INACTIVE';
col database_mode for a16
col recovery_mode for a16
col protection_mode for a20
col synchronization_status for a24
col gap_status for a12
select inst_id,
dest_id,
dest_name,
status,
type,
database_mode,
recovery_mode,
protection_mode,
standby_logfile_count srl_cnt,
standby_logfile_active srl_id#,
-- archived_thread#,
archived_seq#,
-- applied_thread#,
applied_seq#,
error,
db_unique_name,
synchronization_status,
-- synchronized,
gap_status
from gv\$archive_dest_status
where status <> 'INACTIVE';
ttitle 'apply logfile info'
col name for a64
select
name,
registrar,
creator,
thread#,
sequence#,
standby_dest,
archived,
applied,
deleted,
status,
first_time,
next_time
from v\$archived_log;
ttitle off;
quit;
EOF
对比ARCH和LGWR进程同步redo数据
ARCH进程传输
主库仅看到ARCH进程
set lines 168 pages 99
col client_process for a16
select process, client_process, thread#, sequence#, status from v$managed_standby;
备库
set lines 168 pages 99
col client_process for a16
select process, client_process, thread#, sequence#, status from v$managed_standby;
select group#, thread#, SEQUENCE#, bytes, blocksize, used, archived, status from v$standby_log;
与RFS进程通信的进程只有ARCH进程,SRLs也没有使用。
LGWR进程传输
主库
set lines 168 pages 99
col client_process for a16
select process, client_process, thread#, sequence#, status from v$managed_standby;
可以看到比之前ARCH传输多个一个LGWR的子进程LNS。
备库
set lines 168 pages 99
col client_process for a16
select process, client_process, thread#, sequence#, status from v$managed_standby;
select group#, dbid, thread#, SEQUENCE#, bytes, blocksize, used, archived, status from v$standby_log;
使用LGWR进程传输时,对比之前ARCH进程传输,与RFS进程通信进程(client_process)中不仅仅有ARCH,也有LGWR进程;SRLs也被正常使用。
逻辑备库(Logical Standby)
查看逻辑备库当前配置参数
SELECT * FROM DBA_LOGSTDBY_PARAMETERS;
逻辑Standby操作日志(DBA_LOGSTDBY_EVENTS)
默认只保留最近100条事件的记录
SELECT EVENT_TIME,STATUS,EVENT FROM DBA_LOGSTDBY_EVENTS
ORDER BY EVENT_TIMESTAMP;
记录当前的重做日志的应用情况(DBA_LOGSTDBY_LOG)
SELECT SEQUENCE#,FIRST_CHANGE#,NEXT_CHANGE#,
TIMESTAMP,APPLIED FROM DBA_LOGSTDBY_LOG;
显示LogMiner的状态等相关信息(V$LOGSTDBY_STATS)
SELECT *FROM V$LOGSTDBY_STATS;
显示当前日志应用服务的相关信息(V$LOGSTDBY_PROCESS)
显示当前日志应用服务的相关信息。常用于诊断归档日志逻辑应用的性能问题
SELECT SID,SERIAL#,SPID,TYPE,STATUS,HIGH_SCN FROM V$LOGSTDBY_PROCESS;
TYPE列表示SQL应用进程信息,其值有:
- COORDINATOR
- READER
- BUILDER
- PREPARER
- ANALYZER
- APPLIER
显示SQL应用的大致状态(V$LOGSTDBY_STATE)
STATE列的状态信息
-
INITIALIZING 初始化状态:
ALTER DATABASE START LOGICAL STANDBY APPLY IMMEDIATE;
执行后,启用SQL应用时,首先进入初始状态,该状态存在时间非常短暂。SELECT SESSION_ID, STATE FROM V$LOGSTDBY_STATE;
-
WAITING FOR DICTIONARY LOGS等待数据字典日志:指第一次初始化时的状态,如刚从物理Standby转换成逻辑Standby,需要首先应用来自Primary端生成的数据字典,在等待Primary数据字典信息时,就会处于这一状态
-
LOADING DICTIONARY 加载并分析:说明处于加载数据字典的状态
-
查询V$lOGMNR_DICTIONARY_LOAD视图获取关于加载的更详细的信息
SELECT PERCENT_DONE, COMMAND FROM V$LOGMNR_DICTIONARY_LOAD WHERE SESSION_ID =(SELECT SESSION_ID FROM V$LOGSTDBY_STATE);
-
-
WAITING ON GAP中断等待状态:SQL应用挖掘并应用了所有可用的REDO数据,正等待新的日志文件,也有可能是由于归档文件有中断造成的。
- 如果查询
V$LOGSTDBY_STATE
视图时发现处于这一状态,应该同时查询V$ARCHIVE_GAP
视图,检查是否有中断的归档。
- 如果查询
-
IDLE空闲状态:处于这一状态也有可能不是好现象,一方面可能是逻辑Standby处理能力优秀,所有活都干完了;也可能是Primary数据库发送日志或逻辑Standby日志出现了问题,导致SQL应用无活可干,因此处于空闲状态。
- 如果你发现你的逻辑Standby数据库长期处于这一状态,建议查询DBA_LOGSTDBY_LOG视图,确认Primary端产生的日志文件能被逻辑Standby数据库正常接收
-
SQL APPLY NOT ON:说明逻辑Standby数据库根本没启动SQL应用
管理
物理备库(Physical Standby)
重启进程
primary端(LNS进程)
alter system set log_archive_dest_state_2='defer' scope=both;
alter system set log_archive_dest_state_2='enable' scope=both;
standby端(MRP进程)
alter database recover managed standby database cancel;
-- 11g
alter database recover managed standby database using current logfile disconnect from session;
-- 12c 默认实时应用
alter database recover managed standby database disconnect from session;
归档日志管理
归档删除策略
CONFIGURE ARCHIVELOG DELETION POLICY TO NONE; # 默认归档删除策略
- NONE:表示不启用归档文件的删除策略,默认值
- APPLIED ON STANDBY:表示强制检查待删除的log 是否已经在备库apply,只有apply后的log才能删除
- 当通过附加的 DELETE INPUT 子句删除Standby数据库仍需要的日志时,会提示RMAN-08137错误而无法删除。 不过仍然可以手动地通过 DELETE ARCHIVELOG 方式删除,但由于网络问题导致没有传给standby时,DELETE ARCHIVELOG 方式也无法删除。
- SHIPPED TO ALL STANDBY:当归档传送到备库就可以删除
备库配置归档删除策略
rman target / <<EOF
CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON STANDBY;
EOF
备库上rman配置:当应用完的日志自动删除
删除归档日志
归档文件已不存在(即更新控制文件中记录信息)
rman target / <<EOF
crosscheck archivelog all;
delete noprompt expired archivelog all;
EOF
删除超过恢复策略的备份文件
delete noprompt obsolete; #不仅仅删除过期的备份,相关的归档也会删除
删除n天前的归档
delete noprompt archivelog all completed before 'sysdate-N';
使用FRA(fast recovery area)快速恢复区存储归档
在 Oracle 11.2 及以后版本中,当 SPACE_USED 达到 SPACE_LIMIT 的 80% 时,Oracle 将开始清除 FRA 中的文件。使用FRA可以方便管理维护归档日志。
查看FRA空间使用情况
-- 10g
select * from V$FLASH_RECOVERY_AREA_USAGE;
-- 11g+
select * from V$RECOVERY_AREA_USAGE;
SELECT substr(name, 1, 30) name
, space_limit/(1073741824) AS Quota_GB
, space_used/(1073741824) AS Used_GB
, space_reclaimable/(1073741824) AS Reclaimable_GB
, number_of_files AS files
FROM V$RECOVERY_FILE_DEST ;
调整FRA中归档日志删除占比
通过事件来设置,事件号为: 19823 。如下图
-
创建测试数据
create table t1 ( id number, name char(1000)) tablespace TBS_USERS; insert into t1 values (1,'a'); begin for i in 1..300000 loop update t1 set name=to_char(i); end loop; commit; end; / -- 监视产生的日志量 select * from v$sesstat where sid=155 and statistic#=178;
-
调整FRA的使用空间限制占比
-- 将默认值80%,改成95% alter system set event='19823 trace name context forever,level 95' scope=spfile sid='*'; -- 调整为50% alter system set event='19823 trace name context forever,level 50' scope=spfile sid='*';
dgmgrl工具
Oracle Data Guard command-line interface (DGMGRL) :用来管理维护Data Guard命令行工具。
语法
逻辑备库(Logical Standby)
指定对象跳过应用
在默认情况下,接收自Primary的REDO数据中,所有能够被逻辑Standby数据库支持的操作都会在逻辑Standby端执行。如果你希望跳过对某些对象的某些操作的话,DBMS_LOGSTDBY.SKIP就能派上用场了
DBMS_LOGSTDBY.SKIP的语法
DBMS_LOGSTDBY.SKIP (
stmt IN VARCHAR2,
schema_name IN VARCHAR2 DEFAULT NULL,
object_name IN VARCHAR2 DEFAULT NULL,
proc_name IN VARCHAR2 DEFAULT NULL,
use_like IN BOOLEAN DEFAULT TRUE,
esc IN CHAR1 DEFAULT NULL);
跳过SCOTT用户下对dept表的DML操作
-- 需要先停止REDO应用
ALTER DATABASE STOP LOGICAL STANDBY APPLY;
-- 跳过应用
EXEC DBMS_LOGSTDBY.SKIP('DML', 'SCOTT', 'DEPT');
-- 启动redo应用
ALTER DATABASE START LOGICAL STANDBY APPLY IMMEDIATE;
恢复对象同步
如果逻辑Standby中的某些表取消了与Primary的同步维护,想再恢复同步,可以通过DBMS_LOGSTDBY.UNSKIP实现。
DBMS_LOGSTDBY.UNSKIP的语法
DBMS_LOGSTDBY.UNSKIP (
stmt IN VARCHAR2,
schema_name IN VARCHAR2,
object_name IN VARCHAR2);
查看当前逻辑Standby都有哪些对象处于不同步状态
select * from dba_logstdby_skip;
恢复同步
-- 停止当前的SQL应用状态
ALTER DATABASE STOP LOGICAL STANDBY APPLY;
--恢复前面停止的scott.tmp1表的应用
execute dbms_logstdby.unskip('DML', 'SCOTT', 'DEPT');
-- 启动redo应用
ALTER DATABASE START LOGICAL STANDBY APPLY IMMEDIATE;
添加或重建对象
指定对象跳过应用虽然被取消,但是有可能在此期间由于Primary数据库做过数据修改,两端此时已经不同步,如果Standby端继续应用极有可能导致应用错误的数据
对于这类情况,Oracle也早有预见,DBMS_LOGSTDBY包中还有一个过程叫INSTANTIATE_TABLE,专门用来同步一下跳过的对象,以保持与Primary数据库的一致。
DBMS_LOGSTDBY.INSTANTIATE_TABLE的调用语法
DBMS_LOGSTDBY.INSTANTIATE_TABLE (
schema_name IN VARCHAR2,
table_name IN VARCHAR2,
dblink IN VARCHAR2);
需先创建dblink
-- 在逻辑Standby端创建一个连接Primary数据库dblink
CREATE DATABASE LINK PRE_TBL_DATA CONNECT TO SYSTEM IDENTIFIED BY oracle
USING 'ORCL_PD';
重建对象
-- 停止当前的SQL应用状态
ALTER DATABASE STOP LOGICAL STANDBY APPLY;
EXEC DBMS_LOGSTDBY.INSTANTIATE_TABLE('SCOTT', 'DEPT', 'PRE_TBL_DATA');
-- 启动redo应用
ALTER DATABASE START LOGICAL STANDBY APPLY IMMEDIATE;
仅在逻辑备库修改数据
应用Redo数据到Standby端
逻辑Standby应用REDO数据
SQL应用的原理是将接收到的REDO数据转换成SQL语句在逻辑Standby数据库端执行,因此逻辑Standby需要启动至OPEN状态
启动SQL应用
ALTER DATABASE START LOGICAL STANDBY APPLY;
启动实时应用
ALTER DATABASE START LOGICAL STANDBY APPLY IMMEDIATE;
停止SQL应用
ALTER DATABASE STOP LOGICAL STANDBY APPLY;
需要等待当前执行的SQL触发的事务结束,才能真正停止REDO应用的状态
若不考虑事务执行情况,马上停止REDO应用
ALTER DATABASE ABORT LOGICAL STANDBY APPLY;
附录
参考文档
Files being deleted in the flash recovery area, messages in the alert log Deleted Oracle managed file filename (Doc ID 1369341.1)