OGG学习笔记03-单向复制简单故障处理

OGG学习笔记03-单向复制简单故障处理

环境:参考:OGG学习笔记02-单向复制配置实例

实验目的:了解OGG简单故障的基本处理思路。

1. 故障现象

故障现象:启动OGG源端的extract进程,data pump进程,一段时间后发现进程均被终止。

GGSCI (oradb30) 1> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING
EXTRACT ABENDED LPJY1 00:00:00 47:39:54
EXTRACT ABENDED LXJY1 00:00:00 47:40:00 GGSCI (oradb30) 2> start extract lxjy1 Sending START request to MANAGER ...
EXTRACT LXJY1 starting GGSCI (oradb30) 3> info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING
EXTRACT ABENDED LPJY1 00:00:00 47:40:50
EXTRACT RUNNING LXJY1 00:00:00 47:40:55 GGSCI (oradb30) 4> start extract lpjy1 Sending START request to MANAGER ...
EXTRACT LPJY1 starting GGSCI (oradb30) 5> info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING
EXTRACT RUNNING LPJY1 00:00:00 47:40:58
EXTRACT RUNNING LXJY1 00:00:00 47:41:04 GGSCI (oradb30) 6> info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING
EXTRACT ABENDED LPJY1 00:00:00 47:41:15
EXTRACT RUNNING LXJY1 00:00:00 47:41:21 GGSCI (oradb30) 7> info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING
EXTRACT ABENDED LPJY1 00:00:00 47:41:19
EXTRACT RUNNING LXJY1 00:00:00 47:41:25 GGSCI (oradb30) 8> info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING
EXTRACT ABENDED LPJY1 00:00:00 47:41:41
EXTRACT ABENDED LXJY1 00:00:00 47:41:47

2. 查看日志

查看ogg日志ggserr.log, 排查进程被终止的原因。

[ogg@oradb30 ogg]$ cd $GG_HOME

[ogg@oradb30 ogg]$ tail -200f ggserr.log

发现datapump进程lpjy1是因为连接不到目标OGG而终止;extract进程lxjy1是因为无法找到归档日志sequence 160 thread 1而终止。

2017-01-19 14:51:46  INFO    OGG-00993  Oracle GoldenGate Capture for Oracle, lpjy1.prm:  EXTRACT LPJY1 started.
2017-01-19 14:51:49 ERROR OGG-01224 Oracle GoldenGate Capture for Oracle, lpjy1.prm: TCP/IP error 113 (No route to host).
2017-01-19 14:51:49 ERROR OGG-01668 Oracle GoldenGate Capture for Oracle, lpjy1.prm: PROCESS ABENDING.
2017-01-19 14:52:28 ERROR OGG-00446 Oracle GoldenGate Capture for Oracle, lxjy1.prm: Could not find archived log for sequence 160 thread 1 under default destinations SQL <SELECT name FROM v$archived_log WHERE sequence# = :ora_seq_no AND thread# = :ora_thread AND resetlogs_id = :ora_resetlog_id AND archived = 'YES' AND deleted = 'NO' AND name not like '+%' AND standby_dest = 'NO' >, error retrieving redo file name for sequence 160, archived = 1, use_alternate = 0Not able to establish initial position for sequence 160, rba 7758352.
2017-01-19 14:52:28 ERROR OGG-01668 Oracle GoldenGate Capture for Oracle, lxjy1.prm: PROCESS ABENDING.

排查原因发现是归档日志被RMAN备份策略备份完成后删除了,既然有备份,那么下一步只需要从备份集中恢复日志中提示的sequence 160及其之后的日志即可。

这里,也说明配置OGG最好建议是归档模式,否则在这种目标端没有及时获取到源端在线日志的情况下,就没有办法继续应用了。

3. 解决问题

对于lxjy1进程(Extract),只需要从RMAN备份集中恢复sequence 160及其之后的归档日志:

$ rman target /
RMAN> restore archivelog from logseq 160;

然后再启动lxjy1进程。

对于lpjy1进程(Data Pump),只需要确认已经启动目标端OGG所在主机,网通,然后启动目标端数据库和目标OGG,并启动目标OGG的mgr进程,replicat进程即可。

最终确认源端和目标端ogg所有进程均正常running:

源端OGG:

GGSCI (oradb30) 1> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING
EXTRACT RUNNING LPJY1 00:00:00 00:00:03
EXTRACT RUNNING LXJY1 00:00:00 00:00:00

目标端OGG:

GGSCI (oradb31) 1> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING
REPLICAT RUNNING RJY1 00:00:00 00:00:01

OGG学习笔记基础篇:

上一篇:git 推送内容到远程新分支


下一篇:Android开发中Chronometer的用法