[20140422]使用dgmgrl管理dataguard(12)

[20140422]使用dgmgrl管理dataguard(12).txt

参考链接:
http://blog.itpub.net/267265/viewspace-1142649/
http://blog.itpub.net/267265/viewspace-1143027/
http://blog.itpub.net/267265/viewspace-1143058/
http://blog.itpub.net/267265/viewspace-1143126/
http://blog.itpub.net/267265/viewspace-1143480/
http://blog.itpub.net/267265/viewspace-1144742/
http://blog.itpub.net/267265/viewspace-1145573/
http://blog.itpub.net/267265/viewspace-1145697/
http://blog.itpub.net/267265/viewspace-1145727/
http://blog.itpub.net/267265/viewspace-1146558/
http://blog.itpub.net/267265/viewspace-1146575/

-- 我的测试环境:primary的tnsnames别名test,standby主机的tnsnames别名testdg。数据库版本:11GR2.
-- 昨天测试了FSFO,但是我遇到1个问题,如何触发failover,正常shutdown immediate测试无法触发FSFO切换.

--晚上看了一些文档,google一些blog,链接如下:
-- http://www.databasejournal.com/features/oracle/article.php/3849106/Fast-Start-Failover-in-Oracle-11g-Data-Guard.htm
-- 摘要如下:
Table 7-1. Fast-Start Failover: Triggering Events

Event Class                        Triggering Event
------------------------------------------------------------------------------------------------------------------
Connectivity Loss                  Network connectivity is lost simultaneously between the primary database and:
                                       the FSFO itself; and
                                       the standby database designated as the FSF target
                                   and the connectivity time lost exceeds the FSF threshold

Database Health Check Failure      A database health check detects any of the following optionally-configured failures:
                                       Any datafile has gone offline due to a write error
                                       A critical database object has dictionary corruption
                                       A control file is permanently destroyed because a disk has failed
                                       Log Writer (LGWR) cannot write to any member of a log group due to an I/O failure
                                       Archiver (ARCn) cannot archive a redo log because the destination is full or unavailable

Instance Crash (Single Instance)     The primary database's instance has crashed

Instances Crash (RAC)                All instances for a RAC primary database have crashed

Shutdown Abort on Primary            The primary database is shut down with the SHUTDOWN ABORT command
------------------------------------------------------------------------------------------------------------------
--shutdown abort 才会触发.


Table 7-3. Fast-Start Failover: Advanced Failover Triggers

Trigger                         Active By Default?   Explanation
-----------------------------------------------------------------------------------------------------------------------
Datafile Offline                Yes                  Tells FSFO to perform a failover whenever a datafile is offline on
                                                     the primary database

Corrupted Controlfile           Yes                  Tells FSFO to perform a failover whenever a corrupted control file
                                                     is detected on the primary database

Corrupted Dictionary            Yes                  Requests FSFO to perform a failover whenever corruption is detected
                                                     within the primary database's data dictionary

Inaccessible Logfile            No                   Forces FSFO to perform a failover whenever any one online redo log
                                                     member cannot be accessed on the primary database

Stuck Archiver                  No                   Tells FSFO to perform a failover whenever archived redo logging
                                                     cannot proceed on the primary database because there is no more
                                                     space to write out archived redo logs on that server
-----------------------------------------------------------------------------------------------------------------------
--还有一些高级的Failover Triggers.

--看看我的fast_start failover配置:

DGMGRL> show fast_start failover
Fast-Start Failover: ENABLED
  Threshold:        30 seconds
  Target:           testdg
  Observer:         testdb
  Lag Limit:        30 seconds
  Shutdown Primary: TRUE
  Auto-reinstate:   TRUE

Configurable Failover Conditions
  Health Conditions:
    Corrupted Controlfile          YES
    Corrupted Dictionary           YES
    Inaccessible Logfile            NO
    Stuck Archiver                  NO
    Datafile Offline               YES

  Oracle Error Conditions:
    (none)


--如果想打开Stuck Archiver 触发条件.执行如下:
DGMGRL> enable fast_start failover Condition "Stuck Archiver"
Succeeded.

DGMGRL> show fast_start failover

Fast-Start Failover: ENABLED

  Threshold:        30 seconds
  Target:           testdg
  Observer:         testdb
  Lag Limit:        30 seconds
  Shutdown Primary: TRUE
  Auto-reinstate:   TRUE

Configurable Failover Conditions
  Health Conditions:
    Corrupted Controlfile          YES
    Corrupted Dictionary           YES
    Inaccessible Logfile            NO
    Stuck Archiver                 YES
    Datafile Offline               YES

  Oracle Error Conditions:
    (none)

DGMGRL> disable fast_start failover Condition "Stuck Archiver"
Succeeded.

--文档中还提到,I could configure FSFO to initiate automatic failover whenever the ORA-55623 error is detected on the
--primary database with thefollowing command:
$ oerr ora 55623
55623, 00000, "Flashback Archive \"%s\" is blocking and tracking on all tables is suspended"
// *Cause: Flashback archive tablespace has run out of space.
// *Action: Add tablespace or increase tablespace quota for the flashback archive.
//

DGMGRL> ENABLE FAST_START FAILOVER CONDITION 55623;
Succeeded.
DGMGRL> show fast_start failover

Fast-Start Failover: ENABLED

  Threshold:        30 seconds
  Target:           testdg
  Observer:         testdb
  Lag Limit:        30 seconds
  Shutdown Primary: TRUE
  Auto-reinstate:   TRUE

Configurable Failover Conditions
  Health Conditions:
    Corrupted Controlfile          YES
    Corrupted Dictionary           YES
    Inaccessible Logfile            NO
    Stuck Archiver                  NO
    Datafile Offline               YES

  Oracle Error Conditions:
    ORA-55623: Flashback Archive "%s" is blocking and tracking on all tables is suspended

--如果Flashback archive磁盘空间满的话,会触发failover.自己做一个flashrecovery满的情况看看.

SYS@test> show parameter recover

NAME                         TYPE         VALUE
---------------------------- ------------ ---------------------------------------
db_recovery_file_dest        string       /u01/app/oracle11g/flash_recovery_area
db_recovery_file_dest_size   big integer  20G

SYS@test> alter system set db_recovery_file_dest_size=200M;
System altered.

RMAN> backup datafile 1;

Starting backup at 2014-04-23 09:13:01
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=12 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=/u01/app/oracle11g/oradata/test/system01.dbf
channel ORA_DISK_1: starting piece 1 at 2014-04-23 09:13:07
RMAN-03009: failure of backup command on ORA_DISK_1 channel at 04/23/2014 09:13:10
ORA-19809: limit exceeded for recovery files
ORA-19804: cannot reclaim 81920 bytes disk space from 209715200 limit
continuing other job steps, job failed will not be re-run
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including current control file in backup set
including current SPFILE in backup set
channel ORA_DISK_1: starting piece 1 at 2014-04-23 09:13:13
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ORA_DISK_1 channel at 04/23/2014 09:13:14
ORA-19809: limit exceeded for recovery files
ORA-19804: cannot reclaim 20201472 bytes disk space from 209715200 limit


--可以发现出现的错误是ORA-19809: limit exceeded for recovery files.加入如下触发条件.

DGMGRL> ENABLE FAST_START FAILOVER CONDITION 19809;
Succeeded.
DGMGRL> show fast_start failover

Fast-Start Failover: ENABLED

  Threshold:        30 seconds
  Target:           testdg
  Observer:         testdb
  Lag Limit:        30 seconds
  Shutdown Primary: TRUE
  Auto-reinstate:   TRUE

Configurable Failover Conditions
  Health Conditions:
    Corrupted Controlfile          YES
    Corrupted Dictionary           YES
    Inaccessible Logfile            NO
    Stuck Archiver                  NO
    Datafile Offline               YES

  Oracle Error Conditions:
    ORA-55623: Flashback Archive "%s" is blocking and tracking on all tables is suspended
    ORA-19809: limit exceeded for recovery files

--再次执行backup datafile 1;


DGMGRL> show fast_start failover
ORA-03113: end-of-file on communication channel
Process ID: 28425
Session ID: 209 Serial number: 45

Configuration details cannot be determined by DGMGRL

--可以发现已经触发,查看observer窗口:

09:17:45.93  Wednesday, April 23, 2014
Initiating Fast-Start Failover to database "testdg"...
Performing failover NOW, please wait...
Failover succeeded, new primary is "testdg"
09:17:53.59  Wednesday, April 23, 2014

DGMGRL> connect sys/btbtms@testdg
Connected.
DGMGRL> show fast_start failover

Fast-Start Failover: ENABLED

  Threshold:        30 seconds
  Target:           test
  Observer:         testdb
  Lag Limit:        30 seconds
  Shutdown Primary: TRUE
  Auto-reinstate:   TRUE

Configurable Failover Conditions
  Health Conditions:
    Corrupted Controlfile          YES
    Corrupted Dictionary           YES
    Inaccessible Logfile            NO
    Stuck Archiver                  NO
    Datafile Offline               YES

  Oracle Error Conditions:
    ORA-55623: Flashback Archive "%s" is blocking and tracking on all tables is suspended
    ORA-19809: limit exceeded for recovery files

DGMGRL> show configuration
Configuration - study
  Protection Mode: MaxPerformance
  Databases:
    testdg - Primary database
      Warning: ORA-16829: fast-start failover configuration is lagging
    test   - (*) Physical standby database (disabled)
      ORA-16661: the standby database needs to be reinstated

Fast-Start Failover: ENABLED

Configuration Status:
WARNING

--可以发现failover已经到testdg. 还原现场:
SYS@test> startup mount
ORACLE instance started.
Total System Global Area 1603411968 bytes
Fixed Size                  2228784 bytes
Variable Size            1006636496 bytes
Database Buffers          587202560 bytes
Redo Buffers                7344128 bytes
Database mounted.

SYS@test> alter system set db_recovery_file_dest_size=20G;
System altered.

DGMGRL> show configuration
Configuration - study
  Protection Mode: MaxPerformance
  Databases:
    testdg - Primary database
      Warning: ORA-16829: fast-start failover configuration is lagging
    test   - (*) Physical standby database (disabled)
      ORA-16661: the standby database needs to be reinstated
Fast-Start Failover: ENABLED
Configuration Status:
WARNING

DGMGRL> reinstate database test;
Reinstating database "test", please wait...
Operation requires shutdown of instance "test" on database "test"
Shutting down instance "test"...
ORA-01109: database not open

Database dismounted.
ORACLE instance shut down.
Operation requires startup of instance "test" on database "test"
Starting instance "test"...
ORACLE instance started.
Database mounted.
Continuing to reinstate database "test" ...
Reinstatement of database "test" succeeded


DGMGRL> DGMGRL> show configuration
Configuration - study
  Protection Mode: MaxPerformance
  Databases:
    testdg - Primary database
    test   - (*) Physical standby database
Fast-Start Failover: ENABLED
Configuration Status:
SUCCESS

DGMGRL> switchover to test;
Performing switchover NOW, please wait...
New primary database "test" is opening...
Operation requires shutdown of instance "testdg" on database "testdg"
Shutting down instance "testdg"...
ORACLE instance shut down.
Operation requires startup of instance "testdg" on database "testdg"
Starting instance "testdg"...
ORACLE instance started.
Database mounted.
Database opened.
Switchover succeeded, new primary is "test"

DGMGRL> disable fast_start failover Condition 55623
Succeeded.
DGMGRL> disable fast_start failover Condition 19809
Succeeded.

总结:
FSFO确实很好东西,如果做的好,能很好提高HA,减少待机时间,问题是我总觉得有多少人敢在生产系统使用.测试学习还行.
在生产系统使用自己总有点担心^_^.

上一篇:openstack 管理二十五 - rpm 方式部署 openstack(架构说明)


下一篇:wordpress 添加自定义菜单到管理面板(wp-admin)