跨NAT,防火墙(firewall)的RAC监听配置(ORA-12545)

    对于存在NAT或防火墙的RAC数据库,在启用了服务器端的load balance后,经常会碰到ORA-12545连接错误,这是因为服务器端转发客户端连接请求到其它节点后,客户端使用返回的IP再次发出连接请求而出现不可识别的IP地址或主机名而造成的。本文描述了这个问题并给出了解决方案。

  有关RAC监听配置请参考
    ORACLE RAC 监听配置 (listener.ora tnsnames.ora)
    ORACLE RAC 下非缺省端口监听配置(listener.ora tnsnames.ora)

 

一、NAT下监听描述图

    跨NAT,防火墙(firewall)的RAC监听配置(ORA-12545)


二、配置情况

1、服务器端的配置情况
 ################Server node 1 listener.ora########################
 LISTENER_VMDB01P =
   (DESCRIPTION_LIST =
     (DESCRIPTION =
       (ADDRESS = (PROTOCOL = TCP)(HOST = vmdb01pvip)(PORT = 1314)(IP = FIRST))
       (ADDRESS = (PROTOCOL = TCP)(HOST = vmdb01p)(PORT = 1314)(IP = FIRST))
     )
   )
 
 SID_LIST_LISTENER_VMDB01P =
   (SID_LIST =
     (SID_DESC =
       (SID_NAME = PLSExtProc)
       (ORACLE_HOME = /u01/oracle/db)
       (PROGRAM = extproc)
     )
   )
 
 ################Server node 2 listener.ora########################
 LISTENER_VMDB02P =
   (DESCRIPTION_LIST =
     (DESCRIPTION =
       (ADDRESS = (PROTOCOL = TCP)(HOST = vmdb02pvip)(PORT = 1314)(IP = FIRST))
       (ADDRESS = (PROTOCOL = TCP)(HOST = vmdb02p)(PORT = 1314)(IP = FIRST))
     )
   )
 
 SID_LIST_LISTENER_VMDB02P =
   (SID_LIST =
     (SID_DESC =
       (SID_NAME = PLSExtProc)
       (ORACLE_HOME = /u01/oracle/db)
       (PROGRAM = extproc)
     )
   )
 
 ################Server side tnsnames.ora, same for both nodes########################
 remote_lsnr_lm5330 =
   (ADDRESS_LIST =
     (ADDRESS = (PROTOCOL = TCP)(HOST = vmdb01pvip)(PORT = 1314))
     (ADDRESS = (PROTOCOL = TCP)(HOST = vmdb02pvip)(PORT = 1314))
   )
 
 local_lsnr_lm5330a =
   (ADDRESS_LIST =
     (ADDRESS = (PROTOCOL = TCP)(HOST = vmdb01pvip)(PORT = 1314))
   )
 
 local_lsnr_lm5330b =
   (ADDRESS_LIST =
     (ADDRESS = (PROTOCOL = TCP)(HOST = vmdb02pvip)(PORT = 1314))
   )
 
 ################Server side host table, same for both nodes######################## 
 127.0.0.1          localhost
 10.200.48.17    vmdb01p.oradb.com   vmdb01p
 10.200.48.18    vmdb02p.oradb.com   vmdb02p
 10.200.48.15    vmdb01pvip.oradb.com        vmdb01pvip
 10.200.48.16    vmdb02pvip.oradb.com        vmdb02pvip
 192.168.48.17   vmdb01pph.oradb.com vmdb01pph
 192.168.48.18   vmdb02pph.oradb.com vmdb02pph
 
 SQL> show parameter instance_name
 
 NAME                                 TYPE                              VALUE
 ------------------------------------ ------------------- ------------------
 instance_name                        string              LM5330A
 SQL> show parameter listener
 
 NAME                                 TYPE                VALUE
 ------------------------------------ ------------------- -------------------
 local_listener                       string              local_lsnr_lm5330a
 remote_listener                      string              remote_lsnr_lm5330
 
 SQL> show parameter instance_name
 
 NAME                                 TYPE                VALUE
 ------------------------------------ ------------------- ------------------
 instance_name                        string              LM5330B
 SQL> show parameter listener
 
 NAME                                 TYPE                VALUE
 ------------------------------------ ------------------- ------------------
 local_listener                       string              local_lsnr_lm5330b
 remote_listener                      string              remote_lsnr_lm5330

2、客户端tnsnames.ora配置 
 #For NAT client 
 LM5330TKO =
   (DESCRIPTION =
     (ADDRESS = (PROTOCOL = TCP)(HOST = 10.255.48.15)(PORT = 1314))
     (ADDRESS = (PROTOCOL = TCP)(HOST = 10.255.48.16)(PORT = 1314))
     (LOAD_BALANCE = yes)
     (CONNECT_DATA =
       (SERVER = DEDICATED)
       (SERVICE_NAME = LM5330)
        (FAILOVER_MODE =                    
         (TYPE = SELECT)                                                           
         (METHOD = BASIC)                            
         (RETRIES = 180)                                                           
         (DELAY = 5) 
          ) 
     )
   )
 
 #For non NAT client
 LM5330VIP =
   (DESCRIPTION =
     (ADDRESS = (PROTOCOL = TCP)(HOST = 10.200.48.15)(PORT = 1314))
     (ADDRESS = (PROTOCOL = TCP)(HOST = 10.200.48.16)(PORT = 1314))
     (LOAD_BALANCE = yes)
     (CONNECT_DATA =
       (SERVER = DEDICATED)
       (SERVICE_NAME = LM5330)
        (FAILOVER_MODE =                    
         (TYPE = SELECT)                                                           
         (METHOD = BASIC)                            
         (RETRIES = 180)                                                           
         (DELAY = 5) 
          ) 
     )
   )

3、客户端hosts文件配置
 c:\windows\system32\drivers\etc\hosts
 10.200.48.15    vmdb01pvip.oradb.com        vmdb01pvip
 10.200.48.16    vmdb02pvip.oradb.com        vmdb02pvip


三、ORA-12545错误信息 

C:\Users\Administrator>sqlplus system/system@lm5330tko
 
 SQL*Plus: Release 10.2.0.3.0 - Production on 星期二 11月 27 16:01:23 2012
 
 Copyright (c) 1982, 2006, Oracle.  All Rights Reserved.
 
 ERROR:
 ORA-12545: Connect failed because target host or object does not exist
 
 对于客户端发起的连接请求,有时候可以成功建立连接,有时候无法建立连接并伴随给出ORA-12545错误
 
 oracle@bo2dbp:~> oerr ora 12545
 12545, 00000, "Connect failed because target host or object does not exist"
 // *Cause: The address specified is not valid, or the program being 
 // connected to does not exist.
 // *Action: Ensure the ADDRESS parameters have been entered correctly; the
 // most likely incorrect parameter is the node name.  Ensure that the 
 // executable for the server exists (perhaps "oracle" is missing.)
 // If the protocol is TCP/IP, edit the TNSNAMES.ORA file to change the
 // host name to a numeric IP address and try again.
 
 从ORA-12545错误描述来看,主要是由于解析的问题产生该类错误


四、分析与解决

1、启用sqlnet trace
 为客户端的sqlnet.ora文件添加下列项来启用sqlnet trace,trace后,建议移除这些选项以避免额外的网络开销
 Trace_level_client=16
 Trace_directory_client=<path_to_the_trace_directory> # use the full path to the trace directory
 Trace_unique_client=on
 Trace_timestamp_client=on
 Diag_adr_enabled=off
 
 #Author : Robinson
 #Blog : http://blog.csdn.net/robinson_0612
 
 下面是trace文件提供的信息,关于如何做 oracle net trace 请参考:http://blog.csdn.net/robinson_0612/article/details/8254720
 文件名:cli_10648_1.trc
 
 [27-11-2012 17:59:01:694] nsc2addr: (ADDRESS=(PROTOCOL=TCP)(HOST=vmdb02pvip)(PORT=1314))
 [27-11-2012 17:59:01:694] nttbnd2addr: entry
 [27-11-2012 17:59:01:694] snlinGetAddrInfo: entry
 [27-11-2012 17:59:01:694] snlinGetAddrInfo: Invalid IP address string vmdb02pvip
 [27-11-2012 17:59:01:694] snlinFreeAddrInfo: entry
 [27-11-2012 17:59:01:694] snlinFreeAddrInfo: exit
 [27-11-2012 17:59:01:694] snlinGetAddrInfo: exit
 [27-11-2012 17:59:01:694] nttbnd2addr: looking up IP addr for host: vmdb02pvip
 [27-11-2012 17:59:01:694] snlinGetAddrInfo: entry
 [27-11-2012 17:59:04:259] snlinGetAddrInfo: Name resolution failed for vmdb02pvip
 [27-11-2012 17:59:04:259] snlinFreeAddrInfo: entry
 [27-11-2012 17:59:04:259] snlinFreeAddrInfo: exit
 [27-11-2012 17:59:04:259] snlinGetAddrInfo: exit
 [27-11-2012 17:59:04:259] nttbnd2addr: *** hostname lookup failure! ***
 [27-11-2012 17:59:04:259] nttbnd2addr: exit
 [27-11-2012 17:59:04:259] nserror: entry
 [27-11-2012 17:59:04:259] nserror: nsres: id=0, op=77, ns=12545, ns2=12560; nt[0]=515, nt[1]=1004, nt[2]=0; 
    ora[0]=0, ora[1]=0, ora[2]=0
 [27-11-2012 17:59:04:259] nsc2addr: error exit

2、故障分析
 #分析上面的trace信息,trace信息中指出了无法解析主机vmdb02pvip
 #客户端显示的12545错误的发生正式由于这个vmdb02pvip无法解析。
 #那有时候可以连接成功是怎么一回事呢? 
 #当客户端发起连接请求,由于客户端配置了load balance,因此客户端会随机从ADDRESS列表中挑选一个IP进行连接。
 #假定当前选择从NAT IP 10.255.48.15发出连接请求,10.255.48.15会被自动映射到相应的virtual ip,即10.200.48.15。
 #当10.200.48.15上的监听器捕获到该请求,服务器端的load balance生效,pmon进程根据监听信息发现当前节点并不繁忙,直接建立连接。
 #上面的描述就是可以成功建立连接的情形。
 
 #当服务器的监听发现其余的节点空闲时,而当前节点繁忙,监听器将实现转发。
 #也就是说服务器端返回一个新的IP地址给客户端,说我比较忙,你连接到10.200.48.16吧。
 #于是客户端根据所提供的IP地址重新发起连接请求,而此时客户端到10.200.48.16根本不可达,即使将ADDRESS列表改为该IP。
 #所以上面的分析是产生ORA-12545的原因。

3、解决
 #下面通过tracert命令来检查节点是否可达

 C:\Users\Administrator>tracert vmdb02pvip
 
 Tracing route to vmdb02pvip [10.200.48.16]
 over a maximum of 30 hops:
 
   1     1 ms     1 ms     1 ms  192.168.9.254
   2    <1 ms    <1 ms    <1 ms  10.97.5.1
   3     *        *        *     Request timed out.   #超时,节点不可达
   4     *        *        *     Request timed out.
   5     *        *        *     Request timed out.
   6     *        *        *     Request timed out.

 #考虑将客户端hosts文件作下列更改,即将映射关系的IP直接改为NAT的IP    
 c:\windows\system32\drivers\etc\hosts
 #10.200.48.15    vmdb01pvip.oradb.com        vmdb01pvip
 #10.200.48.16    vmdb02pvip.oradb.com        vmdb02pvip
 
 10.255.48.15    vmdb01pvip.oradb.com        vmdb01pvip
 10.255.48.16    vmdb02pvip.oradb.com        vmdb02pvip

 #再次tracert
 C:\Users\Administrator>tracert vmdb02pvip
 
 Tracing route to v2048db02pvip [10.255.48.16]
 over a maximum of 30 hops:
 
   1    <1 ms     1 ms     1 ms  192.168.9.254
   2    <1 ms    <1 ms    <1 ms  10.97.5.1
   3     8 ms     5 ms     4 ms  10.100.34.4
   4     6 ms     4 ms     4 ms  vmdb02pvip [10.255.48.16]
   5     8 ms     4 ms     4 ms  vmdb02pvip [10.255.48.16]
 
 Trace complete.
 
 #再次建立连接时,不再出现ORA-12545 
 C:\Users\Administrator>netstat
 
 Active Connections
 
   Proto  Local Address          Foreign Address        State
   TCP    192.168.9.12:59541     vmdb01pvip:1314     ESTABLISHED
   TCP    192.168.9.12:59558     vmdb02pvip:1314     ESTABLISHED

 其次也可以通过禁用服务器端的load balance来避免该错误,当然此方法不被推荐。
  alter system set remote_listener='' sid='*';

 也可以通过配置CMAN(Oracle Connection Manager)来解决该问题    

 

五、更多参考

有关Oracle RAC请参考
     使用crs_setperm修改RAC资源的所有者及权限
     使用crs_profile管理RAC资源配置文件
     RAC 数据库的启动与关闭
     再说 Oracle RAC services
     Services in Oracle Database 10g
     Migrate datbase from single instance to Oracle RAC
     Oracle RAC 连接到指定实例
     Oracle RAC 负载均衡测试(结合服务器端与客户端)
     Oracle RAC 服务器端连接负载均衡(Load Balance)
     Oracle RAC 客户端连接负载均衡(Load Balance)
     ORACLE RAC 下非缺省端口监听配置(listener.ora tnsnames.ora)
     ORACLE RAC 监听配置 (listener.ora tnsnames.ora)
     配置 RAC 负载均衡与故障转移
     CRS-1006 , CRS-0215 故障一例 
     基于Linux (RHEL 5.5) 安装Oracle 10g RAC
     使用 runcluvfy 校验Oracle RAC安装环境

有关Oracle 网络配置相关基础以及概念性的问题请参考:
     配置非默认端口的动态服务注册
     配置sqlnet.ora限制IP访问Oracle
     Oracle 监听器日志配置与管理
     设置 Oracle 监听器密码(LISTENER)
     配置ORACLE 客户端连接到数据库

有关基于用户管理的备份和备份恢复的概念请参考
     Oracle 冷备份
     Oracle 热备份
     Oracle 备份恢复概念
     Oracle 实例恢复
     Oracle 基于用户管理恢复的处理
     SYSTEM 表空间管理及备份恢复
     SYSAUX表空间管理及恢复
     Oracle 基于备份控制文件的恢复(unsing backup controlfile)

有关RMAN的备份恢复与管理请参考
     RMAN 概述及其体系结构
     RMAN 配置、监控与管理
     RMAN 备份详解
     RMAN 还原与恢复
     RMAN catalog 的创建和使用
     基于catalog 创建RMAN存储脚本
     基于catalog 的RMAN 备份与恢复
     RMAN 备份路径困惑
     使用RMAN实现异机备份恢复(WIN平台)
     使用RMAN迁移文件系统数据库到ASM
     linux 下RMAN备份shell脚本
     使用RMAN迁移数据库到异机

有关ORACLE体系结构请参考
     Oracle 表空间与数据文件
     Oracle 密码文件
     Oracle 参数文件
     Oracle 联机重做日志文件(ONLINE LOG FILE)
     Oracle 控制文件(CONTROLFILE)
     Oracle 归档日志
     Oracle 回滚(ROLLBACK)和撤销(UNDO)
     Oracle 数据库实例启动关闭过程
     Oracle 10g SGA 的自动化管理
     Oracle 实例和Oracle数据库(Oracle体系结构) 

上一篇:影响架构决策的非功能性需求


下一篇:Django访问静态资源