【RAC】参数CLUSTER_INTERCONNECTS
CLUSTER_INTERCONNECTS参数定义一个私有网络,这个参数将影响GCS和GES服务网络接口的选择。
该参数主要用于以下目的:
1.覆盖默认的内联网络
2.单一的网络带宽不能满足RAC数据库的带宽要求,增加带宽。
CLUSTER_INTERCONNECTS将信息存储在集群注册表中,明确覆盖以下内容:
1.存储在OCR中通过oifcfg命令查看的网络分类。
2.Oracle选择的默认内部连接。
该参数默认值是空,可以包含一到多个IP地址,用冒号分隔。
CLUSTER_INTERCONNECTS是一个静态参数,在修改CLUSTER_INTERCONNECTS时,必须对每个实例进行修改:
[html] view plain copy
- alter system set cluster_interconnects = '192.168.100.2' scope=spfile sid = 'rac1';
- alter system set cluster_interconnects = '192.168.100.3' scope=spfile sid = 'rac2';
可以指定多个网络接口作为内联网络:
[html] view plain copy
- alter system set cluster_interconnects = '192.168.100.2:192.168.101.2' scope=spfile sid = 'rac1';
- alter system set cluster_interconnects = '192.168.100.3:192.168.101.3' scope=spfile sid = 'rac2';
但是对于整个RAC可用性来说,使用HAIP或者系统级别的bonding是更好的选择。
Set Cluster Interconnects in Oracle RAC
To set the cluster interconnects in the RAC
- Delete any references of the cluster_interconnect on the interfaces.
Before
host1$ oifcfg getif ce0 192.168.1.0 global cluster_interconnect ce4 10.0.102.0 global public
Delete cluster interconnects using oifcfg.
host1$ oifcfg delif -global ce0
After
host1$ oifcfg getif ce4 10.0.102.0 global public
- The cluster_interconnects initialization parameter must be manually overrode its the default value from the OCR.
Before
SQL> select * from gv$cluster_interconnects; INST_ID NAME IP_ADDRESS IS_ SOURCE ---------- --------------- ---------------- --- ------------------------------- 1 ce0 192.168.1.50 NO Oracle Cluster Repository 2 ce0 192.168.1.51 NO Oracle Cluster Repository
Update the initialization parameters in both ASM and RAC database.
alter system set cluster_interconnects = '192.168.1.50' scope=spfile sid='RAC1' ; alter system set cluster_interconnects = '192.168.1.51' scope=spfile sid='RAC2' ;
After
SQL> select * from gv$cluster_interconnects; INST_ID NAME IP_ADDRESS IS_ SOURCE ---------- --------------- ---------------- --- ------------------------------- 1 ce0 192.168.1.50 NO cluster_interconnects parameter <== Source is changed 2 ce0 192.168.1.51 NO cluster_interconnects parameter <== Source is changed
Oracle CLUSTER_INTERCONNECTS参数详解
This note attempts to clarify the cluster_interconnects parameter and the
platforms on which the implementation has been made. A brief explanation on
the workings of the parameter has also been presented in this note.
This is also one of the most frequently questions related to cluster and RAC
installations on most sites and forms a part of the prerequisite as well.
ORACLE 9I RAC – Parameter CLUSTER_INTERCONNECTS
———————————————–
FREQUENTLY ASKED QUESTIONS
————————–
November 2002
CONTENTS
——–
1. What is the parameter CLUSTER_INTERCONNECTS for ?
2. Is the parameter CLUSTER_INTERCONNECTS available for all platforms ?
3. How is the Interconnect recognized on Linux ?
4. Where could I find more information on this parameter ?
5. How to detect which interconnect is used ?
6. Cluster_Interconnects is mentioned in the 9i RAC administration
guide as a Solaris specific parameter, is this the only platform
where this parameter is available ?
7. Are there any side effects for this parameter, namely affecting normal
operations ?
8. Is the parameter OPS_INTERCONNECTS which was available in 8i similar
to this parameter ?
9. Does Cluster_interconnect allow failover from one Interconnect to another
Interconnect ?
10. Is the size of messages limited on the Interconnect ?
11. How can you see which protocoll is being used by the instances ?
12. Can the parameter CLUSTER_INTERCONNECTS be changed dynamically during runtime ?
QUESTIONS & ANSWERS
——————-
1. What is the parameter CLUSTER_INTERCONNECTS for ?
Answer
——
This parameter is used to influence the selection of the network interface
for Global Cache Service (GCS) and Global Enqueue Service (GES) processing.
This note does not compare the other elements of 8i OPS with 9i RAC
because of substantial differences in the behaviour of both architectures.
Oracle 9i RAC has certain optimizations which attempt to transfer most of
the information required via the interconnects so that the number of disk
reads are minimized. This behaviour known as Cache fusion phase 2 is summarised
in Note 139436.1
The definition of the interconnnect is a private network which
will be used to transfer the cluster traffic and Oracle Resource directory
information and blocks to satisfy queries. The technical term for that is
cache fusion.
The CLUSTER_INTERCONNECTS should be used when
- you want to override the default network selection
- bandwith of a single interconnect does not meet the bandwith requirements of
a Real Application Cluster database
The syntax of the parameter is:
CLUSTER_INTERCONNECTS = if1:if2:…:ifn
Where if is an IP address in standard dotted-decimal format, for example,
144.25.16.214. Subsequent platform. implementations may specify interconnects
with different syntaxes.
2. Is the parameter CLUSTER_INTERCONNECTS available for all platforms ?
Answer
——
This parameter is configurable on most platforms.
This parameter can not be used on Linux.
The following Matrix shows when the parameter was introduced on which platform.:
Operating System Available since
AIX 9.2.0
HP/UX 9.0.1
HP Tru64 9.0.1
HP OPenVMS 9.0.1
Sun Solaris 9.0.1
References
———-
Bug <2119403> ORACLE9I RAC ADMINISTRATION SAYS CLUSTER_INTERCONNECTS IS SOLARIS ONLY.
Bug <2359300> ENHANCE CLUSTER_INTERCONNECTS TO WORK WITH 9I RAC ON IBM
3. How is the Interconnect recognized on Linux ?
Answer
——
Since Oracle9i 9.2.0.8 CLUSTER_INTECONNETCS can be used to change the interconnect.
A patch is also available for 9.2.0.7 under Patch 4751660.
Before 9.2.0.8 the Oracle implementation for the interface selection reads the ‘private hostname’
in the cmcfg.ora file and uses the corresponding ip-address for the interconnect.
If no private hostname is available the public hostname will be used.
4. Where could I find information on this parameter ?
Answer
——
The parameter is documented in the following books:
Oracle9i Database Reference Release 2 (9.2)
Oracle9i Release 1 (9.0.1) New Features in Oracle9i Database Reference -
What’s New in Oracle9i Database Reference?
Oracle9i Real Application Clusters Administration Release 2 (9.2)
Oracle9i Real Application Clusters Deployment and Performance Release 2 (9.2)
Also port specific documentation may contain information about the usage of
the cluster_interconnects parameter.
Documentation can be viewed on
http://tahiti.oracle.com
http://otn.oracle.com/documentation/content.html
References:
———–
Note 162725.1: OPS/RAC VMS: Using alternate TCP Interconnects on 8i OPS
and 9i RAC on OpenVMS
Note 151051.1: Init.ora Parameter “CLUSTER_INTERCONNECTS” Reference Note
5. How to detect which interconnect is used ?
The following commands show which interconnect is used for UDP or TCP:
sqlplus> connect / as sysdba
oradebug setmypid
oradebug ipc
exit
The corresponding trace can be found in the user_dump_dest directory and for
example contains the following information in the last couple of lines:
SKGXPCTX: 0x32911a8 ctx
admno 0x12f7150d admport:
SSKGXPT 0x3291db8 flags SSKGXPT_READPENDING info for network 0
socket no 9 IP 172.16.193.1 UDP 43307
sflags SSKGXPT_WRITESSKGXPT_UP
info for network 1
socket no 0 IP 0.0.0.0 UDP 0
sflags SSKGXPT_DOWN
context timestamp 0x1ca5
no ports
Please note that on some platforms and versions (Oracle9i 9.2.0.1 on Windows)
you might see an ORA-70 when the command oradebug ipc has not been
implemented.
When other protocols such as LLT, HMP or RDG are used, then the trace file will not
reveal an IP address.
6. Cluster_Interconnects is mentioned in the 9i RAC administration
guide as a Solaris specific parameter, is this the only platform
where this parameter is available ?
Answer
—–
This information that this parameter works on Solaris only is incorrect. Please
check the answer for question number 2 for the complete list of platforms for the same.
References:
———–
bug <2119403> ORACLE9I RAC ADMINISTRATION SAYS CLUSTER_INTERCONNECTS IS SOLARIS ONLY.
7. Are there any side effects for this parameter, namely affecting normal
operations ?
Answer
—–
When you set CLUSTER_INTERCONNECTS in cluster configurations, the
interconnect high availability features are not available. In other words,
an interconnect failure that is normally unnoticeable would instead cause
an Oracle cluster failure as Oracle still attempts to access the network
interface which has gone down. Using this parameter you are explicitly
specifying the interface or list of interfaces to be used.
8. Is the parameter OPS_INTERCONNECTS which was available in 8i similar
to this parameter ?
Answer
——
Yes, the parameter OPS_INTERCONNECTS was used to influence the network selection
for the Oracle 8i Parallel Server.
Reference
———
Note <120650.1> Init.ora Parameter “OPS_INTERCONNECTS” Reference Note
9. Does Cluster_interconnect allow failover from one Interconnect to another
Interconnect ?
Answer
——
Failover capability is not implemented at the Oracle level. In general this
functionality is delivered by hardware and/or Software of the operating system.
For platform. details please see Oracle platform. specific documentation
and the operating system documentation.
10. Is the size of messages limited on the Interconnect ?
Answer
——
The message size depends on the protocoll and platform.
UDP: In Oracle9i Release 2 (9.2.0.1) message size for UDP was limited to 32K.
Oracle9i 9.2.0.2 allows to use bigger UDP message sizes depending on the
platform. To increase throughput on an interconnect you have to adjust
udp kernel parameters.
TCP: There is no need to set the message size for TCP.
RDG: The recommendations for RDG are documented in
Oracle9i Administrator’s Reference – Part No. A97297-01
References
———-
Bug <2475236> RAC multiblock read performance issue using UDP IPC
11. How can you see which protocoll is being used by the instances ?
Answer
——
Please see the alert-file(s) of your RAC instances. During startup you’ll
find a message in the alert-file that shows the protocoll being used.
Wed Oct 30 05:28:55 2002
cluster interconnect IPC version:Oracle UDP/IP with Sun RSM disabled
IPC Vendor 1 proto 2 Version 1.0
12. Can the parameter CLUSTER_INTERCONNECT be changed dynamically during runtime ?
Answer
——
No. Cluster_interconnects is a static parameter and can only be set in the
spfile or pfile (init.ora)
通过实验详解CLUSTER_INTERCONNECTS参数对实例的影响
在Oracle RAC环境中,RAC实例的Cache Fusion通常都使用的是Clusterware的私有心跳网络,特别是11.2.0.2版本之后,多用HAIP技术,这种技术在提高带宽的同时(最多4个心跳网络),也保证了心跳网络的容错能力,例如:RAC节点服务器4条心跳网络,同时坏3条都不会引起Oracle RAC和Clusterware宕机。
但是当一套RAC环境中部署有多套数据库时,不同数据库实例之间的Cache Fusion活动会相互的影响,可能有些库对带宽要求高些,有些库对带宽要求低些,为了避免同一套RAC环境的多套数据库的心跳之间相互影响,Oracle在数据库层面提供了cluster_interconnects参数,该参数的作用就是覆盖默认的心跳网络,使用指定的网络用于数据库实例Cache Fusion活动,但该参数不具备容错的能力,下面我们通过实验来说明:
Oracle RAC环境:12.1.0.2.0 标准Cluster for Oracle Linux 5.9 x64。
一.网络配置。
>节点1:
[root@rhel1 ~]# ifconfig -a
eth0 Link encap:Ethernet HWaddr 00:50:56:A8:16:15 <<<< eth0管理网络。
inet addr:172.168.4.20 Bcast:172.168.4.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:13701 errors:0 dropped:522 overruns:0 frame:0
TX packets:3852 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1122408 (1.0 MiB) TX bytes:468021 (457.0 KiB)
eth1 Link encap:Ethernet HWaddr 00:50:56:A8:25:6B <<<< eth1公共网络。
inet addr:10.168.4.20 Bcast:10.168.4.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:23074 errors:0 dropped:520 overruns:0 frame:0
TX packets:7779 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:15974971 (15.2 MiB) TX bytes:2980403 (2.8 MiB)
eth1:1 Link encap:Ethernet HWaddr 00:50:56:A8:25:6B
inet addr:10.168.4.22 Bcast:10.168.4.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
eth1:2 Link encap:Ethernet HWaddr 00:50:56:A8:25:6B
inet addr:10.168.4.24 Bcast:10.168.4.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
eth2 Link encap:Ethernet HWaddr 00:50:56:A8:21:0A <<<< eth2心跳网络,属于Clusterware HAIP其中之一。
inet addr:10.0.1.20 Bcast:10.0.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:11322 errors:0 dropped:500 overruns:0 frame:0
TX packets:10279 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:6765147 (6.4 MiB) TX bytes:5384321 (5.1 MiB)
eth2:1 Link encap:Ethernet HWaddr 00:50:56:A8:21:0A
inet addr:169.254.10.239 Bcast:169.254.127.255 Mask:255.255.128.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
eth3 Link encap:Ethernet HWaddr 00:50:56:A8:F7:F7 <<<< eth3心跳网络,属于Clusterware HAIP其中之一。
inet addr:10.0.2.20 Bcast:10.0.2.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:347096 errors:0 dropped:500 overruns:0 frame:0
TX packets:306170 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:210885992 (201.1 MiB) TX bytes:173504069 (165.4 MiB)
eth3:1 Link encap:Ethernet HWaddr 00:50:56:A8:F7:F7
inet addr:169.254.245.28 Bcast:169.254.255.255 Mask:255.255.128.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
eth4 Link encap:Ethernet HWaddr 00:50:56:A8:DC:CC <<<< eth4~eth9心跳网络,但不属于Clusterware HAIP。
inet addr:10.0.3.20 Bcast:10.0.3.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:7247 errors:0 dropped:478 overruns:0 frame:0
TX packets:6048 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3525191 (3.3 MiB) TX bytes:2754275 (2.6 MiB)
eth5 Link encap:Ethernet HWaddr 00:50:56:A8:A1:86
inet addr:10.0.4.20 Bcast:10.0.4.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:40028 errors:0 dropped:480 overruns:0 frame:0
TX packets:23700 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:15139172 (14.4 MiB) TX bytes:9318750 (8.8 MiB)
eth6 Link encap:Ethernet HWaddr 00:50:56:A8:F7:53
inet addr:10.0.5.20 Bcast:10.0.5.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:13324 errors:0 dropped:470 overruns:0 frame:0
TX packets:128 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1075873 (1.0 MiB) TX bytes:16151 (15.7 KiB)
eth7 Link encap:Ethernet HWaddr 00:50:56:A8:E4:78
inet addr:10.0.6.20 Bcast:10.0.6.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:13504 errors:0 dropped:457 overruns:0 frame:0
TX packets:120 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1158553 (1.1 MiB) TX bytes:14643 (14.2 KiB)
eth8 Link encap:Ethernet HWaddr 00:50:56:A8:C0:B0
inet addr:10.0.7.20 Bcast:10.0.7.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:13272 errors:0 dropped:442 overruns:0 frame:0
TX packets:126 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1072609 (1.0 MiB) TX bytes:15999 (15.6 KiB)
eth9 Link encap:Ethernet HWaddr 00:50:56:A8:5E:F6
inet addr:10.0.8.20 Bcast:10.0.8.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:14316 errors:0 dropped:431 overruns:0 frame:0
TX packets:127 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1169023 (1.1 MiB) TX bytes:15293 (14.9 KiB)
节点2:
[root@rhel2 ~]# ifconfig -a <<<< 网络配置和节点1一致。
eth0 Link encap:Ethernet HWaddr 00:50:56:A8:C2:66
inet addr:172.168.4.21 Bcast:172.168.4.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:19156 errors:0 dropped:530 overruns:0 frame:0
TX packets:278 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:4628107 (4.4 MiB) TX bytes:37558 (36.6 KiB)
eth1 Link encap:Ethernet HWaddr 00:50:56:A8:18:1A
inet addr:10.168.4.21 Bcast:10.168.4.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:21732 errors:0 dropped:531 overruns:0 frame:0
TX packets:7918 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:4110335 (3.9 MiB) TX bytes:14783715 (14.0 MiB)
eth1:2 Link encap:Ethernet HWaddr 00:50:56:A8:18:1A
inet addr:10.168.4.23 Bcast:10.168.4.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
eth2 Link encap:Ethernet HWaddr 00:50:56:A8:1B:DD
inet addr:10.0.1.21 Bcast:10.0.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:410244 errors:0 dropped:524 overruns:0 frame:0
TX packets:433865 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:206461212 (196.8 MiB) TX bytes:283858870 (270.7 MiB)
eth2:1 Link encap:Ethernet HWaddr 00:50:56:A8:1B:DD
inet addr:169.254.89.158 Bcast:169.254.127.255 Mask:255.255.128.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
eth3 Link encap:Ethernet HWaddr 00:50:56:A8:2B:68
inet addr:10.0.2.21 Bcast:10.0.2.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:323060 errors:0 dropped:512 overruns:0 frame:0
TX packets:337911 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:176652414 (168.4 MiB) TX bytes:212347379 (202.5 MiB)
eth3:1 Link encap:Ethernet HWaddr 00:50:56:A8:2B:68
inet addr:169.254.151.103 Bcast:169.254.255.255 Mask:255.255.128.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
eth4 Link encap:Ethernet HWaddr 00:50:56:A8:81:DB
inet addr:10.0.3.21 Bcast:10.0.3.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:37308 errors:0 dropped:507 overruns:0 frame:0
TX packets:27565 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:10836885 (10.3 MiB) TX bytes:14973305 (14.2 MiB)
eth5 Link encap:Ethernet HWaddr 00:50:56:A8:43:EA
inet addr:10.0.4.21 Bcast:10.0.4.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:38506 errors:0 dropped:496 overruns:0 frame:0
TX packets:27985 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:10940661 (10.4 MiB) TX bytes:14859794 (14.1 MiB)
eth6 Link encap:Ethernet HWaddr 00:50:56:A8:84:76
inet addr:10.0.5.21 Bcast:10.0.5.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:13653 errors:0 dropped:484 overruns:0 frame:0
TX packets:114 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1102617 (1.0 MiB) TX bytes:14161 (13.8 KiB)
eth7 Link encap:Ethernet HWaddr 00:50:56:A8:B6:4F
inet addr:10.0.6.21 Bcast:10.255.255.255 Mask:255.0.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:13633 errors:0 dropped:474 overruns:0 frame:0
TX packets:115 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1101251 (1.0 MiB) TX bytes:14343 (14.0 KiB)
eth8 Link encap:Ethernet HWaddr 00:50:56:A8:97:62
inet addr:10.0.7.21 Bcast:10.0.7.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:13633 errors:0 dropped:459 overruns:0 frame:0
TX packets:115 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1102065 (1.0 MiB) TX bytes:14343 (14.0 KiB)
eth9 Link encap:Ethernet HWaddr 00:50:56:A8:28:10
inet addr:10.0.8.21 Bcast:10.0.8.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:13764 errors:0 dropped:446 overruns:0 frame:0
TX packets:115 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1159479 (1.1 MiB) TX bytes:14687 (14.3 KiB)
二.集群当前的心跳网络配置。
[grid@rhel1 ~]$ oifcfg getif
eth1 10.168.4.0 global public
eth2 10.0.1.0 global cluster_interconnect
eth3 10.0.2.0 global cluster_interconnect
三.cluster_interconnects参数调整前。
SQL> show parameter cluster_interconnect
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
cluster_interconnects string
cluster_interconnects默认为空。
SQL> select * from v$cluster_interconnects;
NAME IP_ADDRESS IS_ SOURCE CON_ID
--------------- ---------------- --- ------------------------------- ----------
eth2:1 169.254.10.239 NO 0
eth3:1 169.254.245.28 NO 0
V$CLUSTER_INTERCONNECTS displays one or more interconnects that are being used for cluster communication.
查询v$cluster_interconnects发现,当前RAC环境使用的是HAIP,请注意:这里显示的是HAIP地址,并不是系统配置的地址,这和之后的显示是有区别的。
四.调整cluster_interconnects参数。
调整cluster_interconnects参数,为了尽可能大的提高心跳带宽,我们为每台机器配置了9个心跳网络:
SQL> alter system set cluster_interconnects="10.0.1.20:10.0.2.20:10.0.3.20:10.0.4.20:10.0.5.20:10.0.6.20:10.0.7.20:10.0.8.20:10.0.9.20" scope=spfile sid='orcl1'; <<<< 注意IP之间用冒号隔开,双引号引起来;设置cluster_interconnects参数将覆盖掉通过oifcfg getif命令查看到的clusterware心跳网络,该网络也是RAC心跳通信的默认网络。
System altered.
SQL> alter system set cluster_interconnects="10.0.1.21:10.0.2.21:10.0.3.21:10.0.4.21:10.0.5.21:10.0.6.21:10.0.7.21:10.0.8.21:10.0.9.21" scope=spfile sid='orcl2';
System altered.
重启数据库实例收到如下报错:
Advanced Analytics and Real Application Testing options
[oracle@rhel1 ~]$ srvctl stop database -d orcl
[oracle@rhel1 ~]$ srvctl start database -d orcl
PRCR-1079 : Failed to start resource ora.orcl.db
CRS-5017: The resource action "ora.orcl.db start" encountered the following error:
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:ip_list failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpcini
ORA-27303: additional information: Too many IPs specified to SKGXP. Max supported is 4, given 9.
. For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/rhel2/crs/trace/crsd_oraagent_oracle.trc".
CRS-2674: Start of 'ora.orcl.db' on 'rhel2' failed
CRS-5017: The resource action "ora.orcl.db start" encountered the following error:
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:ip_list failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpcini
ORA-27303: additional information: Too many IPs specified to SKGXP. Max supported is 4, given 9.
. For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/rhel1/crs/trace/crsd_oraagent_oracle.trc".
CRS-2674: Start of 'ora.orcl.db' on 'rhel1' failed
CRS-2632: There are no more servers to try to place resource 'ora.orcl.db' on that would satisfy its placement policy
看来即使是使用cluster_interconnects网络地址也不能超过4个,这个跟HAIP一致。
于是,去掉后面的5个IP,保留前4个IP用于心跳网络:
节点1:10.0.1.20:10.0.2.20:10.0.3.20:10.0.4.20
节点2:10.0.1.21:10.0.2.21:10.0.3.21:10.0.4.21
五.测试cluster_interconnects参数容错的能力。
下面我们来测试一下cluster_interconnects的容错能力:
SQL> set linesize 200
SQL> select * from v$cluster_interconnects;
NAME IP_ADDRESS IS_ SOURCE CON_ID
--------------- ---------------- --- ------------------------------- ----------
eth2 10.0.1.20 NO cluster_interconnects parameter 0
eth3 10.0.2.20 NO cluster_interconnects parameter 0
eth4 10.0.3.20 NO cluster_interconnects parameter 0
eth5 10.0.4.20 NO cluster_interconnects parameter 0
重启实例之后发现当前RAC使用之前指定的4个IP用于心跳网络。
RAC双节点实例都正常运行:
[oracle@rhel1 ~]$ srvctl status database -d orcl
Instance orcl1 is running on node rhel1
Instance orcl2 is running on node rhel2
手动down掉节点1的其中一个心跳网卡:
[root@rhel1 ~]# ifdown eth4 <<<< 该网卡不是HAIP其中的IP网口。
[oracle@rhel1 ~]$ srvctl status database -d orcl
Instance orcl1 is running on node rhel1
Instance orcl2 is running on node rhel2
通过srvctl工具显示实例依然是运行状态。
用sqlplus本地登陆:
[oracle@rhel1 ~]$ sql
SQL*Plus: Release 12.1.0.2.0 Production on Tue Oct 20 18:11:35 2015
Copyright (c) 1982, 2014, Oracle. All rights reserved.
Connected.
SQL>
这个状态显然不对了。
检查告警日志,收到如下报错:
2015-10-20 18:10:22.996000 +08:00
SKGXP: ospid 32107: network interface query failed for IP address 10.0.3.20.
SKGXP: [error 32607]
2015-10-20 18:10:31.600000 +08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_qm03_453.trc (incident=29265) (PDBNAME=CDB$ROOT):
ORA-00603: ORACLE server session terminated by fatal error
ORA-27501: IPC error creating a port
ORA-27300: OS system dependent operation:bind failed with status: 99
ORA-27301: OS failure message: Cannot assign requested address
ORA-27302: failure occurred at: sskgxpsock
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_29265/orcl1_qm03_453_i29265.trc
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_cjq0_561.trc (incident=29297) (PDBNAME=CDB$ROOT):
ORA-00603: ORACLE server session terminated by fatal error
ORA-27544: Failed to map memory region for export
ORA-27300: OS system dependent operation:bind failed with status: 99
ORA-27301: OS failure message: Cannot assign requested address
ORA-27302: failure occurred at: sskgxpsock
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_29297/orcl1_cjq0_561_i29297.trc
2015-10-20 18:10:34.724000 +08:00
Dumping diagnostic data in directory=[cdmp_20151020181034], requested by (instance=1, osid=561 (CJQ0)), summary=[incident=29297].
2015-10-20 18:10:35.819000 +08:00
Dumping diagnostic data in directory=[cdmp_20151020181035], requested by (instance=1, osid=453 (QM03)), summary=[incident=29265].
从日志来看,实例并没有down掉,HANG在那里了,查看另一个节点的数据库实例日志,发现RAC的其他实例并没有报错,不受影响。
手动恢复网卡:
[root@rhel1 ~]# ifup eth4
随即实例恢复正常,整个过程实例并没有down掉。
那HAIP对应的网口down掉会不会影响实例呢?于是将eth2 down掉:
[root@rhel1 ~]# ifdown eth2
从测试来看,实例依然hang住,跟down掉非HAIP网口的情况一致,网口恢复后实例即恢复正常。
总结:从测试来看,不管指定的是HAIP网口,还是非HAIP网口,设置cluster_interconnects参数都将使心跳网络不具备容错能力,任何一个指定的网口出现问题,都将使实例HANG住,直到网口恢复正常,实例才能恢复正常,同时cluster_interconnects参数也只支持到4个IP地址。
虽然在RAC环境多数据库的情况下,通过设置数据库实例的cluster_interconnects初始化参数可以覆盖默认的clusterware心跳网络,多个数据库实例的心跳通信相互隔离,但指定的任何网卡出现故障都会引起实例HANG住,高可用性没有得到保障。