亲爱的用户,您好:
1、gpnpd 进程 具体会尝试几次启动失败后,才会不再尝试重启,而保持 OFFLINE状态?
这个由ora.gpnpd资源的RESTART_ATTEMPTS属性决定。默认为10次。
crsctl stat res ora.gpnpd -p -init | grep RESTART_ATTEMPTS
RESTART_ATTEMPTS=10
2、 gpnpd 进程保持 OFFLINE状态,在哪里能看到这个 offline 状态?当时rac1上的crsctl start crs 根本无法启动,执行crsctl status res -t -init ,也是无法和集群通信报错的。只能在正常的rac2上才能正常执行 crsctl status res -t -init。
通常 crsctl status res -t -init 可以查看。
如果 crsctl status res -t -init 查看不了,需要查看 问题发生时段的 ohasd进程的trace日志文件(ohasd.trc)来确认。
谢谢
Oracle Support - 21 days ago [Notes]
亲爱的用户,您好!
您的更新已经收到,我们会尽快查看!感谢您的耐心等待。
谢谢
ZUPENG_LI@YMTC.COM - 21 days ago [Update from Customer]
您好,
“gpnpd 进程经过多次启动失败后,12/28 15:05 后不再尝试重启,保持 OFFLINE状态。
此后,在 12/30 您执行 crsctl 命令时,也因为 gpnpd 保持 OFFLINE 状态导致ocssd、ASM 无法启动而失败。”
<<<<<<
1、gpnpd 进程 具体会尝试几次启动失败后,才会不再尝试重启,而保持 OFFLINE状态?
2、 gpnpd 进程保持 OFFLINE状态,在哪里能看到这个 offline 状态?当时rac1上的crsctl start crs 根本无法启动,执行crsctl status res -t -init ,也是无法和集群通信报错的。只能在正常的rac2上才能正常执行 crsctl status res -t -init。
谢谢!
Oracle Support - 25 days ago [ODM Answer]
亲爱的用户,您好:
12月28日,主机1的网络是处于offline状态。
了解了。
当网络恢复正常后,此种情况,该如何处理以启动crs?
这种情况,需要手动把资源拉起来。
crsctl start res ora.gpnpd -init
查看资源状况
crsctl stat res -t -init
谢谢
Oracle Support - 25 days ago [ODM Question]
12月28日,主机1的网络是处于offline状态。
“此后,在 12/30 您执行 crsctl 命令时,也因为 gpnpd 保持 OFFLINE 状态导致
ocssd、ASM 无法启动而失败。”
<<<<<<<<
当网络恢复正常后,此种情况,该如何处理以启动crs?
ZUPENG_LI@YMTC.COM - 25 days ago [Update from Customer]
您好,
12月28日,主机1的网络是处于offline状态。
“此后,在 12/30 您执行 crsctl 命令时,也因为 gpnpd 保持 OFFLINE 状态导致
ocssd、ASM 无法启动而失败。”
<<<<<<<<
当网络恢复正常后,此种情况,该如何处理以启动crs?
感谢!
Best Regards
Oracle Support - 28 days ago [ODM Action Plan]
-------------------- ACTION PLAN DETAILS BELOW---------------------
亲爱的用户,您好:
感谢您的耐心等待,向您报告调查的进展。
从 gpnpd 的 trace 文件,可以看到,在 12/28 ,gpnpd 进程多次失败,
报 "no interfaces to filter in net data" 错误:
<gpnpd.trc>
2021-12-28 15:05:01.247 : CLSINET:4160576128: no interfaces to filter in net data <<<<<<<
2021-12-28 15:05:01.247 : GPNP:4160576128: clsgpnpd_lCheckIpTypes:
[at clsgpnpd.c:1719] Result: (1) CLSGPNP_ERR. (:GPNPD00120:) clsinet_ProfileGetNetData() failed, crv=1. <<<<<
2021-12-28 15:05:01.248 : GPNP:4160576128: clsgpnpd_term: [at clsgpnpd.c:1180] STOP GPnPD terminating. Closing connections...
2021-12-28 15:05:01.250 : default:4160576128: clsgpnpd_term STOP terminating.
2021-12-28 15:05:01.250 : GPNP:4160576128: clsgpnp_Term: [at clsgpnp0.c:1512] GPnP cli=gpnpd
2021-12-28 15:05:01.250 : GPNP:4160576128: clsgpnp_Term: [at clsgpnp0.c:1512] GPnP cli=clsinet
2021-12-28 15:05:01.251 : GPNP:4160576128: [at clsgpnp0.c:1443] Glob "gpnpd" ref dec (1) from "clsinet"
2021-12-28 15:05:01.252 : GPNP:4160576128: [at clsgpnp0.c:1430] Glob "gpnpd" terminated from "gpnpd" <<<<<<
gpnpd 进程经过多次启动失败后,12/28 15:05 后不再尝试重启,保持 OFFLINE
状态。
此后,在 12/30 您执行 crsctl 命令时,也因为 gpnpd 保持 OFFLINE 状态导致
ocssd、ASM 无法启动而失败。
综上所述,怀疑在 12/28 15:05 前后,私网网卡出现了故障。若要了解当时的详细
情形,麻烦您提供 12/28 15:05 前后的 OSWatcher 信息。如果没有 当时的数据,
请您和OS管理、网络管理人员协同,查看当时的网卡、网络通信等是否出现了问题。
Best Regards, 高 健 Oracle客户服务-中国数据库组
Oracle Support - 28 days ago [ODM Data Collection]
=== Data Collection ===
Filename = gpnpd.trc
2021-12-28 15:05:01.247 : CLSINET:4160576128: no interfaces to filter in net data
2021-12-28 15:05:01.247 : GPNP:4160576128: clsgpnpd_lCheckIpTypes:
[at clsgpnpd.c:1719] Result: (1) CLSGPNP_ERR. (:GPNPD00120:) clsinet_ProfileGetNetData() failed, crv=1. <<<<<
2021-12-28 15:05:01.248 : GPNP:4160576128: clsgpnpd_term: [at clsgpnpd.c:1180] STOP GPnPD terminating. Closing connections...
2021-12-28 15:05:01.250 : default:4160576128: clsgpnpd_term STOP terminating.
2021-12-28 15:05:01.250 : GPNP:4160576128: clsgpnp_Term: [at clsgpnp0.c:1512] GPnP cli=gpnpd
2021-12-28 15:05:01.250 : GPNP:4160576128: clsgpnp_Term: [at clsgpnp0.c:1512] GPnP cli=clsinet
2021-12-28 15:05:01.251 : GPNP:4160576128: [at clsgpnp0.c:1443] Glob "gpnpd" ref dec (1) from "clsinet"
2021-12-28 15:05:01.252 : GPNP:4160576128: [at clsgpnp0.c:1430] Glob "gpnpd" terminated from "gpnpd"
Filename = gpnpd.trc
Oracle Support - 29 days ago [ODM Data Collection]
=== Data Collection ===
Filename = crsd.trc
2021-12-28 13:13:17.563 : AGFW:2628663040: [ INFO] {1:39877:29678} Agfw Proxy Server received the message: CMD_COMPLETED[Proxy] ID 20482:8770684
2021-12-28 13:13:17.563 : AGFW:2628663040: [ INFO] {1:39877:29678} Agfw Proxy Server replying to the message: CMD_COMPLETED[Proxy] ID 20482:8770684
2021-12-28 13:13:17.573 :UiServer:1870640896: [ INFO] {1:39877:29678} Done for ctx=0x7fff1c062d40
2021-12-28 13:13:17.573 :UiServer:1870640896: [ INFO] {1:39877:29678} Informing CSS of successful CRS shutdown...
2021-12-28 13:13:17.574 :UiServer:1870640896: [ INFO] {1:39877:29678} Flushing repository write requests...
2021-12-28 13:13:17.574 : CRSD:1870640896: [ INFO] {1:39877:29678} Exiting on request of the Policy Engine...
2021-12-28 13:13:17.574 : CRSD:1870640896: [ INFO] {1:39877:29678} Done. <<<< last line
Filename = crsd.trc
Oracle Support - 29 days ago [ODM Data Collection]
=== Data Collection ===
Filename = alert_+ASM1.log
2021-12-28T13:13:28.533348+08:00
freeing rdom 4
freeing the fusion rht of pdb 4
freeing rdom 3
freeing the fusion rht of pdb 3
freeing rdom 2
freeing the fusion rht of pdb 2
freeing rdom 1
freeing the fusion rht of pdb 1
freeing rdom 0
freeing the fusion rht of pdb 0
2021-12-28T13:13:33.788148+08:00
Instance shutdown complete (OS id: 71392) <<<<<< last line
Filename = alert_+ASM1.log
Oracle Support - 29 days ago [ODM Issue Verification]
Verified the issue in the log file as noted below:
LOG FILE
Filename = node#1\alert.log
See the following error:
021-12-30 03:10:08.902 [CRSCTL(120577)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/crsctl_120577.trc.
2021-12-30 03:10:15.737 [CRSCTL(120720)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/crsctl_120720.trc.
Oracle Support - 29 days ago [ODM Data Collection]
=== Data Collection ===
Filename = crsctl_120577.trc
Trace file /grid/app/grid/diag/crs/p1rac1/crs/trace/crsctl_120577.trc
Oracle Database 19c Clusterware Release 19.0.0.0.0 - Production
Version 19.10.1.0.0 Copyright 1996, 2021 Oracle. All rights reserved.
default:4160564992: u_set_comp_error: comptype '103' : error '29' <<<<<<<<<<<
2021-12-30 03:10:02.479 : OCRRAW:4160564992: kgfnInitEnv env=0x7ffffffefef8 flags=0x0
2021-12-30 03:10:02.479 : OCRRAW:4160564992: kgfoCreateCtxExt2 trcflg: 0 [trclvl_in:3] ctx:0x5555562d16b0
2021-12-30 03:10:02.725 : OCRRAW:4160564992: kgxgncin: clsssinit: CLSS init failed with status 3
2021-12-30 03:10:02.725 : OCRRAW:4160564992: kgxgncin: clsssinit: return status 3 (0 SKGXN not av) from CLSS
2021-12-30 03:10:02.725 : OCRRAW:4160564992: kgfnFindLocalNode01: ORA-29701
2021-12-30 03:10:02.725*:kgfn.c@1381: kgfnFindLocalNode: ORA-29701 nmret=2
2021-12-30 03:10:02.725 : OCRRAW:4160564992: kgfnFindLocalNode: not ok
2021-12-30 03:10:02.725*:kgfn.c@1485: kgfnFindLocalNode: not ok
2021-12-30 03:10:02.725 : OCRRAW:4160564992: kgfnTgtInit: local node not found, free kgfnpds
2021-12-30 03:10:02.725*:kgfn.c@2271: kgfnTgtInit: not found
2021-12-30 03:10:02.725 : OCRRAW:4160564992: kgfnGetBeqData failed init target; inst=(null) flags=0x6000
2021-12-30 03:10:02.725*:kgfn.c@5993: kgfnGetBeqData: kgfnTgtInit failed, inst=NULL flags=0x6000
2021-12-30 03:10:02.729 : CLSNS:4160564992: clsns_SetTraceLevel:trace level set to 1.
2021-12-30 03:10:02.847 : OCRRAW:4160564992: 9607 Error 4 querying length of attr ASM_DISCOVERY_ADDRESS <<<
2021-12-30 03:10:02.851 : OCRRAW:4160564992: 9607 Error 4 querying length of attr ASM_STATIC_DISCOVERY_ADDRESS
2021-12-30 03:10:02.885 : OCRRAW:4160564992: 9325 Error 4 opening dom root in 0x555556559a50
......
2021-12-30 03:10:08.902*:kgfn.c@5513: kgfnConnect2: failed to connect
2021-12-30 03:10:08.902 : OCRRAW:4160564992: kgfnConnect2Retry: failed to connect connect after 2 attempts, 151s elapsed
2021-12-30 03:10:08.902 : OCRRAW:4160564992: kgfo_kge2slos error stack at kgfoAl06: ORA-15077: could not locate ASM instance serving a required diskgroup <<<<<<<<<<<
2021-12-30 03:10:08.902*:kgfo.c@1014: kgfo_kge2slos error stack at kgfoAl06: ORA-15077: could not locate ASM instance serving a required diskgroup
2021-12-30 03:10:08.902 : OCRRAW:4160564992: -- trace dump on error exit --
2021-12-30 03:10:08.902 : OCRRAW:4160564992: Error [kgfoAl06] in [kgfokge] at kgfo.c:3180
2021-12-30 03:10:08.902 : OCRRAW:4160564992: ORA-15077: could not locate ASM instance serving a required diskgroup
2021-12-30 03:10:08.902 : OCRRAW:4160564992: Category: 7
2021-12-30 03:10:08.902 : OCRRAW:4160564992: DepInfo: 15077
2021-12-30 03:10:08.902 : OCRRAW:4160564992: -- trace dump end --
OCRASM:4160564992: SLOS : SLOS: cat=7, opn=kgfoAl06, dep=15077, loc=kgfokge
2021-12-30 03:10:08.902 : OCRASM:4160564992: ASM Error Stack : ORA-15077: could not locate ASM instance serving a required diskgroup
2021-12-30 03:10:08.902 : OCRASM:4160564992: proprasmo: kgfoCheckMount returned [7]
2021-12-30 03:10:08.902 : OCRASM:4160564992: proprasmo: The ASM instance is down
2021-12-30 03:10:08.980 : OCRRAW:4160564992: proprioo: Failed to open [+DG_CRS_FEFL/p-rac/OCRFILE/registry.255.1078051179]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE.
2021-12-30 03:10:08.980 : OCRRAW:4160564992: proprioo: No OCR/OLR devices are usable
OCRUTL:4160564992: u_fill_errorbuf: Error Info : [Insufficient quorum to open OCR devices]
default:4160564992: u_set_gbl_comp_error: comptype '107' : error '0'
2021-12-30 03:10:08.980 : OCRRAW:4160564992: proprinit: Could not open raw device
2021-12-30 03:10:08.980 : default:4160564992: a_init:7!: Backend init unsuccessful : [26]
2021-12-30 03:10:08.982 : default:4160564992: clsvactversion:4: Retrieving Active Version from local storage.
Filename = crsctl_120577.trc
Oracle Support - 29 days ago [ODM Data Collection]
=== Data Collection ===
Filename = node#1\alert.log
2021-12-29 15:17:26.195 [GIPCD(22619)]CRS-42216: No interfaces are configured on the local node for interface definition bond1(:.)?:20.20.88.0: available interface definitions are [eno1(:.)?:10.131.12.0][bond0(:.)?:10.20.28.0].
2021-12-29 15:17:26.221 [GIPCD(22619)]CRS-42216: No interfaces are configured on the local node for interface definition bond1(:.)?:20.20.88.0: available interface definitions are [eno1(:.)?:10.131.12.0][bond0(:.)?:10.20.28.0].
2021-12-30 03:10:08.902 [CRSCTL(120577)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/crsctl_120577.trc.
2021-12-30 03:10:15.737 [CRSCTL(120720)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/crsctl_120720.trc.
2021-12-30 03:10:22.969 [CRSCTL(120872)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/crsctl_120872.trc.
2021-12-31 14:02:53.207 [CRSCTL(119476)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/crsctl_119476.trc.
2021-12-31 14:03:00.371 [CRSCTL(119668)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/crsctl_119668.trc.
2021-12-31 14:03:06.842 [OCRCONFIG(119799)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/ocrconfig_119799.trc.
2021-12-31 14:03:18.172 [OCRDUMP(121314)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/ocrdump_121314.trc.
2021-12-31 14:04:23.398 [CRSCTL(129904)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/crsctl_129904.trc.
2021-12-31 14:04:30.579 [CRSCTL(130934)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/crsctl_130934.trc.
2021-12-31 14:04:37.047 [OCRCONFIG(131208)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/ocrconfig_131208.trc.
2021-12-31 14:04:48.418 [OCRDUMP(132888)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/ocrdump_132888.trc.
2021-12-31 15:10:09.557 [CRSCTL(47630)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /grid/app/grid/diag/crs/p1rac1/crs/trace/crsctl_47630.trc.
Filename = node#1\alert.log
Oracle Support - 29 days ago [Notes]
亲爱的用户,您好:
关于 主机messages中发现的error信息, 目前尚不能确定它和 crs 启动不了的现象是否有关联。
此信息与如下文档的记载有些类似:
Error 'Multipathd: Asm!.Asm_ctl_spec: Failed To Store Path Info' found In /var/log/messages ( Doc ID 1268895.1 )
您可以尝试上述文档的方法,看看是否可以使得message 的信息消失。
我将继续调查 crs 启动不了的现象,若有进展,会再向您报告。
Best Regards, 高 健 Oracle客户服务-中国数据库组