ora.ctssd OFFLINE Wrong check return.,

2023-12-23 18:30:27

两节点时间同步正常：
[root@rac2 trace]# chronyc sources
210 Number of sources = 1
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^* 1.1.1.1 11 4 377 4 -9474ns[ -33us] +/- 1543ms
[root@rac2 trace]# chronyc sourcestats
210 Number of sources = 1
Name/IP Address NP NR Span Frequency Freq Skew Offset Std Dev
==============================================================================
1.1.1.1 10 6 146 -0.003 0.535 -9ns 15us
[root@rac2 trace]# systemctl status chronyd
* chronyd.service - NTP client/server
Loaded: loaded (/usr/lib/systemd/system/chronyd.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2020-12-26 01:59:53 CST; 1 months 10 days ago
Docs: man:chronyd(8)
man:chrony.conf(5)
Main PID: 13693 (chronyd)
CGroup: /system.slice/chronyd.service
`-13693 /usr/sbin/chronyd

Dec 26 01:59:53 rac2.cywszx.com systemd[1]: Starting NTP client/server...
Dec 26 01:59:53 rac2.cywszx.com chronyd[13693]: chronyd version 3.4 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SIGND +ASYNCDNS +SECHASH +IPV6 +DEBUG)
Dec 26 01:59:53 rac2.cywszx.com chronyd[13693]: Enabled HW timestamping on eth0
Dec 26 01:59:53 rac2.cywszx.com chronyd[13693]: Frequency 7.111 +/- 0.183 ppm read from /var/lib/chrony/drift
Dec 26 01:59:53 rac2.cywszx.com systemd[1]: Started NTP client/server.
Dec 28 16:26:07 rac2.cywszx.com chronyd[13693]: Selected source 1.1.1.1
Dec 28 16:26:07 rac2.cywszx.com chronyd[13693]: System clock wrong by -35.524387 seconds, adjustment started
Dec 28 16:25:32 rac2.cywszx.com chronyd[13693]: System clock was stepped by -35.524387 seconds
Jan 11 03:12:17 rac2.cywszx.com chronyd[13693]: Can't synchronise: no selectable sources
Jan 18 14:15:38 rac2.cywszx.com chronyd[13693]: Selected source 1.1.1.1

# timedatectl
Local time: Fri 2021-02-05 10:54:45 CST
Universal time: Fri 2021-02-05 02:54:45 UTC
RTC time: Fri 2021-02-05 02:54:45
Time zone: Asia/Shanghai (CST, +0800)
NTP enabled: yes
NTP synchronized: yes
RTC in local TZ: no
DST active: n/a

节点1集群服务正常:

[grid@rac1 trace]$ crsctl stat res -t -init
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE ONLINE rac1 Started,STABLE
ora.crf
1 ONLINE ONLINE rac1 STABLE
ora.crsd
1 ONLINE ONLINE rac1 STABLE
ora.cssd
1 ONLINE ONLINE rac1 STABLE
ora.cssdmonitor
1 ONLINE ONLINE rac1 STABLE
ora.ctssd
1 ONLINE ONLINE rac1 OBSERVER,STABLE
ora.diskmon
1 ONLINE ONLINE rac1 STABLE
ora.drivers.acfs
1 ONLINE ONLINE rac1 STABLE
ora.evmd
1 ONLINE ONLINE rac1 STABLE
ora.gipcd
1 ONLINE ONLINE rac1 STABLE
ora.gpnpd
1 ONLINE ONLINE rac1 STABLE
ora.mdnsd
1 ONLINE ONLINE rac1 STABLE
ora.storage
1 ONLINE ONLINE rac1 STABLE

[grid@rac1 trace]$ ps -ef|grep ctssd
root 175876 1 0 2020 ? 03:39:51 /u01/app/19.0.0.0/grid/bin/octssd.bin reboot
grid 281553 215330 0 10:51 pts/2 00:00:00 grep --color=auto ctssd

节点2状态异常：
grid@rac2 trace]$ crsctl stat res -t -init
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE ONLINE rac2 STABLE
ora.crf
1 ONLINE ONLINE rac2 STABLE
ora.crsd
1 ONLINE ONLINE rac2 STABLE
ora.cssd
1 ONLINE ONLINE rac2 STABLE
ora.cssdmonitor
1 ONLINE ONLINE rac2 STABLE
ora.ctssd
1 OFFLINE OFFLINE Wrong check return.,
STABLE
ora.diskmon
1 ONLINE ONLINE rac2 STABLE
ora.drivers.acfs
1 ONLINE ONLINE rac2 STABLE
ora.evmd
1 ONLINE ONLINE rac2 STABLE
ora.gipcd
1 ONLINE ONLINE rac2 STABLE
ora.gpnpd
1 ONLINE ONLINE rac2 STABLE
ora.mdnsd
1 ONLINE ONLINE rac2 STABLE
ora.storage
1 ONLINE ONLINE rac2 STABLE
--------------------------------------------------------------------------------

[grid@rac2 trace]$ ps -ef | grep octssd.bin
root 374327 187266 0 10:36 pts/0 00:00:00 grep --color=auto octssd.bin
检查日志似乎无异常
[grid@rac2 trace]$ pwd
/u01/app/grid/diag/crs/rac2/crs/trace
尝试重启服务：
[grid@rac2 trace]$ crsctl start res ora.ctssd -init
CRS-2672: Attempting to start 'ora.ctssd' on 'rac2'
CRS-2676: Start of 'ora.ctssd' on 'rac2' succeeded
[grid@rac2 trace]$ crsctl stat res -t -init
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE ONLINE rac2 STABLE
ora.crf
1 ONLINE ONLINE rac2 STABLE
ora.crsd
1 ONLINE ONLINE rac2 STABLE
ora.cssd
1 ONLINE ONLINE rac2 STABLE
ora.cssdmonitor
1 ONLINE ONLINE rac2 STABLE
ora.ctssd
1 ONLINE ONLINE rac2 OBSERVER,STABLE
ora.diskmon
1 ONLINE ONLINE rac2 STABLE
ora.drivers.acfs
1 ONLINE ONLINE rac2 STABLE
ora.evmd
1 ONLINE ONLINE rac2 STABLE
ora.gipcd
1 ONLINE ONLINE rac2 STABLE
ora.gpnpd
1 ONLINE ONLINE rac2 STABLE
ora.mdnsd
1 ONLINE ONLINE rac2 STABLE
ora.storage
1 ONLINE ONLINE rac2 STABLE
--------------------------------------------------------------------------------
[grid@rac2 trace]$ ps -ef|grep ctssd.bin
root 21320 1 0 10:46 ? 00:00:00 /u01/app/19.0.0.0/grid/bin/octssd.bin reboot
grid 23276 389208 0 10:46 pts/0 00:00:00 grep --color=auto ctssd.bin

相关参考，以下均不适用本次处理：
http://www.dbaref.com/troubleshooting-rac-issues/whattodoif11gr2clusterwareisunhealthy

Grid Infrastructure Does not Start after Node Reboot as Master octssd.bin Stuck (Doc ID 1215893.1)

https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=418564448443711&id=1215893.1&displayIndex=4&_afrWindowMode=0&_adf.ctrl-state=1bugq5w5mk_189

SOLUTION

1. Kill the master octssd.bin process, so that it can be respawned with new communication port:
as root user
ps -ef | grep octssd.bin
kill -9 <pid of octssd.bin>

2. Start CRS on node 3
If previous processes left running, stop them using:
as root user
crsctl stop crs -f
crsctl start crs

After this octssd.bin should start and the whole CRS stack also start.

码农公寓

SOLUTION

相关文章