两节点时间同步正常:
[root@rac2 trace]# chronyc sources
210 Number of sources = 1
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^* 1.1.1.1 11 4 377 4 -9474ns[ -33us] +/- 1543ms
[root@rac2 trace]# chronyc sourcestats
210 Number of sources = 1
Name/IP Address NP NR Span Frequency Freq Skew Offset Std Dev
==============================================================================
1.1.1.1 10 6 146 -0.003 0.535 -9ns 15us
[root@rac2 trace]# systemctl status chronyd
* chronyd.service - NTP client/server
Loaded: loaded (/usr/lib/systemd/system/chronyd.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2020-12-26 01:59:53 CST; 1 months 10 days ago
Docs: man:chronyd(8)
man:chrony.conf(5)
Main PID: 13693 (chronyd)
CGroup: /system.slice/chronyd.service
`-13693 /usr/sbin/chronyd
Dec 26 01:59:53 rac2.cywszx.com systemd[1]: Starting NTP client/server...
Dec 26 01:59:53 rac2.cywszx.com chronyd[13693]: chronyd version 3.4 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SIGND +ASYNCDNS +SECHASH +IPV6 +DEBUG)
Dec 26 01:59:53 rac2.cywszx.com chronyd[13693]: Enabled HW timestamping on eth0
Dec 26 01:59:53 rac2.cywszx.com chronyd[13693]: Frequency 7.111 +/- 0.183 ppm read from /var/lib/chrony/drift
Dec 26 01:59:53 rac2.cywszx.com systemd[1]: Started NTP client/server.
Dec 28 16:26:07 rac2.cywszx.com chronyd[13693]: Selected source 1.1.1.1
Dec 28 16:26:07 rac2.cywszx.com chronyd[13693]: System clock wrong by -35.524387 seconds, adjustment started
Dec 28 16:25:32 rac2.cywszx.com chronyd[13693]: System clock was stepped by -35.524387 seconds
Jan 11 03:12:17 rac2.cywszx.com chronyd[13693]: Can't synchronise: no selectable sources
Jan 18 14:15:38 rac2.cywszx.com chronyd[13693]: Selected source 1.1.1.1
# timedatectl
Local time: Fri 2021-02-05 10:54:45 CST
Universal time: Fri 2021-02-05 02:54:45 UTC
RTC time: Fri 2021-02-05 02:54:45
Time zone: Asia/Shanghai (CST, +0800)
NTP enabled: yes
NTP synchronized: yes
RTC in local TZ: no
DST active: n/a
节点1集群服务正常:
[grid@rac1 trace]$ crsctl stat res -t -init
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE ONLINE rac1 Started,STABLE
ora.crf
1 ONLINE ONLINE rac1 STABLE
ora.crsd
1 ONLINE ONLINE rac1 STABLE
ora.cssd
1 ONLINE ONLINE rac1 STABLE
ora.cssdmonitor
1 ONLINE ONLINE rac1 STABLE
ora.ctssd
1 ONLINE ONLINE rac1 OBSERVER,STABLE
ora.diskmon
1 ONLINE ONLINE rac1 STABLE
ora.drivers.acfs
1 ONLINE ONLINE rac1 STABLE
ora.evmd
1 ONLINE ONLINE rac1 STABLE
ora.gipcd
1 ONLINE ONLINE rac1 STABLE
ora.gpnpd
1 ONLINE ONLINE rac1 STABLE
ora.mdnsd
1 ONLINE ONLINE rac1 STABLE
ora.storage
1 ONLINE ONLINE rac1 STABLE
[grid@rac1 trace]$ ps -ef|grep ctssd
root 175876 1 0 2020 ? 03:39:51 /u01/app/19.0.0.0/grid/bin/octssd.bin reboot
grid 281553 215330 0 10:51 pts/2 00:00:00 grep --color=auto ctssd
节点2状态异常:
grid@rac2 trace]$ crsctl stat res -t -init
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE ONLINE rac2 STABLE
ora.crf
1 ONLINE ONLINE rac2 STABLE
ora.crsd
1 ONLINE ONLINE rac2 STABLE
ora.cssd
1 ONLINE ONLINE rac2 STABLE
ora.cssdmonitor
1 ONLINE ONLINE rac2 STABLE
ora.ctssd
1 OFFLINE OFFLINE Wrong check return.,
STABLE
ora.diskmon
1 ONLINE ONLINE rac2 STABLE
ora.drivers.acfs
1 ONLINE ONLINE rac2 STABLE
ora.evmd
1 ONLINE ONLINE rac2 STABLE
ora.gipcd
1 ONLINE ONLINE rac2 STABLE
ora.gpnpd
1 ONLINE ONLINE rac2 STABLE
ora.mdnsd
1 ONLINE ONLINE rac2 STABLE
ora.storage
1 ONLINE ONLINE rac2 STABLE
--------------------------------------------------------------------------------
[grid@rac2 trace]$ ps -ef | grep octssd.bin
root 374327 187266 0 10:36 pts/0 00:00:00 grep --color=auto octssd.bin
检查日志似乎无异常
[grid@rac2 trace]$ pwd
/u01/app/grid/diag/crs/rac2/crs/trace
尝试重启服务:
[grid@rac2 trace]$ crsctl start res ora.ctssd -init
CRS-2672: Attempting to start 'ora.ctssd' on 'rac2'
CRS-2676: Start of 'ora.ctssd' on 'rac2' succeeded
[grid@rac2 trace]$ crsctl stat res -t -init
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE ONLINE rac2 STABLE
ora.crf
1 ONLINE ONLINE rac2 STABLE
ora.crsd
1 ONLINE ONLINE rac2 STABLE
ora.cssd
1 ONLINE ONLINE rac2 STABLE
ora.cssdmonitor
1 ONLINE ONLINE rac2 STABLE
ora.ctssd
1 ONLINE ONLINE rac2 OBSERVER,STABLE
ora.diskmon
1 ONLINE ONLINE rac2 STABLE
ora.drivers.acfs
1 ONLINE ONLINE rac2 STABLE
ora.evmd
1 ONLINE ONLINE rac2 STABLE
ora.gipcd
1 ONLINE ONLINE rac2 STABLE
ora.gpnpd
1 ONLINE ONLINE rac2 STABLE
ora.mdnsd
1 ONLINE ONLINE rac2 STABLE
ora.storage
1 ONLINE ONLINE rac2 STABLE
--------------------------------------------------------------------------------
[grid@rac2 trace]$ ps -ef|grep ctssd.bin
root 21320 1 0 10:46 ? 00:00:00 /u01/app/19.0.0.0/grid/bin/octssd.bin reboot
grid 23276 389208 0 10:46 pts/0 00:00:00 grep --color=auto ctssd.bin
相关参考,以下均不适用本次处理:
http://www.dbaref.com/troubleshooting-rac-issues/whattodoif11gr2clusterwareisunhealthy
Grid Infrastructure Does not Start after Node Reboot as Master octssd.bin Stuck (Doc ID 1215893.1)
SOLUTION
1. Kill the master octssd.bin process, so that it can be respawned with new communication port:
as root user
ps -ef | grep octssd.bin
kill -9 <pid of octssd.bin>
2. Start CRS on node 3
If previous processes left running, stop them using:
as root user
crsctl stop crs -f
crsctl start crs
After this octssd.bin should start and the whole CRS stack also start.