【ceph学习笔记】ceph -s 出现health HEALTH_ERR,pg remapped

背景

查看ceph集群状态的时候出现了ERR的错误,报错如下

[root@node1 ceph]# ceph -s
    cluster 8eaa3f15-0946-4500-b018-6d31d1cc69f6
     health HEALTH_ERR
            clock skew detected on mon.node2, mon.node3
            54 pgs are stuck inactive for more than 300 seconds
            121 pgs peering
            54 pgs stuck inactive
            85 pgs stuck unclean
            Monitor clock skew detected 
     monmap e1: 3 mons at {node1=192.168.209.100:6789/0,node2=192.168.209.101:6789/0,node3=192.168.209.102:6789/0}
            election epoch 266, quorum 0,1,2 node1,node2,node3
     osdmap e5602: 12 osds: 11 up, 11 in; 120 remapped pgs
            flags sortbitwise,require_jewel_osds
      pgmap v16259: 128 pgs, 1 pools, 0 bytes data, 0 objects
            1421 MB used, 54777 MB / 56198 MB avail
                 120 remapped+peering
                   7 active+clean
                   1 peering

可以看出pgmap的状态不对,正常的状态应该所有的pg都active+clean

问题解决

经过排查之后发现时ceph三个节点之间的时间不同步(NTP服务端没有自启动),造成pgmap混乱,只需要修改时间或者配置ntp服务即可。

查看时间

[root@node1 ceph]# date
Sun Sep  9 21:44:39 EDT 2018

[root@node2 ~]# date
Tue Sep  4 21:37:10 EDT 2018

[root@node3 ~]# date
Sun Sep  9 21:51:39 EDT 2018

启动了NTP服务端启动了NTP服务之后,时间完成同步,再次查看ceph的状态,可发现ceph集群恢复正常。

[root@node3 ~]# ceph -s
    cluster 8eaa3f15-0946-4500-b018-6d31d1cc69f6
     health HEALTH_OK
     monmap e1: 3 mons at {node1=192.168.209.100:6789/0,node2=192.168.209.101:6789/0,node3=192.168.209.102:6789/0}
            election epoch 278, quorum 0,1,2 node1,node2,node3
     osdmap e5647: 12 osds: 11 up, 11 in
            flags sortbitwise,require_jewel_osds
      pgmap v16402: 128 pgs, 1 pools, 0 bytes data, 0 objects
            1374 MB used, 54824 MB / 56198 MB avail
                 128 active+clean

【ceph学习笔记】ceph -s 出现health HEALTH_ERR,pg remapped

上一篇:JWT


下一篇:iOS学习二之UILabel