mariadb galera cluster 中的节点宕机日志解释以及如何避免ddl造成集群范围ha

2016-07-07 11:05:49 140011799914240 [Note] WSREP: (fb55d9c9, 'tcp://') turning message relay requesting on, nonlive peers: tcp:// 


2016-07-07 11:05:51 140011799914240 [Note] WSREP: (fb55d9c9, 'tcp://') reconnecting to 51f4c07a (tcp://, attempt 0


2016-07-07 11:05:55 140011799914240 [Note] WSREP: evs::proto(fb55d9c9, GATHER, view_id(REG,128a493c,122))suspecting node: 51f4c07a


2016-07-07 11:05:55 140011799914240 [Note] WSREP: evs::proto(fb55d9c9, GATHER, view_id(REG,128a493c,122))suspected node without join message, declaring inactive

  宣判节点宕掉。declaring inactive

2016-07-07 11:05:55 140011799914240 [Note] WSREP: declaring 128a493c at tcp:// stable


2016-07-07 11:05:55 140011799914240 [Note] WSREP: Node 128a493c state prim


2016-07-07 11:05:55 140011799914240 [Note] WSREP: view(view_id(PRIM,128a493c,123) memb {



} joined {

} left {

} partitioned {




2016-07-07 11:05:55 140011799914240 [Note] WSREP: save pc into disk

2016-07-07 11:05:55 140011799914240 [Note] WSREP:forgetting 51f4c07a (tcp://

2016-07-07 11:05:55 140011791521536 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2

 打印新的primary  COMPONENT 信息。

2016-07-07 11:05:55 140011791521536 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.

2016-07-07 11:05:55 140011799914240 [Note] WSREP: (fb55d9c9, 'tcp://') turning message relay requesting off

2016-07-07 11:05:55 140011791521536 [Note] WSREP: STATE EXCHANGE: sent state msg: b8a08882-43ef-11e6-884e-2beaea270cdb

2016-07-07 11:05:55 140011791521536 [Note] WSREP: STATE EXCHANGE: got state msg: b8a08882-43ef-11e6-884e-2beaea270cdb from 0 (node3)

2016-07-07 11:05:55 140011791521536 [Note] WSREP: STATE EXCHANGE: got state msg: b8a08882-43ef-11e6-884e-2beaea270cdb from 1 (node2)

2016-07-07 11:05:55 140011791521536 [Note] WSREP: Quorum results:

        version    = 4,

        component  = PRIMARY,

        conf_id    = 33,

        members    = 2/2 (joined/total),

        act_id     = 128561512,

        last_appl. = 0,

        protocols  = 0/7/3 (gcs/repl/appl),

        group UUID = afbfeed1-38e8-11e6-84b1-d20f5b484535

新的集群状态信息在新的primary component 内的节点进行交换与确认。

2016-07-07 11:05:55 140011791521536 [Note] WSREP: Flow-control interval: [23, 23]

2016-07-07 11:05:55 140011908541184 [Note] WSREP: New cluster view: global state: afbfeed1-38e8-11e6-84b1-d20f5b484535:128561512, view# 34: Primary, number of nodes: 2, my index: 1, protocol version 3

2016-07-07 11:05:55 140011908541184 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

2016-07-07 11:05:55 140011908541184 [Note] WSREP: REPL Protocols: 7 (3, 2)

2016-07-07 11:05:55 140011808306944 [Note] WSREP: Service thread queue flushed.

2016-07-07 11:05:55 140011908541184 [Note] WSREP: Assign initial position for certification: 128561512, protocol version: 3

2016-07-07 11:05:55 140011808306944 [Note] WSREP: Service thread queue flushed.

2016-07-07 11:05:58 140011799914240 [Note] WSREP:  cleaning up 51f4c07a (tcp://


  上面的日志中,涉及到进行节点宕机检查的timeout 时间,详细信息请参考官方文档,作者做了截图,请查看!


mariadb galera cluster 中的节点宕机日志解释以及如何避免ddl造成集群范围ha

顺便提一下,在galera 集群中,执行ddl会造成整个集群hang ,如果ddl只要执行的时间较长,在交易频率的系统上,则就造成了悲剧。下面的方式可以避免这种情况。

To run an ALTER statement in this manner, on each node run the following queries:


1. Change the Schema Upgrade method to Rolling Schema Upgrade.

  SET wsrep_OSU_method='RSU';


2. Run the ALTER statement.

执行alter 语句。

3. Reset the Schema Upgrade method back to Total Order Isolation.

   SET wsrep_OSU_method='TOI';



下一篇:程序员的算法趣题:Q19 朋友的朋友也是朋友吗(Java版)