Redis集群分析(29)

1、 主客观下线

在(27)(28)中,分析哨兵服务器发现从服务器和其他哨兵服务器的功能。剩下的三个功能(主客观下线、头领选举、故障迁移)关联较为紧密。这几个功能由主客观下线起始,会逐步引出剩下的两个功能。

主客观下线

主客观下线时哨兵对其他服务器的运行状态的一种标识,其中主观下线是面对其他所有的服务器,而客观下线只对主服务器执行。在(26)中提到的sentinelHandleRedisInstance方法中,代表主观下线的是sentinelCheckSubjectivelyDown方法,这个方法的内容如下:

/* Is this instance down from our point of view? */
void sentinelCheckSubjectivelyDown(sentinelRedisInstance *ri) {
    mstime_t elapsed = 0;

    if (ri->link->act_ping_time)
        elapsed = mstime() - ri->link->act_ping_time;
    else if (ri->link->disconnected)
        elapsed = mstime() - ri->link->last_avail_time;

    /* Check if we are in need for a reconnection of one of the
     * links, because we are detecting low activity.
     *
     * 1) Check if the command link seems connected, was connected not less
     *    than SENTINEL_MIN_LINK_RECONNECT_PERIOD, but still we have a
     *    pending ping for more than half the timeout. */
    if (ri->link->cc &&
        (mstime() - ri->link->cc_conn_time) >
        SENTINEL_MIN_LINK_RECONNECT_PERIOD &&
        ri->link->act_ping_time != 0 && /* There is a pending ping... */
        /* The pending ping is delayed, and we did not receive
         * error replies as well. */
        (mstime() - ri->link->act_ping_time) > (ri->down_after_period/2) &&
        (mstime() - ri->link->last_pong_time) > (ri->down_after_period/2))
    {
        instanceLinkCloseConnection(ri->link,ri->link->cc);
    }

    /* 2) Check if the pubsub link seems connected, was connected not less
     *    than SENTINEL_MIN_LINK_RECONNECT_PERIOD, but still we have no
     *    activity in the Pub/Sub channel for more than
     *    SENTINEL_PUBLISH_PERIOD * 3.
     */
    if (ri->link->pc &&
        (mstime() - ri->link->pc_conn_time) >
         SENTINEL_MIN_LINK_RECONNECT_PERIOD &&
        (mstime() - ri->link->pc_last_activity) > (SENTINEL_PUBLISH_PERIOD*3))
    {
        instanceLinkCloseConnection(ri->link,ri->link->pc);
    }

    /* Update the SDOWN flag. We believe the instance is SDOWN if:
     *
     * 1) It is not replying.
     * 2) We believe it is a master, it reports to be a slave for enough time
     *    to meet the down_after_period, plus enough time to get two times
     *    INFO report from the instance. */
    if (elapsed > ri->down_after_period ||
        (ri->flags & SRI_MASTER &&
         ri->role_reported == SRI_SLAVE &&
         mstime() - ri->role_reported_time >
          (ri->down_after_period+SENTINEL_INFO_PERIOD*2)))
    {
        /* Is subjectively down */
        if ((ri->flags & SRI_S_DOWN) == 0) {
            sentinelEvent(LL_WARNING,"+sdown",ri,"%@");
            ri->s_down_since_time = mstime();
            ri->flags |= SRI_S_DOWN;
        }
    } else {
        /* Is subjectively up */
        if (ri->flags & SRI_S_DOWN) {
            sentinelEvent(LL_WARNING,"-sdown",ri,"%@");
            ri->flags &= ~(SRI_S_DOWN|SRI_SCRIPT_KILL_SENT);
        }
    }
}

这个方法看起来很长,但实际很简单。主要分两部分,第一部分是16行和33行的两个if语句,这两个语句主要是检查在(27)中创建的两个链接是否正常,若不正常则关闭链接。第二部分是47行,这里是在检查是否为主观下线。

注意这里的主观下线是不会向服务器发送命令的,发送命令的操作是在(28)中解析的方法里进行的,这里的主观下线是根据之前的返回来做出判断结果而已。

主观下线分析完成后,接下来就开始分析客观下线。

在开始分析源码前,需要先了解客观下线的机制。首先哨兵模式里的哨兵是一个集群,集群中的某一台机器认为主服务器下线的时候,并不一定是主服务器真的故障了,有可能是这台服务器与主服务器的网络连接出了问题,而主服务器与其他服务器的网络是正常的,可以正常提供服务。为了避免这种情况,集群要求有一定数量的哨兵服务器都认为主服务器主观下线,集群才会认为主服务器真的下线,即客观下线。这里哨兵服务器的数量是由配置文件决定的。在配置sentinel monitor的最后一个参数quorum就是在指定这里的哨兵数量。

sentinel monitor <master-name> <ip> <redis-port> <quorum>

在了解完机制后,我们再来细看redis的源码。还是在sentinelHandleRedisInstance方法中,代表主观下线的方法是sentinelCheckObjectivelyDown方法。这个方法的实现如下:

/* Is this instance down according to the configured quorum?
 *
 * Note that ODOWN is a weak quorum, it only means that enough Sentinels
 * reported in a given time range that the instance was not reachable.
 * However messages can be delayed so there are no strong guarantees about
 * N instances agreeing at the same time about the down state. */
void sentinelCheckObjectivelyDown(sentinelRedisInstance *master) {
    dictIterator *di;
    dictEntry *de;
    unsigned int quorum = 0, odown = 0;

    if (master->flags & SRI_S_DOWN) {
        /* Is down for enough sentinels? */
        quorum = 1; /* the current sentinel. */
        /* Count all the other sentinels. */
        di = dictGetIterator(master->sentinels);
        while((de = dictNext(di)) != NULL) {
            sentinelRedisInstance *ri = dictGetVal(de);

            if (ri->flags & SRI_MASTER_DOWN) quorum++;
        }
        dictReleaseIterator(di);
        if (quorum >= master->quorum) odown = 1;
    }

    /* Set the flag accordingly to the outcome. */
    if (odown) {
        if ((master->flags & SRI_O_DOWN) == 0) {
            sentinelEvent(LL_WARNING,"+odown",master,"%@ #quorum %d/%d",
                quorum, master->quorum);
            master->flags |= SRI_O_DOWN;
            master->o_down_since_time = mstime();
        }
    } else {
        if (master->flags & SRI_O_DOWN) {
            sentinelEvent(LL_WARNING,"-odown",master,"%@");
            master->flags &= ~SRI_O_DOWN;
        }
    }
}

这个方法和主观下线的方法相同,只是负责检查是否符合客观下线而已。与其他服务器进行通信的操作并不在这里。

这段代码主要分为两部分,第一部分为12行的if语句,这里负责检查是否为主观下线,第二部分是27行的if语句,根据之前代码的结果为参数赋值。

在12行的if语句中检查是否主观下线的方式也很简单。首先是16行从参数master->sentinels中读取数据,这个参数我们在(28)中提到了在发现了其他哨兵服务器的时候,这些服务器会被实例化然后存储到一个字典中,这个字典就是master->sentinels。然后是17行使用了一个while循环遍历字典中的服务器,若其也标识了主观下线,那么quorum的数量加1.最后比较统计的quorum和配置的quorum,若大于则主观下线。

上一篇:神经网络的一些技巧


下一篇:【洛谷6630】[ZJOI2020] 传统艺能(动态规划+矩乘)