引言

Sentinel 是 Redis 的高可用解决方案。

Redis Sentinel文档里介绍了 Sentinel 在 Redis 系统中的几大功能：

监控（Monitoring）：定期检查 Redis 实例是否工作正常
通知（Notification）：当发现 Redis 实例工作异常时，通知系统管理员，主要是通过发布订阅与配置脚本两种方式
自动故障切换（Automatic failover）：当 Master 出现问题时，能够自动从 slaves 中选择一个成为新的 Master，并且使用 Redis 的应用也会被通知新 Master 的地址
配置提供者（Configuration provider）：类似服务发现的功能，客户端可以先连接它，通过它获得 Master 或者 Slave 的地址，然后再去连接相应的 Redis 实例

因为内容比较多，所以分两篇来介绍。

Sentinel 的启动

Sentinel 服务支持的命令和普通 Redis 实例不太一样，所以他会先请空 Redis 原本自己的命令表，将自己的命令表插入进行，sentinel.c: 645：

    // 清空 Redis 服务器的命令表（该表用于普通模式）
    dictEmpty(server.commands,NULL);
    // 将 SENTINEL 模式所用的命令添加进命令表
    for (j = 0; j < sizeof(sentinelcmds)/sizeof(sentinelcmds[0]); j++) {
        int retval;
        struct redisCommand *cmd = sentinelcmds+j;

        retval = dictAdd(server.commands, sdsnew(cmd->name), cmd);
        redisAssert(retval == DICT_OK);
    }

Sentinel 模式下命令表的内容如下：

// 服务器在 sentinel 模式下可执行的命令
struct redisCommand sentinelcmds[] = {
    {"ping",pingCommand,1,"",0,NULL,0,0,0,0,0},
    {"sentinel",sentinelCommand,-2,"",0,NULL,0,0,0,0,0},
    {"subscribe",subscribeCommand,-2,"",0,NULL,0,0,0,0,0},
    {"unsubscribe",unsubscribeCommand,-1,"",0,NULL,0,0,0,0,0},
    {"psubscribe",psubscribeCommand,-2,"",0,NULL,0,0,0,0,0},
    {"punsubscribe",punsubscribeCommand,-1,"",0,NULL,0,0,0,0,0},
    {"publish",sentinelPublishCommand,3,"",0,NULL,0,0,0,0,0},
    {"info",sentinelInfoCommand,-1,"",0,NULL,0,0,0,0,0},
    {"shutdown",shutdownCommand,-1,"",0,NULL,0,0,0,0,0}
};

可以发现支持的大多都是发布订阅相关的命令，主要用于当监控的 Redis 实例发现异常时，通知订阅者。

Sentinel 默认监听的端口号是 26379，也普通 Redis 监听的端口号哦 6379 也不太一样。

Sentinel 的配置文件

Sentinel 的启动要求必须给定一个配置文件，而且还比如对这个文件有写权限：

redis-server /path/to/sentinel.conf --sentinel

Sentinel 基本上是把这个配置文件当成数据库用了，有什么状态变化（比如新发现了一个 Slave）都会调用sentinelFlushConfig函数写进这个文件中，所以必须具有写权限：

void sentinelFlushConfig(void) {
    //...
}

配置文件的解析函数位于，sentinel.c:1842：

// Sentinel 配置文件分析器
char *sentinelHandleConfiguration(char **argv, int argc) {
     //...
}

Sentinel 的主函数

Sentinel 相比普通的 Redis Server 要更主动一点，它必须要主动监控 Redis 实例的当前状态。它的主动行为主要是依靠在之前提到过的 serverCron 中每 100ms 执行一次的 sentinelTimer函数实现的,redis.c:1549：

    run_with_period(100) {
        if (server.sentinel_mode) sentinelTimer();
    }

Redis 实例在 Sentinel 中的内存表示

除了本 Sentinel 以外，和本 Sentinel 相关的所有 Master, Slaves 以及其他 Sentinels 都会在内存被表示为一个 createSentinelRedisInstance 结构体：

typedef struct sentinelRedisInstance {
    //...
    
    // 每个 RedisInstance 都有一个名字
    char *name;

    // 其他同样监控这个主服务器的所有 sentinel
    dict *sentinels;

    // 如果这个实例代表的是一个主服务器
    // 那么这个字典保存着主服务器属下的从服务器
    // 字典的键是从服务器的名字，字典的值是从服务器对应的 sentinelRedisInstance 结构
    dict *slaves;


    // 主服务器的实例（在本实例为从服务器时使用）
    struct sentinelRedisInstance *master;
    
    //...

} sentinelRedisInstance;

本 Sentinel 相关的信息则保存在 sentinel 结构体中：

struct sentinelState {

    //...

    // 保存了所有被这个 sentinel 监视的主服务器
    // 字典的键是主服务器的名字
    // 字典的值则是一个指向 sentinelRedisInstance 结构的指针
    dict *masters;
    
    //...

} sentinel;

在 sentinel 结构体中保存了它监控的所有 Master。总体的引用关系看起来如下：

sentinel 的 RedisInstances

总得来说就是 sentinel 维护了所有的 Master，然后 Master 维护了它所有的 slave 以及所有正在监控该 Master 的 sentinels（除了本 Sentinel），这些结构体也反过来维护者一个到它自己的 Master 的指针。虽然我画得像数组，其实上面这些存储结构全都是字典，key 就是该 Redis 实例的名字（就是 sentinelRedisInstance 结构体的 name 字段），Redis 实例的命名规则如下：

Master：在配置文件中用 sentinel monitor xxxx 配置的名字
Slave 和 Sentinel：使用 <ip>:<port> 作为名字，比如 192.168.0.1:6380，如果 ipv6 地址的话，则是 [::1]:port

所有的 sentinelRedisInstance 都是通过 createSentinelRedisInstance 函数创建的，给 slave 和 sentinel 生成一个名字，同时还会顺便以名字为 key 将其放到它所应该在字典中：

sentinelRedisInstance *createSentinelRedisInstance(char *name, int flags, char *hostname,
        int port, int quorum, sentinelRedisInstance *master) {
    //...
    // 选择要添加的字典
    // 注意主服务会被添加到 sentinel.masters 字典
    // 而从服务器和 sentinel 则会被添加到 master 所属的 slaves 字典和 sentinels 字典中
    if (flags & SRI_MASTER) table = sentinel.masters;
    else if (flags & SRI_SLAVE) table = master->slaves;
    else if (flags & SRI_SENTINEL) table = master->sentinels;
    
    //...
}

自动发现 Slaves 和其他 Sentinels

Sentinel 的配置文件写起来非常简单，只要配置 Master 的地址即可：

sentinel monitor mymaster 127.0.0.1 6379 2

之后 Redis 就能自行发现该 Master 的 Slaves 以及正在监听该 Master 的其它 Sentinel 了。

下面我们具体分析其自动发现的流程。

在解析上面 monitor 命令时，会根据配置创建一个 sentinelRedisInstance 结构体存入全局的 sentinel.masters 字典中，key 就是它的名字，比如上面的配置就是 mymaster，sentinel.c:1856：

        // 创建主服务器实例
        if (createSentinelRedisInstance(argv[1],SRI_MASTER,argv[2],
                                        atoi(argv[3]),quorum,NULL) == NULL)

之后在 sentinelTimer 中会遍历所有配置的 master，sentinel.c:5039：

    sentinelHandleDictOfRedisInstances(sentinel.masters);

然后会给每个 master 创建两条连接，一条是 pc，一条是 cc：

pc: pubsub connect 的简称，用于订阅 __sentinel__:hello 频道，自动发现 Sentinel。所有监听同一个 Redis 实例的 Sentinel 都会在这个频道定时发布，让别的 Sentinel 发现自己。
cc：command connect 的简称，用于给 Redis 发送 Redis 命令，之所以要开一个新的连接发送命令，是因为 Redis 不允许处在订阅状态的连接执行命令

创建连接的源码如下，sentinel.c:2298：

void sentinelReconnectInstance(sentinelRedisInstance *ri) {
    //...
    if (ri->cc == NULL) {
        // 异步创建 cc
        //...
    }
    
    // 对主服务器和从服务器，创建一个用于订阅频道的连接
    if ((ri->flags & (SRI_MASTER|SRI_SLAVE)) && ri->pc == NULL) {
        // 异步创建 pc
        //...
        // 在 pc 创建成功后会紧接着订阅 __sentinel__:hello
        // 不用担心 pc 还没有异步创建成功，这里不会真的发送命令
        // 只是将命令暂存到了缓冲中，等到 pc 建立成功才会发送过去
        retval = redisAsyncCommand(ri->pc,
            sentinelReceiveHelloMessages, NULL, "SUBSCRIBE %s",
                SENTINEL_HELLO_CHANNEL);
    }
    // 如果实例是主服务器或者从服务器，那么当 cc 和 pc 两个连接都创建成功时，关闭 DISCONNECTED 标识
    // 如果实例是 Sentinel ，那么当 cc 连接创建成功时，关闭 DISCONNECTED 标识
    if (ri->cc && (ri->flags & SRI_SENTINEL || ri->pc))
        ri->flags &= ~SRI_DISCONNECTED;
}

pc 和 cc 都建立成功后就会关闭该 Master 的 SRI_DISCONNECTED 标识。在上面的代码，虽然我没有贴出来，Redis 给 pc 和 cc 的异步连接成功都设置了回调函数，一旦连接失败，会再次将 SRI_DISCONNECTED 置位。并且还会使用 CLIENT SETNAME 命令，将自己的客户端名字设置为 sentinel-<runid>-pubsub 这种形式，可以在 master 上执行 CLIENT LIST，并且在 name 字段看到相应的名字。

在使用 SUBSCRIBE 命令订阅 __sentinel__:hello 频道时还设置了一个回调函数，通过这个回调函数接收其他 Sentinel 发送的 hello 信息就可以发现正在监控该 Master 的其他 Sentinel，sentinel.c:2868：

void sentinelProcessHelloMessage(char *hello, int hello_len) {
    if (numtokens == 8) {
        // 查看该 sentinel 是否已知，避免 sentinel 重复添加
        // 遍历查找, 必须要 runid ip port 完全一样才行
        // runid 不同也不算
        si = getSentinelRedisInstanceByAddrAndRunID(
                        master->sentinels,token[0],port,token[2]);
        //...
        if (!si) {
            // 创建新发现的 Sentinel 的 RedisInstance
            si = createSentinelRedisInstance(NULL,SRI_SENTINEL,
                            token[0],port,master->quorum,master);
           if (si) {
               //...
               // 将最新的配置刷到硬盘
               sentinelFlushConfig();
           }
           //...
        }
        //...
    }
    //...
}

Sentinel 每隔 1s 会向 Master 发送一条 INFO 命令，询问情况，sentinel.c:3097：

        retval = redisAsyncCommand(ri->cc,
            sentinelInfoReplyCallback, NULL, "INFO");

它注册了一个回调函数，就在这个回调函数里，它通过 INFO 的返回信息自动发现 Slaves：

            // 如果发现有新的从服务器出现，则添加进 master.slaves
            if (sentinelRedisInstanceLookupSlave(ri,ip,atoi(port)) == NULL) {
                // 在创建 slave 的同时会将其加入 ri->slaves 中
                if ((slave = createSentinelRedisInstance(NULL,SRI_SLAVE,ip,
                            atoi(port), ri->quorum, ri)) != NULL)
                {
                    sentinelEvent(REDIS_NOTICE,"+slave",slave,"%@");
                }
            }

在 Sentinel 的定时函数中，会顺着 Master 对所有相关的 Slaves，Sentinels，sentinel.c:3830：

void sentinelHandleDictOfRedisInstances(dict *instances) {
    di = dictGetIterator(instances);
    while((de = dictNext(di)) != NULL) {
    
        //...
        if (ri->flags & SRI_MASTER) {

            // 所有从服务器
            sentinelHandleDictOfRedisInstances(ri->slaves);

            // 所有 sentinel
            sentinelHandleDictOfRedisInstances(ri->sentinels);
    }
}

通知

我们可以用一个普通的 Redis 客户端对着 Sentinel 上执行 subscribe +slave，这样每当发现新的 slave 就能够收到消息。除了 +slave 还有其他很多可以订阅的事件，具体见官方文档。

之前我们在看源码的时候，已经看到一个 sentinelEvent 函数调用，其实这个函数就是用来发布事件的，比如 sentinelEvent(REDIS_NOTICE,"+slave",slave,"%@");，就是发布一个 +slave 事件：

void sentinelEvent(int level, char *type, sentinelRedisInstance *ri,
                   const char *fmt, ...) {
    //...
}

稍微浏览一下就发现，这个函数主要做的事情就是打印 log，发布消息，执行通知脚本。

所谓通知脚本，就是在 conf 中通过 sentinel notification-script mymaster /var/redis/notify.sh 配置的脚本，如果事件的 level 达到 REDIS_WARNING 级别，就会执行配置的脚本，达到这个级别的基本都是一些异常事件，比如 +sdown，+odown 这种表示服务器下线的事件。

监控

监控 Master

监控的方式很简单，无非就是每秒发送一次 Redis 的 Ping 命令，sentinel.c:2378：

        /* Send PING to all the three kinds of instances. */
        sentinelSendPing(ri);

点进去上面的方法，就能找到 Ping 的回调函数，sentinel.c:2052：

void sentinelPingReplyCallback(redisAsyncContext *c, void *reply, void *privdata) {
        //...

        
        if (strncmp(r->str,"PONG",4) == 0 ||
            strncmp(r->str,"LOADING",7) == 0 ||
            strncmp(r->str,"MASTERDOWN",10) == 0)
        {
            // PING 返回 PONG LOADING 或者 MASTERDOWN 都认为正常
            ri->last_avail_time = mstime();
            ri->last_ping_time = 0;
        } else {
            // 运行异常
            //...
            
            if (strncmp(r->str,"BUSY",4) == 0 &&
                (ri->flags & SRI_S_DOWN) &&
                !(ri->flags & SRI_SCRIPT_KILL_SENT))
            {
                // 如果发现是因为脚本执行时间过长导致
                // 还会发送 SCRIPT KILL 命令帮助停止
                //...
            }
        }
        
        // last_pong_time 无论如何都会刷新
        // 每隔 1s 也是以 last_pong_time 为间隔的
        ri->last_pong_time = mstime();
}

三种认为是正常的 PING 响应及含义如下：

+PONG：正常执行的响应
-MASTERDOWN Link with MASTER is down and slave-serve-stale-data is set to 'no'.：所谓 MASTERDOWN，只有 Slave 有可能返回这个，因为如果你给 Slave 的 slave-serve-stale-data 的配置设置为 no 的话，一旦和主服务器连接超时，Slave 就不允许执行命令了（除了 INFO 和 SLAVEOF），执行任何命令都会返回这个错误信息。
-LOADING Redis is loading the dataset in memory：Redis 正在从 rdb 初始化数据库，此时无法执行任何命令，执行任何命令都会返回这个错误。

返回其他值则认为有异常，如果发现 PING 返回了 -BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.，Sentinel 还会帮忙停止脚本。

发现运行正常则直接将对应的 RedisInstance 的 last_avail_time 更新，同时把 last_ping_time 置 0，为什么要将 last_ping_time 置 0 呢？其实是让下一次发送 PING 命令时将其更新，sentinel.c:3035：

int sentinelSendPing(sentinelRedisInstance *ri) {
      //....
        // 只有上次 PING 返回成功了
        // 才会将 last_ping_time 改成这一次发送 PING 的时间
        if (ri->last_ping_time == 0) ri->last_ping_time = mstime();
        //....
}

所以 last_ping_time 没有表面上的那么好理解，它并不是上一次发送 PING 命令的时间，而是，而是最近一次收到正确的 PING 回复后，紧接着的一次发送 PING 命令的时间。

除了发送 PING 以外，还要进行超时检测，主要在 sentinelCheckSubjectivelyDown 函数：

void sentinelCheckSubjectivelyDown(sentinelRedisInstance *ri) {

    mstime_t elapsed = 0;

    if (ri->last_ping_time)
        // elapsed 下面会被用于超时判断
        elapsed = mstime() - ri->last_ping_time;
    
    // 如果 cc 长时间没有反应，则将其 kill 掉，下一轮重连
    if (ri->cc &&
        //...
    {
        //...
    }
    
    // 如果 pc 长时间没有反应，则将其 kill 掉，下一轮重连
    if (ri->pc && //...
    {
        if ((ri->flags & SRI_S_DOWN) == 0) { // S_DOWN 还没有被设置
            //...
            // 打开 SDOWN 标志
            ri->flags |= SRI_S_DOWN;
        }
    }
    
    // 超时
    if (elapsed > ri->down_after_period || //...
    {  
        //...
    }
}

超时的标准是距离 last_ping_time 时间已经超过了 down_after_period。其中 down_after_period ，就是在 conf 中配置的 sentinel down-after-milliseconds mymaster 60000 的值，如果不配置的话，默认是 30000，也就是 30s，由该 Master 自动发现的 Slaves 和 Sentinels 也会继承这个配置。

超时后就会被打开 SRI_S_DOWN 标志，这个 S_DOWN 是 subjective down（主观下线）的意思，即本 Sentinel 认为该 Redis 实例已经下线了。

如果是 Master 的话，还需要进一步判断是否进入 O_ODWN 状态，即客观下线（objective down）,必须要超过一定数目（quorum，即配置文件中的 sentinel monitor mymaster 127.0.0.1 6380 2 里面的 2）的 Sentinel 认为其已经下线，实例才能进入 O_DOWN 状态，sentinel.c:3798：

    // 如果当前 Sentinel 将主服务器判断为主观下线
    // 那么检查是否有其他 Sentinel 同意这一判断
    // 当同意的数量足够时，将主服务器判断为客观下线
    if (master->flags & SRI_S_DOWN) {
        /* Is down for enough sentinels? */

        // 统计同意的 Sentinel 数量（起始的 1 代表本 Sentinel）
        quorum = 1; /* the current sentinel. */

        /* Count all the other sentinels. */
        // 统计其他认为 master 进入下线状态的 Sentinel 的数量
        di = dictGetIterator(master->sentinels);
        while((de = dictNext(di)) != NULL) {
            sentinelRedisInstance *ri = dictGetVal(de);
                
            // 该 SENTINEL 也认为 master 已下线
            if (ri->flags & SRI_MASTER_DOWN) quorum++;
        }
        dictReleaseIterator(di);
        
        // 如果投票得出的支持数目大于等于判断 ODOWN 所需的票数
        // 那么进入 ODOWN 状态
        if (quorum >= master->quorum) odown = 1;
    }

从上面代码可以看出，当前 Sentinel 并没有通过网络去询问其他 Sentinel 是否认为 Master 已经下线，然后直接用了内存中缓存的结构，那又是什么时候询问的其他 Sentinel 的意见呢？它会在 sentinelAskMasterStateToOtherSentinels 函数调用中向其他每一个 Sentinel 发送一条 SENTINEL is-master-down-by-addr <ip> <port> * 来询问其他 Sentinel 对该 Master 的看法：

        // 如果有需要的话，向其他 Sentinel 发送 SENTINEL is-master-down-by-addr 命令(目的是刷新其他 Sentinel 状态)
        // 刷新其他 Sentinel 关于主服务器的状态
        sentinelAskMasterStateToOtherSentinels(ri,SENTINEL_NO_FLAGS);

监控 Sentinel

虽然在代码上对 Sentinel 也会有 PING 检测，也会进入 S_DOWN 状态，但是 Sentinel 的掉线却不是通过这个状态来判断，这可能也是 Redis 代码书写上有些不一致的地方。

Sentinel 的掉线是通过计算距离上一次它回复 SENTINEL is-master-down-by-addr 的时间，如果超过 5s 则认为其掉线，清楚它的一些信息（并不会删除它，Sentinel 的数目是永远不会减少的），sentinel.c:3920：

        mstime_t elapsed = mstime() - ri->last_master_down_reply_time;

        //...

        /* If the master state from other sentinel is too old, we clear it. */
        // 如果目标 Sentinel 关于主服务器的信息已经太久（5s）没更新，那么我们清除它
        if (elapsed > SENTINEL_ASK_PERIOD*5) {
            ri->flags &= ~SRI_MASTER_DOWN;
            sdsfree(ri->leader);
            ri->leader = NULL;
        }

之后这个 Sentinel 虽然还在内存中，但是它已经没有任何自己的观点了，比如它不会给 Master 下线投票的。

监控 Slave

PING 检测，如果进入 S_DOWN 状态，那么在故障切换的过程中，就不会再考虑这个 Slave了，sentinel.c:4393s：

        // 忽略所有 SDOWN 、ODOWN 或者已断线的从服务器
        if (slave->flags & (SRI_S_DOWN|SRI_O_DOWN|SRI_DISCONNECTED)) continue;