第 18 章

Sentinel 哨兵：Raft 选主与 failover 状态机

第18章 Sentinel 哨兵：Raft 选主与 failover 状态机

Redis Sentinel 是官方提供的高可用解决方案，负责监控主从集群、自动故障检测与故障转移。本章深入分析 Sentinel 的三个定时任务、主观/客观下线判定、基于 Raft 的 Leader 选举，以及 failover 九步状态机的完整执行过程。

18.1 Sentinel 架构概述

标准部署：3个哨兵 + 1主 + 2从：

┌─────────────────────────────────────────────────────┐
│  Sentinel 节点（推荐奇数个，最少3个）                  │
│  [Sentinel-1]  [Sentinel-2]  [Sentinel-3]           │
└──────┬──────────────┬──────────────┬────────────────┘
       │              │              │
       ▼              ▼              ▼
┌──────────────────────────────────────────────────────┐
│  Redis 节点                                           │
│  [Master:6379] ──replication──► [Slave-1:6380]      │
│                ──replication──► [Slave-2:6381]      │
└──────────────────────────────────────────────────────┘

哨兵本身也是 Redis 进程（redis-sentinel），监听默认端口 26379，但不存储业务数据，专门负责监控与协调。

18.2 三个定时任务（sentinel.c）

Sentinel 通过三个周期性任务实现自动发现与健康检查。

任务1：每1秒 PING（存活检测）

向以下所有节点发送 PING 命令：

监控的主库
已发现的所有从库
已知的所有其他哨兵

/* sentinel.c - sentinelSendPing() */
void sentinelSendPing(sentinelRedisInstance *ri) {
    int retval = redisAsyncCommand(ri->link->cc, sentinelReceivePong,
        ri, "%s", "PING");
    if (retval == REDIS_OK) {
        ri->link->pending_commands++;
        ri->link->last_ping_time = mstime();
        if (ri->link->act_ping_time == 0)
            ri->link->act_ping_time = ri->link->last_ping_time;
    }
}

如果节点在 down-after-milliseconds（默认30秒）内没有回复有效响应，该节点被标记为主观下线（sdown）。

任务2：每10秒 INFO（拓扑发现）

向主库和从库发送 INFO 命令，获取：

从库列表（slave0:ip=...,port=...,offset=...）
主库的 replid、repl_offset
节点角色（role: master/slave）

INFO 响应示例（主库）：
# Replication
role:master
connected_slaves:2
slave0:ip=192.168.1.2,port=6380,state=online,offset=1234560,lag=0
slave1:ip=192.168.1.3,port=6381,state=online,offset=1234100,lag=1

关键用途：当有新从库加入时，哨兵通过 INFO 自动发现，无需手动配置。哨兵同时向新从库建立 PING 连接，将其纳入监控范围。

任务3：每2秒 Pub/Sub 自我发布（哨兵间发现）

向主库的 __sentinel__:hello 频道发布自身信息：

发布内容格式（逗号分隔）：
<sentinel_ip>,<sentinel_port>,<sentinel_runid>,<current_epoch>,
<master_name>,<master_ip>,<master_port>,<master_config_epoch>

示例：

192.168.1.10,26379,abc123...,5,mymaster,192.168.1.1,6379,5

哨兵订阅同一频道，收到其他哨兵的发布后，将其加入已知哨兵列表，并建立直接连接。这实现了哨兵间的自动相互发现，无需在配置文件中列出所有哨兵地址。

18.3 主观下线（sdown）与客观下线（odown）

主观下线（Subjectively Down）

单个哨兵认为节点不可达：

条件：
- 发送 PING 后，在 down-after-milliseconds 毫秒内：
  - 没有收到 PONG
  - 或收到非预期响应（如 -LOADING，-MASTERDOWN 不算）

状态转换：
active → sdown（超时未响应）
sdown → active（收到有效响应）

sdown 只是单哨兵的主观判断，不会触发 failover。

客观下线（Objectively Down）

多个哨兵共同确认节点不可达：

流程：
1. 哨兵 S1 判定主库 sdown
2. S1 向其他哨兵发送询问（SENTINEL is-master-down-by-addr）：
   SENTINEL is-master-down-by-addr <ip> <port> <epoch> <runid/*>
3. 其他哨兵回复各自的判断：
   *3\r\n
   :1\r\n          ← 1表示自己也认为它down了
   $0\r\n\r\n      ← leader runid（尚未开始选举则为空）
   :0\r\n          ← leader epoch
4. S1 统计收到 "down=1" 的回复数量
5. 达到 quorum 个 → 标记为客观下线（odown）

quorum 推荐值：ceil(哨兵数 / 2) + 1

哨兵数	推荐 quorum	允许失效哨兵数
3	2	1
5	3	2
7	4	3

为什么 sdown → odown 需要 quorum

防止网络分区脑裂：若只有 S1 能判断，S1 网络故障时会错误地触发 failover。多数哨兵确认可以排除单点误判。

18.4 Raft 选主：选出 failover Leader

判定主库 odown 后，需要选出一个哨兵作为 Leader 来执行 failover（避免多个哨兵同时执行产生冲突）。

选举触发条件

主库已被标记为 odown
距上次 failover 超过 failover-timeout（防止频繁 failover）
当前没有其他 Leader 正在执行 failover

Raft 投票流程

1. 候选哨兵（通常是发现 odown 的那个）增加 current_epoch（配置纪元）
2. 候选哨兵向其他哨兵发送拉票请求（IS-MASTER-DOWN + 自己的 runid）：
   SENTINEL is-master-down-by-addr <ip> <port> <epoch> <my_runid>
3. 其他哨兵收到拉票请求：
   - 若在本 epoch 内还未投票 → 投票给候选哨兵，回复 leader=<candidate_runid>
   - 若已投票 → 回复自己投票给的 runid
4. 候选哨兵统计票数，获得超过半数（哨兵数/2 + 1）且 >= quorum → 成为 Leader
5. Leader 开始执行 failover

Epoch 防混乱机制

epoch 是单调递增的整数，确保：
- 同一轮选举不会产生两个 Leader（每轮 epoch 不同）
- 旧 Leader 的决策不会覆盖新 Leader（epoch 更大的哨兵具有更高权威）
- 配置更新携带 epoch，从库/客户端以最高 epoch 为准

18.5 failover 九步状态机

Leader 哨兵执行 failover 的完整状态机（sentinel.c 中的 sentinelFailoverStateMachine()）：

状态1：SENTINEL_FAILOVER_STATE_WAIT_START

等待 failover 开始条件满足（epoch 正确、无竞争）。

状态2：SENTINEL_FAILOVER_STATE_SELECT_SLAVE

选择最优从库，评分标准（按优先级从高到低）：

replica-priority（slave-priority）：数值越小优先级越高，0表示永不提升为主
复制偏移量（replication offset）：offset 越大（数据越新）越优先
runid 字典序：相同情况下，lexicographically 较小的 runid 优先

/* 伪代码：选择最优从库 */
int compareSlavesForPromotion(sentinelRedisInstance *a, sentinelRedisInstance *b) {
    // 1. 优先级（越小越好）
    if (a->slave_priority != b->slave_priority)
        return a->slave_priority - b->slave_priority;
    // 2. 复制偏移量（越大越好）
    if (b->slave_repl_offset != a->slave_repl_offset)
        return b->slave_repl_offset - a->slave_repl_offset;
    // 3. runid 字典序
    return strcmp(a->runid, b->runid);
}

状态3：SENTINEL_FAILOVER_STATE_SEND_SLAVEOF_NOONE

向选中的从库发送 SLAVEOF NO ONE（Redis 5.0+ 为 REPLICAOF NO ONE），使其脱离旧主：

SLAVEOF NO ONE
+OK

从库收到命令后停止复制，转变为独立主库角色，但此时还未被其他从库认可。

状态4：SENTINEL_FAILOVER_STATE_WAIT_PROMOTION

等待被选中的从库角色变更为 master（通过 INFO 命令确认 role:master）。

等待时间限制：failover-timeout / 2（超时则 failover 失败，重新选举）。

# 哨兵每秒 INFO 轮询提升中的从库：
redis-cli -h selected_slave_ip -p 6380 info replication
# 等待出现：role:master

状态5：SENTINEL_FAILOVER_STATE_RECONF_SLAVES

向其他从库发送 SLAVEOF new_master_ip new_master_port，让它们跟随新主：

SLAVEOF 192.168.1.2 6380
+OK

parallel-syncs 控制并行配置的从库数量：

sentinel parallel-syncs mymaster 1
# 一次只重新配置1个从库，减少新主库压力
# 设为2时，2个从库同时向新主发起全量同步

状态6：SENTINEL_FAILOVER_STATE_UPDATE_CONFIG

更新哨兵自身的配置文件（sentinel.conf），记录新主库地址：

# sentinel.conf 更新后：
sentinel monitor mymaster 192.168.1.2 6380 2    ← 新主地址
sentinel config-epoch mymaster 6                ← 新 epoch
sentinel leader-epoch mymaster 6

同时通过 Pub/Sub 通知所有监听 +switch-master 频道的客户端。

状态7：SENTINEL_FAILOVER_STATE_RESET（等待旧主恢复）

当旧主库恢复上线时，哨兵向其发送 SLAVEOF new_master_ip new_master_port，将其降级为新主的从库。

流程：
旧主重启 → 哨兵检测到（PING 恢复响应）
→ 哨兵向旧主发送 SLAVEOF 192.168.1.2 6380
→ 旧主以从库身份与新主进行全量同步
→ 集群恢复正常状态

18.6 客户端连接切换

通知方式：Pub/Sub

哨兵在 failover 完成后，向主库的以下频道发布事件：

频道                     内容
+switch-master           <master_name> <old_ip> <old_port> <new_ip> <new_port>
+slave-reconf-sent       从库重配置事件
+slave-reconf-done       从库重配置完成
+failover-end            failover 完成

客户端通过订阅这些频道感知主库切换：

import redis

sentinel = redis.Sentinel([
    ('192.168.1.10', 26379),
    ('192.168.1.11', 26379),
    ('192.168.1.12', 26379)
], socket_timeout=0.1)

# 智能客户端：自动从哨兵获取主库地址
master = sentinel.master_for('mymaster', socket_timeout=0.1)
slave = sentinel.slave_for('mymaster', socket_timeout=0.1)

# failover 发生后，下次调用 master_for() 会返回新主地址

哨兵接口命令

# 查询主库地址
redis-cli -h sentinel_ip -p 26379 SENTINEL get-master-addr-by-name mymaster
# 1) "192.168.1.2"
# 2) "6380"

# 查询所有主库信息
redis-cli -h sentinel_ip -p 26379 SENTINEL masters

# 查询特定主库的从库列表
redis-cli -h sentinel_ip -p 26379 SENTINEL slaves mymaster

# 查询其他哨兵
redis-cli -h sentinel_ip -p 26379 SENTINEL sentinels mymaster

# 手动触发 failover（测试用）
redis-cli -h sentinel_ip -p 26379 SENTINEL failover mymaster

18.7 完整配置示例

sentinel.conf

# 哨兵监听端口
port 26379

# 后台运行
daemonize yes

# 日志文件
logfile "/var/log/redis/sentinel.log"

# 工作目录
dir "/var/run/redis"

# 监控配置：mymaster 是自定义名称，quorum=2
sentinel monitor mymaster 192.168.1.1 6379 2

# 主库密码（如有）
sentinel auth-pass mymaster your_redis_password

# 主库不可达多久后判定为 sdown（毫秒）
sentinel down-after-milliseconds mymaster 30000

# failover 超时时间（毫秒）
# 实际超时为该值的 2 倍（用于重试）
sentinel failover-timeout mymaster 180000

# 并行重配置从库数量
sentinel parallel-syncs mymaster 1

# 哨兵间通信密码（Redis 5.0+）
sentinel sentinel-pass sentinel_password

# 通知脚本（failover 时触发）
sentinel notification-script mymaster /etc/redis/notify.sh

# 客户端重配置脚本（用于通知负载均衡器等）
sentinel client-reconfig-script mymaster /etc/redis/reconfig.sh

启动哨兵

# 方式1：使用 sentinel 模式
redis-sentinel /etc/redis/sentinel.conf

# 方式2：通过 redis-server
redis-server /etc/redis/sentinel.conf --sentinel

# 验证哨兵状态
redis-cli -p 26379 sentinel masters
redis-cli -p 26379 info sentinel

通知脚本示例

#!/bin/bash
# /etc/redis/notify.sh
# 参数：<event-type> <event-description>

EVENT_TYPE=$1
EVENT_DESC=$2

case $EVENT_TYPE in
    "+sdown")
        echo "ALERT: Redis instance subjectively down: $EVENT_DESC"
        ;;
    "+odown")
        echo "CRITICAL: Redis instance objectively down: $EVENT_DESC"
        # 发送告警（钉钉/PagerDuty/SMS）
        curl -X POST "https://alert.example.com/redis" \
            -d "{\"event\": \"$EVENT_TYPE\", \"desc\": \"$EVENT_DESC\"}"
        ;;
    "+switch-master")
        echo "INFO: Master switched: $EVENT_DESC"
        ;;
esac

18.8 Sentinel 常见问题与调优

问题1：哨兵误判（sdown 频繁出现）

症状：主库运行正常但频繁出现 sdown 告警
原因：down-after-milliseconds 设置过小，网络抖动触发误判

排查：
redis-cli -p 26379 sentinel masters | grep -A3 "mymaster"
# 查看 last-ping-reply 和 last-ok-ping-reply

调优：
sentinel down-after-milliseconds mymaster 60000  # 增大到60秒

问题2：failover 超时

症状：failover 开始但长时间未完成
原因：从库全量同步耗时过长，超过 failover-timeout / 2

调优：
sentinel failover-timeout mymaster 600000  # 增大到10分钟

# 同时确保从库有足够资源进行全量同步

问题3：哨兵配置文件被覆盖

注意：哨兵会自动重写 sentinel.conf（每次 failover 后）
不要在 sentinel 运行时手动编辑 sentinel.conf
如需修改，使用 SENTINEL SET 命令：
redis-cli -p 26379 SENTINEL set mymaster down-after-milliseconds 60000
redis-cli -p 26379 SENTINEL set mymaster failover-timeout 300000

问题4：奇偶数哨兵的选择

3个哨兵（推荐最小配置）：
- 允许1个哨兵故障
- quorum=2，需要2个哨兵确认

5个哨兵：
- 允许2个哨兵故障
- quorum=3
- 适合大规模生产环境

不要部署2个或4个哨兵（偶数）：
- 2个：任一故障 quorum 不足
- 4个：无法形成严格多数派，可能脑裂

18.9 Sentinel 适用场景与局限

适用场景

✓ 单主+多从的中小型部署（数据量 < 数百GB）
✓ 读写分离架构（写主库，读从库）
✓ 需要自动故障转移但不需要水平扩展
✓ 数据量可以放在单机内存中

局限性

✗ 不支持数据分片（所有数据在一个主库上）
✗ 扩展性受单机内存限制
✗ failover 期间（30~60秒）短暂不可写
✗ 客户端必须支持哨兵协议（通过哨兵获取主库地址）
✗ 与 Redis Cluster 不兼容（不能混用）

超过单机内存限制或需要水平扩展写入能力时，应迁移至 Redis Cluster（第19~20章）。

本章小结

组件	核心机制	关键配置
存活检测	每秒 PING	down-after-milliseconds
拓扑发现	每10秒 INFO	—
哨兵发现	每2秒 Pub/Sub	sentinel monitor
sdown	单哨兵超时	down-after-milliseconds
odown	quorum 确认	quorum 值
Leader 选举	Raft 投票	—
failover	9步状态机	failover-timeout，parallel-syncs
客户端通知	Pub/Sub +switch-master	—

Sentinel 是 Redis 高可用的核心组件。理解其内部工作机制，才能在生产中正确配置、快速排障，以及在 Sentinel 无法满足需求时平滑迁移至 Cluster。

本章评分

4.7 / 5 (13 评分)