Sentinel: Raft Leader Election and Failover State Machine
Chapter 18 โ Sentinel: Raft-Based Leader Election and the Failover State Machine
Redis Sentinel is the official high-availability companion to the primary-replica architecture. It provides autonomous failure detection, leader election, and failover coordination. This chapter dissects the three periodic tasks that form Sentinel's heartbeat, the distinction between subjective and objective down states, Raft-inspired voting, and the nine-state failover machine โ with exact protocol messages, tuning parameters, and production pitfalls.
18.1 Topology Overview
Standard deployment: three Sentinels + one primary + two replicas:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Sentinel Layer (always deploy an odd number, minimum 3) โ
โ [Sentinel-1:26379] [Sentinel-2:26379] [Sentinel-3:26379] โ
โโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโ
โ monitor + Pub/Sub โ โ
โผ โผ โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Redis Data Layer โ
โ [Primary:6379] โโreplicationโโโบ [Replica-1:6380] โ
โ โโreplicationโโโบ [Replica-2:6381] โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Sentinels are themselves Redis processes (running in sentinel mode). They listen on port 26379 by default, store no business data, and communicate with each other via the primary's Pub/Sub system during discovery.
18.2 The Three Periodic Tasks
Sentinel's core behavior emerges from three timer-driven loops running inside sentinelTimer().
Task 1 โ PING Every 1 Second (Liveness Detection)
Sentinel sends PING to every monitored instance:
- The primary
- Every known replica
- Every known sibling Sentinel
/* sentinel.c โ sentinelSendPing() */
void sentinelSendPing(sentinelRedisInstance *ri) {
int retval = redisAsyncCommand(ri->link->cc, sentinelReceivePong,
ri, "%s", "PING");
if (retval == REDIS_OK) {
ri->link->pending_commands++;
ri->link->last_ping_time = mstime();
if (ri->link->act_ping_time == 0)
ri->link->act_ping_time = ri->link->last_ping_time;
}
}
If an instance does not reply with a valid response within down-after-milliseconds (default 30,000 ms), it is flagged subjectively down (sdown) by this Sentinel.
Task 2 โ INFO Every 10 Seconds (Topology Discovery)
Sentinel sends INFO to the primary and all replicas to:
- Discover newly added replicas (they appear in
INFO replicationoutput automatically). - Track replication offsets, roles, and the current
replid. - Detect role changes (e.g., a replica that has become a primary after a manual promotion).
Sample INFO fragment from primary:
# Replication
role:master
connected_slaves:2
slave0:ip=192.168.1.2,port=6380,state=online,offset=1234560,lag=0
slave1:ip=192.168.1.3,port=6381,state=online,offset=1234100,lag=1
When a new replica appears, Sentinel immediately begins monitoring it (PING + INFO) without any configuration change.
Task 3 โ Pub/Sub Hello Every 2 Seconds (Sentinel Mutual Discovery)
Each Sentinel publishes a hello message to the __sentinel__:hello channel on the primary:
Message format (comma-separated):
<sentinel_ip>,<sentinel_port>,<sentinel_runid>,<current_epoch>,
<master_name>,<master_ip>,<master_port>,<master_config_epoch>
Example:
192.168.1.10,26379,abc1234...,5,mymaster,192.168.1.1,6379,5
All Sentinels subscribe to this channel. When Sentinel-1 receives Sentinel-2's hello message, it adds Sentinel-2 to its known Sentinel list and establishes a direct connection. No static Sentinel peer list is required in the configuration file.
18.3 Subjective Down (sdown) and Objective Down (odown)
Subjective Down
A single Sentinel declares a node sdown when it does not receive a valid response within down-after-milliseconds:
Valid responses: +PONG, +LOADING, +MASTERDOWN
Invalid / missing: timeout, connection refused, unexpected data
State transitions:
active โโ(timeout)โโโบ sdown
sdown โโ(valid response)โโโบ active
sdown is a local judgment โ it never directly triggers failover.
Objective Down
Multiple Sentinels independently confirm the node is unreachable:
Protocol sequence:
1. Sentinel-1 marks primary M as sdown
2. Sentinel-1 queries other Sentinels:
SENTINEL is-master-down-by-addr <ip> <port> <epoch> <*>
(* = no vote request yet, just asking for down status)
3. Each Sentinel replies:
*3\r\n
:1\r\n โ 1 = "I also consider it down"
$0\r\n\r\n โ no leader runid (not voting yet)
:0\r\n โ leader epoch = 0
4. Sentinel-1 counts replies where down=1
5. When count >= quorum โ marks primary as odown
Quorum recommendation: ceil(sentinel_count / 2) + 1
| Sentinel Count | Recommended Quorum | Tolerated Sentinel Failures |
|---|---|---|
| 3 | 2 | 1 |
| 5 | 3 | 2 |
| 7 | 4 | 3 |
Why quorum prevents split-brain: if only one Sentinel could declare odown, a Sentinel with a faulty network link could erroneously trigger failover on a healthy primary.
18.4 Raft-Inspired Leader Election
After a primary is declared odown, the Sentinels must agree on a single Leader Sentinel to execute the failover โ concurrent failover attempts from multiple Sentinels would produce conflicting configurations.
Election Trigger Conditions
- Primary is in odown state.
- No recent failover for this primary (guard interval based on
failover-timeout). - No other Sentinel is currently executing a failover.
Voting Protocol
1. Candidate Sentinel (usually the first to declare odown) increments currentEpoch:
new_epoch = currentEpoch + 1
2. Candidate sends vote-request to all peer Sentinels:
SENTINEL is-master-down-by-addr <ip> <port> <new_epoch> <candidate_runid>
3. Each peer Sentinel evaluates:
โ new_epoch > own currentEpoch (valid epoch)
โ Has not voted in this epoch yet (each epoch gets one vote)
โ Votes for candidate; updates own currentEpoch
โ Replies with: leader = <candidate_runid>
โ Already voted in this epoch
โ Replies with: leader = <other_runid> (whoever it already voted for)
4. Candidate collects votes:
- Receives > (total_sentinels / 2) votes AND >= quorum โ becomes Leader
- Timeout without majority โ wait and retry with epoch+1
Epoch as Conflict Prevention
The epoch is a monotonically increasing integer. Its properties:
- Only one Leader can be elected per epoch (each Sentinel votes at most once per epoch).
- If two candidates race, only one achieves majority; the other must increment epoch and retry.
- Configuration updates carry the epoch โ downstream nodes always defer to the highest epoch, preventing a stale Leader from overwriting a newer configuration.
18.5 The Nine-State Failover Machine
The Leader Sentinel executes failover through the sentinelFailoverStateMachine() function:
State 1 โ WAIT_START
Verify that this Sentinel is legitimately the Leader for the current epoch. If another Sentinel has already started a newer-epoch failover, abort.
State 2 โ SELECT_SLAVE
Choose the best candidate replica using a three-level comparator:
/* Simplified priority: */
int compareSlavesForPromotion(sentinelRedisInstance *a, sentinelRedisInstance *b) {
/* 1. replica-priority: lower is better (0 = never promote) */
if (a->slave_priority != b->slave_priority)
return a->slave_priority - b->slave_priority;
/* 2. replication offset: higher is better (more up-to-date) */
if (b->slave_repl_offset != a->slave_repl_offset)
return (int)(b->slave_repl_offset - a->slave_repl_offset);
/* 3. runid: lexicographically smaller wins */
return strcmp(a->runid, b->runid);
}
Replicas with replica-priority 0 are never selected.
State 3 โ SEND_SLAVEOF_NOONE
The Leader sends SLAVEOF NO ONE to the selected replica:
Leader โ selected_replica: SLAVEOF NO ONE
selected_replica โ Leader: +OK
The replica stops replication, increments its replication epoch, and begins operating as a standalone primary โ though other replicas don't know this yet.
State 4 โ WAIT_PROMOTION
The Leader polls the promoted replica every second with INFO until role:master appears in the response. Time limit: failover-timeout / 2. If promotion is not confirmed within this window, the failover is marked as failed and a new election cycle starts.
State 5 โ RECONF_SLAVES
The Leader sends SLAVEOF <new_primary_ip> <new_primary_port> to each remaining replica:
Leader โ Replica-2: SLAVEOF 192.168.1.2 6380
Replica-2 โ Leader: +OK
parallel-syncs controls how many replicas are reconfigured simultaneously:
sentinel parallel-syncs mymaster 1
Setting this to 1 prevents all replicas from simultaneously issuing full sync requests to the new primary (which could overwhelm it). Set higher only if the new primary has ample bandwidth and the replicas can perform partial resync (PSYNC2).
State 6 โ UPDATE_CONFIG
The Leader rewrites its sentinel.conf with the new primary address and the new config epoch:
# Updated sentinel.conf
sentinel monitor mymaster 192.168.1.2 6380 2
sentinel config-epoch mymaster 7
sentinel leader-epoch mymaster 7
The Leader also publishes the topology change on the +switch-master Pub/Sub channel.
States 7โ9 โ RESET (Waiting for Old Primary)
When the old primary comes back online, Sentinel detects it via PING and immediately sends SLAVEOF <new_primary> to demote it to a replica. The cluster returns to full redundancy.
Old primary restarts
โ Sentinel detects via PING recovery
โ Sends: SLAVEOF 192.168.1.2 6380
โ Old primary performs full sync with new primary
โ Cluster restored to primary + 2 replicas
18.6 Client Failover Integration
Pub/Sub Notification Channels
After failover completes, Sentinel publishes to these channels on the primary:
Channel Payload
+switch-master <name> <old_ip> <old_port> <new_ip> <new_port>
+slave-reconf-sent replica being reconfigured
+slave-reconf-inprog replica sync in progress
+slave-reconf-done replica reconfiguration complete
+failover-end failover fully complete
-failover-abort failover was aborted
Sentinel-Aware Client Usage (Python)
import redis
from redis.sentinel import Sentinel
sentinel = Sentinel(
sentinels=[
('192.168.1.10', 26379),
('192.168.1.11', 26379),
('192.168.1.12', 26379),
],
socket_timeout=0.1,
password='redis_password',
)
# Sentinel client auto-discovers primary address
master = sentinel.master_for('mymaster', socket_timeout=0.1, decode_responses=True)
replica = sentinel.slave_for('mymaster', socket_timeout=0.1, decode_responses=True)
# After failover, the next call to master_for() returns the new primary address
# Connection errors during failover are retried automatically
Sentinel Query Commands
# Get current primary address
redis-cli -h sentinel1 -p 26379 SENTINEL get-master-addr-by-name mymaster
# 1) "192.168.1.2"
# 2) "6380"
# List all monitored primaries
redis-cli -h sentinel1 -p 26379 SENTINEL masters
# List replicas for a primary
redis-cli -h sentinel1 -p 26379 SENTINEL slaves mymaster
# List sibling Sentinels
redis-cli -h sentinel1 -p 26379 SENTINEL sentinels mymaster
# Manually trigger a failover (for testing)
redis-cli -h sentinel1 -p 26379 SENTINEL failover mymaster
18.7 Complete Sentinel Configuration
# sentinel.conf
port 26379
daemonize yes
logfile "/var/log/redis/sentinel.log"
dir "/var/run/redis"
# Monitor declaration: name, primary IP, primary port, quorum
sentinel monitor mymaster 192.168.1.1 6379 2
# Primary authentication password
sentinel auth-pass mymaster your_redis_password
# Milliseconds before declaring sdown
sentinel down-after-milliseconds mymaster 30000
# Milliseconds before a failover is considered timed out
# (actual timeout is 2x this value for retries)
sentinel failover-timeout mymaster 180000
# Number of replicas to reconfigure in parallel
sentinel parallel-syncs mymaster 1
# Sentinel-to-Sentinel authentication (Redis 5.0+)
sentinel sentinel-pass sentinel_shared_password
# Execute script on key events (sdown, odown, switch-master, etc.)
sentinel notification-script mymaster /etc/redis/notify.sh
# Execute script when a client needs to reconfigure after failover
sentinel client-reconfig-script mymaster /etc/redis/reconfig.sh
Starting Sentinel
# Method 1: Sentinel binary
redis-sentinel /etc/redis/sentinel.conf
# Method 2: redis-server in sentinel mode
redis-server /etc/redis/sentinel.conf --sentinel
# Verify
redis-cli -p 26379 INFO sentinel
# sentinel_masters:1
# sentinel_tilt:0
# sentinel_running_scripts:0
# sentinel_scripts_queue_length:0
# sentinel_simulate_failure_flags:0
# master0:name=mymaster,status=ok,address=192.168.1.1:6379,slaves=2,sentinels=3
Notification Script
#!/bin/bash
# /etc/redis/notify.sh
EVENT_TYPE=$1
EVENT_DESCRIPTION=$2
case "$EVENT_TYPE" in
"+sdown")
logger "WARN: Redis sdown detected: $EVENT_DESCRIPTION"
;;
"+odown")
logger "CRIT: Redis odown โ failover imminent: $EVENT_DESCRIPTION"
curl -s -X POST "https://alerts.example.com/redis" \
-H "Content-Type: application/json" \
-d "{\"event\":\"$EVENT_TYPE\",\"details\":\"$EVENT_DESCRIPTION\"}"
;;
"+switch-master")
logger "INFO: Primary switched: $EVENT_DESCRIPTION"
# Optionally update load balancer or service discovery
;;
esac
18.8 Tuning and Common Problems
Problem 1 โ Frequent sdown False Positives
Symptom: sdown events appear even when the primary is healthy
Cause: down-after-milliseconds too small; transient network jitter exceeds it
Diagnosis:
redis-cli -p 26379 SENTINEL masters
# Check last-ping-reply vs last-ok-ping-reply fields
Fix (no restart needed):
redis-cli -p 26379 SENTINEL SET mymaster down-after-milliseconds 60000
Problem 2 โ Failover Times Out
Symptom: failover starts but never completes
Cause: Full sync to the promoted replica takes longer than failover-timeout/2
Fix:
redis-cli -p 26379 SENTINEL SET mymaster failover-timeout 600000 # 10 minutes
Problem 3 โ Manual sentinel.conf Edits Are Overwritten
Sentinel rewrites sentinel.conf after every failover and configuration change.
Never edit sentinel.conf while Sentinel is running.
Instead, use SENTINEL SET:
redis-cli -p 26379 SENTINEL SET mymaster down-after-milliseconds 60000
redis-cli -p 26379 SENTINEL SET mymaster failover-timeout 300000
Problem 4 โ Choosing the Right Sentinel Count
3 Sentinels (minimum recommended):
Quorum = 2; tolerates 1 Sentinel failure
Minimum for production
5 Sentinels:
Quorum = 3; tolerates 2 Sentinel failures
For high-traffic, mission-critical environments
Never use 2 or 4:
2: Any single failure breaks quorum
4: Ties are possible โ no strict majority can form
18.9 When Sentinel Is the Right Choice (and When It Isn't)
Appropriate Scenarios
โ Single primary + multiple replicas (vertical scale)
โ Read-heavy workloads using read-from-replica pattern
โ Data volume fits comfortably in single-node memory (<200โ500 GB)
โ Automatic failover without Cluster complexity overhead
โ Existing single-node Redis migration path
Limitations
โ No horizontal write scale โ all writes go to one primary
โ Memory limited by single machine (or network interface)
โ 30โ60 second write unavailability window during failover
โ Clients must use a Sentinel-aware library
โ Incompatible with Redis Cluster (cannot mix modes)
When data volume exceeds single-machine capacity or write throughput requires horizontal scaling, migrate to Redis Cluster (Chapters 19โ20).
Chapter Summary
| Component | Mechanism | Key Configuration |
|---|---|---|
| Liveness detection | 1s PING loop | down-after-milliseconds |
| Topology discovery | 10s INFO | โ |
| Sentinel peer discovery | 2s Pub/Sub hello | sentinel monitor |
| sdown | Single-Sentinel timeout | down-after-milliseconds |
| odown | Quorum agreement | quorum value |
| Leader election | Raft voting, epoch-gated | โ |
| Failover | 9-state machine | failover-timeout, parallel-syncs |
| Client notification | Pub/Sub +switch-master |
โ |
Understanding Sentinel's internal machinery enables faster incident response, confident configuration decisions, and a clear mental model for when to upgrade to Cluster. Chapter 19 covers the 16,384-slot hash slot design and the Gossip protocol that powers cluster state propagation.