Chapter 18

Sentinel: Raft Leader Election and Failover State Machine

Chapter 18 โ€” Sentinel: Raft-Based Leader Election and the Failover State Machine

Redis Sentinel is the official high-availability companion to the primary-replica architecture. It provides autonomous failure detection, leader election, and failover coordination. This chapter dissects the three periodic tasks that form Sentinel's heartbeat, the distinction between subjective and objective down states, Raft-inspired voting, and the nine-state failover machine โ€” with exact protocol messages, tuning parameters, and production pitfalls.


18.1 Topology Overview

Standard deployment: three Sentinels + one primary + two replicas:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Sentinel Layer (always deploy an odd number, minimum 3)     โ”‚
โ”‚  [Sentinel-1:26379]  [Sentinel-2:26379]  [Sentinel-3:26379] โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
       โ”‚  monitor + Pub/Sub โ”‚                      โ”‚
       โ–ผ                    โ–ผ                      โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Redis Data Layer                                             โ”‚
โ”‚  [Primary:6379] โ”€โ”€replicationโ”€โ”€โ–บ [Replica-1:6380]           โ”‚
โ”‚                 โ”€โ”€replicationโ”€โ”€โ–บ [Replica-2:6381]           โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Sentinels are themselves Redis processes (running in sentinel mode). They listen on port 26379 by default, store no business data, and communicate with each other via the primary's Pub/Sub system during discovery.


18.2 The Three Periodic Tasks

Sentinel's core behavior emerges from three timer-driven loops running inside sentinelTimer().

Task 1 โ€” PING Every 1 Second (Liveness Detection)

Sentinel sends PING to every monitored instance:

/* sentinel.c โ€” sentinelSendPing() */
void sentinelSendPing(sentinelRedisInstance *ri) {
    int retval = redisAsyncCommand(ri->link->cc, sentinelReceivePong,
        ri, "%s", "PING");
    if (retval == REDIS_OK) {
        ri->link->pending_commands++;
        ri->link->last_ping_time = mstime();
        if (ri->link->act_ping_time == 0)
            ri->link->act_ping_time = ri->link->last_ping_time;
    }
}

If an instance does not reply with a valid response within down-after-milliseconds (default 30,000 ms), it is flagged subjectively down (sdown) by this Sentinel.

Task 2 โ€” INFO Every 10 Seconds (Topology Discovery)

Sentinel sends INFO to the primary and all replicas to:

Sample INFO fragment from primary:
# Replication
role:master
connected_slaves:2
slave0:ip=192.168.1.2,port=6380,state=online,offset=1234560,lag=0
slave1:ip=192.168.1.3,port=6381,state=online,offset=1234100,lag=1

When a new replica appears, Sentinel immediately begins monitoring it (PING + INFO) without any configuration change.

Task 3 โ€” Pub/Sub Hello Every 2 Seconds (Sentinel Mutual Discovery)

Each Sentinel publishes a hello message to the __sentinel__:hello channel on the primary:

Message format (comma-separated):
<sentinel_ip>,<sentinel_port>,<sentinel_runid>,<current_epoch>,
<master_name>,<master_ip>,<master_port>,<master_config_epoch>

Example:
192.168.1.10,26379,abc1234...,5,mymaster,192.168.1.1,6379,5

All Sentinels subscribe to this channel. When Sentinel-1 receives Sentinel-2's hello message, it adds Sentinel-2 to its known Sentinel list and establishes a direct connection. No static Sentinel peer list is required in the configuration file.


18.3 Subjective Down (sdown) and Objective Down (odown)

Subjective Down

A single Sentinel declares a node sdown when it does not receive a valid response within down-after-milliseconds:

Valid responses:  +PONG, +LOADING, +MASTERDOWN
Invalid / missing:  timeout, connection refused, unexpected data

State transitions:
  active โ”€โ”€(timeout)โ”€โ”€โ–บ sdown
  sdown  โ”€โ”€(valid response)โ”€โ”€โ–บ active

sdown is a local judgment โ€” it never directly triggers failover.

Objective Down

Multiple Sentinels independently confirm the node is unreachable:

Protocol sequence:
1. Sentinel-1 marks primary M as sdown
2. Sentinel-1 queries other Sentinels:
   SENTINEL is-master-down-by-addr <ip> <port> <epoch> <*>
   (* = no vote request yet, just asking for down status)

3. Each Sentinel replies:
   *3\r\n
   :1\r\n          โ† 1 = "I also consider it down"
   $0\r\n\r\n      โ† no leader runid (not voting yet)
   :0\r\n          โ† leader epoch = 0

4. Sentinel-1 counts replies where down=1
5. When count >= quorum โ†’ marks primary as odown

Quorum recommendation: ceil(sentinel_count / 2) + 1

Sentinel Count Recommended Quorum Tolerated Sentinel Failures
3 2 1
5 3 2
7 4 3

Why quorum prevents split-brain: if only one Sentinel could declare odown, a Sentinel with a faulty network link could erroneously trigger failover on a healthy primary.


18.4 Raft-Inspired Leader Election

After a primary is declared odown, the Sentinels must agree on a single Leader Sentinel to execute the failover โ€” concurrent failover attempts from multiple Sentinels would produce conflicting configurations.

Election Trigger Conditions

Voting Protocol

1. Candidate Sentinel (usually the first to declare odown) increments currentEpoch:
   new_epoch = currentEpoch + 1

2. Candidate sends vote-request to all peer Sentinels:
   SENTINEL is-master-down-by-addr <ip> <port> <new_epoch> <candidate_runid>

3. Each peer Sentinel evaluates:
   โœ“ new_epoch > own currentEpoch          (valid epoch)
   โœ“ Has not voted in this epoch yet       (each epoch gets one vote)
   โ†’ Votes for candidate; updates own currentEpoch
   โ†’ Replies with: leader = <candidate_runid>

   โœ— Already voted in this epoch
   โ†’ Replies with: leader = <other_runid>  (whoever it already voted for)

4. Candidate collects votes:
   - Receives > (total_sentinels / 2) votes AND >= quorum โ†’ becomes Leader
   - Timeout without majority โ†’ wait and retry with epoch+1

Epoch as Conflict Prevention

The epoch is a monotonically increasing integer. Its properties:


18.5 The Nine-State Failover Machine

The Leader Sentinel executes failover through the sentinelFailoverStateMachine() function:

State 1 โ€” WAIT_START

Verify that this Sentinel is legitimately the Leader for the current epoch. If another Sentinel has already started a newer-epoch failover, abort.

State 2 โ€” SELECT_SLAVE

Choose the best candidate replica using a three-level comparator:

/* Simplified priority: */
int compareSlavesForPromotion(sentinelRedisInstance *a, sentinelRedisInstance *b) {
    /* 1. replica-priority: lower is better (0 = never promote) */
    if (a->slave_priority != b->slave_priority)
        return a->slave_priority - b->slave_priority;

    /* 2. replication offset: higher is better (more up-to-date) */
    if (b->slave_repl_offset != a->slave_repl_offset)
        return (int)(b->slave_repl_offset - a->slave_repl_offset);

    /* 3. runid: lexicographically smaller wins */
    return strcmp(a->runid, b->runid);
}

Replicas with replica-priority 0 are never selected.

State 3 โ€” SEND_SLAVEOF_NOONE

The Leader sends SLAVEOF NO ONE to the selected replica:

Leader โ†’ selected_replica:  SLAVEOF NO ONE
selected_replica โ†’ Leader:  +OK

The replica stops replication, increments its replication epoch, and begins operating as a standalone primary โ€” though other replicas don't know this yet.

State 4 โ€” WAIT_PROMOTION

The Leader polls the promoted replica every second with INFO until role:master appears in the response. Time limit: failover-timeout / 2. If promotion is not confirmed within this window, the failover is marked as failed and a new election cycle starts.

State 5 โ€” RECONF_SLAVES

The Leader sends SLAVEOF <new_primary_ip> <new_primary_port> to each remaining replica:

Leader โ†’ Replica-2:  SLAVEOF 192.168.1.2 6380
Replica-2 โ†’ Leader:  +OK

parallel-syncs controls how many replicas are reconfigured simultaneously:

sentinel parallel-syncs mymaster 1

Setting this to 1 prevents all replicas from simultaneously issuing full sync requests to the new primary (which could overwhelm it). Set higher only if the new primary has ample bandwidth and the replicas can perform partial resync (PSYNC2).

State 6 โ€” UPDATE_CONFIG

The Leader rewrites its sentinel.conf with the new primary address and the new config epoch:

# Updated sentinel.conf
sentinel monitor mymaster 192.168.1.2 6380 2
sentinel config-epoch mymaster 7
sentinel leader-epoch mymaster 7

The Leader also publishes the topology change on the +switch-master Pub/Sub channel.

States 7โ€“9 โ€” RESET (Waiting for Old Primary)

When the old primary comes back online, Sentinel detects it via PING and immediately sends SLAVEOF <new_primary> to demote it to a replica. The cluster returns to full redundancy.

Old primary restarts
  โ†’ Sentinel detects via PING recovery
  โ†’ Sends: SLAVEOF 192.168.1.2 6380
  โ†’ Old primary performs full sync with new primary
  โ†’ Cluster restored to primary + 2 replicas

18.6 Client Failover Integration

Pub/Sub Notification Channels

After failover completes, Sentinel publishes to these channels on the primary:

Channel                     Payload
+switch-master              <name> <old_ip> <old_port> <new_ip> <new_port>
+slave-reconf-sent          replica being reconfigured
+slave-reconf-inprog        replica sync in progress
+slave-reconf-done          replica reconfiguration complete
+failover-end               failover fully complete
-failover-abort             failover was aborted

Sentinel-Aware Client Usage (Python)

import redis
from redis.sentinel import Sentinel

sentinel = Sentinel(
    sentinels=[
        ('192.168.1.10', 26379),
        ('192.168.1.11', 26379),
        ('192.168.1.12', 26379),
    ],
    socket_timeout=0.1,
    password='redis_password',
)

# Sentinel client auto-discovers primary address
master = sentinel.master_for('mymaster', socket_timeout=0.1, decode_responses=True)
replica = sentinel.slave_for('mymaster', socket_timeout=0.1, decode_responses=True)

# After failover, the next call to master_for() returns the new primary address
# Connection errors during failover are retried automatically

Sentinel Query Commands

# Get current primary address
redis-cli -h sentinel1 -p 26379 SENTINEL get-master-addr-by-name mymaster
# 1) "192.168.1.2"
# 2) "6380"

# List all monitored primaries
redis-cli -h sentinel1 -p 26379 SENTINEL masters

# List replicas for a primary
redis-cli -h sentinel1 -p 26379 SENTINEL slaves mymaster

# List sibling Sentinels
redis-cli -h sentinel1 -p 26379 SENTINEL sentinels mymaster

# Manually trigger a failover (for testing)
redis-cli -h sentinel1 -p 26379 SENTINEL failover mymaster

18.7 Complete Sentinel Configuration

# sentinel.conf

port 26379
daemonize yes
logfile "/var/log/redis/sentinel.log"
dir "/var/run/redis"

# Monitor declaration: name, primary IP, primary port, quorum
sentinel monitor mymaster 192.168.1.1 6379 2

# Primary authentication password
sentinel auth-pass mymaster your_redis_password

# Milliseconds before declaring sdown
sentinel down-after-milliseconds mymaster 30000

# Milliseconds before a failover is considered timed out
# (actual timeout is 2x this value for retries)
sentinel failover-timeout mymaster 180000

# Number of replicas to reconfigure in parallel
sentinel parallel-syncs mymaster 1

# Sentinel-to-Sentinel authentication (Redis 5.0+)
sentinel sentinel-pass sentinel_shared_password

# Execute script on key events (sdown, odown, switch-master, etc.)
sentinel notification-script mymaster /etc/redis/notify.sh

# Execute script when a client needs to reconfigure after failover
sentinel client-reconfig-script mymaster /etc/redis/reconfig.sh

Starting Sentinel

# Method 1: Sentinel binary
redis-sentinel /etc/redis/sentinel.conf

# Method 2: redis-server in sentinel mode
redis-server /etc/redis/sentinel.conf --sentinel

# Verify
redis-cli -p 26379 INFO sentinel
# sentinel_masters:1
# sentinel_tilt:0
# sentinel_running_scripts:0
# sentinel_scripts_queue_length:0
# sentinel_simulate_failure_flags:0
# master0:name=mymaster,status=ok,address=192.168.1.1:6379,slaves=2,sentinels=3

Notification Script

#!/bin/bash
# /etc/redis/notify.sh
EVENT_TYPE=$1
EVENT_DESCRIPTION=$2

case "$EVENT_TYPE" in
    "+sdown")
        logger "WARN: Redis sdown detected: $EVENT_DESCRIPTION"
        ;;
    "+odown")
        logger "CRIT: Redis odown โ€” failover imminent: $EVENT_DESCRIPTION"
        curl -s -X POST "https://alerts.example.com/redis" \
             -H "Content-Type: application/json" \
             -d "{\"event\":\"$EVENT_TYPE\",\"details\":\"$EVENT_DESCRIPTION\"}"
        ;;
    "+switch-master")
        logger "INFO: Primary switched: $EVENT_DESCRIPTION"
        # Optionally update load balancer or service discovery
        ;;
esac

18.8 Tuning and Common Problems

Problem 1 โ€” Frequent sdown False Positives

Symptom: sdown events appear even when the primary is healthy
Cause:   down-after-milliseconds too small; transient network jitter exceeds it

Diagnosis:
redis-cli -p 26379 SENTINEL masters
# Check last-ping-reply vs last-ok-ping-reply fields

Fix (no restart needed):
redis-cli -p 26379 SENTINEL SET mymaster down-after-milliseconds 60000

Problem 2 โ€” Failover Times Out

Symptom: failover starts but never completes
Cause:   Full sync to the promoted replica takes longer than failover-timeout/2

Fix:
redis-cli -p 26379 SENTINEL SET mymaster failover-timeout 600000  # 10 minutes

Problem 3 โ€” Manual sentinel.conf Edits Are Overwritten

Sentinel rewrites sentinel.conf after every failover and configuration change.
Never edit sentinel.conf while Sentinel is running.
Instead, use SENTINEL SET:
redis-cli -p 26379 SENTINEL SET mymaster down-after-milliseconds 60000
redis-cli -p 26379 SENTINEL SET mymaster failover-timeout 300000

Problem 4 โ€” Choosing the Right Sentinel Count

3 Sentinels (minimum recommended):
  Quorum = 2; tolerates 1 Sentinel failure
  Minimum for production

5 Sentinels:
  Quorum = 3; tolerates 2 Sentinel failures
  For high-traffic, mission-critical environments

Never use 2 or 4:
  2: Any single failure breaks quorum
  4: Ties are possible โ€” no strict majority can form

18.9 When Sentinel Is the Right Choice (and When It Isn't)

Appropriate Scenarios

โœ“ Single primary + multiple replicas (vertical scale)
โœ“ Read-heavy workloads using read-from-replica pattern
โœ“ Data volume fits comfortably in single-node memory (<200โ€“500 GB)
โœ“ Automatic failover without Cluster complexity overhead
โœ“ Existing single-node Redis migration path

Limitations

โœ— No horizontal write scale โ€” all writes go to one primary
โœ— Memory limited by single machine (or network interface)
โœ— 30โ€“60 second write unavailability window during failover
โœ— Clients must use a Sentinel-aware library
โœ— Incompatible with Redis Cluster (cannot mix modes)

When data volume exceeds single-machine capacity or write throughput requires horizontal scaling, migrate to Redis Cluster (Chapters 19โ€“20).


Chapter Summary

Component Mechanism Key Configuration
Liveness detection 1s PING loop down-after-milliseconds
Topology discovery 10s INFO โ€”
Sentinel peer discovery 2s Pub/Sub hello sentinel monitor
sdown Single-Sentinel timeout down-after-milliseconds
odown Quorum agreement quorum value
Leader election Raft voting, epoch-gated โ€”
Failover 9-state machine failover-timeout, parallel-syncs
Client notification Pub/Sub +switch-master โ€”

Understanding Sentinel's internal machinery enables faster incident response, confident configuration decisions, and a clear mental model for when to upgrade to Cluster. Chapter 19 covers the 16,384-slot hash slot design and the Gossip protocol that powers cluster state propagation.

Rate this chapter
4.7  / 5  (13 ratings)

๐Ÿ’ฌ Comments