Sentinel Guide
Table of Contents
1. Architecture Overview
Redis Sentinel provides automatic failover, service discovery, and configuration management. A minimum of 3 Sentinel instances is recommended because quorum voting requires a majority to agree a master is down and to elect a failover leader. With only 2 Sentinels, a network partition can prevent majority consensus, risking split-brain scenarios.
2. Quick Setup
Step 1: Install Redis
Step 2: Configure Master
Step 3: Configure Replicas
Step 4: Configure Sentinels
Step 5: Start Everything
3. sentinel.conf Complete Reference
| Directive | Default | Description |
|---|---|---|
sentinel monitor <name> <ip> <port> <quorum> | - | Monitor a master. Quorum is the minimum Sentinels needed for ODOWN. Recommended: N/2+1 (2 for 3 nodes, 3 for 5 nodes) |
sentinel down-after-milliseconds | 30000 | Time without response before SDOWN. Production: 5000-10000ms. Too low causes false positives |
sentinel parallel-syncs | 1 | Number of replicas reconfigured simultaneously after failover. Higher = faster recovery, but replicas are unavailable during sync |
sentinel failover-timeout | 180000 | Failover timeout in ms. After this, Sentinel considers failover failed and allows another Sentinel to retry. Set at least 2x replication timeout |
sentinel auth-pass | - | Master password. Sentinel uses this to authenticate to master and replicas |
sentinel auth-user | - | Redis 6+ ACL username, used with auth-pass |
sentinel deny-scripts-reconfig | yes | Prevent changing notification script paths via SENTINEL SET. Must be enabled in production for security |
sentinel notification-script | - | Script executed on state changes. Receives event type and description as arguments |
sentinel client-reconfig-script | - | Script executed after failover with old/new master addresses. Useful for updating DNS or load balancers |
sentinel resolve-hostnames | no | Redis 6.2+, allows hostnames instead of IPs, useful for containers and DNS environments |
4. Sentinel API Commands
| Command | Description |
|---|---|
SENTINEL masters | List all monitored masters and their state |
SENTINEL master <name> | Return detailed info for a specific master (IP, port, flags, replica count, etc.) |
SENTINEL replicas <name> | List all replicas of a master |
SENTINEL sentinels <name> | List other Sentinel instances monitoring this master |
SENTINEL get-master-addr-by-name <name> | Return current master IP and port (used by clients for service discovery) |
SENTINEL ckquorum <name> | Check if enough Sentinels are online to reach quorum |
SENTINEL failover <name> | Manually trigger failover (no agreement from other Sentinels needed) |
SENTINEL reset <pattern> | Reset state for matching masters, clearing known replicas and Sentinels |
SENTINEL flushconfig | Force write running configuration to sentinel.conf |
SENTINEL pending-scripts | List pending notification scripts in queue |
SENTINEL myid | Return the unique ID of this Sentinel instance |
5. Failover Process
| Phase | Details |
|---|---|
| SDOWN | Subjective Down: a single Sentinel has not received a valid PING reply from the master within down-after-milliseconds. This is a unilateral judgment and does not trigger failover. |
| ODOWN | Objective Down: when quorum Sentinels report SDOWN, the state escalates to ODOWN. Sentinels confirm via SENTINEL is-master-down-by-addr messages. |
| Leader Election | Using a Raft-like algorithm, Sentinels vote to elect a leader to perform failover. Requires majority (N/2+1) votes. Each Sentinel votes only once per epoch to prevent duplicate elections. |
| Replica Selection | The leader Sentinel selects the best replica by: priority (replica-priority, lower = higher, 0 = never promote), replication offset (most up-to-date data), then Run ID (lexicographically smallest) as tiebreaker. |
| Promotion | Sends REPLICAOF NO ONE to the chosen replica, making it a standalone master. Waits for confirmation that role switch is complete. |
| Reconfiguration | All other replicas execute REPLICAOF <new-master> to follow the new master. Sentinel updates its config and publishes +switch-master via Pub/Sub to notify clients. |
min-replicas-to-write and min-replicas-max-lag to limit the data loss window.6. Client Connection
Node.js (ioredis)
Python (redis-py)
Java (Lettuce)
Go (go-redis)
7. Sentinel vs Redis Cluster
| Feature | Sentinel | Redis Cluster |
|---|---|---|
| Primary Use | High availability (HA) | HA + data sharding |
| Data Sharding | No, single master holds all data | Auto-sharded across 16384 hash slots |
| Data Capacity | Limited to single-node memory | Scales linearly |
| Min Nodes | 3 Sentinel + 1M + 2R = 6 | 3 masters + 3 replicas = 6 |
| Failover Time | ~5-15s | ~1-5s |
| Client Complexity | Requires Sentinel-aware client | Requires Cluster-aware client, handles MOVED/ASK redirects |
| Multi-key Ops | Fully supported (single master) | Only for same-slot keys, use {hash_tag} |
| Best For | Datasets < 25GB, simple HA needed | Large datasets, horizontal scaling |
8. Production Best Practices
Network & Deployment Topology
Deploy 3 or 5 Sentinels across different availability zones or racks. Avoid co-locating Sentinel with Redis data nodes on the same machine -- if that machine dies, you lose both a voter and a monitored instance. Use an odd number of Sentinels (3 or 5); even numbers add complexity without improving fault tolerance.
Quorum Sizing
Quorum controls the ODOWN detection threshold, but failover election requires a majority (N/2+1). 3 nodes quorum=2: tolerates 1 Sentinel failure. 5 nodes quorum=3: tolerates 2 failures, suitable for 3-datacenter deployments.
Monitoring & Alerting
Security: TLS + ACL
Data Safety Settings
9. Troubleshooting
| Issue | Cause & Solution |
|---|---|
| Split-Brain | Network partition causes two masters to coexist. Prevention: set min-replicas-to-write 1 so the isolated old master rejects writes when it cannot reach replicas. Ensure Sentinels are deployed across at least 3 network zones. |
| SDOWN Flapping | Sentinel repeatedly toggles between SDOWN and normal. Usually caused by down-after-milliseconds set too low or network latency jitter. Fix: increase timeout to 10000ms+, check network quality between Sentinel and master. |
| Failover Not Triggering | Possible causes: (1) not enough Sentinels online for quorum -- verify with SENTINEL ckquorum. (2) A failed failover within failover-timeout cooldown period. (3) No eligible replicas (all have priority=0 or are disconnected). |
| Replication Lag After Failover | Replicas of new master require full sync (RDB transfer). Mitigation: ensure repl-diskless-sync yes (default in Redis 6.0+), increase repl-backlog-size to support partial resync, set parallel-syncs 1 to avoid bandwidth saturation from all replicas syncing simultaneously. |
| Sentinel Config Drift | Sentinel auto-rewrites sentinel.conf (adding discovered replicas and Sentinels). If config management tools (Ansible/Puppet) overwrite this file, state becomes inconsistent. Fix: only manage initial config, let Sentinel self-govern after startup. |
10. FAQ
Can Sentinel monitor multiple Redis masters?
Yes. Add multiple sentinel monitor lines in sentinel.conf, each with a different name. Each master group can have its own quorum, timeout, and password settings. This is useful when providing HA for multiple independent applications.
What are the considerations for Sentinel in Docker/Kubernetes?
The main challenge in containers is NAT and port mapping. Sentinel broadcasts its discovered IP to other Sentinels, but in Docker this may be an internal container IP. Solutions: (1) Use sentinel announce-ip and sentinel announce-port to declare externally reachable addresses. (2) Redis 6.2+ supports sentinel resolve-hostnames yes for DNS names. (3) In Kubernetes, use StatefulSet + Headless Service.
Can data be lost during Sentinel failover?
Yes, because Redis uses asynchronous replication. The master accepts writes and syncs to replicas in the background. If the master crashes before sync completes, un-propagated writes are lost. Use min-replicas-to-write + min-replicas-max-lag to shrink the loss window: the master rejects writes when not enough low-lag replicas are connected, limiting maximum data loss to max-lag seconds.
What is the difference between quorum and majority?
Quorum is the minimum Sentinels needed to detect ODOWN, configurable in sentinel monitor. Majority is N/2+1, used for leader election. Even if quorum is set to 1, failover still requires a majority of Sentinels online to elect a leader. Example with 5 Sentinels + quorum=2: 2 agreements mark ODOWN, but 3 must be online to elect a leader and execute failover.
How to achieve zero-downtime Redis version upgrades?
Recommended rolling upgrade: (1) Upgrade all replicas first (restart one at a time). (2) Manually trigger SENTINEL failover mymaster to switch master to an upgraded replica. (3) Upgrade the old master (now a replica). (4) Finally, upgrade Sentinel nodes one by one. Service remains available throughout, with only 1-3 seconds write interruption during failover.
How large a Redis deployment can Sentinel handle?
Sentinel is designed for single-master scenarios where data fits in one machine's memory (typically 25-100GB). If you need more capacity than a single machine or write horizontal scaling, use Redis Cluster. Sentinel can monitor multiple master groups simultaneously (each fails over independently), but each group remains a single-master architecture.