Replication: PSYNC2 Protocol and Replication Backlog
Chapter 16 โ Master-Replica Replication: The PSYNC2 Protocol and Replication Backlog
Redis replication is the foundation of every high-availability deployment. This chapter dissects the seven-step PSYNC2 handshake, the dual-replid mechanism, the full-sync vs. partial-resync decision tree, and the replication backlog ring buffer โ with the depth needed to tune, troubleshoot, and reason about correctness guarantees in production.
16.1 The Seven-Step Handshake
When a replica starts, it executes the following sequence to establish replication with a primary.
Step 1 โ PING (liveness probe)
The replica opens a TCP connection to the primary's port and immediately sends a PING to verify the primary is alive and responsive:
Replica โ Primary: *1\r\n$4\r\nPING\r\n
Primary โ Replica: +PONG\r\n
If no valid response arrives within repl-timeout (default 60 s), the replica disconnects and retries. This is also a connectivity sanity check before investing in a full handshake.
Step 2 โ AUTH (password authentication)
If the primary has requirepass set, the replica sends AUTH using the masterauth credential:
Replica โ Primary: *2\r\n$4\r\nAUTH\r\n$6\r\nsecret\r\n
Primary โ Replica: +OK\r\n
Replica-side configuration:
masterauth <password>
Redis 6.0+ supports ACL-based authentication with a dedicated replication user:
# redis.conf (replica side)
masteruser replication_user
masterauth replication_password
Step 3 โ REPLCONF listening-port
The replica announces its own listening port so the primary can record it for INFO replication output:
Replica โ Primary: REPLCONF listening-port 6380
Primary โ Replica: +OK\r\n
Step 4 โ REPLCONF capa (capability negotiation)
The replica declares which protocol features it supports:
Replica โ Primary: REPLCONF capa eof capa psync2
Primary โ Replica: +OK\r\n
eof: supports diskless replication โ RDB data arrives as a socket stream terminated by an EOF marker rather than a file.psync2: supports the dual-replid PSYNC2 protocol introduced in Redis 4.0, enabling partial resynchronization after failover.
Step 5 โ PSYNC (synchronization request)
# First connection โ no history:
Replica โ Primary: PSYNC ? -1
# Reconnection attempting partial resync:
Replica โ Primary: PSYNC <replid> <offset>
Fields:
replidโ the 40-character replication ID the replica last received from its primary.offsetโ the replication offset up to which the replica has processed commands.
Step 6 โ Primary Response
The primary evaluates the request and returns one of two responses:
# Full synchronization required:
+FULLRESYNC <replid> <master_repl_offset>\r\n
# Partial resynchronization possible:
+CONTINUE <replid>\r\n
On FULLRESYNC the replica must flush all local data before accepting the incoming RDB.
Step 7 โ RDB Transfer + Backlog Commands
Full sync path:
- Primary triggers BGSAVE (or reuses an ongoing one).
- New write commands during BGSAVE are accumulated in
repl_backlogand the replica's client output buffer simultaneously. - RDB is delivered:
- Disk mode:
$<size>\r\n<rdb_bytes> - Diskless mode:
$EOF:<40-byte-delimiter>\r\n<rdb_stream><delimiter>
- Disk mode:
- After RDB delivery, buffered commands are streamed to the replica.
Partial resync path:
The primary reads commands from the repl_backlog ring buffer in the range [replica_offset, master_repl_offset] and streams them directly over the existing connection.
16.2 replid and replid2 โ The Dual-ID Mechanism
PSYNC2 (Redis 4.0+) introduced two replication IDs to support partial resynchronization after failover โ the single biggest improvement over PSYNC1.
replid Generation
Every instance generates a new replid when it becomes a primary:
/* server.c */
void createReplicationId(void) {
getRandomHexChars(server.replid, CONFIG_RUN_ID_SIZE);
server.replid[CONFIG_RUN_ID_SIZE] = '\0';
}
Example replid: 8371b4fb1155b71f4a04d3e1bc3e18c4a990aeeb
Promotion and ID Rotation
When a replica is promoted to primary (failover or REPLICAOF NO ONE):
Before promotion (replica):
replid = A (its old primary's ID)
offset = 10000
After promotion (new primary):
replid = B (freshly generated)
replid2 = A (old primary's ID)
second_replid_offset = 10001 (the promotion point)
offset = 10000
The new primary remembers: "I was following ID A up to offset 10000."
Partial Resync Decision Logic
When a replica sends PSYNC <rid> <offset>, the primary evaluates:
/* replication.c โ masterTryPartialResynchronization() */
if (strcasecmp(rid, server.replid) &&
(strcasecmp(rid, server.replid2) ||
psync_offset > server.second_replid_offset))
{
goto need_full_resync;
}
Partial resync is possible when:
rid == replidโ replica is reconnecting to its current primary (normal reconnect), ORrid == replid2ANDoffset <= second_replid_offsetโ replica was following the old primary and its offset predates the promotion point.
Condition 2 is the key innovation: after a failover, sibling replicas can partial-resync against the new primary without a full RDB transfer.
Inspecting replid State
127.0.0.1:6379> INFO replication
# Replication
role:master
connected_slaves:2
master_replid:8371b4fb1155b71f4a04d3e1bc3e18c4a990aeeb
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1234567
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:104857600
repl_backlog_first_byte_offset:134568
repl_backlog_histlen:1099999
second_repl_offset: -1 means this instance has never been promoted from replica.
16.3 Full Sync vs. Partial Resync
When Full Sync Is Triggered
Any of the following causes a FULLRESYNC:
- Replica's first connection (
offset = -1/PSYNC ? -1) - Replica's
ridmatches neitherreplidnorreplid2 - Replica's offset has fallen behind the
repl_backlogwindow repl_backlogis disabled (repl-backlog-size 0)- The primary's backlog hasn't been initialized yet (no replica has ever connected)
Full sync is expensive: BGSAVE CPU + I/O, full network bandwidth, replica downtime during RDB load.
When Partial Resync Is Triggered
All conditions must hold simultaneously:
slave_replidmatchesreplidorreplid2slave_offsetis within the backlog window:master_repl_offset - repl_backlog_size <= slave_offset <= master_repl_offset
Estimating Missed Data Volume
disconnect_seconds = 30
write_rate_bytes_per_sec = 50 * 1024 * 1024 # 50 MB/s
missed_bytes = disconnect_seconds * write_rate_bytes_per_sec
# = 1,500 MB
# If repl_backlog_size = 100 MB โ full sync required
# If repl_backlog_size = 2000 MB โ partial resync succeeds
16.4 The repl_backlog Ring Buffer
Data Structure
The backlog is a fixed-size circular byte buffer:
/* server.h */
typedef struct {
char *buf; /* circular buffer pointer */
long long histlen; /* bytes of valid data currently stored */
long long idx; /* next write position */
long long offset; /* repl offset at buf[0] */
size_t size; /* total capacity (repl-backlog-size) */
} replicationBacklog;
When idx reaches the end it wraps around, overwriting the oldest data. The invariant: bytes in range [offset, offset + histlen) are always available.
Sizing Formula
repl_backlog_size = max_tolerable_disconnect_seconds
ร peak_write_rate_bytes_per_sec
ร 2
The factor of 2 provides headroom for COW-induced write amplification during BGSAVE.
Production Configuration
# Default โ almost always insufficient
repl-backlog-size 1mb
# Moderate write load (~10 MB/s), tolerate 60s disconnect
repl-backlog-size 1gb
# High write load (~50 MB/s), tolerate 60s disconnect
repl-backlog-size 6gb
# TTL after the last replica disconnects before releasing the backlog
repl-backlog-ttl 3600
Monitoring Backlog Utilization
redis-cli info replication | grep -E 'repl_backlog|repl_offset'
# repl_backlog_size:536870912
# repl_backlog_first_byte_offset:134568
# repl_backlog_histlen:536870912 โ histlen == size: backlog is full (wrapping)
# master_repl_offset:2147483647
When histlen == size, the oldest data is being continuously overwritten. A slow replica whose offset falls behind the window will be forced into a full sync.
16.5 Diskless Replication
Disk Mode vs. Diskless Mode
| Dimension | Disk Mode | Diskless Mode |
|---|---|---|
| RDB path | BGSAVE โ disk file โ read โ send | BGSAVE โ direct socket write |
| Disk I/O | Yes (write + read) | None |
| Best for | Fast disk, slow network | Slow disk (e.g., EBS), fast network |
| Multiple replicas | Share single RDB file | Each replica needs a fork (or waits) |
Configuration
repl-diskless-sync yes
# Wait N seconds after first replica connects before sending
# Allows additional replicas to join the same RDB transfer
repl-diskless-sync-delay 5
# 0 = unlimited concurrent diskless transfers
repl-diskless-sync-max-replicas 0
Batching Multiple Replicas
T=0: Replica-A connects; primary starts delay timer
T=3: Replica-B connects (within delay window)
T=5: Delay expires; primary forks once, streams RDB to A and B simultaneously
T=10: Replica-C connects (missed the window); waits for next round
16.6 Client Output Buffer for Replicas
Distinction from repl_backlog
These are two separate buffers with completely different purposes:
| Buffer | repl_backlog | client output buffer (replica) |
|---|---|---|
| Scope | Global (per primary) | Per replica connection |
| Purpose | Enable partial resync after disconnect | Stream live commands to replica |
| When full | Old data overwritten (no immediate harm) | Replica connection forcibly closed |
| Config key | repl-backlog-size |
client-output-buffer-limit replica |
The Three Output Buffer Limits
client-output-buffer-limit replica 256mb 64mb 60
Format: <hard_limit> <soft_limit> <soft_seconds>
- Hard limit (256 MB): replica is disconnected immediately.
- Soft limit (64 MB for 60 s): if the buffer exceeds 64 MB continuously for 60 seconds, the replica is disconnected.
The Full-Sync Death Spiral
Large write burst during BGSAVE (COW doubles memory pressure)
โ Replica output buffer grows at 100 MB/s
โ Hits 256 MB hard limit after ~2.5 s
โ Primary closes replica connection
โ Replica reconnects, attempts PSYNC
โ offset is outside backlog (backlog also overflowed)
โ FULLRESYNC triggered
โ Primary forks again for BGSAVE (additional load)
โ Output buffer fills again
โ Repeat
Breaking the cycle:
# 1. Increase output buffer limits
client-output-buffer-limit replica 1gb 256mb 300
# 2. Use diskless replication to reduce disk pressure during BGSAVE
repl-diskless-sync yes
# 3. Increase backlog so partial resync works after disconnect
repl-backlog-size 2gb
# 4. Disable automatic RDB saves if using AOF
save ""
16.7 Complete Configuration Reference
# ===== Primary side =====
repl-backlog-size 512mb
repl-backlog-ttl 3600
repl-diskless-sync yes
repl-diskless-sync-delay 5
client-output-buffer-limit replica 256mb 64mb 60
# Refuse writes when fewer than N replicas have lag < M seconds
min-replicas-to-write 1
min-replicas-max-lag 10
# ===== Replica side =====
replicaof 192.168.1.1 6379
masterauth your_password
replica-read-only yes # strongly recommended
repl-timeout 60
# Sentinel priority (lower = higher priority for promotion)
replica-priority 100
# ===== Both sides =====
tcp-keepalive 300
16.8 Production Diagnostic Procedure
Step 1 โ Check Replica Replication State
redis-cli -h replica_host info replication
# master_link_status:down โ replication is broken
# master_last_io_seconds_ago:120 โ 120 s since last data
# master_sync_in_progress:1 โ full sync underway
Step 2 โ Verify Primary's View of Replicas
redis-cli -h primary_host info replication
# slave0:ip=192.168.1.2,port=6380,state=online,offset=1234567,lag=0
# slave1:ip=192.168.1.3,port=6381,state=sync,offset=0,lag=0
# โ state=sync: full sync in progress
Step 3 โ Check Whether Partial Resync Is Possible
slave_offset=$(redis-cli -h replica info replication \
| grep master_repl_offset | cut -d: -f2 | tr -d '\r')
backlog_start=$(redis-cli -h primary info replication \
| grep repl_backlog_first_byte_offset | cut -d: -f2 | tr -d '\r')
echo "slave_offset=$slave_offset, backlog_start=$backlog_start"
# If slave_offset >= backlog_start โ partial resync possible
Step 4 โ Inspect Output Buffer Usage
redis-cli -h primary client list | grep replica
# ... omem=67108864 ... โ 64 MB, approaching soft limit
16.9 Redis 7.x: Shared Replication Buffer
Redis 7.0 introduced a shared replication backlog that eliminates the per-replica output buffer duplication:
Pre-7.0:
Primary โ repl_backlog (global)
Primary โ output_buffer_replica1 (per-replica copy)
Primary โ output_buffer_replica2 (per-replica copy)
Memory: O(N ร size)
Redis 7.0+:
Primary โ shared_replication_buf (global)
replica1 and replica2 hold read positions into the same buffer
Memory: O(size)
With 10 replicas and a 512 MB limit, this change reduces replication-related memory from ~5 GB to ~512 MB.
Chapter Summary
| Concept | Key Config / Command | Production Advice |
|---|---|---|
| Handshake | PING/AUTH/REPLCONF/PSYNC | Ensure masterauth is set correctly |
| replid2 | INFO replication |
Confirm ID rotation after failover |
| Full sync | FULLRESYNC | Monitor frequency; avoid triggering repeatedly |
| repl_backlog | repl-backlog-size |
At least 512 MB; high write loads: 1 GB+ |
| Diskless sync | repl-diskless-sync |
Strongly recommended on cloud storage |
| Output buffer | client-output-buffer-limit |
Tune based on replica count and write rate |
A thorough understanding of the PSYNC2 protocol is the prerequisite for reasoning about Redis high-availability guarantees. Chapter 17 digs into replication lag, consistency models, and concrete data-loss scenarios.