Chapter 16

Replication: PSYNC2 Protocol and Replication Backlog

Chapter 16 — Master-Replica Replication: The PSYNC2 Protocol and Replication Backlog

Redis replication is the foundation of every high-availability deployment. This chapter dissects the seven-step PSYNC2 handshake, the dual-replid mechanism, the full-sync vs. partial-resync decision tree, and the replication backlog ring buffer — with the depth needed to tune, troubleshoot, and reason about correctness guarantees in production.


16.1 The Seven-Step Handshake

When a replica starts, it executes the following sequence to establish replication with a primary.

Step 1 — PING (liveness probe)

The replica opens a TCP connection to the primary's port and immediately sends a PING to verify the primary is alive and responsive:

Replica → Primary:  *1\r\n$4\r\nPING\r\n
Primary → Replica:  +PONG\r\n

If no valid response arrives within repl-timeout (default 60 s), the replica disconnects and retries. This is also a connectivity sanity check before investing in a full handshake.

Step 2 — AUTH (password authentication)

If the primary has requirepass set, the replica sends AUTH using the masterauth credential:

Replica → Primary:  *2\r\n$4\r\nAUTH\r\n$6\r\nsecret\r\n
Primary → Replica:  +OK\r\n

Replica-side configuration:

masterauth <password>

Redis 6.0+ supports ACL-based authentication with a dedicated replication user:

# redis.conf (replica side)
masteruser replication_user
masterauth replication_password

Step 3 — REPLCONF listening-port

The replica announces its own listening port so the primary can record it for INFO replication output:

Replica → Primary:  REPLCONF listening-port 6380
Primary → Replica:  +OK\r\n

Step 4 — REPLCONF capa (capability negotiation)

The replica declares which protocol features it supports:

Replica → Primary:  REPLCONF capa eof capa psync2
Primary → Replica:  +OK\r\n

Step 5 — PSYNC (synchronization request)

# First connection — no history:
Replica → Primary:  PSYNC ? -1

# Reconnection attempting partial resync:
Replica → Primary:  PSYNC <replid> <offset>

Fields:

Step 6 — Primary Response

The primary evaluates the request and returns one of two responses:

# Full synchronization required:
+FULLRESYNC <replid> <master_repl_offset>\r\n

# Partial resynchronization possible:
+CONTINUE <replid>\r\n

On FULLRESYNC the replica must flush all local data before accepting the incoming RDB.

Step 7 — RDB Transfer + Backlog Commands

Full sync path:

  1. Primary triggers BGSAVE (or reuses an ongoing one).
  2. New write commands during BGSAVE are accumulated in repl_backlog and the replica's client output buffer simultaneously.
  3. RDB is delivered:
    • Disk mode: $<size>\r\n<rdb_bytes>
    • Diskless mode: $EOF:<40-byte-delimiter>\r\n<rdb_stream><delimiter>
  4. After RDB delivery, buffered commands are streamed to the replica.

Partial resync path:

The primary reads commands from the repl_backlog ring buffer in the range [replica_offset, master_repl_offset] and streams them directly over the existing connection.


16.2 replid and replid2 — The Dual-ID Mechanism

PSYNC2 (Redis 4.0+) introduced two replication IDs to support partial resynchronization after failover — the single biggest improvement over PSYNC1.

replid Generation

Every instance generates a new replid when it becomes a primary:

/* server.c */
void createReplicationId(void) {
    getRandomHexChars(server.replid, CONFIG_RUN_ID_SIZE);
    server.replid[CONFIG_RUN_ID_SIZE] = '\0';
}

Example replid: 8371b4fb1155b71f4a04d3e1bc3e18c4a990aeeb

Promotion and ID Rotation

When a replica is promoted to primary (failover or REPLICAOF NO ONE):

Before promotion (replica):
  replid  = A  (its old primary's ID)
  offset  = 10000

After promotion (new primary):
  replid  = B  (freshly generated)
  replid2 = A  (old primary's ID)
  second_replid_offset = 10001   (the promotion point)
  offset  = 10000

The new primary remembers: "I was following ID A up to offset 10000."

Partial Resync Decision Logic

When a replica sends PSYNC <rid> <offset>, the primary evaluates:

/* replication.c — masterTryPartialResynchronization() */
if (strcasecmp(rid, server.replid) &&
    (strcasecmp(rid, server.replid2) ||
     psync_offset > server.second_replid_offset))
{
    goto need_full_resync;
}

Partial resync is possible when:

  1. rid == replid — replica is reconnecting to its current primary (normal reconnect), OR
  2. rid == replid2 AND offset <= second_replid_offset — replica was following the old primary and its offset predates the promotion point.

Condition 2 is the key innovation: after a failover, sibling replicas can partial-resync against the new primary without a full RDB transfer.

Inspecting replid State

127.0.0.1:6379> INFO replication
# Replication
role:master
connected_slaves:2
master_replid:8371b4fb1155b71f4a04d3e1bc3e18c4a990aeeb
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1234567
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:104857600
repl_backlog_first_byte_offset:134568
repl_backlog_histlen:1099999

second_repl_offset: -1 means this instance has never been promoted from replica.


16.3 Full Sync vs. Partial Resync

When Full Sync Is Triggered

Any of the following causes a FULLRESYNC:

  1. Replica's first connection (offset = -1 / PSYNC ? -1)
  2. Replica's rid matches neither replid nor replid2
  3. Replica's offset has fallen behind the repl_backlog window
  4. repl_backlog is disabled (repl-backlog-size 0)
  5. The primary's backlog hasn't been initialized yet (no replica has ever connected)

Full sync is expensive: BGSAVE CPU + I/O, full network bandwidth, replica downtime during RDB load.

When Partial Resync Is Triggered

All conditions must hold simultaneously:

  1. slave_replid matches replid or replid2
  2. slave_offset is within the backlog window:
    master_repl_offset - repl_backlog_size <= slave_offset <= master_repl_offset
    

Estimating Missed Data Volume

disconnect_seconds = 30
write_rate_bytes_per_sec = 50 * 1024 * 1024  # 50 MB/s

missed_bytes = disconnect_seconds * write_rate_bytes_per_sec
# = 1,500 MB

# If repl_backlog_size = 100 MB → full sync required
# If repl_backlog_size = 2000 MB → partial resync succeeds

16.4 The repl_backlog Ring Buffer

Data Structure

The backlog is a fixed-size circular byte buffer:

/* server.h */
typedef struct {
    char *buf;              /* circular buffer pointer */
    long long histlen;      /* bytes of valid data currently stored */
    long long idx;          /* next write position */
    long long offset;       /* repl offset at buf[0] */
    size_t size;            /* total capacity (repl-backlog-size) */
} replicationBacklog;

When idx reaches the end it wraps around, overwriting the oldest data. The invariant: bytes in range [offset, offset + histlen) are always available.

Sizing Formula

repl_backlog_size = max_tolerable_disconnect_seconds
                    × peak_write_rate_bytes_per_sec
                    × 2

The factor of 2 provides headroom for COW-induced write amplification during BGSAVE.

Production Configuration

# Default — almost always insufficient
repl-backlog-size 1mb

# Moderate write load (~10 MB/s), tolerate 60s disconnect
repl-backlog-size 1gb

# High write load (~50 MB/s), tolerate 60s disconnect
repl-backlog-size 6gb

# TTL after the last replica disconnects before releasing the backlog
repl-backlog-ttl 3600

Monitoring Backlog Utilization

redis-cli info replication | grep -E 'repl_backlog|repl_offset'
# repl_backlog_size:536870912
# repl_backlog_first_byte_offset:134568
# repl_backlog_histlen:536870912   ← histlen == size: backlog is full (wrapping)
# master_repl_offset:2147483647

When histlen == size, the oldest data is being continuously overwritten. A slow replica whose offset falls behind the window will be forced into a full sync.


16.5 Diskless Replication

Disk Mode vs. Diskless Mode

Dimension Disk Mode Diskless Mode
RDB path BGSAVE → disk file → read → send BGSAVE → direct socket write
Disk I/O Yes (write + read) None
Best for Fast disk, slow network Slow disk (e.g., EBS), fast network
Multiple replicas Share single RDB file Each replica needs a fork (or waits)

Configuration

repl-diskless-sync yes

# Wait N seconds after first replica connects before sending
# Allows additional replicas to join the same RDB transfer
repl-diskless-sync-delay 5

# 0 = unlimited concurrent diskless transfers
repl-diskless-sync-max-replicas 0

Batching Multiple Replicas

T=0:  Replica-A connects; primary starts delay timer
T=3:  Replica-B connects (within delay window)
T=5:  Delay expires; primary forks once, streams RDB to A and B simultaneously
T=10: Replica-C connects (missed the window); waits for next round

16.6 Client Output Buffer for Replicas

Distinction from repl_backlog

These are two separate buffers with completely different purposes:

Buffer repl_backlog client output buffer (replica)
Scope Global (per primary) Per replica connection
Purpose Enable partial resync after disconnect Stream live commands to replica
When full Old data overwritten (no immediate harm) Replica connection forcibly closed
Config key repl-backlog-size client-output-buffer-limit replica

The Three Output Buffer Limits

client-output-buffer-limit replica 256mb 64mb 60

Format: <hard_limit> <soft_limit> <soft_seconds>

The Full-Sync Death Spiral

Large write burst during BGSAVE (COW doubles memory pressure)
→ Replica output buffer grows at 100 MB/s
→ Hits 256 MB hard limit after ~2.5 s
→ Primary closes replica connection
→ Replica reconnects, attempts PSYNC
→ offset is outside backlog (backlog also overflowed)
→ FULLRESYNC triggered
→ Primary forks again for BGSAVE (additional load)
→ Output buffer fills again
→ Repeat

Breaking the cycle:

# 1. Increase output buffer limits
client-output-buffer-limit replica 1gb 256mb 300

# 2. Use diskless replication to reduce disk pressure during BGSAVE
repl-diskless-sync yes

# 3. Increase backlog so partial resync works after disconnect
repl-backlog-size 2gb

# 4. Disable automatic RDB saves if using AOF
save ""

16.7 Complete Configuration Reference

# ===== Primary side =====
repl-backlog-size 512mb
repl-backlog-ttl 3600

repl-diskless-sync yes
repl-diskless-sync-delay 5

client-output-buffer-limit replica 256mb 64mb 60

# Refuse writes when fewer than N replicas have lag < M seconds
min-replicas-to-write 1
min-replicas-max-lag 10

# ===== Replica side =====
replicaof 192.168.1.1 6379
masterauth your_password

replica-read-only yes     # strongly recommended
repl-timeout 60

# Sentinel priority (lower = higher priority for promotion)
replica-priority 100

# ===== Both sides =====
tcp-keepalive 300

16.8 Production Diagnostic Procedure

Step 1 — Check Replica Replication State

redis-cli -h replica_host info replication
# master_link_status:down          ← replication is broken
# master_last_io_seconds_ago:120   ← 120 s since last data
# master_sync_in_progress:1        ← full sync underway

Step 2 — Verify Primary's View of Replicas

redis-cli -h primary_host info replication
# slave0:ip=192.168.1.2,port=6380,state=online,offset=1234567,lag=0
# slave1:ip=192.168.1.3,port=6381,state=sync,offset=0,lag=0
#                                   ↑ state=sync: full sync in progress

Step 3 — Check Whether Partial Resync Is Possible

slave_offset=$(redis-cli -h replica info replication \
    | grep master_repl_offset | cut -d: -f2 | tr -d '\r')
backlog_start=$(redis-cli -h primary info replication \
    | grep repl_backlog_first_byte_offset | cut -d: -f2 | tr -d '\r')

echo "slave_offset=$slave_offset, backlog_start=$backlog_start"
# If slave_offset >= backlog_start → partial resync possible

Step 4 — Inspect Output Buffer Usage

redis-cli -h primary client list | grep replica
# ... omem=67108864 ...   ← 64 MB, approaching soft limit

16.9 Redis 7.x: Shared Replication Buffer

Redis 7.0 introduced a shared replication backlog that eliminates the per-replica output buffer duplication:

Pre-7.0:
  Primary → repl_backlog (global)
  Primary → output_buffer_replica1 (per-replica copy)
  Primary → output_buffer_replica2 (per-replica copy)
  Memory: O(N × size)

Redis 7.0+:
  Primary → shared_replication_buf (global)
  replica1 and replica2 hold read positions into the same buffer
  Memory: O(size)

With 10 replicas and a 512 MB limit, this change reduces replication-related memory from ~5 GB to ~512 MB.


Chapter Summary

Concept Key Config / Command Production Advice
Handshake PING/AUTH/REPLCONF/PSYNC Ensure masterauth is set correctly
replid2 INFO replication Confirm ID rotation after failover
Full sync FULLRESYNC Monitor frequency; avoid triggering repeatedly
repl_backlog repl-backlog-size At least 512 MB; high write loads: 1 GB+
Diskless sync repl-diskless-sync Strongly recommended on cloud storage
Output buffer client-output-buffer-limit Tune based on replica count and write rate

A thorough understanding of the PSYNC2 protocol is the prerequisite for reasoning about Redis high-availability guarantees. Chapter 17 digs into replication lag, consistency models, and concrete data-loss scenarios.

Rate this chapter
4.5  / 5  (18 ratings)

💬 Comments