Chapter 26

Multi-Thread I/O: Redis 6+ Architecture Evolution

Chapter 26: Multi-Threaded I/O — Redis 6+ Architecture Evolution

26.1 Background: The Single-Thread Ceiling

Redis was born single-threaded, and for good reason. In 2009 that design was brilliant: no lock contention, simple code, excellent CPU cache locality. As hardware scaled out—more cores, faster NICs, higher memory bandwidth—the single-thread ceiling became visible.

26.1.1 Where Single-Thread Breaks Down

In Redis 5 and earlier, a single main thread owned everything:

Listening on sockets (epoll/kqueue)
Reading raw bytes from client sockets
Parsing RESP protocol frames
Executing command logic
Writing response bytes back to sockets
Coordinating persistence (partially async)

All network I/O and command execution ran serially on one thread. Measured performance caps:

Workload	Max QPS	Bottleneck
Small values (< 100B) GET/SET	80K–100K	CPU frequency; network I/O > 60% of time
Large values (> 10KB) GET/SET	20K–40K	Network I/O bandwidth
Pipelined batch operations	500K–800K	Command execution logic

As client count rises and value size grows, network I/O dominates. The actual GET command logic takes a few microseconds; reading the request and writing the response takes tens of microseconds. The work ratio is badly skewed.

26.1.2 Why Not Fully Parallelize Command Execution

The obvious question: why not also run command execution on multiple threads?

Lock contention: Redis core data structures—dict, skiplist, listpack—are not thread-safe. Concurrent ZADD from multiple threads requires locking; at high concurrency the lock overhead erases all multi-thread gains.

Atomic command semantics: INCR, LPUSH, ZADD derive their atomic guarantees from single-threaded execution. Fine-grained locking in a multi-threaded model is exponentially complex to implement correctly.

Transactions and Lua: MULTI/EXEC and EVAL rely on uninterrupted serial execution. Parallelizing execution would break these fundamental guarantees.

Maintainability: Single-threaded code has no deadlocks, no race conditions, easy debugging. Multi-threaded bugs are notoriously hard to reproduce and diagnose.

The conclusion Redis reached: parallelize only the I/O work; keep command execution single-threaded.

26.2 The Redis 6 Multi-Threaded I/O Design

Redis 6.0 (released May 2020) introduced multi-threaded I/O. The architecture separates the fast path (command execution) from the slow path (network I/O):

26.2.1 Thread Model Architecture

                    ┌──────────────────────────────────────┐
                    │           Main Thread                 │
                    │                                       │
                    │  epoll_wait() → detect readable fds  │
                    │  Distribute clients to I/O threads   │
                    │  Spin-wait for reads to complete     │
                    │  Execute ALL commands (single-thread)│
                    │  Distribute clients to write threads │
                    │  Spin-wait for writes to complete    │
                    └──────────────────────────────────────┘
                          ↕                   ↕
            ┌─────────────────┐   ┌─────────────────┐
            │  I/O Thread 1   │   │  I/O Thread 2   │
            │                 │   │                 │
            │ readQueryFrom   │   │ readQueryFrom   │
            │   Client()      │   │   Client()      │
            │ writeToClient() │   │ writeToClient() │
            └─────────────────┘   └─────────────────┘

26.2.2 Step-by-Step Execution Flow

Phase 1 — Read Phase

Main thread calls epoll_wait, detects multiple readable sockets.
Main thread distributes clients round-robin into io_threads_list[tid] queues.
Main thread sets io_threads_op = IO_THREADS_OP_READ.
I/O threads concurrently execute readQueryFromClient: read raw bytes from socket into client->querybuf.
Main thread spin-waits until all I/O threads finish, detected via atomic counter io_threads_pending[tid].

/* Main thread waits for all I/O threads to complete */
while(1) {
    unsigned long pending = 0;
    for (int j = 1; j < server.io_threads_num; j++)
        pending += io_threads_pending[j];
    if (pending == 0) break;
}

Phase 2 — Execute Phase

Main thread iterates all ready clients, calls processCommandAndResetClient on each.
Parses client->querybuf, executes command, writes result to client->buf (output buffer).
This phase is strictly single-threaded—no concurrent data structure access.

Phase 3 — Write Phase

Main thread enqueues clients that have data to send.
Distributes them round-robin to I/O threads.
I/O threads concurrently call writeToClient, flushing client->buf to the socket.
Main thread spin-waits for all writes to complete.

26.2.3 Key Implementation Details

Spin-wait over blocking wait: After dispatching to I/O threads, the main thread busy-waits rather than using a condition variable or semaphore. This avoids thread context switch latency—the main thread must resume immediately when I/O threads finish. The trade-off: CPU burns cycles during the wait window.

Minimum threshold: When pending clients < io_threads_num * 2, multi-threading is skipped entirely, falling back to single-threaded processing. Multi-thread dispatch overhead would outweigh the benefit for small batches.

/* networking.c — should we use multi-threaded I/O? */
int stopThreadedIOIfNeeded(void) {
    int pending = listLength(server.clients_pending_write);
    if (server.io_threads_num == 1) return 1;
    if (pending < (server.io_threads_num * 2)) {
        if (server.io_threads_active) stopThreadedIO();
        return 1;
    }
    return 0;
}

26.3 Configuration Reference

26.3.1 Core Settings

# redis.conf

# Number of I/O threads.
# Recommended: half of physical CPU cores, max 8.
# Setting io-threads=1 disables multi-threading entirely (legacy behavior).
io-threads 4

# Apply multi-threading to read operations as well.
# Default: no (only write responses use I/O threads).
# Recommendation: set to yes for better throughput.
io-threads-do-reads yes

26.3.2 Performance Benchmark by Configuration

Tested on a 4-core/8GB server using redis-benchmark -n 1000000 -c 200 -t get,set:

Configuration	GET QPS	SET QPS	CPU Usage
io-threads=1 (default)	95,000	88,000	Single core at 100%
io-threads=2	145,000	138,000	2 cores at ~80%
io-threads=4	195,000	182,000	4 cores at ~70%
io-threads=8 (excessive)	190,000	178,000	Thread switching overhead

Key finding: More threads is not always better. Beyond the physical core count, thread scheduling overhead dominates.

26.3.3 Verifying Multi-Threaded I/O at Runtime

# Check current I/O thread settings
CONFIG GET io-threads
CONFIG GET io-threads-do-reads

# Modify at runtime (Redis 6.2+)
CONFIG SET io-threads 4
# Note: io-threads-do-reads cannot be changed at runtime; requires restart

# Confirm threads are active (check server info)
redis-cli INFO server | grep io_threads_active

When multi-threaded I/O is NOT worth enabling:

Pure intranet deployments with sub-0.1ms latency (network is not the bottleneck)
QPS under 50K (single-thread headroom is ample)
Very small values (< 100 bytes per operation)
Environments where CPU cores are scarce

26.4 Comparison: Redis vs. High-Performance Alternatives

26.4.1 KeyDB

KeyDB is a Redis fork that takes multi-threading further—command execution itself runs on multiple threads:

# KeyDB configuration
server-threads 4           # One full event loop per thread
server-thread-affinity true  # Pin threads to CPU cores

How it differs:

Each thread runs an independent event loop; commands can execute in parallel
Per-key mutex (read-write lock) protects data structure access
No global lock—only the specific key being modified is locked

Trade-offs:

Lock contention on hot keys is severe
Lua script and MULTI/EXEC semantics require special handling
Compatibility with the official Redis client ecosystem degrades over time

26.4.2 Dragonfly

Dragonfly (released 2022) uses a shared-nothing architecture:

--threads=8      # Each thread owns independent hash slot shards
                 # No shared state between threads — zero lock contention

How it differs:

Memory is partitioned by hash slots; each thread owns a subset
Cross-shard operations use 2PC (two-phase commit) coordination
Uses io_uring (Linux 5.1+) instead of epoll, reducing syscall overhead
Claims 25x throughput over Redis on multi-core machines (controversial benchmarks)

Trade-offs:

Cross-slot commands (e.g., MGET across multiple slots) have coordination cost
Ecosystem maturity is far below Redis
Some command semantics differ from Redis

26.4.3 Comparison Matrix

Dimension	Redis 6+	KeyDB	Dragonfly
Command execution	Single-threaded	Multi-threaded (with locks)	Multi-threaded (shared-nothing)
Theoretical max QPS	200K–300K	300K–500K	1M+
API compatibility	Official standard	High (fork)	Medium (some differences)
Production stability	Excellent	Good	Moderate (newer)
Community/ecosystem	Largest	Small	Small
Recommended for	Most production use	Extreme throughput needs	Experimental/specific cases

26.5 RESP3 Protocol

Redis 6 also introduced RESP3 (Redis Serialization Protocol v3). Clients negotiate the protocol version with HELLO:

# Switch to RESP3
HELLO 3
# Server returns a Map with server metadata

# Switch back to RESP2
HELLO 2

26.5.1 New Data Types in RESP3

Type	Wire Prefix	Description
Map	`%`	Key-value pairs; client doesn't need to guess array layout
Set	`~`	Unordered collection
Double	`,`	Native float (vs. string encoding in RESP2)
Boolean	`#`	True/false (vs. integer 0/1 in RESP2)
BigNumber	`(`	Arbitrary-precision integer
Blob Error	`!`	Length-prefixed error message
Verbatim String	`=`	String with MIME-type hint
Push	`>`	Server-initiated push (subscription messages, invalidation)

26.5.2 Push Type: Cleaner Pub/Sub

In RESP2, subscription messages and command responses share the same connection stream. The client needs a state machine to distinguish them. RESP3's Push type (> prefix) explicitly marks server-initiated data:

# RESP3 subscription message wire format
>3\r\n
+message\r\n
+channel-name\r\n
+message-data\r\n

Clients can now handle push data in a dedicated path without ambiguity.

26.5.3 Client-Side Caching with Tracking

RESP3 enables server-assisted client-side caching. The server tracks which keys a client has read and sends invalidation messages when those keys change:

# Enable tracking — redirect invalidation notices to connection ID 1234
CLIENT TRACKING ON REDIRECT 1234

# BCAST mode — broadcast all changes under a key prefix
CLIENT TRACKING ON BCAST PREFIX user:

# When a tracked key changes, the server sends a Push message:
# >2\r\n+invalidate\r\n*1\r\n$8\r\nuser:123\r\n

This enables near-zero latency client-local caches with automatic invalidation—a significant step toward eliminating redundant round trips for read-heavy workloads.

26.6 Production Tuning Guide

26.6.1 Benchmarking Correctly

# Baseline: single-threaded
redis-benchmark -h 127.0.0.1 -p 6379 \
  -n 1000000 -c 200 -t get,set -d 100

# Multi-threaded benchmark (Redis 6+ benchmark client also supports threads)
redis-benchmark -h 127.0.0.1 -p 6379 \
  -n 1000000 -c 200 --threads 4 -t get,set -d 100

# Large value test to stress I/O path
redis-benchmark -n 100000 -c 100 -d 10240 -t get,set

# Pipeline test (reduces per-command I/O overhead)
redis-benchmark -n 1000000 -c 200 -P 16 -t get,set

26.6.2 Diagnosing Multi-Thread Issues

Symptom: Higher CPU usage after enabling multi-threading, but QPS barely improved

# 1. Verify multi-threading is actually active
redis-cli INFO server | grep io_threads_active

# 2. Check connected client count (needs to exceed io_threads_num * 2)
redis-cli INFO clients | grep connected_clients

# 3. Profile with perf to find actual hotspot
perf top -p $(pgrep redis-server)
# If epoll_wait < 20% of CPU time, I/O is not the bottleneck

# 4. Check whether network bandwidth is saturated
sar -n DEV 1 10
ifstat -i eth0 1

Symptom: Only 2 I/O threads active despite io-threads=4

Root cause: concurrent client count below the activation threshold (io_threads_num * 2 = 8).

Fix: Increase benchmark concurrency (-c 20 or higher), or verify that production traffic has sufficient concurrent connections.

26.6.3 Recommended Production Configuration

# redis.conf — 8-core production server

# Multi-threaded I/O
io-threads 4                  # Half of physical cores
io-threads-do-reads yes       # Read and write both parallelized

# Connection handling
tcp-backlog 1024              # Enlarge TCP accept queue
maxclients 10000              # Allow more concurrent connections
tcp-keepalive 300             # Detect dead connections after 300s

# Slow log — catch commands taking > 1ms
slowlog-log-slower-than 1000
slowlog-max-len 1000

# Disable transparent huge pages (OS-level, do in system config)
# echo never > /sys/kernel/mm/transparent_hugepage/enabled

26.7 Summary

Redis 6's multi-threaded I/O is a carefully scoped optimization: it parallelizes the slow part (network I/O) while preserving the guarantees that make Redis so reliable (single-threaded command execution, no locks, no race conditions).

Key takeaways:

Command execution is always single-threaded—data structure atomicity is never compromised
I/O thread count: recommend physical_cores / 2, hard cap at 8
Enable io-threads-do-reads yes to parallelize both reads and writes
Multi-threading activates automatically only when enough clients are present
Below 50K QPS, the overhead of multi-threading outweighs its benefit
RESP3 is a companion protocol upgrade: richer types, native push messages, and client-side caching support
For most production deployments, Redis 6+ with io-threads=4 is the right answer; KeyDB and Dragonfly are only warranted in extreme edge cases

Rate this chapter

4.6 / 5 (5 ratings)