Multi-Thread I/O: Redis 6+ Architecture Evolution
Chapter 26: Multi-Threaded I/O — Redis 6+ Architecture Evolution
26.1 Background: The Single-Thread Ceiling
Redis was born single-threaded, and for good reason. In 2009 that design was brilliant: no lock contention, simple code, excellent CPU cache locality. As hardware scaled out—more cores, faster NICs, higher memory bandwidth—the single-thread ceiling became visible.
26.1.1 Where Single-Thread Breaks Down
In Redis 5 and earlier, a single main thread owned everything:
- Listening on sockets (epoll/kqueue)
- Reading raw bytes from client sockets
- Parsing RESP protocol frames
- Executing command logic
- Writing response bytes back to sockets
- Coordinating persistence (partially async)
All network I/O and command execution ran serially on one thread. Measured performance caps:
| Workload | Max QPS | Bottleneck |
|---|---|---|
| Small values (< 100B) GET/SET | 80K–100K | CPU frequency; network I/O > 60% of time |
| Large values (> 10KB) GET/SET | 20K–40K | Network I/O bandwidth |
| Pipelined batch operations | 500K–800K | Command execution logic |
As client count rises and value size grows, network I/O dominates. The actual GET command logic takes a few microseconds; reading the request and writing the response takes tens of microseconds. The work ratio is badly skewed.
26.1.2 Why Not Fully Parallelize Command Execution
The obvious question: why not also run command execution on multiple threads?
Lock contention: Redis core data structures—dict, skiplist, listpack—are not thread-safe. Concurrent ZADD from multiple threads requires locking; at high concurrency the lock overhead erases all multi-thread gains.
Atomic command semantics: INCR, LPUSH, ZADD derive their atomic guarantees from single-threaded execution. Fine-grained locking in a multi-threaded model is exponentially complex to implement correctly.
Transactions and Lua: MULTI/EXEC and EVAL rely on uninterrupted serial execution. Parallelizing execution would break these fundamental guarantees.
Maintainability: Single-threaded code has no deadlocks, no race conditions, easy debugging. Multi-threaded bugs are notoriously hard to reproduce and diagnose.
The conclusion Redis reached: parallelize only the I/O work; keep command execution single-threaded.
26.2 The Redis 6 Multi-Threaded I/O Design
Redis 6.0 (released May 2020) introduced multi-threaded I/O. The architecture separates the fast path (command execution) from the slow path (network I/O):
26.2.1 Thread Model Architecture
┌──────────────────────────────────────┐
│ Main Thread │
│ │
│ epoll_wait() → detect readable fds │
│ Distribute clients to I/O threads │
│ Spin-wait for reads to complete │
│ Execute ALL commands (single-thread)│
│ Distribute clients to write threads │
│ Spin-wait for writes to complete │
└──────────────────────────────────────┘
↕ ↕
┌─────────────────┐ ┌─────────────────┐
│ I/O Thread 1 │ │ I/O Thread 2 │
│ │ │ │
│ readQueryFrom │ │ readQueryFrom │
│ Client() │ │ Client() │
│ writeToClient() │ │ writeToClient() │
└─────────────────┘ └─────────────────┘
26.2.2 Step-by-Step Execution Flow
Phase 1 — Read Phase
- Main thread calls
epoll_wait, detects multiple readable sockets. - Main thread distributes clients round-robin into
io_threads_list[tid]queues. - Main thread sets
io_threads_op = IO_THREADS_OP_READ. - I/O threads concurrently execute
readQueryFromClient: read raw bytes from socket intoclient->querybuf. - Main thread spin-waits until all I/O threads finish, detected via atomic counter
io_threads_pending[tid].
/* Main thread waits for all I/O threads to complete */
while(1) {
unsigned long pending = 0;
for (int j = 1; j < server.io_threads_num; j++)
pending += io_threads_pending[j];
if (pending == 0) break;
}
Phase 2 — Execute Phase
- Main thread iterates all ready clients, calls
processCommandAndResetClienton each. - Parses
client->querybuf, executes command, writes result toclient->buf(output buffer). - This phase is strictly single-threaded—no concurrent data structure access.
Phase 3 — Write Phase
- Main thread enqueues clients that have data to send.
- Distributes them round-robin to I/O threads.
- I/O threads concurrently call
writeToClient, flushingclient->bufto the socket. - Main thread spin-waits for all writes to complete.
26.2.3 Key Implementation Details
Spin-wait over blocking wait: After dispatching to I/O threads, the main thread busy-waits rather than using a condition variable or semaphore. This avoids thread context switch latency—the main thread must resume immediately when I/O threads finish. The trade-off: CPU burns cycles during the wait window.
Minimum threshold: When pending clients < io_threads_num * 2, multi-threading is skipped entirely, falling back to single-threaded processing. Multi-thread dispatch overhead would outweigh the benefit for small batches.
/* networking.c — should we use multi-threaded I/O? */
int stopThreadedIOIfNeeded(void) {
int pending = listLength(server.clients_pending_write);
if (server.io_threads_num == 1) return 1;
if (pending < (server.io_threads_num * 2)) {
if (server.io_threads_active) stopThreadedIO();
return 1;
}
return 0;
}
26.3 Configuration Reference
26.3.1 Core Settings
# redis.conf
# Number of I/O threads.
# Recommended: half of physical CPU cores, max 8.
# Setting io-threads=1 disables multi-threading entirely (legacy behavior).
io-threads 4
# Apply multi-threading to read operations as well.
# Default: no (only write responses use I/O threads).
# Recommendation: set to yes for better throughput.
io-threads-do-reads yes
26.3.2 Performance Benchmark by Configuration
Tested on a 4-core/8GB server using redis-benchmark -n 1000000 -c 200 -t get,set:
| Configuration | GET QPS | SET QPS | CPU Usage |
|---|---|---|---|
| io-threads=1 (default) | 95,000 | 88,000 | Single core at 100% |
| io-threads=2 | 145,000 | 138,000 | 2 cores at ~80% |
| io-threads=4 | 195,000 | 182,000 | 4 cores at ~70% |
| io-threads=8 (excessive) | 190,000 | 178,000 | Thread switching overhead |
Key finding: More threads is not always better. Beyond the physical core count, thread scheduling overhead dominates.
26.3.3 Verifying Multi-Threaded I/O at Runtime
# Check current I/O thread settings
CONFIG GET io-threads
CONFIG GET io-threads-do-reads
# Modify at runtime (Redis 6.2+)
CONFIG SET io-threads 4
# Note: io-threads-do-reads cannot be changed at runtime; requires restart
# Confirm threads are active (check server info)
redis-cli INFO server | grep io_threads_active
When multi-threaded I/O is NOT worth enabling:
- Pure intranet deployments with sub-0.1ms latency (network is not the bottleneck)
- QPS under 50K (single-thread headroom is ample)
- Very small values (< 100 bytes per operation)
- Environments where CPU cores are scarce
26.4 Comparison: Redis vs. High-Performance Alternatives
26.4.1 KeyDB
KeyDB is a Redis fork that takes multi-threading further—command execution itself runs on multiple threads:
# KeyDB configuration
server-threads 4 # One full event loop per thread
server-thread-affinity true # Pin threads to CPU cores
How it differs:
- Each thread runs an independent event loop; commands can execute in parallel
- Per-key mutex (read-write lock) protects data structure access
- No global lock—only the specific key being modified is locked
Trade-offs:
- Lock contention on hot keys is severe
- Lua script and
MULTI/EXECsemantics require special handling - Compatibility with the official Redis client ecosystem degrades over time
26.4.2 Dragonfly
Dragonfly (released 2022) uses a shared-nothing architecture:
--threads=8 # Each thread owns independent hash slot shards
# No shared state between threads — zero lock contention
How it differs:
- Memory is partitioned by hash slots; each thread owns a subset
- Cross-shard operations use 2PC (two-phase commit) coordination
- Uses
io_uring(Linux 5.1+) instead ofepoll, reducing syscall overhead - Claims 25x throughput over Redis on multi-core machines (controversial benchmarks)
Trade-offs:
- Cross-slot commands (e.g.,
MGETacross multiple slots) have coordination cost - Ecosystem maturity is far below Redis
- Some command semantics differ from Redis
26.4.3 Comparison Matrix
| Dimension | Redis 6+ | KeyDB | Dragonfly |
|---|---|---|---|
| Command execution | Single-threaded | Multi-threaded (with locks) | Multi-threaded (shared-nothing) |
| Theoretical max QPS | 200K–300K | 300K–500K | 1M+ |
| API compatibility | Official standard | High (fork) | Medium (some differences) |
| Production stability | Excellent | Good | Moderate (newer) |
| Community/ecosystem | Largest | Small | Small |
| Recommended for | Most production use | Extreme throughput needs | Experimental/specific cases |
26.5 RESP3 Protocol
Redis 6 also introduced RESP3 (Redis Serialization Protocol v3). Clients negotiate the protocol version with HELLO:
# Switch to RESP3
HELLO 3
# Server returns a Map with server metadata
# Switch back to RESP2
HELLO 2
26.5.1 New Data Types in RESP3
| Type | Wire Prefix | Description |
|---|---|---|
| Map | % |
Key-value pairs; client doesn't need to guess array layout |
| Set | ~ |
Unordered collection |
| Double | , |
Native float (vs. string encoding in RESP2) |
| Boolean | # |
True/false (vs. integer 0/1 in RESP2) |
| BigNumber | ( |
Arbitrary-precision integer |
| Blob Error | ! |
Length-prefixed error message |
| Verbatim String | = |
String with MIME-type hint |
| Push | > |
Server-initiated push (subscription messages, invalidation) |
26.5.2 Push Type: Cleaner Pub/Sub
In RESP2, subscription messages and command responses share the same connection stream. The client needs a state machine to distinguish them. RESP3's Push type (> prefix) explicitly marks server-initiated data:
# RESP3 subscription message wire format
>3\r\n
+message\r\n
+channel-name\r\n
+message-data\r\n
Clients can now handle push data in a dedicated path without ambiguity.
26.5.3 Client-Side Caching with Tracking
RESP3 enables server-assisted client-side caching. The server tracks which keys a client has read and sends invalidation messages when those keys change:
# Enable tracking — redirect invalidation notices to connection ID 1234
CLIENT TRACKING ON REDIRECT 1234
# BCAST mode — broadcast all changes under a key prefix
CLIENT TRACKING ON BCAST PREFIX user:
# When a tracked key changes, the server sends a Push message:
# >2\r\n+invalidate\r\n*1\r\n$8\r\nuser:123\r\n
This enables near-zero latency client-local caches with automatic invalidation—a significant step toward eliminating redundant round trips for read-heavy workloads.
26.6 Production Tuning Guide
26.6.1 Benchmarking Correctly
# Baseline: single-threaded
redis-benchmark -h 127.0.0.1 -p 6379 \
-n 1000000 -c 200 -t get,set -d 100
# Multi-threaded benchmark (Redis 6+ benchmark client also supports threads)
redis-benchmark -h 127.0.0.1 -p 6379 \
-n 1000000 -c 200 --threads 4 -t get,set -d 100
# Large value test to stress I/O path
redis-benchmark -n 100000 -c 100 -d 10240 -t get,set
# Pipeline test (reduces per-command I/O overhead)
redis-benchmark -n 1000000 -c 200 -P 16 -t get,set
26.6.2 Diagnosing Multi-Thread Issues
Symptom: Higher CPU usage after enabling multi-threading, but QPS barely improved
# 1. Verify multi-threading is actually active
redis-cli INFO server | grep io_threads_active
# 2. Check connected client count (needs to exceed io_threads_num * 2)
redis-cli INFO clients | grep connected_clients
# 3. Profile with perf to find actual hotspot
perf top -p $(pgrep redis-server)
# If epoll_wait < 20% of CPU time, I/O is not the bottleneck
# 4. Check whether network bandwidth is saturated
sar -n DEV 1 10
ifstat -i eth0 1
Symptom: Only 2 I/O threads active despite io-threads=4
Root cause: concurrent client count below the activation threshold (io_threads_num * 2 = 8).
Fix: Increase benchmark concurrency (-c 20 or higher), or verify that production traffic has sufficient concurrent connections.
26.6.3 Recommended Production Configuration
# redis.conf — 8-core production server
# Multi-threaded I/O
io-threads 4 # Half of physical cores
io-threads-do-reads yes # Read and write both parallelized
# Connection handling
tcp-backlog 1024 # Enlarge TCP accept queue
maxclients 10000 # Allow more concurrent connections
tcp-keepalive 300 # Detect dead connections after 300s
# Slow log — catch commands taking > 1ms
slowlog-log-slower-than 1000
slowlog-max-len 1000
# Disable transparent huge pages (OS-level, do in system config)
# echo never > /sys/kernel/mm/transparent_hugepage/enabled
26.7 Summary
Redis 6's multi-threaded I/O is a carefully scoped optimization: it parallelizes the slow part (network I/O) while preserving the guarantees that make Redis so reliable (single-threaded command execution, no locks, no race conditions).
Key takeaways:
- Command execution is always single-threaded—data structure atomicity is never compromised
- I/O thread count: recommend
physical_cores / 2, hard cap at 8 - Enable
io-threads-do-reads yesto parallelize both reads and writes - Multi-threading activates automatically only when enough clients are present
- Below 50K QPS, the overhead of multi-threading outweighs its benefit
- RESP3 is a companion protocol upgrade: richer types, native push messages, and client-side caching support
- For most production deployments, Redis 6+ with
io-threads=4is the right answer; KeyDB and Dragonfly are only warranted in extreme edge cases