Chapter 26

Multi-Thread I/O: Redis 6+ Architecture Evolution

Chapter 26: Multi-Threaded I/O โ€” Redis 6+ Architecture Evolution

26.1 Background: The Single-Thread Ceiling

Redis was born single-threaded, and for good reason. In 2009 that design was brilliant: no lock contention, simple code, excellent CPU cache locality. As hardware scaled outโ€”more cores, faster NICs, higher memory bandwidthโ€”the single-thread ceiling became visible.

26.1.1 Where Single-Thread Breaks Down

In Redis 5 and earlier, a single main thread owned everything:

All network I/O and command execution ran serially on one thread. Measured performance caps:

Workload Max QPS Bottleneck
Small values (< 100B) GET/SET 80Kโ€“100K CPU frequency; network I/O > 60% of time
Large values (> 10KB) GET/SET 20Kโ€“40K Network I/O bandwidth
Pipelined batch operations 500Kโ€“800K Command execution logic

As client count rises and value size grows, network I/O dominates. The actual GET command logic takes a few microseconds; reading the request and writing the response takes tens of microseconds. The work ratio is badly skewed.

26.1.2 Why Not Fully Parallelize Command Execution

The obvious question: why not also run command execution on multiple threads?

Lock contention: Redis core data structuresโ€”dict, skiplist, listpackโ€”are not thread-safe. Concurrent ZADD from multiple threads requires locking; at high concurrency the lock overhead erases all multi-thread gains.

Atomic command semantics: INCR, LPUSH, ZADD derive their atomic guarantees from single-threaded execution. Fine-grained locking in a multi-threaded model is exponentially complex to implement correctly.

Transactions and Lua: MULTI/EXEC and EVAL rely on uninterrupted serial execution. Parallelizing execution would break these fundamental guarantees.

Maintainability: Single-threaded code has no deadlocks, no race conditions, easy debugging. Multi-threaded bugs are notoriously hard to reproduce and diagnose.

The conclusion Redis reached: parallelize only the I/O work; keep command execution single-threaded.


26.2 The Redis 6 Multi-Threaded I/O Design

Redis 6.0 (released May 2020) introduced multi-threaded I/O. The architecture separates the fast path (command execution) from the slow path (network I/O):

26.2.1 Thread Model Architecture

                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                    โ”‚           Main Thread                 โ”‚
                    โ”‚                                       โ”‚
                    โ”‚  epoll_wait() โ†’ detect readable fds  โ”‚
                    โ”‚  Distribute clients to I/O threads   โ”‚
                    โ”‚  Spin-wait for reads to complete     โ”‚
                    โ”‚  Execute ALL commands (single-thread)โ”‚
                    โ”‚  Distribute clients to write threads โ”‚
                    โ”‚  Spin-wait for writes to complete    โ”‚
                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                          โ†•                   โ†•
            โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
            โ”‚  I/O Thread 1   โ”‚   โ”‚  I/O Thread 2   โ”‚
            โ”‚                 โ”‚   โ”‚                 โ”‚
            โ”‚ readQueryFrom   โ”‚   โ”‚ readQueryFrom   โ”‚
            โ”‚   Client()      โ”‚   โ”‚   Client()      โ”‚
            โ”‚ writeToClient() โ”‚   โ”‚ writeToClient() โ”‚
            โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

26.2.2 Step-by-Step Execution Flow

Phase 1 โ€” Read Phase

  1. Main thread calls epoll_wait, detects multiple readable sockets.
  2. Main thread distributes clients round-robin into io_threads_list[tid] queues.
  3. Main thread sets io_threads_op = IO_THREADS_OP_READ.
  4. I/O threads concurrently execute readQueryFromClient: read raw bytes from socket into client->querybuf.
  5. Main thread spin-waits until all I/O threads finish, detected via atomic counter io_threads_pending[tid].
/* Main thread waits for all I/O threads to complete */
while(1) {
    unsigned long pending = 0;
    for (int j = 1; j < server.io_threads_num; j++)
        pending += io_threads_pending[j];
    if (pending == 0) break;
}

Phase 2 โ€” Execute Phase

  1. Main thread iterates all ready clients, calls processCommandAndResetClient on each.
  2. Parses client->querybuf, executes command, writes result to client->buf (output buffer).
  3. This phase is strictly single-threadedโ€”no concurrent data structure access.

Phase 3 โ€” Write Phase

  1. Main thread enqueues clients that have data to send.
  2. Distributes them round-robin to I/O threads.
  3. I/O threads concurrently call writeToClient, flushing client->buf to the socket.
  4. Main thread spin-waits for all writes to complete.

26.2.3 Key Implementation Details

Spin-wait over blocking wait: After dispatching to I/O threads, the main thread busy-waits rather than using a condition variable or semaphore. This avoids thread context switch latencyโ€”the main thread must resume immediately when I/O threads finish. The trade-off: CPU burns cycles during the wait window.

Minimum threshold: When pending clients < io_threads_num * 2, multi-threading is skipped entirely, falling back to single-threaded processing. Multi-thread dispatch overhead would outweigh the benefit for small batches.

/* networking.c โ€” should we use multi-threaded I/O? */
int stopThreadedIOIfNeeded(void) {
    int pending = listLength(server.clients_pending_write);
    if (server.io_threads_num == 1) return 1;
    if (pending < (server.io_threads_num * 2)) {
        if (server.io_threads_active) stopThreadedIO();
        return 1;
    }
    return 0;
}

26.3 Configuration Reference

26.3.1 Core Settings

# redis.conf

# Number of I/O threads.
# Recommended: half of physical CPU cores, max 8.
# Setting io-threads=1 disables multi-threading entirely (legacy behavior).
io-threads 4

# Apply multi-threading to read operations as well.
# Default: no (only write responses use I/O threads).
# Recommendation: set to yes for better throughput.
io-threads-do-reads yes

26.3.2 Performance Benchmark by Configuration

Tested on a 4-core/8GB server using redis-benchmark -n 1000000 -c 200 -t get,set:

Configuration GET QPS SET QPS CPU Usage
io-threads=1 (default) 95,000 88,000 Single core at 100%
io-threads=2 145,000 138,000 2 cores at ~80%
io-threads=4 195,000 182,000 4 cores at ~70%
io-threads=8 (excessive) 190,000 178,000 Thread switching overhead

Key finding: More threads is not always better. Beyond the physical core count, thread scheduling overhead dominates.

26.3.3 Verifying Multi-Threaded I/O at Runtime

# Check current I/O thread settings
CONFIG GET io-threads
CONFIG GET io-threads-do-reads

# Modify at runtime (Redis 6.2+)
CONFIG SET io-threads 4
# Note: io-threads-do-reads cannot be changed at runtime; requires restart

# Confirm threads are active (check server info)
redis-cli INFO server | grep io_threads_active

When multi-threaded I/O is NOT worth enabling:


26.4 Comparison: Redis vs. High-Performance Alternatives

26.4.1 KeyDB

KeyDB is a Redis fork that takes multi-threading furtherโ€”command execution itself runs on multiple threads:

# KeyDB configuration
server-threads 4           # One full event loop per thread
server-thread-affinity true  # Pin threads to CPU cores

How it differs:

Trade-offs:

26.4.2 Dragonfly

Dragonfly (released 2022) uses a shared-nothing architecture:

--threads=8      # Each thread owns independent hash slot shards
                 # No shared state between threads โ€” zero lock contention

How it differs:

Trade-offs:

26.4.3 Comparison Matrix

Dimension Redis 6+ KeyDB Dragonfly
Command execution Single-threaded Multi-threaded (with locks) Multi-threaded (shared-nothing)
Theoretical max QPS 200Kโ€“300K 300Kโ€“500K 1M+
API compatibility Official standard High (fork) Medium (some differences)
Production stability Excellent Good Moderate (newer)
Community/ecosystem Largest Small Small
Recommended for Most production use Extreme throughput needs Experimental/specific cases

26.5 RESP3 Protocol

Redis 6 also introduced RESP3 (Redis Serialization Protocol v3). Clients negotiate the protocol version with HELLO:

# Switch to RESP3
HELLO 3
# Server returns a Map with server metadata

# Switch back to RESP2
HELLO 2

26.5.1 New Data Types in RESP3

Type Wire Prefix Description
Map % Key-value pairs; client doesn't need to guess array layout
Set ~ Unordered collection
Double , Native float (vs. string encoding in RESP2)
Boolean # True/false (vs. integer 0/1 in RESP2)
BigNumber ( Arbitrary-precision integer
Blob Error ! Length-prefixed error message
Verbatim String = String with MIME-type hint
Push > Server-initiated push (subscription messages, invalidation)

26.5.2 Push Type: Cleaner Pub/Sub

In RESP2, subscription messages and command responses share the same connection stream. The client needs a state machine to distinguish them. RESP3's Push type (> prefix) explicitly marks server-initiated data:

# RESP3 subscription message wire format
>3\r\n
+message\r\n
+channel-name\r\n
+message-data\r\n

Clients can now handle push data in a dedicated path without ambiguity.

26.5.3 Client-Side Caching with Tracking

RESP3 enables server-assisted client-side caching. The server tracks which keys a client has read and sends invalidation messages when those keys change:

# Enable tracking โ€” redirect invalidation notices to connection ID 1234
CLIENT TRACKING ON REDIRECT 1234

# BCAST mode โ€” broadcast all changes under a key prefix
CLIENT TRACKING ON BCAST PREFIX user:

# When a tracked key changes, the server sends a Push message:
# >2\r\n+invalidate\r\n*1\r\n$8\r\nuser:123\r\n

This enables near-zero latency client-local caches with automatic invalidationโ€”a significant step toward eliminating redundant round trips for read-heavy workloads.


26.6 Production Tuning Guide

26.6.1 Benchmarking Correctly

# Baseline: single-threaded
redis-benchmark -h 127.0.0.1 -p 6379 \
  -n 1000000 -c 200 -t get,set -d 100

# Multi-threaded benchmark (Redis 6+ benchmark client also supports threads)
redis-benchmark -h 127.0.0.1 -p 6379 \
  -n 1000000 -c 200 --threads 4 -t get,set -d 100

# Large value test to stress I/O path
redis-benchmark -n 100000 -c 100 -d 10240 -t get,set

# Pipeline test (reduces per-command I/O overhead)
redis-benchmark -n 1000000 -c 200 -P 16 -t get,set

26.6.2 Diagnosing Multi-Thread Issues

Symptom: Higher CPU usage after enabling multi-threading, but QPS barely improved

# 1. Verify multi-threading is actually active
redis-cli INFO server | grep io_threads_active

# 2. Check connected client count (needs to exceed io_threads_num * 2)
redis-cli INFO clients | grep connected_clients

# 3. Profile with perf to find actual hotspot
perf top -p $(pgrep redis-server)
# If epoll_wait < 20% of CPU time, I/O is not the bottleneck

# 4. Check whether network bandwidth is saturated
sar -n DEV 1 10
ifstat -i eth0 1

Symptom: Only 2 I/O threads active despite io-threads=4

Root cause: concurrent client count below the activation threshold (io_threads_num * 2 = 8).

Fix: Increase benchmark concurrency (-c 20 or higher), or verify that production traffic has sufficient concurrent connections.

# redis.conf โ€” 8-core production server

# Multi-threaded I/O
io-threads 4                  # Half of physical cores
io-threads-do-reads yes       # Read and write both parallelized

# Connection handling
tcp-backlog 1024              # Enlarge TCP accept queue
maxclients 10000              # Allow more concurrent connections
tcp-keepalive 300             # Detect dead connections after 300s

# Slow log โ€” catch commands taking > 1ms
slowlog-log-slower-than 1000
slowlog-max-len 1000

# Disable transparent huge pages (OS-level, do in system config)
# echo never > /sys/kernel/mm/transparent_hugepage/enabled

26.7 Summary

Redis 6's multi-threaded I/O is a carefully scoped optimization: it parallelizes the slow part (network I/O) while preserving the guarantees that make Redis so reliable (single-threaded command execution, no locks, no race conditions).

Key takeaways:

Rate this chapter
4.6  / 5  (5 ratings)

๐Ÿ’ฌ Comments