Chapter 38
Slow Query and Big Key Analysis
Chapter 38: Slow Query Analysis and Big Key Diagnosis
Performance problems hide in details: a forgotten KEYS *, a Hash that silently grew to 100,000 fields, a single key absorbing 80% of all traffic. This chapter covers the complete toolkit for discovering, diagnosing, and fixing these issues without impacting production.
1. Slowlog Configuration and Usage
1.1 Configuration Parameters
# redis.conf
slowlog-log-slower-than 10000 # threshold in microseconds (μs); 10000 μs = 10 ms
# set to 0 to log every command (debug use only)
# set to -1 to disable entirely
slowlog-max-len 128 # ring buffer of at most 128 entries; older entries are dropped
Dynamic changes (no restart needed):
CONFIG SET slowlog-log-slower-than 5000 # lower to 5 ms to catch more
CONFIG SET slowlog-max-len 256
CONFIG REWRITE # persist to redis.conf
1.2 Querying the Slowlog
SLOWLOG GET # retrieve all entries (up to slowlog-max-len)
SLOWLOG GET 10 # retrieve the 10 most recent entries
SLOWLOG LEN # how many entries are currently stored
SLOWLOG RESET # clear all entries
# Each entry contains:
# 1) (integer) 14 ← auto-increment ID
# (integer) 1699000000 ← Unix timestamp when the command started
# (integer) 28000 ← execution time in microseconds (28 ms)
# 1) "KEYS" ← command name
# "*" ← argument (truncated to 128 bytes by default)
# "127.0.0.1:54321" ← client address (Redis 4.0+)
# "myapp" ← client name (set via CLIENT SETNAME)
1.3 Parsing the Slowlog in Python
import redis
from datetime import datetime
r = redis.Redis()
def print_slowlog(count: int = 50):
for entry in r.slowlog_get(count):
ts = datetime.fromtimestamp(entry['start_time'])
ms = entry['duration'] / 1000
cmd_parts = [
arg.decode() if isinstance(arg, bytes) else str(arg)
for arg in entry['command']
]
cmd = ' '.join(cmd_parts)[:120]
print(f"[{ts}] {ms:.1f} ms | {cmd}")
print_slowlog()
2. Commands Known to Be Slow
| Command | Complexity | Blocking Risk | Replacement |
|---|---|---|---|
KEYS * |
O(N) | Critical | SCAN |
HGETALL |
O(N) | High (large Hashes) | HSCAN iteratively |
SMEMBERS |
O(N) | High (large Sets) | SSCAN iteratively |
SORT |
O(N + M·log M) | High | Pre-sort into ZSET |
LRANGE 0 -1 |
O(N) | High (long Lists) | Paginate with LRANGE |
SUNIONSTORE |
O(N) | High | Split into smaller operations |
SINTERSTORE |
O(N·M) | Critical | Step-by-step intersections |
DEL (large key) |
O(N) | High | UNLINK (async) |
FLUSHDB (sync) |
O(N) | Critical | FLUSHDB ASYNC |
2.1 UNLINK: Asynchronous Deletion of Large Keys
# DEL is synchronous — freeing a 100,000-field Hash blocks the main thread
DEL big_hash # dangerous: may take hundreds of milliseconds
# UNLINK detaches the key in O(1) on the main thread; a background thread frees the memory
UNLINK big_hash # safe
# Production policy: always prefer UNLINK for keys that could be large
def safe_delete(*keys):
return r.unlink(*keys)
3. Big Key Scanning
3.1 redis-cli --bigkeys
redis-cli --bigkeys
# Uses SCAN internally — does NOT block the main thread
# Reports the largest key for each data type
# Throttle scan rate to reduce production impact
redis-cli --bigkeys -i 0.1 # sleep 0.1 s after every 100 SCAN commands
Sample output:
# Scanning the entire keyspace to find biggest keys as well as
# average sizes per key type.
[00.00%] Biggest string found so far 'config:blob' with 512000 bytes
[23.45%] Biggest hash found so far 'user:profile:9999' with 18432 fields
[67.89%] Biggest zset found so far 'leaderboard:all' with 2048321 members
-------- summary -------
Sampled 1000000 keys in the keyspace!
Total key length in bytes is 24000000 (avg len 24.00)
Biggest string found 'config:blob' has 512000 bytes
Biggest hash found 'user:profile:9999' has 18432 fields
Biggest zset found 'leaderboard:all' has 2048321 members
Biggest set found 'tags:popular' has 10240 members
Biggest list found 'queue:jobs' has 50000 items
3.2 Per-Key Memory Usage
# Recursive memory calculation including encoding overhead
MEMORY USAGE user:profile:9999
# returns bytes used by the key and its value (including Redis metadata)
MEMORY USAGE user:profile:9999 SAMPLES 0 # exact (slow) — samples all nested elements
4. Hot Key Detection
4.1 Enabling LFU Policy
Hot key detection requires an LFU eviction policy so Redis tracks access frequency:
# redis.conf
maxmemory-policy allkeys-lfu # or volatile-lfu
# or dynamically
CONFIG SET maxmemory-policy allkeys-lfu
redis-cli --hotkeys
# Sample output:
# [67.89%] Hot key 'product:detail:10086' found so far with counter 9876
# [89.12%] Hot key 'user:session:abc123' found so far with counter 7654
#
# -------- summary -------
# hot key found with counter: 9876 keyname: product:detail:10086
# hot key found with counter: 7654 keyname: user:session:abc123
4.2 Real-Time Hot Key Detection via MONITOR
# WARNING: MONITOR streams all commands and causes ~50% throughput drop
# Use only briefly for urgent investigation, never in steady-state
redis-cli monitor | head -n 50000 \
| grep -oP '(?<= ")[^"]+(?=")' \
| sort | uniq -c | sort -rn | head -20
4.3 Mitigating Hot Keys
# Strategy 1: Local in-process cache (LRU, short TTL)
from functools import lru_cache
@lru_cache(maxsize=1000)
def get_product(product_id: str) -> dict:
"""Cached at process level; Redis only queried on cache miss."""
data = r.hgetall(f"product:{product_id}")
return {k.decode(): v.decode() for k, v in data.items()}
# Strategy 2: Key replication across multiple Redis keys
import random
REPLICAS = 10
def get_hot_key(base_key: str) -> str:
"""Distribute reads across N replicated copies."""
replica = random.randint(0, REPLICAS - 1)
return r.get(f"{base_key}:r{replica}")
def set_hot_key(base_key: str, value: str, ttl: int = 300):
"""Write to all replicas atomically via pipeline."""
pipe = r.pipeline()
for i in range(REPLICAS):
pipe.setex(f"{base_key}:r{i}", ttl, value)
pipe.execute()
5. Memory Analysis Tools
5.1 Online Analysis
# Memory usage of a single key
redis-cli memory usage mykey
# Aggregated memory statistics
redis-cli memory stats
# Key fields:
# peak.allocated — highest memory allocation ever seen
# total.allocated — current total allocated
# fragmentation.ratio — RSS / used_memory
# Memory doctor: actionable recommendations
redis-cli memory doctor
# Example: "Your vm.overcommit_memory is set to 0..."
5.2 Offline RDB Analysis
# Option 1: redis-rdb-tools (Python)
pip install rdbtools python-lzf
rdb --command memory dump.rdb > memory.csv
# CSV columns: database, type, key, size_in_bytes, encoding, num_elements, len_largest_element
# Find the 20 largest keys
sort -t',' -k4 -rn memory.csv | head -20
# Option 2: rdb-cli (Go — much faster on large RDB files)
go install github.com/HDT3213/rdb@latest
rdb -c memory -port 7379 dump.rdb
# Spawns an HTTP server; visualize in your browser
5.3 Generating an RDB Snapshot
# Trigger a background save (brief fork overhead)
BGSAVE
LASTSAVE # Unix timestamp of the most recent successful save
# Monitor progress
redis-cli info persistence | grep rdb_bgsave_in_progress
# 0 = save complete; 1 = in progress
6. Big Key Refactoring Playbook
6.1 Scenario: Oversized Hash (100,000 fields)
# Problem: HGETALL user:1000 returns 100,000 fields → 500 ms+
HGETALL user:1000
Solution: shard fields across multiple Hashes
import hashlib
SHARD_COUNT = 16
def _shard(field: str) -> int:
return int(hashlib.md5(field.encode()).hexdigest(), 16) % SHARD_COUNT
def hset_sharded(base_key: str, field: str, value: str):
r.hset(f"{base_key}:s{_shard(field)}", field, value)
def hget_sharded(base_key: str, field: str) -> str | None:
raw = r.hget(f"{base_key}:s{_shard(field)}", field)
return raw.decode() if raw else None
def hgetall_sharded(base_key: str) -> dict:
pipe = r.pipeline()
for i in range(SHARD_COUNT):
pipe.hgetall(f"{base_key}:s{i}")
result = {}
for shard_data in pipe.execute():
result.update({k.decode(): v.decode() for k, v in shard_data.items()})
return result
6.2 Scenario: Oversized Set (Millions of Members)
SET_SHARD_COUNT = 100
def sadd_sharded(base_key: str, member: str):
shard = hash(member) % SET_SHARD_COUNT
r.sadd(f"{base_key}:s{shard}", member)
def sismember_sharded(base_key: str, member: str) -> bool:
shard = hash(member) % SET_SHARD_COUNT
return bool(r.sismember(f"{base_key}:s{shard}", member))
def scard_sharded(base_key: str) -> int:
pipe = r.pipeline()
for i in range(SET_SHARD_COUNT):
pipe.scard(f"{base_key}:s{i}")
return sum(pipe.execute())
6.3 Scenario: Oversized String Value (> 1 MB)
import gzip, json
def set_compressed(key: str, data: dict, ttl: int = 3600):
"""Compress large JSON payloads before storing."""
raw = json.dumps(data, ensure_ascii=False).encode()
compressed = gzip.compress(raw, compresslevel=6)
compression_ratio = len(raw) / len(compressed)
print(f"Stored {len(compressed)} bytes (ratio {compression_ratio:.1f}x)")
r.setex(key, ttl, compressed)
def get_compressed(key: str) -> dict | None:
compressed = r.get(key)
if not compressed:
return None
return json.loads(gzip.decompress(compressed).decode())
Guideline for values over 100 KB: store the object in object storage (S3/OSS) and keep only a URL + metadata in Redis.
7. Memory Fragmentation
INFO memory | grep mem_fragmentation_ratio
# mem_fragmentation_ratio: 1.89
# = RSS memory / used_memory
# Healthy range: 1.0 – 1.5
# > 1.5: significant fragmentation — consider active defrag or restart
# < 1.0: memory has been swapped to disk — investigate immediately
7.1 Active Defragmentation (Redis 4.0+)
CONFIG SET activedefrag yes
CONFIG SET active-defrag-ignore-bytes 100mb # start only if waste > 100 MB
CONFIG SET active-defrag-threshold-lower 10 # start defrag above 10% fragmentation
CONFIG SET active-defrag-threshold-upper 100 # run at full speed above 100%
CONFIG SET active-defrag-cycle-min 1 # minimum CPU % allocated to defrag
CONFIG SET active-defrag-cycle-max 25 # maximum CPU % allocated to defrag
# One-shot manual purge
MEMORY PURGE
7.2 Restart-Based Defragmentation
For severe fragmentation (ratio > 2.0), the fastest remedy is a controlled failover:
- Promote a replica to primary.
- Restart the old primary — RDB reload produces a compact, unfragmented layout.
- Rejoin as a replica.
8. Production Diagnosis Checklist
When Redis response times spike, work through this list in order:
□ 1. SLOWLOG GET 20 — what commands are slow?
□ 2. INFO stats | grep instantaneous — current QPS
□ 3. INFO memory — used_memory, maxmemory, fragmentation
□ 4. INFO clients — connected_clients, blocked_clients
□ 5. redis-cli --bigkeys -i 0.1 — any surprise big keys? (run off-peak)
□ 6. redis-cli --hotkeys — concentrated traffic on a few keys?
□ 7. INFO persistence — bgsave / AOF rewrite in progress?
□ 8. INFO replication — repl_backlog overflow?
□ 9. top -p $(pgrep redis-server) — CPU saturation?
□ 10. ss -tnp | grep 6379 | wc -l — TCP connection count normal?
Chapter Summary
SLOWLOGis the first tool for diagnosing slow commands; set the threshold to 5–10 ms in production.KEYS *must be completely eliminated — from application code and from DBA tooling alike.- Delete large keys with
UNLINK; never useDELon a key of unknown size. redis-cli --bigkeysand--hotkeysare routine inspection tools; schedule them weekly during low-traffic hours.- Refactor big Hashes and Sets using field-hash sharding with parallel Pipeline reads.
- Memory fragmentation above 1.5: enable
activedefrag, or plan a failover-then-restart sequence during a maintenance window.