Key Design and Data Modeling: Avoiding All Common Pitfalls
Chapter 30: Key Design and Data Modeling — Avoiding Every Common Pitfall
30.1 Why Key Design Matters
In Redis, a key is not just an identifier. Its length affects memory consumption; its structure affects readability and operability; its slot determines which cluster node owns it; its naming convention determines whether you can efficiently scan, monitor, and maintain the system at scale.
Poor key design is insidious: it looks harmless at 10K keys and becomes catastrophic at 10 billion. This chapter covers five dimensions of production key design: naming, serialization, hot keys, big keys, and TTL strategy.
30.2 Key Naming Conventions
30.2.1 Hierarchical Naming
The broadly accepted convention is domain:entity_type:id[:subfield]:
# Recommended patterns
user:1000:profile # User 1000's profile (Hash)
user:1000:orders # User 1000's order list (List or ZSet)
order:20240501:12345 # Specific order (String or Hash)
cache:product:SKU-9527 # Product cache (String)
lock:payment:order-12345 # Distributed lock (String)
rate:api:user:1000:v2 # Rate limiter counter (String)
leaderboard:game:101:2024W20 # Weekly leaderboard (ZSet)
session:abc123def456 # Session data (Hash)
Separator choice:
:is the de facto Redis community standard. Tools like RedisInsight parse:to build a tree view of your keyspace_is an acceptable alternative in codebases where colon conflicts with language conventions- Never mix: pick one separator and enforce it project-wide
30.2.2 Key Length
# Memory cost of a key in Redis internals
# - dictEntry struct: ~64 bytes per key
# - SDS (key string): header (8B) + key content
# - Total overhead: ~90–200 bytes per key
# Too long — redundant verbosity
user_profile_data_for_the_user_with_id_12345_and_type_premium # 63 chars
# Too short — unreadable
u:1:p # What does this mean without documentation?
# Good — clear hierarchy, reasonable length
user:12345:profile # 18 chars, self-documenting
Guidelines:
- Target < 60 bytes; anything over 100 bytes needs justification
- Don't sacrifice readability for brevity
- At scale: 100 million keys × 90 bytes base overhead = 9 GB just for key metadata. Short keys reduce this meaningfully
30.2.3 Special Characters and Encoding
# Redis keys support arbitrary binary content, but avoid:
# - Spaces: redis-cli will misparse them
# - Newlines: invisible in logs and monitoring tools
# - Control chars: difficult to debug
# Non-ASCII keys (valid, but not recommended)
SET 用户:1000:档案 value # Legal; each CJK character = 3 bytes, making keys longer
# Prefer ASCII with numeric IDs
# Hash tags in Cluster mode
# The substring inside {} determines the hash slot
SET {user:1000}:profile value
SET {user:1000}:session value
# Both keys land on the same slot → can be used in the same Pipeline or MULTI/EXEC
30.2.4 Namespace Planning for Multi-Tenant Redis
When multiple systems share a Redis instance, prefix isolation is mandatory:
ecommerce:product:123 # E-commerce system
crm:user:456 # CRM system
bi:report:cache:20240501 # BI cache
analytics:event:20240501 # Analytics system
Better architecture: separate Redis instances per domain. The databases configuration (0–15) is often misused as namespacing—it provides no performance isolation, no independent maxmemory, and no independent monitoring:
# Avoid using SELECT 1, SELECT 2 for namespace isolation
# Prefer: one Redis instance per major business domain
# Or: Redis Cluster with dedicated key prefix per service
30.3 Serialization Format Selection
30.3.1 Format Comparison
Redis values are byte strings. The application chooses how to serialize objects:
| Format | Size vs. JSON | Speed | Cross-language | Readability | Best For |
|---|---|---|---|---|---|
| JSON | Baseline | Slow (text parse) | Excellent | High | Debugging, small objects, API response caches |
| Protobuf | ~1/3 of JSON | Very fast (binary) | Good (needs IDL) | Low | High-frequency large objects, cross-language microservices |
| MessagePack | ~1/2 of JSON | Fast (binary) | Good (no IDL needed) | Low | General use; drop-in JSON replacement |
| Avro | Similar to Protobuf | Fast | Good (Schema Registry) | Low | Kafka + Redis pipelines |
| Kryo/Hessian | ~1/2 of JSON | Fast | Poor (JVM-only) | Low | Java monolith |
30.3.2 Benchmark: 100-Field User Object
| Format | Payload Size | Serialize (μs) | Deserialize (μs) |
|---|---|---|---|
| JSON | 2.1 KB | 45 | 62 |
| MessagePack | 1.0 KB | 18 | 22 |
| Protobuf | 650 B | 8 | 11 |
| Custom string | 500 B | 3 | 12 |
Takeaway: For objects larger than a few hundred bytes accessed thousands of times per second, Protobuf's bandwidth and CPU savings are meaningful. For objects needing human inspection (debugging, small configs), JSON's readability is genuinely valuable.
30.3.3 Compressing Large Values
For values exceeding 10 KB, compress after serializing:
import gzip, json, redis
r = redis.Redis()
def set_compressed(key: str, obj: dict, ttl: int = 3600):
"""Serialize → gzip compress → store in Redis."""
serialized = json.dumps(obj).encode('utf-8')
compressed = gzip.compress(serialized, compresslevel=6)
r.setex(key, ttl, compressed)
def get_compressed(key: str) -> dict | None:
"""Fetch from Redis → decompress → deserialize."""
raw = r.get(key)
if raw is None:
return None
return json.loads(gzip.decompress(raw).decode('utf-8'))
# Typical result: 10 KB JSON → 2 KB compressed (5x ratio)
# CPU cost: ~0.3–0.5ms additional latency per operation
Enable compression when:
- Value size > 5 KB and access frequency < 1K QPS (CPU cost is amortized)
- Network bandwidth is billed (cloud environments)
- Memory is the bottleneck (compression reduces memory 50–80%)
30.4 Hot Keys
30.4.1 Defining a Hot Key
A key becomes a "hot key" when it receives a disproportionate share of total QPS—typically 10–30%+ of a single node's request volume:
Typical hot key scenarios:
- Top trending topic on social media (GET hotSearch:rank:1)
- Flash sale product inventory (GET/DECR stock:product:SKU-9527)
- Homepage configuration (GET config:homepage)
- Global page view counter (INCR global:pageview)
- Popular discount coupon (GET coupon:activity:999)
Consequences:
- Single node CPU hits 100%; other keys on that node suffer latency
- Cluster data skew: hot node handles 10x the load of neighboring nodes
- Network saturation on large-value hot keys under high QPS
30.4.2 Detecting Hot Keys
# Method 1: redis-cli --hotkeys (requires maxmemory-policy = *-lfu)
redis-cli --hotkeys -h redis-host -p 6379
# Output:
# hot key found with counter: 9842 keyname: hotSearch:rank:1
# hot key found with counter: 7234 keyname: stock:product:SKU-9527
# Method 2: MONITOR sampling (use sparingly — halves server throughput)
redis-cli monitor | head -5000 | grep " GET " \
| awk '{print $4}' | tr -d '"' \
| sort | uniq -c | sort -rn | head -20
# Method 3: LFU frequency counter (Redis 4.0+ with LFU policy)
OBJECT FREQ hotSearch:rank:1 # Returns LFU access frequency estimate
# Method 4: Client-side instrumentation (recommended for production)
# Intercept all Redis calls in your client wrapper, record key access counts,
# export to Prometheus/Grafana without impacting Redis performance
30.4.3 Hot Key Solutions
Solution 1: Local (L1) Cache
from cachetools import TTLCache
import threading
import redis
r = redis.Redis()
_local = TTLCache(maxsize=500, ttl=3) # 3-second local cache
_lock = threading.Lock()
def get_with_local_cache(key: str):
"""Check local cache first, fall back to Redis."""
value = _local.get(key)
if value is not None:
return value
value = r.get(key)
if value is not None:
with _lock:
_local[key] = value
return value
def invalidate_local(key: str):
"""Call on writes to keep local cache fresh."""
with _lock:
_local.pop(key, None)
Pros: zero Redis traffic for hot reads, sub-0.1ms latency Cons: short data lag (TTL window), stale across multiple app instances
Solution 2: Key Sharding (Read Replicas in Key Form)
import random, redis
r = redis.Redis()
SHARD_COUNT = 10
def read_sharded(base_key: str) -> bytes | None:
"""Read from a random shard — distributes hot reads across 10 keys."""
shard_idx = random.randint(0, SHARD_COUNT - 1)
return r.get(f"{base_key}:shard:{shard_idx}")
def write_sharded(base_key: str, value: str, ttl: int = 60):
"""Write to ALL shards to keep them consistent."""
pipe = r.pipeline()
for i in range(SHARD_COUNT):
pipe.setex(f"{base_key}:shard:{i}", ttl, value)
pipe.execute()
# Usage:
# Write (updates all 10 copies)
write_sharded("hotSearch:rank:1", "Redis 8.0 Released")
# Read (randomly picks 1 of 10 — QPS distributed across 10 keys/nodes)
val = read_sharded("hotSearch:rank:1")
Solution 3: Read-Write Splitting
from redis.sentinel import Sentinel
sentinel = Sentinel(
[('sentinel1', 26379), ('sentinel2', 26379), ('sentinel3', 26379)],
socket_timeout=0.1
)
master = sentinel.master_for('mymaster', socket_timeout=0.1)
replica = sentinel.slave_for('mymaster', socket_timeout=0.1)
# Writes go to primary
master.set('hotkey', 'value')
# Reads distributed across replicas (Redis handles replication automatically)
val = replica.get('hotkey')
30.5 Big Keys
30.5.1 Defining a Big Key
Industry-standard thresholds:
| Data Type | Big Key Threshold |
|---|---|
| String | > 10 KB |
| Hash | > 5,000 fields |
| List | > 5,000 elements |
| Set | > 5,000 members |
| ZSet | > 5,000 members |
| Stream | > 10,000 entries |
30.5.2 The Damage Big Keys Cause
Slow network transfers:
Fetching a 1 MB String value:
- 10 Gbps intranet: ~0.8 ms network time
- 1 Gbps intranet: ~8 ms network time
- Client deserialize: +1–5 ms
Compare: a 100-byte GET typically completes in 0.1 ms
Main thread blocking during deletion:
# DEL on a Hash with 100,000 fields can block for tens of milliseconds
DEL big_hash # Blocks main thread during memory release!
# Use UNLINK instead — returns immediately, releases memory in background thread
UNLINK big_hash
# Same applies to flush operations
FLUSHDB ASYNC
FLUSHALL ASYNC
Cluster data skew: a 10 MB key on one node causes that node's memory, CPU, and network utilization to far exceed its neighbors, breaking load balance.
RDB and AOF impact: large keys serialize slowly during RDB snapshots, extending the fork copy-on-write window and increasing memory pressure.
30.5.3 Detecting Big Keys
# Method 1: redis-cli --bigkeys (uses SCAN, non-blocking)
redis-cli --bigkeys -h redis-host -p 6379
# Output:
# Biggest string: 'user:1000:bio' with 52428 bytes
# Biggest hash: 'user:events:all' with 125432 fields
# Method 2: Check a specific key's memory footprint
redis-cli MEMORY USAGE user:1000:bio # Returns bytes
redis-cli DEBUG OBJECT user:1000:bio # Encoding, serialization length
# Method 3: RDB offline analysis
# rdb-tools (Python)
pip install rdbtools
rdb --command memory dump.rdb | sort -t, -k4 -rn | head -20
# redis-rdb-cli (Java, richer output)
rct -c memory -s /var/lib/redis/dump.rdb -o big_keys_report.csv -t string,hash,list,set,zset
30.5.4 Big Key Remediation Strategies
String — large values:
import gzip, json, redis, boto3
r = redis.Redis()
s3 = boto3.client('s3')
def store_large_content(content_id: str, content: dict, ttl: int = 3600):
content_json = json.dumps(content)
if len(content_json) > 10240: # > 10 KB → offload to object storage
s3_key = f"content/{content_id}.json"
s3.put_object(Bucket='my-cache-bucket', Key=s3_key, Body=content_json)
# Store only the reference URL in Redis
r.setex(f"article:{content_id}:content", ttl,
f"s3://my-cache-bucket/{s3_key}")
else:
# Small enough to store directly
r.setex(f"article:{content_id}:content", ttl, content_json)
Hash — too many fields:
# Before: single Hash with 500+ fields
# HSET user:1000 name Alice age 30 email ... [500 fields]
# After: split by field category
HSET user:1000:basic name Alice age 30 gender F
HSET user:1000:contact email [email protected] phone 13800138000
HSET user:1000:prefs language zh timezone Asia/Shanghai theme dark
HSET user:1000:stats login_count 150 last_login_ts 1716000000
# Benefit: most operations only need one sub-Hash
HGETALL user:1000:basic # Fast — only 3-5 fields
HGET user:1000:contact email # Even faster — one field
List/ZSet — too many elements:
from datetime import datetime, timedelta
import redis
r = redis.Redis()
# Time-based sharding: split history ZSet by month
def add_user_event(user_id: int, event: str, score: float):
month = datetime.now().strftime("%Y%m")
key = f"user:{user_id}:events:{month}"
r.zadd(key, {event: score})
r.expire(key, 86400 * 90) # Keep for 90 days
def get_user_events(user_id: int, months: int = 3):
now = datetime.now()
results = []
for i in range(months):
d = now - timedelta(days=i * 30)
key = f"user:{user_id}:events:{d.strftime('%Y%m')}"
results.extend(r.zrevrangebyscore(key, '+inf', '-inf', withscores=True))
return sorted(results, key=lambda x: x[1], reverse=True)[:1000]
# Fixed-size capped list: keep only the most recent N items
def append_capped(key: str, value: str, max_size: int = 1000):
pipe = r.pipeline()
pipe.lpush(key, value)
pipe.ltrim(key, 0, max_size - 1)
pipe.execute()
30.6 TTL Design Principles
30.6.1 Why Cache Keys Must Have TTLs
Without TTLs, memory grows unboundedly until maxmemory is hit. If eviction policy is noeviction, Redis starts rejecting writes—a production outage. Even with LRU/LFU eviction, keys without TTLs compete unfairly with keys that have natural expiry.
Rule: every cache key must have a TTL. Even "permanent" data should refresh periodically.
30.6.2 Preventing Cache Stampede with TTL Jitter
When many keys expire simultaneously (common after a cache warm-up or deployment), all requests hit the database at once—a "thundering herd" or cache stampede:
import random, redis
r = redis.Redis()
def set_with_jitter(key: str, value, base_ttl: int = 3600):
"""Add random jitter to TTL to prevent synchronized expiry."""
jitter = random.randint(0, base_ttl // 10) # ±10% randomization
actual_ttl = base_ttl + jitter
r.setex(key, actual_ttl, value)
# 1000 keys with base TTL 3600s will now expire
# spread across 3600–3960s — no synchronized cache miss spike
# More aggressive: use base ± 20%
def set_with_wide_jitter(key: str, value, base_ttl: int = 3600):
spread = base_ttl // 5 # 20% spread
actual_ttl = base_ttl + random.randint(-spread // 2, spread // 2)
r.setex(key, max(1, actual_ttl), value)
30.6.3 Aligning TTL with Data Volatility
# Rule: Cache TTL must be shorter than the meaningful freshness window,
# but long enough to provide a cache hit rate worth having.
# Anti-pattern: product price cached for 1 second
# (prices change hourly — cache is useless, 99% miss rate)
r.setex("product:price:123", 1, "99.9") # Pointless!
# Correct: align TTL to actual change frequency
TTL_POLICY = {
"user:profile": 86400, # Changes rarely: 1 day
"user:session": 7200, # Session timeout: 2 hours
"product:detail": 300, # Updated every few minutes: 5 min
"product:stock": 30, # Volatile: 30 seconds
"config:homepage": 600, # Config reloads: 10 minutes
"rate:api:*": 60, # Rate limiter window: 1 minute
"lock:*": 30, # Lock timeout + safety margin
}
def set_with_policy(key: str, value):
for pattern, ttl in TTL_POLICY.items():
import fnmatch
if fnmatch.fnmatch(key, pattern):
r.setex(key, ttl, value)
return
r.setex(key, 300, value) # Default: 5 minutes
30.6.4 Sliding Window TTL and Background Refresh
import threading
def get_sliding_ttl(key: str, ttl: int = 3600):
"""Reset TTL on each access — useful for session-like data."""
pipe = r.pipeline()
pipe.get(key)
pipe.expire(key, ttl) # Refresh TTL on read
value, _ = pipe.execute()
return value
class ProactiveCache:
"""Refresh cache before it expires to avoid a miss on expiry."""
def __init__(self, refresh_fn, base_ttl: int = 3600, refresh_threshold: int = 60):
self.refresh_fn = refresh_fn
self.base_ttl = base_ttl
self.refresh_threshold = refresh_threshold
self._refreshing = set()
def get(self, key: str):
value = r.get(key)
remaining = r.ttl(key)
# Proactively refresh if near expiry
if remaining != -1 and remaining < self.refresh_threshold:
if key not in self._refreshing:
self._refreshing.add(key)
thread = threading.Thread(
target=self._background_refresh,
args=(key,),
daemon=True
)
thread.start()
return value
def _background_refresh(self, key: str):
try:
new_value = self.refresh_fn(key)
r.setex(key, self.base_ttl, new_value)
finally:
self._refreshing.discard(key)
30.7 Data Modeling — Translating RDBMS Patterns to Redis
30.7.1 Common Relational Pattern Translations
One-to-many (User → Orders):
# RDBMS: orders table with user_id foreign key + query by user_id
# Redis Option A: ZSet (score = timestamp → range queries by time)
ZADD user:1000:orders 1716000000 "order:20240501:12345"
ZADD user:1000:orders 1716003600 "order:20240501:12346"
# Query: all orders in a time range
ZRANGEBYSCORE user:1000:orders 1716000000 1716086400
# Redis Option B: List (insertion order → fast access to recent N)
LPUSH user:1000:orders "order:20240501:12346"
LTRIM user:1000:orders 0 999 # Keep latest 1000
# Fetch recent 10
LRANGE user:1000:orders 0 9
Many-to-many (Users ↔ Tags):
# Bidirectional Sets: tag→users and user→tags
SADD tag:redis:users "user:1000" "user:1001" "user:1002"
SADD tag:python:users "user:1001" "user:1003"
SADD user:1000:tags "redis" "distributed"
SADD user:1001:tags "redis" "python"
# Find users with BOTH redis AND python tags (set intersection)
SINTERSTORE result:redis_and_python tag:redis:users tag:python:users
SMEMBERS result:redis_and_python # → user:1001
# Find users with redis OR python (set union)
SUNIONSTORE result:redis_or_python tag:redis:users tag:python:users
SCARD result:redis_or_python # Count of union
Sorted pagination:
# ZSet provides natural ordered pagination
ZADD articles 1716000000 "article:1"
ZADD articles 1716003600 "article:2"
ZADD articles 1716007200 "article:3"
# Page 1 (newest first): items 1–10
ZREVRANGEBYSCORE articles +inf -inf LIMIT 0 10
# Page 2: items 11–20
ZREVRANGEBYSCORE articles +inf -inf LIMIT 10 10
# Redis 6.2+ unified ZRANGE syntax
ZRANGE articles 0 9 REV # Newest 10
ZRANGE articles "(last_seen_score" "-inf" BYSCORE LIMIT 0 10 REV # Cursor-based
30.7.2 Eliminating N+1 Queries
import redis
r = redis.Redis()
# WRONG: N+1 pattern (each order is a separate round trip)
order_ids = r.lrange("user:1000:orders", 0, 49)
orders = []
for oid in order_ids:
orders.append(r.hgetall(f"order:{oid}")) # 50 round trips!
# CORRECT: Pipeline batches all requests into one network round trip
order_ids = r.lrange("user:1000:orders", 0, 49)
pipe = r.pipeline()
for oid in order_ids:
pipe.hgetall(f"order:{oid}")
orders = pipe.execute() # 1 round trip for 50 orders!
# Even more efficient with MGET for String values
product_keys = [f"product:{pid}" for pid in product_ids]
products_raw = r.mget(product_keys)
products = [json.loads(p) for p in products_raw if p is not None]
30.8 Production Monitoring and Operations
30.8.1 Key Space Monitoring
# Keyspace overview — O(1)
redis-cli INFO keyspace
# db0:keys=1500000,expires=800000,avg_ttl=3541000
# db1:keys=50000,expires=50000,avg_ttl=7200000
# Memory overview
redis-cli INFO memory
# used_memory_human: 2.50G
# mem_fragmentation_ratio: 1.15 ← ideal: 1.0–1.5
# maxmemory_human: 8.00G
# Total key count — O(1)
redis-cli DBSIZE
# Count keys matching a pattern — uses SCAN (non-blocking)
redis-cli --scan --pattern "user:*" | wc -l
redis-cli --scan --pattern "cache:*" | wc -l
30.8.2 Safe Key Scanning
# NEVER use KEYS in production — O(N), blocks all clients
KEYS user:* # DANGEROUS: blocks Redis for seconds on 10M+ keys
# ALWAYS use SCAN — non-blocking, iterative
SCAN 0 MATCH "user:*" COUNT 100
# Returns: [next_cursor, [key1, key2, ...]]
# Repeat until cursor returns 0
# Python: lazy iteration over all matching keys
for key in r.scan_iter("user:*", count=100):
process(key) # Processes ~100 keys per Redis round trip
30.8.3 Key Expiry Monitoring
# Monitor expiry events (enable keyspace notifications first)
CONFIG SET notify-keyspace-events Ex
SUBSCRIBE __keyevent@0__:expired
# Check if a key's TTL is set
TTL mykey # Returns -1 (no TTL), -2 (doesn't exist), or seconds remaining
PTTL mykey # Same, in milliseconds
# Find keys WITHOUT a TTL (potential memory leak)
# Note: This is a full scan — only run during off-peak hours
redis-cli --scan | while read key; do
ttl=$(redis-cli TTL "$key")
if [ "$ttl" = "-1" ]; then
echo "No TTL: $key"
fi
done | head -100
30.9 Summary and Production Checklist
Key design and data modeling underpin every production Redis deployment. Here is the distilled checklist from this chapter:
Naming conventions:
- Use
domain:entity:id[:subfield]hierarchy with:as separator - Keep key length under 60 bytes
- Isolate namespaces with prefixes when multiple systems share a Redis instance
- In Cluster mode, use hash tags
{}to control slot placement for co-located keys
Serialization:
- Use Protobuf or MessagePack for large, high-frequency objects instead of JSON
- Apply gzip compression for values exceeding 10 KB
- Offload values larger than 100 KB to object storage (S3/OSS); store the URL in Redis
Hot keys:
- Monitor key access frequency via LFU counters or client-side instrumentation
- Apply local caching (Caffeine/Guava) for extreme hot keys
- Shard hot reads across N replica keys to distribute load
- Use read-write splitting to route reads to replicas
Big keys:
- Run
redis-cli --bigkeysperiodically (ideally in CI/CD or scheduled jobs) - Always use
UNLINKinstead ofDELfor large keys - Split large Hashes by field category; split large ZSets by time window
- Move large String values to object storage
TTL design:
- Every cache key must have a TTL — no exceptions
- Add random jitter (±10%) to prevent synchronized cache expiry storms
- Align TTL with actual data change frequency (not too short, not too long)
- Use sliding TTL for session-like data; use proactive background refresh for critical hot keys