Persistence Strategy and Disaster Recovery
Chapter 15: Persistence Strategy Selection and Disaster Recovery
15.1 The Data Safety Matrix
Before choosing a persistence strategy, answer one foundational question: How much data loss is acceptable?
| Strategy | Max data loss | Recovery speed | File size | CPU overhead | Memory overhead |
|---|---|---|---|---|---|
| No persistence | Everything (restart = empty) | N/A | 0 | Minimal | Minimal |
| RDB hourly | Up to 1 hour | Fastest (5โ10 min/10 GB) | Smallest | Low (fork) | Low |
| RDB every 5 min | Up to 5 min | Fastest | Small | Medium (frequent forks) | Medium |
| AOF everysec | ~1 second | Slow (replay all commands) | Large (unbounded growth) | Medium | Medium |
| AOF always | ~1 command | Slowest | Largest | High (fsync blocks) | Medium |
| Mixed persistence | ~1 second | Fast (RDB + small AOF replay) | Medium | Medium | Medium |
| RDB + AOF + replica | Near zero | Fastest (promote replica) | Medium | Higher | Higher |
15.2 Strategy Selection Guide by Use Case
15.2.1 Case 1: Pure Cache (Full Data Loss Acceptable)
# redis.conf
save "" # disable RDB
appendonly no # disable AOF
# Appropriate for:
# - Session store (users can re-authenticate)
# - Page/API response cache
# - CDN hot data prefetch
# - Temporary computation scratchpad
Benefits: Zero persistence overhead, maximum throughput, no fork() latency spikes.
Important: Disabling persistence on the master does not disable replication. Replicas hold an in-memory copy, but they will sync from an empty master if the master restarts cleanly โ see Section 15.6.3 for the dangerous edge case.
Expected throughput gain: ~15โ20% more ops/s versus mixed persistence, because there is no aof_buf write on every command and no periodic fork() stall.
15.2.2 Case 2: Cache with Fast Rebuild (Minutes of Loss OK)
save 3600 1
save 300 100
save 60 10000
appendonly no
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb
dir /var/lib/redis
# Appropriate for:
# - Product catalog cache (rebuildable from SQL DB)
# - Leaderboards (minor regression acceptable)
# - Real-time counters (small error margin acceptable)
Recovery time estimates:
Dataset: 10 GB RDB on NVMe SSD
Disk read @ 1 GB/s: 10 seconds
Dict rebuild + pointer init: ~30 seconds
Total restart time: ~40 seconds
Comparison: rebuild from MySQL (10M rows JOIN):
Query execution: 5โ20 minutes
Network transfer: additional minutes
15.2.3 Case 3: Business Data (No More Than 1 Second of Loss)
# Mixed persistence โ recommended default for production
save 3600 1
save 300 100
save 60 10000
appendonly yes
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-use-rdb-preamble yes
# Appropriate for:
# - User points and balances (1-second loss tolerated)
# - Order status cache (non-payment critical path)
# - Message queues (Stream)
# - Inventory with compensating transactions
Why mixed persistence is the sweet spot:
Restart recovery sequence:
1. Load RDB preamble (snapshot at last rewrite): 40 seconds
2. Replay incremental AOF since rewrite: 1โ30 seconds (typically)
Total: < 1 minute for 10 GB
versus pure AOF:
Replay entire command history: 10โ20 minutes for same dataset
15.2.4 Case 4: Financial / Orders (Near-Zero Loss)
appendonly yes
appendfsync always # fsync on every command
aof-use-rdb-preamble yes # still useful for fast restart
save "" # optional: disable RDB (AOF is primary safety net)
# Combine with synchronous replication:
# In redis.conf on the client side, issue WAIT after critical writes:
# WAIT 1 100 # block until at least 1 replica acknowledges, timeout 100ms
# Appropriate for:
# - Payment transaction logs
# - Financial ledger balances
# - Auction / flash-sale inventory (strict correctness)
Throughput with appendfsync always:
Hardware: NVMe SSD (Samsung 990 Pro)
fsync latency: ~60โ80 ยตs
Max TPS: 1000ms / 75ยตs = ~13,000 TPS
Hardware: SATA SSD
fsync latency: ~200โ500 ยตs
Max TPS: ~2,000โ5,000 TPS
Hardware: Spinning HDD
fsync latency: ~5โ10 ms
Max TPS: ~100โ200 TPS
Adding WAIT 1 100 halves effective write throughput further, but guarantees data survives a master crash (the replica has it).
15.3 Production Backup Architecture
15.3.1 Layered Backup Strategy
Layer 1 โ Live AOF (continuous):
Maximum exposure window: 1 second (with everysec)
Storage: local disk, same server
RPO: ~1 second
Layer 2 โ Hourly RDB snapshots:
Triggered: cron at :00 past each hour
Retained: 7 days local
RPO: up to 1 hour (if AOF also corrupted)
Layer 3 โ Remote object storage:
Uploaded: immediately after each hourly RDB
Retained: 30 days (S3/GCS/OSS)
RPO: same as Layer 2, but geographically redundant
Layer 4 โ Cross-datacenter replica:
Replication lag: typically < 100 ms (intra-region)
RPO: milliseconds
RTO: seconds (manual or Sentinel-automated failover)
15.3.2 Production Backup Script
#!/usr/bin/env bash
# /etc/cron.hourly/redis-backup
# Runs as root or redis user with S3 IAM role attached
set -euo pipefail
REDIS_CLI="redis-cli"
REDIS_DATA_DIR="/var/lib/redis"
BACKUP_DIR="/backup/redis"
S3_BUCKET="s3://acme-redis-backups"
HOST=$(hostname -s)
RETAIN_LOCAL_DAYS=7
RETAIN_S3_DAYS=30
ALERT_ENDPOINT="https://hooks.slack.com/services/..."
log() { echo "[$(date -u '+%Y-%m-%dT%H:%M:%SZ')] [INFO] $*" | tee -a /var/log/redis-backup.log; }
warn() { echo "[$(date -u '+%Y-%m-%dT%H:%M:%SZ')] [WARN] $*" | tee -a /var/log/redis-backup.log; }
die() { echo "[$(date -u '+%Y-%m-%dT%H:%M:%SZ')] [ERROR] $*" | tee -a /var/log/redis-backup.log
curl -s -X POST "$ALERT_ENDPOINT" -d "{\"text\": \"Redis backup FAILED on $HOST: $*\"}" || true
exit 1; }
TIMESTAMP=$(date -u '+%Y%m%d_%H%M%S')
DEST_FILE="$BACKUP_DIR/dump_${TIMESTAMP}.rdb"
# Step 1: Trigger background save
log "Triggering BGSAVE on $HOST"
$REDIS_CLI BGSAVE || die "BGSAVE command failed"
# Step 2: Poll until complete (max 10 minutes)
log "Waiting for BGSAVE to complete..."
for i in $(seq 1 120); do
IN_PROG=$($REDIS_CLI INFO persistence | grep rdb_bgsave_in_progress | awk -F: '{print $2}' | tr -d $'\r')
STATUS=$($REDIS_CLI INFO persistence | grep rdb_last_bgsave_status | awk -F: '{print $2}' | tr -d $'\r')
if [ "$IN_PROG" = "0" ]; then
[ "$STATUS" = "ok" ] || die "BGSAVE failed with status: $STATUS"
log "BGSAVE completed in $((i * 5)) seconds."
break
fi
[ $i -eq 120 ] && die "BGSAVE did not complete within 10 minutes"
sleep 5
done
# Step 3: Copy and verify
mkdir -p "$BACKUP_DIR"
cp "$REDIS_DATA_DIR/dump.rdb" "$DEST_FILE"
SIZE=$(stat -c%s "$DEST_FILE")
log "Copied to $DEST_FILE (${SIZE} bytes)"
redis-check-rdb "$DEST_FILE" > /dev/null 2>&1 || die "RDB integrity check failed for $DEST_FILE"
log "RDB integrity check passed."
# Step 4: Upload to S3 with server-side encryption
aws s3 cp "$DEST_FILE" \
"${S3_BUCKET}/${HOST}/$(basename $DEST_FILE)" \
--storage-class STANDARD_IA \
--server-side-encryption AES256 \
--only-show-errors
log "Uploaded to ${S3_BUCKET}/${HOST}/$(basename $DEST_FILE)"
# Step 5: Prune local backups
find "$BACKUP_DIR" -name "dump_*.rdb" -mtime "+${RETAIN_LOCAL_DAYS}" -print -delete \
| while read f; do log "Deleted local: $f"; done
# Step 6: Prune S3 backups
aws s3 ls "${S3_BUCKET}/${HOST}/" --recursive | awk '{print $4}' | while read key; do
FILE_DATE=$(basename "$key" | grep -oP '\d{8}' | head -1 || true)
[ -z "$FILE_DATE" ] && continue
AGE=$(( ( $(date +%s) - $(date -d "${FILE_DATE}" +%s 2>/dev/null || echo 0) ) / 86400 ))
if [ "$AGE" -gt "$RETAIN_S3_DAYS" ]; then
aws s3 rm "s3://$(echo $S3_BUCKET | sed 's|s3://||')/$key"
log "Deleted S3: $key (${AGE} days old)"
fi
done
log "Backup cycle completed successfully."
15.3.3 Weekly Backup Validation
#!/usr/bin/env bash
# Run every Monday at 02:00 via cron
LATEST_REMOTE=$(aws s3 ls s3://acme-redis-backups/$(hostname -s)/ | sort | tail -1 | awk '{print $4}')
TMPFILE=$(mktemp /tmp/redis-verify-XXXXXX.rdb)
aws s3 cp "s3://acme-redis-backups/$(hostname -s)/${LATEST_REMOTE}" "$TMPFILE"
if redis-check-rdb "$TMPFILE" > /dev/null 2>&1; then
echo "PASS: Remote backup ${LATEST_REMOTE} is valid"
else
echo "FAIL: Remote backup ${LATEST_REMOTE} is corrupted!"
curl -X POST "$ALERT_ENDPOINT" -d "{\"text\": \"Redis remote backup corrupted: ${LATEST_REMOTE}\"}"
fi
rm -f "$TMPFILE"
15.4 Disaster Recovery Playbooks
15.4.1 Playbook 1: Process Crash, Data on Disk
Trigger: redis-server process disappears; clients report connection refused.
Detection:
redis-cli PING # Connection refused
systemctl status redis # Active: failed
Recovery steps:
# 1. Verify data files exist and are intact
ls -la /var/lib/redis/
redis-check-rdb /var/lib/redis/dump.rdb
redis-check-aof /var/lib/redis/appendonly.aof # if AOF enabled
# 2. If AOF is truncated (crash during write), fix it
redis-check-aof --fix /var/lib/redis/appendonly.aof
# 3. Restart โ Redis automatically loads RDB then replays AOF tail
systemctl start redis
systemctl status redis
# 4. Validate data integrity
redis-cli PING # PONG
redis-cli DBSIZE # compare with expected count
redis-cli INFO keyspace # verify per-DB key counts
redis-cli DEBUG SLEEP 0 # quick responsiveness check
Expected recovery time (10 GB, mixed persistence):
RDB load: ~40 seconds
AOF tail replay (typically < 1 second of commands): ~1 second
Total: ~45 seconds from process start to first client response
15.4.2 Playbook 2: Disk Failure โ Restore from Remote Backup
Trigger: Storage array failure, NVMe device failure, or accidental rm.
# 1. Stop Redis if still running
systemctl stop redis || true
# 2. Mount replacement disk or provision new volume
# (OS-level operation, varies by environment)
# 3. List available remote backups
aws s3 ls s3://acme-redis-backups/$(hostname -s)/ | sort | tail -10
# 4. Choose recovery point โ latest backup before incident
TARGET="dump_20240115_130000.rdb"
aws s3 cp "s3://acme-redis-backups/$(hostname -s)/${TARGET}" \
/var/lib/redis/dump.rdb
# 5. Set correct ownership and permissions
chown redis:redis /var/lib/redis/dump.rdb
chmod 640 /var/lib/redis/dump.rdb
# 6. Remove stale AOF (it references data that no longer exists)
rm -f /var/lib/redis/appendonly.aof
rm -rf /var/lib/redis/appendonlydir/
# 7. Verify and start
redis-check-rdb /var/lib/redis/dump.rdb
systemctl start redis
redis-cli INFO keyspace
RTO calculation example:
Scenario: 10 GB RDB in S3 Standard-IA
Download @ 1 Gbps (125 MB/s): ~80 seconds
File copy + permissions: 2 seconds
Redis startup + RDB load: ~45 seconds
Total RTO: ~2.5 minutes
Optimization strategies:
- Keep a warm standby Redis with replica enabled (RTO: ~5 seconds)
- Use S3 Transfer Acceleration for cross-region recovery
- Pre-stage backups on local disk of standby host
15.4.3 Playbook 3: Accidental FLUSHALL
Trigger: redis-cli FLUSHALL executed (operator error, runaway script).
Critical first step: Stop Redis immediately with SHUTDOWN NOSAVE to prevent an empty RDB from overwriting the backup.
# IMMEDIATE ACTION โ do this within seconds of the accident
redis-cli SHUTDOWN NOSAVE
# NOSAVE ensures no empty RDB is written to disk
Recovery from AOF (if enabled):
# 1. Backup the current AOF before modifying it
cp /var/lib/redis/appendonly.aof /var/lib/redis/appendonly.aof.bak.$(date +%s)
# 2. Remove the FLUSHALL command from the AOF file
python3 << 'PYEOF'
import re, sys
aof_path = '/var/lib/redis/appendonly.aof'
with open(aof_path, 'rb') as f:
data = f.read()
original_size = len(data)
# Match RESP arrays containing FLUSHALL (case-insensitive)
# *1\r\n$8\r\nFLUSHALL\r\n
# *2\r\n$8\r\nFLUSHALL\r\n$5\r\nASYNC\r\n (Redis 4.0+ FLUSHALL ASYNC)
# *2\r\n$8\r\nFLUSHALL\r\n$4\r\nSYNC\r\n
pattern = re.compile(
rb'\*\d+\r\n(?:\$\d+\r\n\S*\r\n)*?\$8\r\n[Ff][Ll][Uu][Ss][Hh][Aa][Ll][Ll]\r\n(?:\$\d+\r\n\S*\r\n)?',
re.MULTILINE
)
matches = list(pattern.finditer(data))
if not matches:
print("ERROR: No FLUSHALL found in AOF โ wrong file?")
sys.exit(1)
print(f"Found {len(matches)} FLUSHALL command(s):")
for m in matches:
print(f" Offset {m.start()}: {m.group()[:60]!r}")
# Remove all occurrences
cleaned = pattern.sub(b'', data)
print(f"Cleaned: {original_size} โ {len(cleaned)} bytes ({original_size - len(cleaned)} bytes removed)")
with open(aof_path, 'wb') as f:
f.write(cleaned)
print("Done. AOF written.")
PYEOF
# 3. Verify the modified AOF
redis-check-aof /var/lib/redis/appendonly.aof
# Should report: "AOF analyzed: size=N, ok_up_to=N, ok_up_to_line=N"
# 4. Remove the empty RDB if it exists
# (dump.rdb may have been written before SHUTDOWN NOSAVE if auto-save ran)
DBSIZE_IN_RDB=$(redis-server --rdbchecksum yes --port 0 /dev/null 2>/dev/null || echo "unknown")
# Safer: just delete it โ Redis will use AOF as primary source
rm -f /var/lib/redis/dump.rdb
# 5. Restart and validate
systemctl start redis
redis-cli DBSIZE # should be > 0 if recovery succeeded
redis-cli RANDOMKEY
Recovery from RDB backup (if AOF not enabled or AOF is corrupted):
# Find most recent pre-incident RDB
ls -lt /backup/redis/dump_*.rdb | head -5
# Choose the latest one that predates the FLUSHALL
cp /backup/redis/dump_20240115_130000.rdb /var/lib/redis/dump.rdb
chown redis:redis /var/lib/redis/dump.rdb
systemctl start redis
redis-cli DBSIZE
15.4.4 Playbook 4: Point-in-Time Recovery
Scenario: Application bug deployed at 14:00 wrote corrupted data for 30 minutes. Roll back to 13:55.
# Step 1: Identify candidate backup
ls /backup/redis/ | grep "20240115_13"
# dump_20240115_120000.rdb โ 12:00 snapshot
# dump_20240115_130000.rdb โ 13:00 snapshot โ best starting point
# Step 2: Spin up a recovery instance (don't touch production yet)
mkdir -p /tmp/redis-recovery
cp /backup/redis/dump_20240115_130000.rdb /tmp/redis-recovery/dump.rdb
redis-server --port 6399 \
--dir /tmp/redis-recovery \
--dbfilename dump.rdb \
--save "" \
--appendonly no \
--daemonize yes \
--logfile /tmp/redis-recovery/redis-recovery.log
redis-cli -p 6399 DBSIZE # confirm data loaded
# Step 3: If aof-timestamp-enabled=yes (Redis 7.0+), replay up to 13:55
# redis-server --aof-timestamp 1704027300 ... (Unix timestamp for 13:55)
# This replays only AOF entries with timestamp <= 13:55
# Step 4 (without timestamps): manual AOF replay up to target time
# Search AOF for approximate position near 13:55 using known key patterns
# that should/shouldn't exist, then truncate at that offset
# Step 5: Validate recovered state
redis-cli -p 6399 RANDOMKEY
redis-cli -p 6399 TYPE <key>
redis-cli -p 6399 DEBUG OBJECT <key>
# Compare with expected state from application logs
# Step 6: Promote recovery instance to production
redis-cli SHUTDOWN NOSAVE # stop production
cp /tmp/redis-recovery/dump.rdb /var/lib/redis/dump.rdb
chown redis:redis /var/lib/redis/dump.rdb
systemctl start redis
redis-cli DBSIZE
# Step 7: Clean up recovery instance
redis-cli -p 6399 SHUTDOWN NOSAVE
rm -rf /tmp/redis-recovery
15.5 redis-check-rdb and redis-check-aof Reference
15.5.1 redis-check-rdb
# Basic check
redis-check-rdb dump.rdb
# Healthy: "\o/ RDB looks OK!"
# Corrupt: "CRITICAL: RDB CRC error" or "Wrong type ..."
# Verbose โ dump all keys found
redis-check-rdb dump.rdb 2>&1 | head -200
# Exit codes: 0 = OK, non-zero = corrupted
# Common errors:
Error message | Cause | Fix
"Wrong RDB checksum" | Last 8 bytes corrupt | Restore from backup
"FATAL: short read or OOM ..." | File truncated | Recover from earlier backup
"RDB version N is not supported" | Newer Redis wrote it | Upgrade Redis version
"DB load failed" | Mid-file corruption | Restore from backup
"Unexpected EOF reading..." | Crash during BGSAVE | Use earlier backup or accept partial loss
15.5.2 redis-check-aof
# Check integrity
redis-check-aof appendonly.aof
# Output includes: "AOF analyzed: size=N, ok_up_to=N, ok_up_to_line=N"
# If ok_up_to < size: file is truncated; lines after ok_up_to are incomplete
# Repair: truncate to last complete command
redis-check-aof --fix appendonly.aof
# "Successfully truncated AOF appendonly.aof to offset N"
# Data after offset N is permanently lost
# For Multi-Part AOF
redis-check-aof --fix appendonlydir/appendonly.aof.2.incr.aof
# Check mixed persistence AOF (has RDB preamble)
redis-check-aof appendonly.aof
# Tool auto-detects RDB header and validates both sections
15.6 Master-Replica Persistence Combinations
15.6.1 Recommended: Persistence on Replica, Not Master
# Master (redis-master.conf)
save "" # no RDB โ eliminates fork() latency spikes
appendonly yes # AOF for real-time safety
appendfsync everysec
aof-use-rdb-preamble yes
# Replica (redis-replica.conf)
save 3600 1 # hourly RDB snapshot
save 300 100
appendonly yes
appendfsync everysec
aof-use-rdb-preamble yes
replicaof 192.168.1.10 6379
Rationale: BGSAVE on the master calls fork(), which can cause latency spikes of 10โ200 ms for large datasets (time for the OS to set all pages read-only). Offloading snapshots to the replica keeps master latency predictable.
15.6.2 Delayed Replica as Accidental Deletion Guard
# Configure one replica with a replication delay (Redis 7.0+)
replicaof 192.168.1.10 6379
replica-lazy-flush yes
# External tool: Delphix or custom proxy to delay replay by 30 minutes
A 30-minute delayed replica ensures that even after FLUSHALL, you have a 30-minute window to stop the replica before it syncs the deletion.
15.6.3 The Deadly Trap: No Persistence + Auto-Restart
Sequence of events:
1. Master has no persistence (save "", appendonly no)
2. Master crashes
3. systemd restarts master (Restart=on-failure)
4. Master starts with empty dataset (no files to load)
5. Replica detects master restarted (new replication ID)
6. Replica initiates full resync with master
7. Replica replaces ALL its data with master's empty dataset
8. BOTH master and replica now have zero data
Prevention:
Option A: Never disable persistence on master (accept slight overhead)
Option B: Set Restart=no in systemd โ require manual intervention after master crash
Option C: Sentinel handles failover โ promotes replica to master BEFORE restarting old master
# Safe systemd unit for Redis master with no persistence
[Service]
ExecStart=/usr/bin/redis-server /etc/redis/redis.conf
Restart=no # DO NOT auto-restart โ operator must intervene
# If Sentinel is controlling this instance, let Sentinel manage restarts
15.7 Persistence Monitoring and Alerting
import redis
import time
import requests
def check_redis_persistence(host='localhost', port=6379,
alert_webhook=None):
r = redis.Redis(host=host, port=port, decode_responses=True)
info = r.info('persistence')
alerts = []
# --- RDB checks ---
last_save_age = time.time() - info['rdb_last_save_time']
if last_save_age > 3600: # no save in 1 hour
alerts.append(('warn', f"No RDB save in {last_save_age/3600:.1f}h"))
if info['rdb_last_bgsave_status'] != 'ok':
alerts.append(('crit', f"BGSAVE failed: {info['rdb_last_bgsave_status']}"))
if info.get('rdb_current_bgsave_time_sec', -1) > 600:
t = info['rdb_current_bgsave_time_sec']
alerts.append(('warn', f"BGSAVE running for {t}s โ possible fork() issue"))
# --- AOF checks ---
if info.get('aof_enabled') == 1:
if info.get('aof_last_write_status') != 'ok':
alerts.append(('crit', "AOF write failed โ WRITES BEING REJECTED"))
cow = info.get('aof_last_cow_size', 0)
if cow > 1 * 1024**3: # > 1 GB
alerts.append(('warn', f"AOF rewrite COW = {cow/1024**2:.0f} MB โ check THP"))
pending = info.get('aof_pending_bio_fsync', 0)
if pending > 500:
alerts.append(('warn', f"AOF pending fsync queue: {pending}"))
delayed = info.get('aof_delayed_fsync', 0)
if delayed > 0:
# This counter is cumulative โ alert on rate increase
alerts.append(('info', f"Cumulative delayed fsyncs: {delayed}"))
# Send alerts
for level, msg in alerts:
print(f"[{level.upper()}] {host}:{port} โ {msg}")
if alert_webhook and level in ('warn', 'crit'):
requests.post(alert_webhook, json={'text': f"Redis {level}: {msg}"})
return len([a for a in alerts if a[0] == 'crit']) == 0
# Run every minute from monitoring infrastructure
if not check_redis_persistence(alert_webhook='https://hooks.slack.com/...'):
print("CRITICAL issues detected โ paging on-call")
Alerting thresholds summary:
| Metric | Warning | Critical | Action |
|---|---|---|---|
| Last RDB save age | > 1 hour | > 3 hours | Check save config, disk space |
| BGSAVE status | err |
err |
Check disk space, ulimits |
| BGSAVE duration | > 5 min | > 15 min | Check COW, THP, memory |
| AOF write status | N/A | err |
Disk full; CRITICAL |
| AOF rewrite duration | > 5 min | > 20 min | Check disk I/O |
| AOF COW size | > 500 MB | > 2 GB | Disable THP immediately |
| AOF pending fsyncs | > 100 | > 1000 | Disk overloaded |
15.8 Persistence Decision Tree
START: What is your data loss tolerance?
โ
โโ FULL (restart = empty is fine)
โ โโโถ NO PERSISTENCE
โ save ""
โ appendonly no
โ [Maximum performance, zero disk overhead]
โ
โโ MINUTES (5โ60 minutes acceptable)
โ โโโถ RDB ONLY
โ save 3600 1
โ save 300 100
โ save 60 10000
โ appendonly no
โ [Fast restarts, simple operation]
โ
โโ SECONDS (~1 second acceptable) โ MOST PRODUCTION WORKLOADS
โ โโโถ MIXED PERSISTENCE
โ appendonly yes
โ appendfsync everysec
โ aof-use-rdb-preamble yes
โ save 3600 1
โ [Recommended default: fast restart + 1-second RPO]
โ
โโ NEAR-ZERO (1 command or 0 acceptable)
โโโถ AOF ALWAYS + REPLICATION
appendfsync always
+ WAIT 1 100 on critical writes
+ Sentinel/Redis Cluster for automatic failover
[Highest durability; ~10Kโ15K TPS ceiling on NVMe]