Redis in Kubernetes: StatefulSet and Persistent Storage
Chapter 42: Redis in Kubernetes: StatefulSet and Persistent Storage
Deploying Redis on Kubernetes is the dominant production pattern today, yet a fundamental tension exists between Redis's stateful nature and Kubernetes's stateless abstractions. This chapter explains why StatefulSet is non-negotiable, compares mainstream deployment options, covers persistent storage configuration, and details the operational nuances and failure prevention unique to the Kubernetes environment.
42.1 Why StatefulSet Is Mandatory
42.1.1 Fatal Flaws of Deployment for Redis
Kubernetes Deployment is designed for stateless applications. Its Pods have the following characteristics:
- Random Pod names:
redis-7d4f9b-abc12; a new name on every restart - Unstable network identity: Pod IPs change with every rebuild
- Non-deterministic PVC binding: Deployment selects PVCs by labels; when multiple Pods compete for one PVC, behavior is undefined
Dependencies of Redis master-replica and Cluster:
- Redis Cluster stores the node ID in
nodes.conf, bound to the persistent data on disk - Sentinel mode identifies master and replica relationships by hostname; unstable Pod names confuse Sentinel
- Replication uses
SLAVEOF <master-host>; the master host must be a stable DNS name
When Pod names change, node IDs no longer match their data. The cluster enters an inconsistent state requiring manual CLUSTER RESET at minimum, and data loss at worst.
42.1.2 Core Advantages of StatefulSet
StatefulSet is designed for stateful applications:
| Feature | Deployment | StatefulSet |
|---|---|---|
| Pod naming | Random (deploy-pod-xyz) | Ordered (redis-0, redis-1) |
| Network identity | Pod IP (unstable) | Stable DNS name |
| Storage binding | Competes for shared PVC (dangerous) | Independent PVC per Pod (volumeClaimTemplate) |
| Start/stop order | Parallel (random) | Ordered (0โ1โ2 start; 2โ1โ0 stop) |
| Rolling update | Parallel replacement | One-at-a-time replacement |
Stable DNS names (with Headless Service):
redis-0.redis-headless.namespace.svc.cluster.local
redis-1.redis-headless.namespace.svc.cluster.local
redis-2.redis-headless.namespace.svc.cluster.local
After a Pod rebuild, the DNS name is unchanged. Other Pods reconnect using the same fixed address.
42.2 Redis Operator vs Helm Chart Comparison
42.2.1 Mainstream Option Comparison
| Option | Type | Supported Modes | Maturity | Best For |
|---|---|---|---|---|
| spotahome/redis-operator | Operator | Sentinel | Medium | Simple HA requirements |
| ot-container-kit/redis-operator | Operator | Standalone / Cluster / Sentinel | High | CRD-based management |
| Redis Stack Operator (official) | Operator | Standalone + modules | Lower (newer) | RediSearch / RedisJSON |
| Bitnami Redis Helm Chart | Helm Chart | Standalone / Replication / Cluster | Very high (most popular) | Most production scenarios |
| Bitnami Redis Cluster Helm Chart | Helm Chart | Cluster | Very high | Large-scale Cluster |
Operator vs Helm Chart โ the essential difference:
- Operator: Extends the Kubernetes API with CRDs and encodes domain knowledge (automatic failover, scaling); actively manages the deployment lifecycle.
- Helm Chart: Renders and applies Kubernetes manifests; does not actively manage after deployment; operational tasks (failover, scaling) require manual intervention or scripts.
42.2.2 Full Bitnami Helm Chart Deployment Example
# values.yaml โ replication + Sentinel mode
architecture: replication # standalone | replication | cluster
auth:
enabled: true
password: "your-strong-password-here"
# Production recommendation: read from a K8s Secret
existingSecret: redis-auth-secret
existingSecretPasswordKey: redis-password
master:
count: 1
persistence:
enabled: true
storageClass: "redis-ssd"
accessModes:
- ReadWriteOnce
size: 20Gi
annotations:
"helm.sh/resource-policy": keep # prevent accidental deletion during helm uninstall
resources:
requests:
cpu: "500m"
memory: "2Gi"
limits:
cpu: "2000m"
memory: "4Gi"
configuration: |
maxmemory 3gb
maxmemory-policy allkeys-lru
activedefrag yes
lazyfree-lazy-eviction yes
lazyfree-lazy-expire yes
replica:
replicaCount: 2
persistence:
enabled: true
storageClass: "redis-ssd"
size: 20Gi
resources:
requests:
cpu: "250m"
memory: "2Gi"
limits:
cpu: "1000m"
memory: "4Gi"
sentinel:
enabled: true
masterSet: "mymaster"
quorum: 2
downAfterMilliseconds: 5000
failoverTimeout: 60000
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "200m"
memory: "256Mi"
metrics:
enabled: true # deploy redis-exporter sidecar
serviceMonitor:
enabled: true # create Prometheus ServiceMonitor
podSecurityContext:
fsGroup: 1001
runAsUser: 1001
runAsNonRoot: true
# Anti-affinity: spread Pods across different physical nodes
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app.kubernetes.io/name: redis
topologyKey: kubernetes.io/hostname
# Install
helm repo add bitnami https://charts.bitnami.com/bitnami
helm install redis bitnami/redis \
-f values.yaml \
-n redis \
--create-namespace \
--version 18.x.x # pin the version to prevent automatic upgrades
# Check status
kubectl get pods -n redis -w
kubectl get pvc -n redis
# Connectivity test
kubectl exec -it redis-master-0 -n redis -- \
redis-cli -a your-strong-password-here ping
42.3 Persistent Storage: StorageClass Design
42.3.1 StorageClass Selection Principles
| Storage Type | IOPS | Latency | Use Case |
|---|---|---|---|
| AWS gp2 EBS | 3000 baseline | Low | Development / testing |
| AWS gp3 EBS | 3000โ16000 (configurable) | Low | Production recommended |
| AWS io2 EBS | Up to 64000 | Very low | High-IOPS production |
| GCP pd-ssd | Up to 30 IOPS/GB | Low | GCP production |
| Local NVMe (local-path) | Extremely high | Extremely low | Max performance, no HA |
Key selection rules:
reclaimPolicy: Retain: Preserve the underlying storage volume when the PVC is deleted, preventing accidental data loss.volumeBindingMode: WaitForFirstConsumer: Delay volume binding until the Pod is scheduled, ensuring the PV is created in the same Availability Zone as the Pod.allowVolumeExpansion: true: Always enable volume expansion for production; storage needs grow over time.
42.3.2 AWS gp3 StorageClass Configuration
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: redis-ssd
annotations:
storageclass.kubernetes.io/is-default-class: "false"
provisioner: ebs.csi.aws.com
parameters:
type: gp3
iops: "6000" # gp3 allows independent IOPS configuration (3000โ16000)
throughput: "250" # MB/s (125โ1000)
encrypted: "true" # encryption at rest
kmsKeyId: "arn:aws:kms:..." # customer-managed key
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
42.3.3 VolumeClaimTemplate (StatefulSet-Only Feature)
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis
spec:
serviceName: redis-headless
replicas: 3
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
containers:
- name: redis
image: redis:7.2
command: ["redis-server", "/etc/redis/redis.conf"]
ports:
- containerPort: 6379
volumeMounts:
- name: redis-data
mountPath: /data
- name: redis-config
mountPath: /etc/redis
volumes:
- name: redis-config
configMap:
name: redis-config
# volumeClaimTemplates: automatically creates an independent PVC for each Pod
volumeClaimTemplates:
- metadata:
name: redis-data
annotations:
"helm.sh/resource-policy": keep
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: redis-ssd
resources:
requests:
storage: 20Gi
# Generated PVC names: redis-data-redis-0, redis-data-redis-1, redis-data-redis-2
42.4 Service Design and Client Routing
42.4.1 Three Types of Services and Their Roles
# 1. Headless Service: foundation for StatefulSet stable DNS
apiVersion: v1
kind: Service
metadata:
name: redis-headless
spec:
clusterIP: None # no ClusterIP = headless
publishNotReadyAddresses: true # publish DNS even when Pod is not Ready
selector:
app: redis
ports:
- name: redis
port: 6379
targetPort: 6379
---
# 2. Master Service: write entry point (or unified entry for non-split clients)
apiVersion: v1
kind: Service
metadata:
name: redis-master
spec:
selector:
app: redis
role: master
ports:
- port: 6379
targetPort: 6379
---
# 3. Replica Service: read entry point (for read-write split clients)
apiVersion: v1
kind: Service
metadata:
name: redis-replica
spec:
selector:
app: redis
role: replica
ports:
- port: 6379
targetPort: 6379
Sentinel mode caveat: After a failover, the former master Pod becomes a replica, but the Pod's labels are not automatically updated. Use a Sentinel-aware client that connects to Sentinel directly to dynamically discover the current master address, rather than using a static K8s Service label.
42.4.2 Service Configuration for Redis Cluster
Redis Cluster requires clients to connect directly to individual nodes (to follow MOVED redirections):
# Cluster mode: use Headless Service so each node is individually addressable
# Client connects to any node for initial handshake and topology discovery
apiVersion: v1
kind: Service
metadata:
name: redis-cluster-entry
spec:
selector:
app: redis-cluster
ports:
- port: 6379
targetPort: 6379
# Do not use LoadBalancer: MOVED redirections require direct node connectivity
Cluster nodes communicate on the Bus port (6379 + 10000 = 16379). Ensure port 16379 is open between all cluster nodes:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: redis-cluster-internal
spec:
podSelector:
matchLabels:
app: redis-cluster
ingress:
- from:
- podSelector:
matchLabels:
app: redis-cluster
ports:
- port: 6379
- port: 16379
42.5 Resource Limits and OOM Prevention
42.5.1 The Three-Layer Memory Model
maxmemory (Redis configuration)
โ ร 1.2โ1.3
resources.requests.memory (K8s scheduling baseline)
โ ร 1.3โ1.5
resources.limits.memory (K8s OOM Kill threshold)
Practical example:
maxmemory 8gb: Redis proactively limits itself to 8 GBrequests.memory: 10Gi: Reserves headroom for COW (copy-on-write) during BGSAVE / BGREWRITEAOF; parent and child processes share pages, and dirty pages create physical memory copieslimits.memory: 12Gi: Buffer for memory fragmentation (mem_fragmentation_ratio can reach 1.3โ1.5)
Never set limits.memory < maxmemory. Redis performs eviction or rejects writes when maxmemory is reached, but it does not release memory below the limits threshold. If limits < maxmemory, the kernel may OOM-kill the Redis process during normal operation.
42.5.2 Linux Kernel Parameter Tuning (via initContainers)
initContainers:
# 1. Disable Transparent Huge Pages (THP)
- name: disable-thp
image: busybox:1.35
command:
- /bin/sh
- -c
- |
if [ -f /sys/kernel/mm/transparent_hugepage/enabled ]; then
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag
fi
echo "THP disabled"
securityContext:
privileged: true
volumeMounts:
- name: sys
mountPath: /sys
# 2. Set vm.overcommit_memory (prevent fork failures)
- name: set-overcommit
image: busybox:1.35
command:
- /bin/sh
- -c
- |
sysctl -w vm.overcommit_memory=1
sysctl -w net.core.somaxconn=65535
securityContext:
privileged: true
volumes:
- name: sys
hostPath:
path: /sys
Why vm.overcommit_memory=1 is required:
Redis BGSAVE creates a child process via fork(). Linux's default overcommit mode (0) checks whether sufficient physical memory exists to back the child's full virtual address space (equal in size to the parent). If Redis uses 8 GB but only 4 GB is free, fork() fails, and Redis logs:
MISCONF Redis is configured to save RDB snapshots,
but it's currently unable to persist on disk
Setting overcommit_memory=1 allows the kernel to grant virtual memory optimistically; physical pages are only allocated when actually written (COW). Fork succeeds, and only the dirty pages consume extra physical memory.
42.5.3 Pod QoS Class and OOM Kill Priority
# Best practice: set Guaranteed QoS (requests == limits) for Redis Pods
# K8s OOM kill priority: BestEffort > Burstable > Guaranteed (Guaranteed is last)
resources:
requests:
memory: "10Gi"
cpu: "2000m"
limits:
memory: "10Gi" # requests == limits โ Guaranteed QoS
cpu: "2000m"
With Guaranteed QoS, the Linux OOM killer will target Redis only after all BestEffort and Burstable Pods on the node have been killed.
42.6 Backup and Recovery
42.6.1 Scheduled Backup to S3 via CronJob
apiVersion: batch/v1
kind: CronJob
metadata:
name: redis-backup
namespace: redis
spec:
schedule: "0 2 * * *" # 02:00 UTC daily
concurrencyPolicy: Forbid # prevent concurrent runs
successfulJobsHistoryLimit: 7
failedJobsHistoryLimit: 3
jobTemplate:
spec:
activeDeadlineSeconds: 3600
template:
spec:
restartPolicy: OnFailure
serviceAccountName: redis-backup-sa
containers:
- name: backup
image: amazon/aws-cli:2.x
env:
- name: REDIS_PASSWORD
valueFrom:
secretKeyRef:
name: redis-auth-secret
key: redis-password
- name: S3_BUCKET
value: "my-redis-backups"
command:
- /bin/sh
- -c
- |
set -e
DATE=$(date +%Y%m%d_%H%M%S)
PREV_LASTSAVE=$(redis-cli -h redis-master -a "$REDIS_PASSWORD" LASTSAVE)
redis-cli -h redis-master -a "$REDIS_PASSWORD" BGSAVE
echo "Waiting for BGSAVE to complete..."
while [ "$(redis-cli -h redis-master -a "$REDIS_PASSWORD" LASTSAVE)" = "$PREV_LASTSAVE" ]; do
sleep 2
done
echo "BGSAVE complete"
kubectl cp redis/redis-master-0:/data/dump.rdb /tmp/dump_${DATE}.rdb
aws s3 cp /tmp/dump_${DATE}.rdb \
s3://${S3_BUCKET}/redis/$(date +%Y/%m/%d)/dump_${DATE}.rdb \
--storage-class STANDARD_IA \
--sse aws:kms
echo "Backup complete"
42.6.2 Restoring from S3
# 1. Download the backup
aws s3 cp s3://my-redis-backups/redis/2024/01/15/dump_20240115_020005.rdb /tmp/dump.rdb
# 2. Stop Redis (scale StatefulSet to 0)
kubectl scale statefulset redis-master -n redis --replicas=0
# 3. Copy the RDB file into the PVC
kubectl run restore-helper --image=busybox --restart=Never \
--overrides='{"spec":{"volumes":[{"name":"data","persistentVolumeClaim":{"claimName":"redis-data-redis-master-0"}}],"containers":[{"name":"restore-helper","image":"busybox","volumeMounts":[{"name":"data","mountPath":"/data"}],"command":["sleep","3600"]}]}}' \
-n redis
kubectl cp /tmp/dump.rdb redis/restore-helper:/data/dump.rdb
kubectl delete pod restore-helper -n redis
# 4. Restart Redis
kubectl scale statefulset redis-master -n redis --replicas=1
# 5. Verify
kubectl exec -it redis-master-0 -n redis -- redis-cli -a "$REDIS_PASSWORD" DBSIZE
42.7 Monitoring and Alerting
42.7.1 Key Prometheus Metrics from redis-exporter
# Memory
redis_memory_used_bytes # actual memory in use
redis_memory_max_bytes # configured maxmemory
redis_mem_fragmentation_ratio # fragmentation ratio (alert > 1.5)
# Connections
redis_connected_clients # current client count
redis_connected_slaves # replica connections
# Persistence
redis_rdb_last_bgsave_status # last BGSAVE result (ok / err)
redis_aof_last_rewrite_duration_sec # last AOF rewrite duration
# Replication
redis_replication_offset # master replication offset
redis_slave_repl_offset # replica offset (alert if gap is large)
redis_replication_backlog_histlen # current backlog utilization
# Performance
redis_commands_duration_seconds_total # total command execution time
redis_keyspace_hits_total # cache hits
redis_keyspace_misses_total # cache misses
Sample alerting rules (PrometheusRule):
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: redis-alerts
spec:
groups:
- name: redis
rules:
- alert: RedisMemoryHigh
expr: redis_memory_used_bytes / redis_memory_max_bytes > 0.85
for: 5m
labels:
severity: warning
annotations:
summary: "Redis memory usage exceeds 85%"
- alert: RedisReplicationLag
expr: redis_replication_offset - redis_slave_repl_offset > 104857600
for: 2m
labels:
severity: critical
annotations:
summary: "Redis replica lag exceeds 100 MB"
- alert: RedisFragmentationHigh
expr: redis_mem_fragmentation_ratio > 1.5
for: 10m
labels:
severity: warning
annotations:
summary: "Redis memory fragmentation ratio exceeds 1.5"
Chapter Summary
Core principles for running Redis on Kubernetes:
-
StatefulSet is mandatory: Stable Pod names and network identities are prerequisites for Redis master-replica and Cluster operation. Deployment must never be used for stateful Redis.
-
Storage selection: Use
reclaimPolicy: Retainto prevent accidental data deletion; cloud SSD (gp3 EBS or equivalent) provides adequate IOPS;volumeClaimTemplateensures each Pod has its own reusable PVC. -
Three-layer memory planning: maxmemory โ requests (COW headroom) โ limits (fragmentation headroom). Each layer must have a meaningful buffer above the previous one.
-
Kernel parameters:
vm.overcommit_memory=1and THP disabling must be configured via initContainers at Pod startup; these are prerequisites for stable Redis operation. -
Backup validation: Periodic backups alone do not guarantee safety. Run a full restore drill monthly to verify that backup files are valid and the recovery procedure works end to end.