Chapter 42

Redis in Kubernetes: StatefulSet and Persistent Storage

Chapter 42: Redis in Kubernetes: StatefulSet and Persistent Storage

Deploying Redis on Kubernetes is the dominant production pattern today, yet a fundamental tension exists between Redis's stateful nature and Kubernetes's stateless abstractions. This chapter explains why StatefulSet is non-negotiable, compares mainstream deployment options, covers persistent storage configuration, and details the operational nuances and failure prevention unique to the Kubernetes environment.


42.1 Why StatefulSet Is Mandatory

42.1.1 Fatal Flaws of Deployment for Redis

Kubernetes Deployment is designed for stateless applications. Its Pods have the following characteristics:

Dependencies of Redis master-replica and Cluster:

When Pod names change, node IDs no longer match their data. The cluster enters an inconsistent state requiring manual CLUSTER RESET at minimum, and data loss at worst.

42.1.2 Core Advantages of StatefulSet

StatefulSet is designed for stateful applications:

Feature Deployment StatefulSet
Pod naming Random (deploy-pod-xyz) Ordered (redis-0, redis-1)
Network identity Pod IP (unstable) Stable DNS name
Storage binding Competes for shared PVC (dangerous) Independent PVC per Pod (volumeClaimTemplate)
Start/stop order Parallel (random) Ordered (0โ†’1โ†’2 start; 2โ†’1โ†’0 stop)
Rolling update Parallel replacement One-at-a-time replacement

Stable DNS names (with Headless Service):

redis-0.redis-headless.namespace.svc.cluster.local
redis-1.redis-headless.namespace.svc.cluster.local
redis-2.redis-headless.namespace.svc.cluster.local

After a Pod rebuild, the DNS name is unchanged. Other Pods reconnect using the same fixed address.


42.2 Redis Operator vs Helm Chart Comparison

42.2.1 Mainstream Option Comparison

Option Type Supported Modes Maturity Best For
spotahome/redis-operator Operator Sentinel Medium Simple HA requirements
ot-container-kit/redis-operator Operator Standalone / Cluster / Sentinel High CRD-based management
Redis Stack Operator (official) Operator Standalone + modules Lower (newer) RediSearch / RedisJSON
Bitnami Redis Helm Chart Helm Chart Standalone / Replication / Cluster Very high (most popular) Most production scenarios
Bitnami Redis Cluster Helm Chart Helm Chart Cluster Very high Large-scale Cluster

Operator vs Helm Chart โ€” the essential difference:

42.2.2 Full Bitnami Helm Chart Deployment Example

# values.yaml โ€” replication + Sentinel mode
architecture: replication       # standalone | replication | cluster

auth:
  enabled: true
  password: "your-strong-password-here"
  # Production recommendation: read from a K8s Secret
  existingSecret: redis-auth-secret
  existingSecretPasswordKey: redis-password

master:
  count: 1
  persistence:
    enabled: true
    storageClass: "redis-ssd"
    accessModes:
    - ReadWriteOnce
    size: 20Gi
    annotations:
      "helm.sh/resource-policy": keep   # prevent accidental deletion during helm uninstall
  resources:
    requests:
      cpu: "500m"
      memory: "2Gi"
    limits:
      cpu: "2000m"
      memory: "4Gi"
  configuration: |
    maxmemory 3gb
    maxmemory-policy allkeys-lru
    activedefrag yes
    lazyfree-lazy-eviction yes
    lazyfree-lazy-expire yes

replica:
  replicaCount: 2
  persistence:
    enabled: true
    storageClass: "redis-ssd"
    size: 20Gi
  resources:
    requests:
      cpu: "250m"
      memory: "2Gi"
    limits:
      cpu: "1000m"
      memory: "4Gi"

sentinel:
  enabled: true
  masterSet: "mymaster"
  quorum: 2
  downAfterMilliseconds: 5000
  failoverTimeout: 60000
  resources:
    requests:
      cpu: "100m"
      memory: "128Mi"
    limits:
      cpu: "200m"
      memory: "256Mi"

metrics:
  enabled: true                # deploy redis-exporter sidecar
  serviceMonitor:
    enabled: true              # create Prometheus ServiceMonitor

podSecurityContext:
  fsGroup: 1001
  runAsUser: 1001
  runAsNonRoot: true

# Anti-affinity: spread Pods across different physical nodes
affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchLabels:
          app.kubernetes.io/name: redis
      topologyKey: kubernetes.io/hostname
# Install
helm repo add bitnami https://charts.bitnami.com/bitnami
helm install redis bitnami/redis \
  -f values.yaml \
  -n redis \
  --create-namespace \
  --version 18.x.x       # pin the version to prevent automatic upgrades

# Check status
kubectl get pods -n redis -w
kubectl get pvc -n redis

# Connectivity test
kubectl exec -it redis-master-0 -n redis -- \
  redis-cli -a your-strong-password-here ping

42.3 Persistent Storage: StorageClass Design

42.3.1 StorageClass Selection Principles

Storage Type IOPS Latency Use Case
AWS gp2 EBS 3000 baseline Low Development / testing
AWS gp3 EBS 3000โ€“16000 (configurable) Low Production recommended
AWS io2 EBS Up to 64000 Very low High-IOPS production
GCP pd-ssd Up to 30 IOPS/GB Low GCP production
Local NVMe (local-path) Extremely high Extremely low Max performance, no HA

Key selection rules:

  1. reclaimPolicy: Retain: Preserve the underlying storage volume when the PVC is deleted, preventing accidental data loss.
  2. volumeBindingMode: WaitForFirstConsumer: Delay volume binding until the Pod is scheduled, ensuring the PV is created in the same Availability Zone as the Pod.
  3. allowVolumeExpansion: true: Always enable volume expansion for production; storage needs grow over time.

42.3.2 AWS gp3 StorageClass Configuration

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: redis-ssd
  annotations:
    storageclass.kubernetes.io/is-default-class: "false"
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  iops: "6000"               # gp3 allows independent IOPS configuration (3000โ€“16000)
  throughput: "250"          # MB/s (125โ€“1000)
  encrypted: "true"          # encryption at rest
  kmsKeyId: "arn:aws:kms:..."  # customer-managed key
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer

42.3.3 VolumeClaimTemplate (StatefulSet-Only Feature)

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: redis
spec:
  serviceName: redis-headless
  replicas: 3
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
      - name: redis
        image: redis:7.2
        command: ["redis-server", "/etc/redis/redis.conf"]
        ports:
        - containerPort: 6379
        volumeMounts:
        - name: redis-data
          mountPath: /data
        - name: redis-config
          mountPath: /etc/redis
      volumes:
      - name: redis-config
        configMap:
          name: redis-config
  # volumeClaimTemplates: automatically creates an independent PVC for each Pod
  volumeClaimTemplates:
  - metadata:
      name: redis-data
      annotations:
        "helm.sh/resource-policy": keep
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: redis-ssd
      resources:
        requests:
          storage: 20Gi
  # Generated PVC names: redis-data-redis-0, redis-data-redis-1, redis-data-redis-2

42.4 Service Design and Client Routing

42.4.1 Three Types of Services and Their Roles

# 1. Headless Service: foundation for StatefulSet stable DNS
apiVersion: v1
kind: Service
metadata:
  name: redis-headless
spec:
  clusterIP: None          # no ClusterIP = headless
  publishNotReadyAddresses: true   # publish DNS even when Pod is not Ready
  selector:
    app: redis
  ports:
  - name: redis
    port: 6379
    targetPort: 6379
---
# 2. Master Service: write entry point (or unified entry for non-split clients)
apiVersion: v1
kind: Service
metadata:
  name: redis-master
spec:
  selector:
    app: redis
    role: master
  ports:
  - port: 6379
    targetPort: 6379
---
# 3. Replica Service: read entry point (for read-write split clients)
apiVersion: v1
kind: Service
metadata:
  name: redis-replica
spec:
  selector:
    app: redis
    role: replica
  ports:
  - port: 6379
    targetPort: 6379

Sentinel mode caveat: After a failover, the former master Pod becomes a replica, but the Pod's labels are not automatically updated. Use a Sentinel-aware client that connects to Sentinel directly to dynamically discover the current master address, rather than using a static K8s Service label.

42.4.2 Service Configuration for Redis Cluster

Redis Cluster requires clients to connect directly to individual nodes (to follow MOVED redirections):

# Cluster mode: use Headless Service so each node is individually addressable
# Client connects to any node for initial handshake and topology discovery
apiVersion: v1
kind: Service
metadata:
  name: redis-cluster-entry
spec:
  selector:
    app: redis-cluster
  ports:
  - port: 6379
    targetPort: 6379
  # Do not use LoadBalancer: MOVED redirections require direct node connectivity

Cluster nodes communicate on the Bus port (6379 + 10000 = 16379). Ensure port 16379 is open between all cluster nodes:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: redis-cluster-internal
spec:
  podSelector:
    matchLabels:
      app: redis-cluster
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: redis-cluster
    ports:
    - port: 6379
    - port: 16379

42.5 Resource Limits and OOM Prevention

42.5.1 The Three-Layer Memory Model

maxmemory (Redis configuration)
    โ†“ ร— 1.2โ€“1.3
resources.requests.memory (K8s scheduling baseline)
    โ†“ ร— 1.3โ€“1.5
resources.limits.memory (K8s OOM Kill threshold)

Practical example:

Never set limits.memory < maxmemory. Redis performs eviction or rejects writes when maxmemory is reached, but it does not release memory below the limits threshold. If limits < maxmemory, the kernel may OOM-kill the Redis process during normal operation.

42.5.2 Linux Kernel Parameter Tuning (via initContainers)

initContainers:
# 1. Disable Transparent Huge Pages (THP)
- name: disable-thp
  image: busybox:1.35
  command:
  - /bin/sh
  - -c
  - |
    if [ -f /sys/kernel/mm/transparent_hugepage/enabled ]; then
      echo never > /sys/kernel/mm/transparent_hugepage/enabled
      echo never > /sys/kernel/mm/transparent_hugepage/defrag
    fi
    echo "THP disabled"
  securityContext:
    privileged: true
  volumeMounts:
  - name: sys
    mountPath: /sys

# 2. Set vm.overcommit_memory (prevent fork failures)
- name: set-overcommit
  image: busybox:1.35
  command:
  - /bin/sh
  - -c
  - |
    sysctl -w vm.overcommit_memory=1
    sysctl -w net.core.somaxconn=65535
  securityContext:
    privileged: true

volumes:
- name: sys
  hostPath:
    path: /sys

Why vm.overcommit_memory=1 is required: Redis BGSAVE creates a child process via fork(). Linux's default overcommit mode (0) checks whether sufficient physical memory exists to back the child's full virtual address space (equal in size to the parent). If Redis uses 8 GB but only 4 GB is free, fork() fails, and Redis logs:

MISCONF Redis is configured to save RDB snapshots,
but it's currently unable to persist on disk

Setting overcommit_memory=1 allows the kernel to grant virtual memory optimistically; physical pages are only allocated when actually written (COW). Fork succeeds, and only the dirty pages consume extra physical memory.

42.5.3 Pod QoS Class and OOM Kill Priority

# Best practice: set Guaranteed QoS (requests == limits) for Redis Pods
# K8s OOM kill priority: BestEffort > Burstable > Guaranteed (Guaranteed is last)
resources:
  requests:
    memory: "10Gi"
    cpu: "2000m"
  limits:
    memory: "10Gi"    # requests == limits โ†’ Guaranteed QoS
    cpu: "2000m"

With Guaranteed QoS, the Linux OOM killer will target Redis only after all BestEffort and Burstable Pods on the node have been killed.


42.6 Backup and Recovery

42.6.1 Scheduled Backup to S3 via CronJob

apiVersion: batch/v1
kind: CronJob
metadata:
  name: redis-backup
  namespace: redis
spec:
  schedule: "0 2 * * *"        # 02:00 UTC daily
  concurrencyPolicy: Forbid    # prevent concurrent runs
  successfulJobsHistoryLimit: 7
  failedJobsHistoryLimit: 3
  jobTemplate:
    spec:
      activeDeadlineSeconds: 3600
      template:
        spec:
          restartPolicy: OnFailure
          serviceAccountName: redis-backup-sa
          containers:
          - name: backup
            image: amazon/aws-cli:2.x
            env:
            - name: REDIS_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: redis-auth-secret
                  key: redis-password
            - name: S3_BUCKET
              value: "my-redis-backups"
            command:
            - /bin/sh
            - -c
            - |
              set -e
              DATE=$(date +%Y%m%d_%H%M%S)
              PREV_LASTSAVE=$(redis-cli -h redis-master -a "$REDIS_PASSWORD" LASTSAVE)

              redis-cli -h redis-master -a "$REDIS_PASSWORD" BGSAVE
              echo "Waiting for BGSAVE to complete..."
              while [ "$(redis-cli -h redis-master -a "$REDIS_PASSWORD" LASTSAVE)" = "$PREV_LASTSAVE" ]; do
                sleep 2
              done
              echo "BGSAVE complete"

              kubectl cp redis/redis-master-0:/data/dump.rdb /tmp/dump_${DATE}.rdb

              aws s3 cp /tmp/dump_${DATE}.rdb \
                s3://${S3_BUCKET}/redis/$(date +%Y/%m/%d)/dump_${DATE}.rdb \
                --storage-class STANDARD_IA \
                --sse aws:kms

              echo "Backup complete"

42.6.2 Restoring from S3

# 1. Download the backup
aws s3 cp s3://my-redis-backups/redis/2024/01/15/dump_20240115_020005.rdb /tmp/dump.rdb

# 2. Stop Redis (scale StatefulSet to 0)
kubectl scale statefulset redis-master -n redis --replicas=0

# 3. Copy the RDB file into the PVC
kubectl run restore-helper --image=busybox --restart=Never \
  --overrides='{"spec":{"volumes":[{"name":"data","persistentVolumeClaim":{"claimName":"redis-data-redis-master-0"}}],"containers":[{"name":"restore-helper","image":"busybox","volumeMounts":[{"name":"data","mountPath":"/data"}],"command":["sleep","3600"]}]}}' \
  -n redis
kubectl cp /tmp/dump.rdb redis/restore-helper:/data/dump.rdb
kubectl delete pod restore-helper -n redis

# 4. Restart Redis
kubectl scale statefulset redis-master -n redis --replicas=1

# 5. Verify
kubectl exec -it redis-master-0 -n redis -- redis-cli -a "$REDIS_PASSWORD" DBSIZE

42.7 Monitoring and Alerting

42.7.1 Key Prometheus Metrics from redis-exporter

# Memory
redis_memory_used_bytes                  # actual memory in use
redis_memory_max_bytes                   # configured maxmemory
redis_mem_fragmentation_ratio            # fragmentation ratio (alert > 1.5)

# Connections
redis_connected_clients                  # current client count
redis_connected_slaves                   # replica connections

# Persistence
redis_rdb_last_bgsave_status             # last BGSAVE result (ok / err)
redis_aof_last_rewrite_duration_sec      # last AOF rewrite duration

# Replication
redis_replication_offset                 # master replication offset
redis_slave_repl_offset                  # replica offset (alert if gap is large)
redis_replication_backlog_histlen        # current backlog utilization

# Performance
redis_commands_duration_seconds_total    # total command execution time
redis_keyspace_hits_total               # cache hits
redis_keyspace_misses_total             # cache misses

Sample alerting rules (PrometheusRule):

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: redis-alerts
spec:
  groups:
  - name: redis
    rules:
    - alert: RedisMemoryHigh
      expr: redis_memory_used_bytes / redis_memory_max_bytes > 0.85
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Redis memory usage exceeds 85%"

    - alert: RedisReplicationLag
      expr: redis_replication_offset - redis_slave_repl_offset > 104857600
      for: 2m
      labels:
        severity: critical
      annotations:
        summary: "Redis replica lag exceeds 100 MB"

    - alert: RedisFragmentationHigh
      expr: redis_mem_fragmentation_ratio > 1.5
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: "Redis memory fragmentation ratio exceeds 1.5"

Chapter Summary

Core principles for running Redis on Kubernetes:

  1. StatefulSet is mandatory: Stable Pod names and network identities are prerequisites for Redis master-replica and Cluster operation. Deployment must never be used for stateful Redis.

  2. Storage selection: Use reclaimPolicy: Retain to prevent accidental data deletion; cloud SSD (gp3 EBS or equivalent) provides adequate IOPS; volumeClaimTemplate ensures each Pod has its own reusable PVC.

  3. Three-layer memory planning: maxmemory โ†’ requests (COW headroom) โ†’ limits (fragmentation headroom). Each layer must have a meaningful buffer above the previous one.

  4. Kernel parameters: vm.overcommit_memory=1 and THP disabling must be configured via initContainers at Pod startup; these are prerequisites for stable Redis operation.

  5. Backup validation: Periodic backups alone do not guarantee safety. Run a full restore drill monthly to verify that backup files are valid and the recovery procedure works end to end.

Rate this chapter
4.7  / 5  (3 ratings)

๐Ÿ’ฌ Comments