第 32 章

MySQL on Kubernetes

MySQL on Kubernetes

将 MySQL 运行在 Kubernetes 上是一把双刃剑:享受云原生的弹性、标准化和自动化运维,同时面对有状态应用在容器编排中的独特挑战。本章涵盖生产级部署方案。

为何在 K8s 上运行 MySQL

优势 挑战
统一基础设施(与应用服务同平台) 有状态应用调度复杂
标准化配置管理(ConfigMap/Secret) 网络 I/O 有额外开销
自动重启故障 Pod(liveness probe) 主从切换需要 Operator 协调
水平扩展只读副本 PVC 绑定导致 Pod 调度受限
Helm Chart 标准化部署 数据安全性要求更高的运维规范

何时不应该在 K8s 上跑 MySQL?

对延迟极度敏感(P99 < 1ms 写入要求)、超大规模主库(>5TB 数据)、团队对 K8s 有状态应用运维经验不足时,建议使用云托管数据库(RDS/PolarDB)或自建物理机。

MySQL Operator 选型

直接使用 StatefulSet 管理 MySQL 主从过于繁琐,Operator 模式将运维知识编码为自动化控制器。

Operator 维护方 支持拓扑 推荐度
MySQL InnoDB Cluster Operator Oracle InnoDB Cluster / ReplicaSet ⭐⭐⭐⭐
Percona Operator for MySQL Percona 主从 / Group Replication ⭐⭐⭐⭐⭐
Bitnami MySQL Chart VMware 主从 ⭐⭐⭐(开发环境)
Vitess Operator CNCF 分片 + 主从 ⭐⭐⭐⭐(大规模)

使用 Percona Operator 部署 InnoDB Cluster

# 安装 Percona Operator for MySQL
kubectl apply -f https://raw.githubusercontent.com/percona/percona-server-mysql-operator/main/deploy/bundle.yaml

# 创建 PerconaServerMySQL 资源
cat > mysql-cluster.yaml <<'EOF'
apiVersion: ps.percona.com/v1alpha1
kind: PerconaServerMySQL
metadata:
  name: cluster1
spec:
  crVersion: 0.7.0
  secretsName: cluster1-secrets
  mysql:
    clusterType: group-replication
    image: percona/percona-server:8.0.36
    size: 3          # 3 节点 Group Replication
    resources:
      requests:
        memory: 2G
        cpu: "1"
      limits:
        memory: 4G
        cpu: "2"
    volumeSpec:
      persistentVolumeClaim:
        storageClassName: fast-ssd
        accessModes: [ReadWriteOnce]
        resources:
          requests:
            storage: 200Gi
  proxy:
    haproxy:
      enabled: true
      size: 2
      image: percona/haproxy:2.8
EOF
kubectl apply -f mysql-cluster.yaml

手动 StatefulSet 部署(单主多从)

核心 YAML

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
spec:
  serviceName: mysql
  replicas: 3
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      initContainers:
      - name: init-mysql
        image: mysql:8.0
        command:
        - bash
        - "-c"
        - |
          # 根据 Pod 序号决定 server-id
          [[ $(hostname) =~ -([0-9]+)$ ]] || exit 1
          ordinal=${BASH_REMATCH[1]}
          echo [mysqld] > /mnt/conf.d/server-id.cnf
          echo server-id=$((100 + $ordinal)) >> /mnt/conf.d/server-id.cnf
          # 主库 (ordinal=0) 开启 binlog
          if [[ $ordinal -eq 0 ]]; then
            cp /mnt/config-map/primary.cnf /mnt/conf.d/
          else
            cp /mnt/config-map/replica.cnf /mnt/conf.d/
          fi
        volumeMounts:
        - name: conf
          mountPath: /mnt/conf.d
        - name: config-map
          mountPath: /mnt/config-map

      containers:
      - name: mysql
        image: mysql:8.0.36
        env:
        - name: MYSQL_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-secret
              key: root-password
        ports:
        - containerPort: 3306
        livenessProbe:
          exec:
            command: ["mysqladmin", "ping"]
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          exec:
            command: ["mysql", "-u", "root", "-p$(MYSQL_ROOT_PASSWORD)", "-e", "SELECT 1"]
          initialDelaySeconds: 5
          periodSeconds: 2
        volumeMounts:
        - name: data
          mountPath: /var/lib/mysql
        - name: conf
          mountPath: /etc/mysql/conf.d
        resources:
          requests:
            cpu: "500m"
            memory: 1Gi
          limits:
            cpu: "2"
            memory: 4Gi

  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ReadWriteOnce]
      storageClassName: fast-ssd
      resources:
        requests:
          storage: 100Gi

Service 配置(读写分离)

# 写入 Service:指向主库 (mysql-0)
apiVersion: v1
kind: Service
metadata:
  name: mysql-write
spec:
  selector:
    app: mysql
    statefulset.kubernetes.io/pod-name: mysql-0
  ports:
  - port: 3306

---
# 读取 Service:负载均衡到所有 Pod(包含主库)
apiVersion: v1
kind: Service
metadata:
  name: mysql-read
spec:
  selector:
    app: mysql
  ports:
  - port: 3306

持久化存储最佳实践

存储类型 场景 备注
Local PV(本地磁盘) 最高 I/O 性能 节点故障数据可能丢失,需 Operator 协调
AWS EBS gp3/io2 AWS 上 MySQL 主库 与 Pod 同 AZ,支持动态扩容
AWS EFS 只读副本共享数据 NFS,延迟高,不适合主库
Ceph RBD(Rook) 自建 K8s 集群 多副本,性能中等
OpenEBS LVM 自建高性能 直接使用宿主机 LVM
# StorageClass 示例(AWS EBS gp3)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  iops: "16000"
  throughput: "1000"
  encrypted: "true"
reclaimPolicy: Retain          # 重要:Pod 删除后保留数据!
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer  # 与 Pod 同 AZ

reclaimPolicy 必须设为 Retain!

默认 Delete 策略会在 PVC 删除时自动删除 EBS 卷,数据永久丢失。生产 MySQL 的 StorageClass 必须设置 reclaimPolicy: Retain。

高可用配置

Pod 反亲和性(不同节点)

spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            app: mysql
        topologyKey: kubernetes.io/hostname  # 不同宿主机
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchLabels:
              app: mysql
          topologyKey: topology.kubernetes.io/zone  # 优先不同可用区

Pod Disruption Budget(限制中断)

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: mysql-pdb
spec:
  minAvailable: 2  # 3 节点集群中至少 2 个可用
  selector:
    matchLabels:
      app: mysql

配置管理

# ConfigMap 管理 my.cnf
apiVersion: v1
kind: ConfigMap
metadata:
  name: mysql-config
data:
  primary.cnf: |
    [mysqld]
    log_bin                     = mysql-bin
    binlog_format               = ROW
    gtid_mode                   = ON
    enforce_gtid_consistency    = ON
    sync_binlog                 = 1
    innodb_flush_log_at_trx_commit = 1

  replica.cnf: |
    [mysqld]
    super_read_only             = ON
    relay_log                   = relay-bin
    log_slave_updates           = ON
    gtid_mode                   = ON
    enforce_gtid_consistency    = ON

# Secret 管理密码(生产使用 Vault / External Secrets)
apiVersion: v1
kind: Secret
metadata:
  name: mysql-secret
type: Opaque
stringData:
  root-password: "StrongRootPass!123"
  replication-password: "ReplPass!456"

备份与恢复

# CronJob 定期备份到 S3
apiVersion: batch/v1
kind: CronJob
metadata:
  name: mysql-backup
spec:
  schedule: "0 2 * * *"   # 每天凌晨 2 点
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: backup
            image: percona/percona-xtrabackup:8.0
            command:
            - sh
            - -c
            - |
              xtrabackup --backup --user=root \
                --password=$MYSQL_ROOT_PASSWORD \
                --host=mysql-write \
                --target-dir=/backup/$(date +%Y%m%d%H%M%S)
              # 上传到 S3
              aws s3 sync /backup/ s3://my-mysql-backups/
            env:
            - name: MYSQL_ROOT_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: mysql-secret
                  key: root-password
          restartPolicy: OnFailure

可观测性

# 部署 MySQL Exporter
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mysql-exporter
spec:
  template:
    spec:
      containers:
      - name: exporter
        image: prom/mysqld-exporter:v0.15.0
        args:
        - --mysqld.username=$(MYSQL_USER):$(MYSQL_PASSWORD)
        - --mysqld.address=mysql-write:3306
        - --collect.info_schema.innodb_metrics
        - --collect.info_schema.processlist
        - --collect.slave_status
        ports:
        - containerPort: 9104

# Prometheus ServiceMonitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: mysql-monitor
spec:
  selector:
    matchLabels:
      app: mysql-exporter
  endpoints:
  - port: metrics
    interval: 30s

关键监控指标

指标 告警阈值 说明
mysql_global_status_threads_connected > 80% max_connections 连接数告警
mysql_slave_status_seconds_behind_master > 30s 主从延迟
mysql_global_status_innodb_buffer_pool_reads 持续上升 缓冲池命中率下降
mysql_global_status_slow_queries 增长速率异常 慢查询增多

生产建议:K8s 上的 MySQL 强烈建议使用 Operator 而不是手动 StatefulSet,Operator 会自动处理主从切换、备份调度、扩容缩容等复杂运维操作,大大降低人为失误风险。Percona Operator 是目前最成熟的开源选择。

本章评分
4.8  / 5  (3 评分)

💬 留言讨论