第 12 章

模型配置深度：Key Rotation、Failover 机制与推理深度控制

第12章：模型配置深度——Key Rotation、Failover 机制与推理深度控制

概述

在生产环境中，单一 API Key 和固定模型配置往往不够健壮。OpenClaw 提供了三大高级机制：Key Rotation（密钥轮转）、Failover（故障转移）和推理深度控制（Think 模式）。这三者共同构成了生产级 AI 应用的可靠性与经济性基础。

12.1 API Key Rotation 配置

为什么需要 Key Rotation

单个 API Key 在高并发场景下很快会触及 Provider 的速率限制（TPM/RPM）。Key Rotation 通过多个 Key 的轮转使用，成倍提升实际可用吞吐量。

配置格式

{
  "providers": {
    "anthropic": {
      "api_keys": [
        {
          "key": "${ANTHROPIC_API_KEY_1}",
          "priority": 1,
          "weight": 3,
          "label": "primary-team-a"
        },
        {
          "key": "${ANTHROPIC_API_KEY_2}",
          "priority": 1,
          "weight": 2,
          "label": "primary-team-b"
        },
        {
          "key": "${ANTHROPIC_API_KEY_BACKUP}",
          "priority": 2,
          "weight": 1,
          "label": "backup"
        }
      ],
      "rotation_strategy": "weighted_round_robin"
    }
  }
}

优先级逻辑

OpenClaw 的 Key 选择遵循以下规则：

Priority 数值越小，优先级越高（priority=1 优先于 priority=2）
同一 priority 内，按 weight 进行加权轮转
当高优先级 Key 全部不可用时，自动降级至下一优先级组

Priority 1: [Key-A weight=3, Key-B weight=2]  -> 3:2 比例分配请求
Priority 2: [Key-Backup weight=1]              -> 仅在 Priority 1 全失效时启用

轮转策略选项

策略	说明	适用场景
`round_robin`	等权轮转	Key 额度相同
`weighted_round_robin`	加权轮转	Key 额度不同
`random`	随机选择	分散负载
`least_used`	选择使用量最少的 Key	精细均衡

12.2 429 / Quota 自动重试机制

错误分类与响应

{
  "retry": {
    "enabled": true,
    "max_attempts": 5,
    "backoff_strategy": "exponential",
    "base_delay_ms": 1000,
    "max_delay_ms": 30000,
    "retryable_errors": [429, 500, 502, 503, 504],
    "non_retryable_errors": [400, 401, 403, 404]
  }
}

重试流程图

请求发出
   |
   v
API 响应
   |
  失败?
   |
   +-- 否 --> 返回结果
   |
   +-- 是 --> 错误码是 429?
              |
              +-- 是 --> 读取 Retry-After 头
              |          等待指定时间
              |          切换到下一个 Key（同优先级）
              |          重试（max_attempts 递减）
              |
              +-- 否 --> 是可重试错误码?
                         |
                         +-- 是 --> 指数退避等待
                         |          重试
                         |
                         +-- 否 --> 抛出 Non-retryable Error

指数退避配置详解

{
  "retry": {
    "backoff_strategy": "exponential_jitter",
    "base_delay_ms": 1000,
    "multiplier": 2.0,
    "jitter": 0.3,
    "max_delay_ms": 30000
  }
}

等待时间计算：delay = min(base * 2^attempt * (1 ± jitter), max_delay)

尝试次数	基础等待	加 Jitter 后（示意）
第1次重试	1000ms	700~1300ms
第2次重试	2000ms	1400~2600ms
第3次重试	4000ms	2800~5200ms
第4次重试	8000ms	5600~10400ms
第5次重试	16000ms	11200~20800ms

12.3 FailoverError 触发链

什么是 FailoverError

FailoverError 是 OpenClaw 定义的特殊错误类型，当某个模型/Provider 无法完成请求时触发，系统自动按预配置的 Failover 链切换到下一个模型。

Failover 链配置

{
  "failover": {
    "enabled": true,
    "chain": [
      {
        "model": "anthropic/claude-opus-4-6",
        "timeout_ms": 30000,
        "triggers": ["FailoverError", "timeout", "context_length_exceeded"]
      },
      {
        "model": "anthropic/claude-sonnet-4-6",
        "timeout_ms": 20000,
        "triggers": ["FailoverError", "timeout"]
      },
      {
        "model": "openai/gpt-5.5",
        "timeout_ms": 25000,
        "triggers": ["FailoverError", "timeout"]
      },
      {
        "model": "ollama/llama3.2",
        "timeout_ms": 60000,
        "triggers": ["FailoverError"]
      }
    ],
    "on_failover_log": true,
    "on_failover_notify_webhook": "${ALERT_WEBHOOK_URL}"
  }
}

触发条件类型

触发条件	说明
`FailoverError`	模型返回明确失败
`timeout`	超过 timeout_ms 未响应
`context_length_exceeded`	输入超过模型上下文限制
`rate_limit_exhausted`	所有 Key 均被限速
`content_filtered`	内容被 Provider 过滤
`model_overloaded`	模型服务过载

实际 Failover 日志示例

[2026-04-26T10:23:41Z] INFO  Primary model request started: anthropic/claude-opus-4-6
[2026-04-26T10:23:71Z] WARN  Timeout after 30000ms, triggering failover
[2026-04-26T10:23:71Z] INFO  Failover to: anthropic/claude-sonnet-4-6 (attempt 2/4)
[2026-04-26T10:23:85Z] INFO  Request completed successfully on fallback model

12.4 Profile 冷却追踪

防止频繁切换的冷却机制

频繁在 Provider 间切换会引入不必要的延迟和状态不一致。OpenClaw 的 Profile 冷却机制确保一个 Provider/Key 在短时间内失败后，不会立刻被重新选中。

{
  "profile_cooling": {
    "enabled": true,
    "error_threshold": 3,
    "cooling_period_seconds": 300,
    "recovery_check_interval_seconds": 60,
    "metrics": {
      "track_per_key": true,
      "track_per_model": true,
      "track_per_provider": false
    }
  }
}

冷却状态机

Key 正常可用 (Active)
    |
    | 错误次数 >= error_threshold
    v
进入冷却状态 (Cooling)
    |
    | 等待 cooling_period_seconds
    v
尝试恢复检测 (Recovery Check)
    |
    +-- 健康检测通过 --> Active
    |
    +-- 健康检测失败 --> 重置冷却计时器 --> Cooling

查看当前 Profile 状态

# CLI 命令
openclaw profile status

# 输出示例
Provider Profile Status:
  anthropic/claude-opus-4-6
    Key: sk-ant-...xxx1  Status: ACTIVE    Errors: 0    Last used: 2s ago
    Key: sk-ant-...xxx2  Status: COOLING   Errors: 3    Cooling until: 14:28:41
    Key: sk-ant-...xxx3  Status: ACTIVE    Errors: 1    Last used: 45s ago

  openai/gpt-5.5
    Key: sk-proj-...yyy1 Status: ACTIVE    Errors: 0    Last used: 1m ago

12.5 推理深度控制：/think 指令

三种推理模式

OpenClaw 支持通过 /think 指令动态控制模型的推理深度。这对于需要在响应速度和推理质量之间做权衡的场景特别有用。

# 自适应模式（默认）：根据问题复杂度自动选择推理深度
/think adaptive

# 高深度推理：开启 CoT/Extended Thinking
/think high

# 关闭推理：最快响应速度，直接输出
/think off

各模式适用场景

模式	Token 消耗	响应延迟	适用场景
`/think off`	最低	最短	简单问答、分类、摘要、格式化
`/think adaptive`	中等	中等	通用场景，推荐默认
`/think high`	最高	最长	数学推导、代码调试、复杂规划

配置文件中设置默认推理模式

{
  "inference": {
    "default_think_mode": "adaptive",
    "per_model_overrides": {
      "anthropic/claude-opus-4-6": "high",
      "anthropic/claude-haiku-4-5": "off",
      "openai/o3": "high",
      "openai/gpt-5.4-mini": "off"
    }
  }
}

推理 Token 预算控制

{
  "inference": {
    "think_budget": {
      "adaptive_min_tokens": 500,
      "adaptive_max_tokens": 4000,
      "high_max_tokens": 16000
    }
  }
}

12.6 按渠道绑定不同模型

业务场景设计

不同的用户接入渠道对响应速度和质量有不同要求：

WhatsApp：用户期望快速回复，使用轻量快速模型
Telegram：中等复杂度对话，使用平衡模型
Web API：专业用户，可接受较长等待，使用旗舰模型
内部工具：可使用本地模型节省成本

渠道模型绑定配置

{
  "channel_model_bindings": {
    "whatsapp": {
      "model": "anthropic/claude-haiku-4-5",
      "think_mode": "off",
      "max_tokens": 1024,
      "temperature": 0.7
    },
    "telegram": {
      "model": "anthropic/claude-sonnet-4-6",
      "think_mode": "adaptive",
      "max_tokens": 4096,
      "temperature": 0.7
    },
    "web_api": {
      "model": "anthropic/claude-opus-4-6",
      "think_mode": "high",
      "max_tokens": 8192,
      "temperature": 0.5
    },
    "internal_tool": {
      "model": "ollama/llama3.2",
      "think_mode": "off",
      "max_tokens": 2048
    }
  }
}

在 Agent 定义中引用渠道绑定

{
  "agent": {
    "name": "customer-support",
    "channel": "${INCOMING_CHANNEL}",
    "fallback_channel": "telegram"
  }
}

12.7 成本控制：per-provider 模型成本对比

成本感知配置

{
  "cost_control": {
    "enabled": true,
    "budget": {
      "daily_usd": 100.0,
      "monthly_usd": 2000.0,
      "alert_threshold_pct": 80
    },
    "auto_downgrade": {
      "enabled": true,
      "trigger_pct": 90,
      "downgrade_to": "anthropic/claude-sonnet-4-6"
    }
  }
}

主流模型成本对比（参考价格，2026年Q1）

模型	输入价格($/M tokens)	输出价格($/M tokens)	性价比等级
anthropic/claude-haiku-4-5	0.25	1.25	极高
anthropic/claude-sonnet-4-6	3.00	15.00	高
anthropic/claude-opus-4-6	15.00	75.00	低（能力最强）
openai/gpt-5.4-mini	0.15	0.60	极高
openai/gpt-5.5	10.00	30.00	中
openai/o3	10.00	40.00	低（推理最强）
deepseek/deepseek-r1	0.55	2.19	极高（推理）
google/gemini-3-flash	0.075	0.30	最高
ollama/任意模型	0.00	0.00	无限高（本地）

成本优化策略

{
  "cost_optimization": {
    "routing_rules": [
      {
        "condition": "token_estimate < 500",
        "route_to": "anthropic/claude-haiku-4-5"
      },
      {
        "condition": "token_estimate >= 500 AND complexity == 'low'",
        "route_to": "openai/gpt-5.4-mini"
      },
      {
        "condition": "complexity == 'high' OR requires_reasoning == true",
        "route_to": "anthropic/claude-opus-4-6"
      }
    ]
  }
}

12.8 综合配置示例：生产级完整配置

{
  "providers": {
    "anthropic": {
      "api_keys": [
        {"key": "${ANT_KEY_1}", "priority": 1, "weight": 3},
        {"key": "${ANT_KEY_2}", "priority": 1, "weight": 2},
        {"key": "${ANT_KEY_BACKUP}", "priority": 2, "weight": 1}
      ],
      "rotation_strategy": "weighted_round_robin"
    },
    "openai": {
      "api_keys": [
        {"key": "${OAI_KEY_1}", "priority": 1, "weight": 1}
      ]
    },
    "ollama": {
      "base_url": "http://localhost:11434"
    }
  },
  "failover": {
    "enabled": true,
    "chain": [
      {"model": "anthropic/claude-sonnet-4-6", "timeout_ms": 30000},
      {"model": "openai/gpt-5.4-mini", "timeout_ms": 20000},
      {"model": "ollama/llama3.2", "timeout_ms": 60000}
    ]
  },
  "profile_cooling": {
    "enabled": true,
    "error_threshold": 3,
    "cooling_period_seconds": 300
  },
  "retry": {
    "enabled": true,
    "max_attempts": 3,
    "backoff_strategy": "exponential_jitter"
  },
  "inference": {
    "default_think_mode": "adaptive"
  },
  "cost_control": {
    "daily_usd": 50.0,
    "alert_threshold_pct": 80
  }
}

本章小结

Key Rotation 通过优先级 + 权重策略，将多个 API Key 作为统一资源池使用
429/quota 重试 使用指数退避 + Jitter 避免惊群效应
FailoverError 链 在模型失败时自动切换，确保请求完成率
Profile 冷却 防止不断重试失败的 Key/模型，避免无效浪费
/think 模式 允许按需控制推理深度，平衡质量与成本
渠道绑定 实现不同入口使用不同模型的精细化策略
成本感知路由 在预算压力下自动降级到经济模型

下一章 将介绍本地模型部署的完整实战方案。

本章评分

4.6 / 5 (30 评分)