Chapter 12

Advanced Model Config: Key Rotation, Failover and Inference Depth Control

Chapter 12: Deep Model Configuration — Key Rotation, Failover Mechanisms, and Inference Depth Control

Overview

In production environments, a single API Key and fixed model configuration are often not robust enough. OpenClaw provides three major advanced mechanisms: Key Rotation, Failover, and inference depth control (Think mode). Together these three form the reliability and economic foundation for production-grade AI applications.


12.1 API Key Rotation Configuration

Why Key Rotation Is Needed

A single API Key will quickly hit Provider rate limits (TPM/RPM) under high concurrency. Key Rotation multiplies actual throughput by rotating across multiple Keys.

Configuration Format

{
  "providers": {
    "anthropic": {
      "api_keys": [
        {
          "key": "${ANTHROPIC_API_KEY_1}",
          "priority": 1,
          "weight": 3,
          "label": "primary-team-a"
        },
        {
          "key": "${ANTHROPIC_API_KEY_2}",
          "priority": 1,
          "weight": 2,
          "label": "primary-team-b"
        },
        {
          "key": "${ANTHROPIC_API_KEY_BACKUP}",
          "priority": 2,
          "weight": 1,
          "label": "backup"
        }
      ],
      "rotation_strategy": "weighted_round_robin"
    }
  }
}

Priority Logic

OpenClaw's Key selection follows these rules:

  1. Lower priority number = higher priority (priority=1 before priority=2)
  2. Within the same priority, Keys are rotated using weighted round-robin per weight
  3. When all higher-priority Keys are unavailable, the system automatically falls back to the next priority group
Priority 1: [Key-A weight=3, Key-B weight=2]  -> 3:2 ratio request distribution
Priority 2: [Key-Backup weight=1]              -> Only activated when Priority 1 fully fails

Rotation Strategy Options

Strategy Description Best For
round_robin Equal-weight rotation Keys with identical quotas
weighted_round_robin Weighted rotation Keys with different quotas
random Random selection Load spreading
least_used Select the least-used Key Fine-grained balancing

12.2 429 / Quota Auto-Retry Mechanism

Error Classification and Response

{
  "retry": {
    "enabled": true,
    "max_attempts": 5,
    "backoff_strategy": "exponential",
    "base_delay_ms": 1000,
    "max_delay_ms": 30000,
    "retryable_errors": [429, 500, 502, 503, 504],
    "non_retryable_errors": [400, 401, 403, 404]
  }
}

Retry Flow Diagram

Request sent
   |
   v
API response
   |
  Failed?
   |
   +-- No --> Return result
   |
   +-- Yes --> Is error 429?
               |
               +-- Yes --> Read Retry-After header
               |           Wait specified time
               |           Switch to next Key (same priority)
               |           Retry (decrement max_attempts)
               |
               +-- No --> Is it a retryable error code?
                          |
                          +-- Yes --> Wait with exponential backoff
                          |           Retry
                          |
                          +-- No --> Throw Non-retryable Error

Exponential Backoff Configuration Details

{
  "retry": {
    "backoff_strategy": "exponential_jitter",
    "base_delay_ms": 1000,
    "multiplier": 2.0,
    "jitter": 0.3,
    "max_delay_ms": 30000
  }
}

Delay calculation: delay = min(base * 2^attempt * (1 ± jitter), max_delay)

Attempt Base Wait With Jitter (approximate)
Retry 1 1000ms 700~1300ms
Retry 2 2000ms 1400~2600ms
Retry 3 4000ms 2800~5200ms
Retry 4 8000ms 5600~10400ms
Retry 5 16000ms 11200~20800ms

12.3 FailoverError Trigger Chain

What Is FailoverError

FailoverError is a special error type defined by OpenClaw. It triggers when a model/Provider cannot complete a request, and the system automatically switches to the next model in the pre-configured Failover chain.

Failover Chain Configuration

{
  "failover": {
    "enabled": true,
    "chain": [
      {
        "model": "anthropic/claude-opus-4-6",
        "timeout_ms": 30000,
        "triggers": ["FailoverError", "timeout", "context_length_exceeded"]
      },
      {
        "model": "anthropic/claude-sonnet-4-6",
        "timeout_ms": 20000,
        "triggers": ["FailoverError", "timeout"]
      },
      {
        "model": "openai/gpt-5.5",
        "timeout_ms": 25000,
        "triggers": ["FailoverError", "timeout"]
      },
      {
        "model": "ollama/llama3.2",
        "timeout_ms": 60000,
        "triggers": ["FailoverError"]
      }
    ],
    "on_failover_log": true,
    "on_failover_notify_webhook": "${ALERT_WEBHOOK_URL}"
  }
}

Trigger Condition Types

Trigger Condition Description
FailoverError Model returns explicit failure
timeout No response within timeout_ms
context_length_exceeded Input exceeds model context limit
rate_limit_exhausted All Keys are rate-limited
content_filtered Content filtered by Provider
model_overloaded Model service overloaded

Real Failover Log Example

[2026-04-26T10:23:41Z] INFO  Primary model request started: anthropic/claude-opus-4-6
[2026-04-26T10:23:71Z] WARN  Timeout after 30000ms, triggering failover
[2026-04-26T10:23:71Z] INFO  Failover to: anthropic/claude-sonnet-4-6 (attempt 2/4)
[2026-04-26T10:23:85Z] INFO  Request completed successfully on fallback model

12.4 Profile Cooling Tracking

Cooling Mechanism to Prevent Frequent Switching

Frequent switching between Providers introduces unnecessary latency and state inconsistency. OpenClaw's Profile cooling mechanism ensures a Provider/Key that recently failed is not immediately re-selected.

{
  "profile_cooling": {
    "enabled": true,
    "error_threshold": 3,
    "cooling_period_seconds": 300,
    "recovery_check_interval_seconds": 60,
    "metrics": {
      "track_per_key": true,
      "track_per_model": true,
      "track_per_provider": false
    }
  }
}

Cooling State Machine

Key is normally available (Active)
    |
    | Errors >= error_threshold
    v
Enter cooling state (Cooling)
    |
    | Wait cooling_period_seconds
    v
Attempt recovery check (Recovery Check)
    |
    +-- Health check passes --> Active
    |
    +-- Health check fails --> Reset cooling timer --> Cooling

View Current Profile Status

# CLI command
openclaw profile status

# Example output
Provider Profile Status:
  anthropic/claude-opus-4-6
    Key: sk-ant-...xxx1  Status: ACTIVE    Errors: 0    Last used: 2s ago
    Key: sk-ant-...xxx2  Status: COOLING   Errors: 3    Cooling until: 14:28:41
    Key: sk-ant-...xxx3  Status: ACTIVE    Errors: 1    Last used: 45s ago

  openai/gpt-5.5
    Key: sk-proj-...yyy1 Status: ACTIVE    Errors: 0    Last used: 1m ago

12.5 Inference Depth Control: The /think Command

Three Inference Modes

OpenClaw supports dynamically controlling the model's inference depth via the /think command. This is especially useful for scenarios requiring trade-offs between response speed and reasoning quality.

# Adaptive mode (default): automatically selects inference depth based on problem complexity
/think adaptive

# High-depth reasoning: enables CoT/Extended Thinking
/think high

# Disable reasoning: fastest response speed, direct output
/think off

Applicable Scenarios for Each Mode

Mode Token Consumption Response Latency Applicable Scenarios
/think off Lowest Shortest Simple Q&A, classification, summarization, formatting
/think adaptive Medium Medium General purpose, recommended default
/think high Highest Longest Mathematical derivation, code debugging, complex planning

Setting Default Inference Mode in Config File

{
  "inference": {
    "default_think_mode": "adaptive",
    "per_model_overrides": {
      "anthropic/claude-opus-4-6": "high",
      "anthropic/claude-haiku-4-5": "off",
      "openai/o3": "high",
      "openai/gpt-5.4-mini": "off"
    }
  }
}

Inference Token Budget Control

{
  "inference": {
    "think_budget": {
      "adaptive_min_tokens": 500,
      "adaptive_max_tokens": 4000,
      "high_max_tokens": 16000
    }
  }
}

12.6 Binding Different Models per Channel

Business Scenario Design

Different user access channels have different requirements for response speed and quality:

Channel Model Binding Configuration

{
  "channel_model_bindings": {
    "whatsapp": {
      "model": "anthropic/claude-haiku-4-5",
      "think_mode": "off",
      "max_tokens": 1024,
      "temperature": 0.7
    },
    "telegram": {
      "model": "anthropic/claude-sonnet-4-6",
      "think_mode": "adaptive",
      "max_tokens": 4096,
      "temperature": 0.7
    },
    "web_api": {
      "model": "anthropic/claude-opus-4-6",
      "think_mode": "high",
      "max_tokens": 8192,
      "temperature": 0.5
    },
    "internal_tool": {
      "model": "ollama/llama3.2",
      "think_mode": "off",
      "max_tokens": 2048
    }
  }
}

Referencing Channel Bindings in Agent Definitions

{
  "agent": {
    "name": "customer-support",
    "channel": "${INCOMING_CHANNEL}",
    "fallback_channel": "telegram"
  }
}

12.7 Cost Control: Per-Provider Model Cost Comparison

Cost-Aware Configuration

{
  "cost_control": {
    "enabled": true,
    "budget": {
      "daily_usd": 100.0,
      "monthly_usd": 2000.0,
      "alert_threshold_pct": 80
    },
    "auto_downgrade": {
      "enabled": true,
      "trigger_pct": 90,
      "downgrade_to": "anthropic/claude-sonnet-4-6"
    }
  }
}

Mainstream Model Cost Comparison (Reference Prices, Q1 2026)

Model Input Price ($/M tokens) Output Price ($/M tokens) Cost-Effectiveness
anthropic/claude-haiku-4-5 0.25 1.25 Very High
anthropic/claude-sonnet-4-6 3.00 15.00 High
anthropic/claude-opus-4-6 15.00 75.00 Low (strongest capability)
openai/gpt-5.4-mini 0.15 0.60 Very High
openai/gpt-5.5 10.00 30.00 Medium
openai/o3 10.00 40.00 Low (strongest reasoning)
deepseek/deepseek-r1 0.55 2.19 Very High (reasoning)
google/gemini-3-flash 0.075 0.30 Highest
ollama/any model 0.00 0.00 Infinite (local)

Cost Optimization Strategy

{
  "cost_optimization": {
    "routing_rules": [
      {
        "condition": "token_estimate < 500",
        "route_to": "anthropic/claude-haiku-4-5"
      },
      {
        "condition": "token_estimate >= 500 AND complexity == 'low'",
        "route_to": "openai/gpt-5.4-mini"
      },
      {
        "condition": "complexity == 'high' OR requires_reasoning == true",
        "route_to": "anthropic/claude-opus-4-6"
      }
    ]
  }
}

12.8 Comprehensive Example: Full Production-Grade Configuration

{
  "providers": {
    "anthropic": {
      "api_keys": [
        {"key": "${ANT_KEY_1}", "priority": 1, "weight": 3},
        {"key": "${ANT_KEY_2}", "priority": 1, "weight": 2},
        {"key": "${ANT_KEY_BACKUP}", "priority": 2, "weight": 1}
      ],
      "rotation_strategy": "weighted_round_robin"
    },
    "openai": {
      "api_keys": [
        {"key": "${OAI_KEY_1}", "priority": 1, "weight": 1}
      ]
    },
    "ollama": {
      "base_url": "http://localhost:11434"
    }
  },
  "failover": {
    "enabled": true,
    "chain": [
      {"model": "anthropic/claude-sonnet-4-6", "timeout_ms": 30000},
      {"model": "openai/gpt-5.4-mini", "timeout_ms": 20000},
      {"model": "ollama/llama3.2", "timeout_ms": 60000}
    ]
  },
  "profile_cooling": {
    "enabled": true,
    "error_threshold": 3,
    "cooling_period_seconds": 300
  },
  "retry": {
    "enabled": true,
    "max_attempts": 3,
    "backoff_strategy": "exponential_jitter"
  },
  "inference": {
    "default_think_mode": "adaptive"
  },
  "cost_control": {
    "daily_usd": 50.0,
    "alert_threshold_pct": 80
  }
}

Chapter Summary

Next chapter introduces the complete practical guide for local model deployment.

Rate this chapter
4.6  / 5  (30 ratings)

💬 Comments