Description

Detect AI API/provider/model failures and route requests to healthy fallback providers or downgraded models. Use when creating or maintaining automatic failo...

README (SKILL.md)

API Failover

Name: API Failover
Author: zqh2333

Create or improve a lightweight failover layer for AI APIs.

Goals

Build systems that:

detect unavailable or degraded providers/models
classify failures before retrying blindly
switch to a safe fallback chain
avoid hammering broken endpoints
recover back to preferred providers after cooldown

Workflow

Identify the call path.
Classify failure modes.
Define a fallback policy.
Add health memory.
Implement guarded retries.
Emit observable logs.
Validate with forced-failure tests.

Use the detailed rules below and the bundled scripts instead of re-inventing routing logic each time.

Practical defaults

Error classes

Use these normalized categories:

AUTH_ERROR
BAD_REQUEST
RATE_LIMIT
TIMEOUT
SERVER_ERROR
NETWORK_ERROR
MODEL_UNAVAILABLE
QUOTA_EXCEEDED
UNKNOWN_TRANSIENT

Suggested routing behavior

AUTH_ERROR, BAD_REQUEST: fail fast; do not retry other providers unless config explicitly maps to another credential set.
RATE_LIMIT: short backoff, then fallback.
TIMEOUT, SERVER_ERROR, NETWORK_ERROR, MODEL_UNAVAILABLE, UNKNOWN_TRANSIENT: retry briefly, then fallback.
QUOTA_EXCEEDED: mark provider unavailable for a longer cooldown and fallback immediately.

Circuit breaker defaults

Start with:

open after 3 consecutive transient failures
cooldown 60-180s
half-open with 1 probe
close after 1-2 successful probes

Configuration pattern

Keep policy in config, not hard-coded logic.

Recommended shape:

provider registry
task profiles with ordered fallback chains
retry policy
circuit-breaker policy
per-provider overrides

Design guidance

Prefer fewer, well-understood providers over large fallback chains.
Keep the fallback chain semantically compatible when possible.
Separate "best quality" from "must return something" behavior.
Keep downgrade rules explicit; avoid silent huge capability drops for critical tasks.
For tool-using agents, treat provider switching as a reliability event and report it when user-visible quality may change.

Semi-automatic deployment model

Use this skill to discover the environment, generate a production-ish config, run a local HTTP failover proxy, and verify health.

Do not claim full autonomous takeover unless the environment-specific integration is actually completed.

References

Read these only when needed:

references/config-example.yaml for a compact policy example
references/config-realworld-example.yaml for a more practical multi-provider template
references/config-production.yaml for a ready-to-edit production template
references/test-scenarios.md for failure-injection and validation cases
references/realworld-notes.md for local proxy deployment and environment-variable setup
references/api-failover.service for a user-systemd service example

Bundled scripts

`scripts/discover_env.py`

Inspect the current environment.

`scripts/generate_config.py`

Generate a production-ish YAML config from simple defaults.

`scripts/failover_proxy.py`

Run a minimal CLI failover call path.

`scripts/http_proxy.py`

Expose a single local OpenAI-compatible entrypoint.

Endpoints:

POST /v1/chat/completions
GET /health

Optional request header:

X-Failover-Profile: cheap|default|critical|local-first

`scripts/selfcheck.py`

Validate that the local proxy is reachable and can process a minimal chat request.

`scripts/bootstrap_failover.py`

Run the semi-automatic bootstrap flow:

discover environment
generate config
optionally start the proxy
run self-check
print next actions

Example:

python3 scripts/bootstrap_failover.py \
  --default-model custom-ai-td-ee/gpt-5.4 \
  --start-proxy

Keep these scripts small and inspectable. Extend them instead of turning SKILL.md into code-heavy instructions.

Usage Guidance

This skill appears to implement an AI failover proxy and includes runnable scripts — that usage is plausible but a few things don't add up and you should inspect before running: - Review the included Python scripts (they're small and readable) before executing. They read /root/.openclaw/openclaw.json and ~/.config/api-failover.env, and will use environment variables like PRIMARY_API_KEY, ANTHROPIC_API_KEY, OPENROUTER_API_KEY, and OLLAMA_DUMMY_KEY if present. - The registry metadata does not list these env vars or the config path; treat that as a red flag. If you plan to use this skill, provide only the provider keys you intend to enable and keep them in a secure env file. - activate_secondary.py and bootstrap_failover.py call systemctl --user to (re)start api-failover.service. Do not run those scripts unless you understand/consent to restarting user services on the host. Prefer running the proxy in a container or isolated test user first. - If you have sensitive /root/.openclaw/openclaw.json data, back it up and confirm what the script will read/inherit. Consider copying necessary provider info into a dedicated minimal config rather than letting the scripts read root-scoped files. - For safer testing, run scripts with limited privileges (non-root user or inside a sandbox/container) and verify network endpoints they call. If you need higher confidence, ask the publisher to: (1) declare required env vars and config paths in the registry, (2) avoid hard-coded root paths or allow configurable paths, and (3) provide an explicit 'dry-run' mode that does not call systemctl or start processes. If you want, I can point out the exact lines in the scripts that read each config/env and the commands that will be executed so you can make a checklist before running anything.

Capability Analysis

Type: OpenClaw Skill Name: api-failover Version: 1.0.1 The api-failover skill bundle provides a framework for AI API reliability by implementing a local routing proxy. It is classified as suspicious because it performs several high-risk operations: it reads sensitive API credentials from environment variables and the master OpenClaw configuration file at /root/.openclaw/openclaw.json (discover_env.py), executes system-level service management commands via systemctl (activate_secondary.py), and starts a local HTTP server to intercept and route LLM request payloads (http_proxy.py). While these capabilities are plausibly required for the stated purpose of a failover proxy, the broad access to system secrets and execution of shell commands represent a significant attack surface without clear evidence of malicious intent.

Capability Assessment

⚠ Purpose & Capability

The name/description (AI API failover) aligns with the code and instructions, but the registry metadata declares no required env vars or config paths while the bundled scripts clearly expect provider API keys (PRIMARY_API_KEY, ANTHROPIC_API_KEY, OPENROUTER_API_KEY, OLLAMA_DUMMY_KEY), read /root/.openclaw/openclaw.json, and reference ~/.config/api-failover.env and other root-scoped paths. Required binaries are listed as none, yet the scripts assume system utilities (systemctl) and python3; these environment & path assumptions are not declared in the manifest and may not be appropriate for all users.

⚠ Instruction Scope

SKILL.md directs the agent/operator to run included scripts that inspect environment, read local config files, start/stop a user systemd service, spawn a local HTTP proxy, and perform upstream API calls. Those instructions go beyond passive guidance: they read system files, restart services, and run network calls. While these actions are coherent for deploying a failover proxy, they are intrusive and should be clearly declared; the instructions also rely on control fields in request bodies and remove them before forwarding (expected for proxy behavior). No hidden remote exfil endpoints were found, but the instructions give the skill broad discretion to read local credentials and config.

ℹ Install Mechanism

No install spec is provided (instruction-only), which minimizes installer risk. However, the skill bundles multiple executable Python scripts that will be run directly. The absence of an install step means files are simply present in the workspace — verify their contents before executing. No external downloads or archive extraction are used.

⚠ Credentials

The manifest lists no required environment variables or primary credential, but the code expects multiple API keys and may inherit credentials from /root/.openclaw/openclaw.json. That mismatch is notable: the skill needs provider credentials (reasonable for failover) but fails to declare them, and it will read config from root-scoped locations. Environment access requests are therefore under-declared and could lead to accidental exposure of unrelated secrets if run in a privileged environment.

⚠ Persistence & Privilege

The skill will restart a user systemd service (systemctl --user restart api-failover.service), create/read files under /tmp and hard-coded /root paths, and can start long-running local proxy processes. 'always' is false (good), but the scripts assume permission to control user services and to read root-scoped config. This level of operational privilege is significant and should be explicitly documented and approved before running.

Version History

v1.0.1

Add model-aware downgrade routing, auto/hinted/body controls, cleaner failure UX, secondary activation flow, delivery docs

v1.0.0

Initial delivered version: intelligent routing, model downgrade, failure UX, activation flow

Metadata

Slug api-failover

Version 1.0.1

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 2

Frequently Asked Questions

What is API Failover?

Detect AI API/provider/model failures and route requests to healthy fallback providers or downgraded models. Use when creating or maintaining automatic failo... It is an AI Agent Skill for Claude Code / OpenClaw, with 93 downloads so far.

How do I install API Failover?

Run "/install api-failover" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is API Failover free?

Yes, API Failover is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does API Failover support?

API Failover is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created API Failover?

It is built and maintained by Qihong (@zqh2333); the current version is v1.0.1.

More Skills

API Failover

API Failover

Goals

Workflow

Practical defaults

Error classes

Suggested routing behavior

Circuit breaker defaults

Configuration pattern

Design guidance

Semi-automatic deployment model

References

Bundled scripts

scripts/discover_env.py

scripts/generate_config.py

scripts/failover_proxy.py

scripts/http_proxy.py

scripts/selfcheck.py

scripts/bootstrap_failover.py

What is API Failover?

How do I install API Failover?

Is API Failover free?

Which platforms does API Failover support?

Who created API Failover?

💬 Comments

`scripts/discover_env.py`

`scripts/generate_config.py`

`scripts/failover_proxy.py`

`scripts/http_proxy.py`

`scripts/selfcheck.py`

`scripts/bootstrap_failover.py`