← Back to Skills Marketplace
zqh2333

API Failover

by Qihong · GitHub ↗ · v1.0.1 · MIT-0
cross-platform ⚠ suspicious
93
Downloads
0
Stars
0
Active Installs
2
Versions
Install in OpenClaw
/install api-failover
Description
Detect AI API/provider/model failures and route requests to healthy fallback providers or downgraded models. Use when creating or maintaining automatic failo...
README (SKILL.md)

API Failover

Create or improve a lightweight failover layer for AI APIs.

Goals

Build systems that:

  • detect unavailable or degraded providers/models
  • classify failures before retrying blindly
  • switch to a safe fallback chain
  • avoid hammering broken endpoints
  • recover back to preferred providers after cooldown

Workflow

  1. Identify the call path.
  2. Classify failure modes.
  3. Define a fallback policy.
  4. Add health memory.
  5. Implement guarded retries.
  6. Emit observable logs.
  7. Validate with forced-failure tests.

Use the detailed rules below and the bundled scripts instead of re-inventing routing logic each time.

Practical defaults

Error classes

Use these normalized categories:

  • AUTH_ERROR
  • BAD_REQUEST
  • RATE_LIMIT
  • TIMEOUT
  • SERVER_ERROR
  • NETWORK_ERROR
  • MODEL_UNAVAILABLE
  • QUOTA_EXCEEDED
  • UNKNOWN_TRANSIENT

Suggested routing behavior

  • AUTH_ERROR, BAD_REQUEST: fail fast; do not retry other providers unless config explicitly maps to another credential set.
  • RATE_LIMIT: short backoff, then fallback.
  • TIMEOUT, SERVER_ERROR, NETWORK_ERROR, MODEL_UNAVAILABLE, UNKNOWN_TRANSIENT: retry briefly, then fallback.
  • QUOTA_EXCEEDED: mark provider unavailable for a longer cooldown and fallback immediately.

Circuit breaker defaults

Start with:

  • open after 3 consecutive transient failures
  • cooldown 60-180s
  • half-open with 1 probe
  • close after 1-2 successful probes

Configuration pattern

Keep policy in config, not hard-coded logic.

Recommended shape:

  • provider registry
  • task profiles with ordered fallback chains
  • retry policy
  • circuit-breaker policy
  • per-provider overrides

Design guidance

  • Prefer fewer, well-understood providers over large fallback chains.
  • Keep the fallback chain semantically compatible when possible.
  • Separate "best quality" from "must return something" behavior.
  • Keep downgrade rules explicit; avoid silent huge capability drops for critical tasks.
  • For tool-using agents, treat provider switching as a reliability event and report it when user-visible quality may change.

Semi-automatic deployment model

Use this skill to discover the environment, generate a production-ish config, run a local HTTP failover proxy, and verify health.

Do not claim full autonomous takeover unless the environment-specific integration is actually completed.

References

Read these only when needed:

  • references/config-example.yaml for a compact policy example
  • references/config-realworld-example.yaml for a more practical multi-provider template
  • references/config-production.yaml for a ready-to-edit production template
  • references/test-scenarios.md for failure-injection and validation cases
  • references/realworld-notes.md for local proxy deployment and environment-variable setup
  • references/api-failover.service for a user-systemd service example

Bundled scripts

scripts/discover_env.py

Inspect the current environment.

scripts/generate_config.py

Generate a production-ish YAML config from simple defaults.

scripts/failover_proxy.py

Run a minimal CLI failover call path.

scripts/http_proxy.py

Expose a single local OpenAI-compatible entrypoint.

Endpoints:

  • POST /v1/chat/completions
  • GET /health

Optional request header:

  • X-Failover-Profile: cheap|default|critical|local-first

scripts/selfcheck.py

Validate that the local proxy is reachable and can process a minimal chat request.

scripts/bootstrap_failover.py

Run the semi-automatic bootstrap flow:

  • discover environment
  • generate config
  • optionally start the proxy
  • run self-check
  • print next actions

Example:

python3 scripts/bootstrap_failover.py \
  --default-model custom-ai-td-ee/gpt-5.4 \
  --start-proxy

Keep these scripts small and inspectable. Extend them instead of turning SKILL.md into code-heavy instructions.

Usage Guidance
This skill appears to implement an AI failover proxy and includes runnable scripts — that usage is plausible but a few things don't add up and you should inspect before running: - Review the included Python scripts (they're small and readable) before executing. They read /root/.openclaw/openclaw.json and ~/.config/api-failover.env, and will use environment variables like PRIMARY_API_KEY, ANTHROPIC_API_KEY, OPENROUTER_API_KEY, and OLLAMA_DUMMY_KEY if present. - The registry metadata does not list these env vars or the config path; treat that as a red flag. If you plan to use this skill, provide only the provider keys you intend to enable and keep them in a secure env file. - activate_secondary.py and bootstrap_failover.py call systemctl --user to (re)start api-failover.service. Do not run those scripts unless you understand/consent to restarting user services on the host. Prefer running the proxy in a container or isolated test user first. - If you have sensitive /root/.openclaw/openclaw.json data, back it up and confirm what the script will read/inherit. Consider copying necessary provider info into a dedicated minimal config rather than letting the scripts read root-scoped files. - For safer testing, run scripts with limited privileges (non-root user or inside a sandbox/container) and verify network endpoints they call. If you need higher confidence, ask the publisher to: (1) declare required env vars and config paths in the registry, (2) avoid hard-coded root paths or allow configurable paths, and (3) provide an explicit 'dry-run' mode that does not call systemctl or start processes. If you want, I can point out the exact lines in the scripts that read each config/env and the commands that will be executed so you can make a checklist before running anything.
Capability Analysis
Type: OpenClaw Skill Name: api-failover Version: 1.0.1 The api-failover skill bundle provides a framework for AI API reliability by implementing a local routing proxy. It is classified as suspicious because it performs several high-risk operations: it reads sensitive API credentials from environment variables and the master OpenClaw configuration file at /root/.openclaw/openclaw.json (discover_env.py), executes system-level service management commands via systemctl (activate_secondary.py), and starts a local HTTP server to intercept and route LLM request payloads (http_proxy.py). While these capabilities are plausibly required for the stated purpose of a failover proxy, the broad access to system secrets and execution of shell commands represent a significant attack surface without clear evidence of malicious intent.
Capability Assessment
Purpose & Capability
The name/description (AI API failover) aligns with the code and instructions, but the registry metadata declares no required env vars or config paths while the bundled scripts clearly expect provider API keys (PRIMARY_API_KEY, ANTHROPIC_API_KEY, OPENROUTER_API_KEY, OLLAMA_DUMMY_KEY), read /root/.openclaw/openclaw.json, and reference ~/.config/api-failover.env and other root-scoped paths. Required binaries are listed as none, yet the scripts assume system utilities (systemctl) and python3; these environment & path assumptions are not declared in the manifest and may not be appropriate for all users.
Instruction Scope
SKILL.md directs the agent/operator to run included scripts that inspect environment, read local config files, start/stop a user systemd service, spawn a local HTTP proxy, and perform upstream API calls. Those instructions go beyond passive guidance: they read system files, restart services, and run network calls. While these actions are coherent for deploying a failover proxy, they are intrusive and should be clearly declared; the instructions also rely on control fields in request bodies and remove them before forwarding (expected for proxy behavior). No hidden remote exfil endpoints were found, but the instructions give the skill broad discretion to read local credentials and config.
Install Mechanism
No install spec is provided (instruction-only), which minimizes installer risk. However, the skill bundles multiple executable Python scripts that will be run directly. The absence of an install step means files are simply present in the workspace — verify their contents before executing. No external downloads or archive extraction are used.
Credentials
The manifest lists no required environment variables or primary credential, but the code expects multiple API keys and may inherit credentials from /root/.openclaw/openclaw.json. That mismatch is notable: the skill needs provider credentials (reasonable for failover) but fails to declare them, and it will read config from root-scoped locations. Environment access requests are therefore under-declared and could lead to accidental exposure of unrelated secrets if run in a privileged environment.
Persistence & Privilege
The skill will restart a user systemd service (systemctl --user restart api-failover.service), create/read files under /tmp and hard-coded /root paths, and can start long-running local proxy processes. 'always' is false (good), but the scripts assume permission to control user services and to read root-scoped config. This level of operational privilege is significant and should be explicitly documented and approved before running.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install api-failover
  3. After installation, invoke the skill by name or use /api-failover
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.1
Add model-aware downgrade routing, auto/hinted/body controls, cleaner failure UX, secondary activation flow, delivery docs
v1.0.0
Initial delivered version: intelligent routing, model downgrade, failure UX, activation flow
Metadata
Slug api-failover
Version 1.0.1
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 2
Frequently Asked Questions

What is API Failover?

Detect AI API/provider/model failures and route requests to healthy fallback providers or downgraded models. Use when creating or maintaining automatic failo... It is an AI Agent Skill for Claude Code / OpenClaw, with 93 downloads so far.

How do I install API Failover?

Run "/install api-failover" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is API Failover free?

Yes, API Failover is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does API Failover support?

API Failover is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created API Failover?

It is built and maintained by Qihong (@zqh2333); the current version is v1.0.1.

💬 Comments