← Back to Skills Marketplace
leijack-lo

Resilience Monitor

by leiJack-lo · GitHub ↗ · v0.3.0 · MIT-0
cross-platform ✓ Security Clean
16
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install resilience-monitor
Description
Monitor and manage OpenClaw API errors, track model performance, configure retry strategies, generate reports, and oversee task recovery status.
README (SKILL.md)

Resilience Skill

LLM API error tracking, classification, retry, and task recovery for OpenClaw.

Overview

This skill provides visibility into API call health and automated retry management. Use it to:

  • Monitor API error rates and patterns
  • View per-model performance statistics
  • Configure retry strategies
  • Generate error reports
  • Track task recovery status

Tools

resilience_dashboard

Open the live web dashboard in your browser for real-time error stats and retry strategy management.

Parameters:

  • action: "open" (default) | "status" | "stop"

Features:

  • Live error overview (today / hour / active retries)
  • Model breakdown table
  • Recent errors feed
  • Retry strategy cards — set default, adjust max retries
  • Auto-refresh: 5s, 60s, 5min, 1h, or off

URL: http://127.0.0.1:18765/ (default port, configurable via dashboardPort)

Voice / natural language examples:

  • "打开错误统计页面" → resilience_dashboard({ action: "open" })
  • "打开监控面板" → resilience_dashboard({ action: "open" })
  • "打开 resilience 面板" → resilience_dashboard({ action: "open" })

The dashboard starts automatically when OpenClaw Gateway starts (unless dashboardEnabled: false).

Configuration lives in ~/.openclaw/openclaw.json under plugins.entries.resilience.config (not only api.pluginConfig at hook time). Example:

"resilience": {
  "enabled": true,
  "config": {
    "dashboardPort": 18765,
    "dashboardEnabled": true,
    "instanceLabel": "my-workspace"
  }
}

At gateway_start, config is read from ctx.config + ctx.workspaceDir.

Multi-instance: Use the instance dropdown to view all instances (aggregated) or a single Gateway. Each instance stores data under ~/.openclaw/plugins/resilience/instances/\x3Cid>/. Strategy edits apply only to the local Gateway instance.

resilience_stats

View API error statistics by time period or model.

Parameters:

  • query (optional): Natural language query
    • "today" or empty — today's full summary
    • "hour" — current hour stats
    • "week" — current week stats
    • Any model name (e.g., "mimo-v2.5") — model-specific stats

Examples:

  • "查看今天报错统计" → resilience_stats({ query: "today" })
  • "查看 mimo-v2.5 的错误率" → resilience_stats({ query: "mimo-v2.5" })
  • "查看本周错误率" → resilience_stats({ query: "week" })

resilience_strategies

View, add, update, or reset retry strategies.

Parameters:

  • action: "list" (default) | "add" | "update" | "reset"
  • strategyName: Strategy name (required for add/update)
  • updates: Fields to update (for add/update)

Examples:

  • "查看当前所有策略配置" → resilience_strategies({ action: "list" })
  • "修改超时重试策略为指数退避" → resilience_strategies({ action: "update", strategyName: "default-exponential", updates: { type: "exponential" } })
  • "添加一个自定义重试策略" → resilience_strategies({ action: "add", strategyName: "my-strategy", updates: { type: "custom", maxRetries: 3, intervals: [60000, 300000, 600000] } })
  • "重置策略为默认" → resilience_strategies({ action: "reset" })

resilience_report

Generate detailed error reports.

Parameters:

  • reportType: "daily" (default) | "model" | "recovery" | "full"
  • target: Model name or date (YYYY-MM-DD)

Examples:

  • "生成今日错误日报" → resilience_report({ reportType: "daily" })
  • "查看 mimo-v2.5 的详细报告" → resilience_report({ reportType: "model", target: "mimo-v2.5" })
  • "查看任务恢复状态" → resilience_report({ reportType: "recovery" })
  • "生成完整状态报告" → resilience_report({ reportType: "full" })

Error Categories

Category Description Retryable
rate_limit 429 Too Many Requests
server_overload 503 Service Unavailable
timeout Request timeout
auth_failed 401/403 Authentication failed
network_error Connection errors
model_unavailable Model not found or offline
context_too_long Context length exceeded
unknown Unclassified errors

Retry Strategies

Strategy Types

  • fixed: Fixed interval between retries (e.g., every 30s)
  • exponential: Exponential backoff (1min → 2min → 4min → 8min...)
  • custom: User-defined interval schedule (e.g., [1min, 3min, 5min, 15min])

Default Strategies

Name Type Max Retries Intervals Error Types
default-exponential exponential 5 1m→15m rate_limit, server_overload, timeout, network_error
rate-limit-fixed fixed 3 30s rate_limit
model-backoff custom 6 1m→2h server_overload, model_unavailable
network-retry exponential 4 5s→1m network_error

Data Storage

Per-instance data: ~/.openclaw/plugins/resilience/instances/\x3Cinstance-id>/ (stats, logs, strategies, tasks). Legacy root layout is still read as default.

~/.openclaw/plugins/resilience/instances/\x3Cinstance-id>/
├── meta.json
├── stats.json
├── strategies.json
├── active-retries.json
├── logs/YYYY-MM-DD.jsonl
└── tasks/
Usage Guidance
Install only if you want a local dashboard for resilience/error monitoring. Avoid opening it during screen sharing or on shared machines, and prefer explicit commands such as opening the resilience dashboard rather than generic monitoring phrases.
Capability Assessment
Purpose & Capability
The reported dashboard capability fits the apparent purpose of monitoring operational errors, retries, and recovery state.
Instruction Scope
Some example phrases appear broad enough to open the dashboard from generic monitoring requests, so users may see the UI invoked unexpectedly.
Install Mechanism
No risky installer, package script, or automatic privileged setup was identified from the supplied telemetry.
Credentials
Opening a localhost dashboard may display operational metadata such as error logs, model names, retry status, or recovery details; this is purpose-aligned but should be disclosed clearly.
Persistence & Privilege
No evidence of persistence, privilege escalation, credential harvesting, destructive actions, or external exfiltration was provided.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install resilience-monitor
  3. After installation, invoke the skill by name or use /resilience-monitor
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v0.3.0
Web dashboard, multi-instance aggregation, gateway_start config fix
Metadata
Slug resilience-monitor
Version 0.3.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Resilience Monitor?

Monitor and manage OpenClaw API errors, track model performance, configure retry strategies, generate reports, and oversee task recovery status. It is an AI Agent Skill for Claude Code / OpenClaw, with 16 downloads so far.

How do I install Resilience Monitor?

Run "/install resilience-monitor" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Resilience Monitor free?

Yes, Resilience Monitor is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Resilience Monitor support?

Resilience Monitor is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Resilience Monitor?

It is built and maintained by leiJack-lo (@leijack-lo); the current version is v0.3.0.

💬 Comments