← Back to Skills Marketplace

Resilience Monitor

Name: Resilience Monitor
Author: leijack-lo

by leiJack-lo · GitHub ↗ · v0.3.0 · MIT-0

cross-platform ✓ Security Clean

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install resilience-monitor

Description

Monitor and manage OpenClaw API errors, track model performance, configure retry strategies, generate reports, and oversee task recovery status.

README (SKILL.md)

Resilience Skill

LLM API error tracking, classification, retry, and task recovery for OpenClaw.

Overview

This skill provides visibility into API call health and automated retry management. Use it to:

Monitor API error rates and patterns
View per-model performance statistics
Configure retry strategies
Generate error reports
Track task recovery status

Tools

resilience_dashboard

Open the live web dashboard in your browser for real-time error stats and retry strategy management.

Parameters:

action: "open" (default) | "status" | "stop"

Features:

Live error overview (today / hour / active retries)
Model breakdown table
Recent errors feed
Retry strategy cards — set default, adjust max retries
Auto-refresh: 5s, 60s, 5min, 1h, or off

URL: http://127.0.0.1:18765/ (default port, configurable via dashboardPort)

Voice / natural language examples:

"打开错误统计页面" → resilience_dashboard({ action: "open" })
"打开监控面板" → resilience_dashboard({ action: "open" })
"打开 resilience 面板" → resilience_dashboard({ action: "open" })

The dashboard starts automatically when OpenClaw Gateway starts (unless dashboardEnabled: false).

Configuration lives in ~/.openclaw/openclaw.json under plugins.entries.resilience.config (not only api.pluginConfig at hook time). Example:

"resilience": {
  "enabled": true,
  "config": {
    "dashboardPort": 18765,
    "dashboardEnabled": true,
    "instanceLabel": "my-workspace"
  }
}

At gateway_start, config is read from ctx.config + ctx.workspaceDir.

Multi-instance: Use the instance dropdown to view all instances (aggregated) or a single Gateway. Each instance stores data under ~/.openclaw/plugins/resilience/instances/\x3Cid>/. Strategy edits apply only to the local Gateway instance.

resilience_stats

View API error statistics by time period or model.

Parameters:

query (optional): Natural language query
- "today" or empty — today's full summary
- "hour" — current hour stats
- "week" — current week stats
- Any model name (e.g., "mimo-v2.5") — model-specific stats

Examples:

"查看今天报错统计" → resilience_stats({ query: "today" })
"查看 mimo-v2.5 的错误率" → resilience_stats({ query: "mimo-v2.5" })
"查看本周错误率" → resilience_stats({ query: "week" })

resilience_strategies

View, add, update, or reset retry strategies.

Parameters:

action: "list" (default) | "add" | "update" | "reset"
strategyName: Strategy name (required for add/update)
updates: Fields to update (for add/update)

Examples:

"查看当前所有策略配置" → resilience_strategies({ action: "list" })
"修改超时重试策略为指数退避" → resilience_strategies({ action: "update", strategyName: "default-exponential", updates: { type: "exponential" } })
"添加一个自定义重试策略" → resilience_strategies({ action: "add", strategyName: "my-strategy", updates: { type: "custom", maxRetries: 3, intervals: [60000, 300000, 600000] } })
"重置策略为默认" → resilience_strategies({ action: "reset" })

resilience_report

Generate detailed error reports.

Parameters:

reportType: "daily" (default) | "model" | "recovery" | "full"
target: Model name or date (YYYY-MM-DD)

Examples:

"生成今日错误日报" → resilience_report({ reportType: "daily" })
"查看 mimo-v2.5 的详细报告" → resilience_report({ reportType: "model", target: "mimo-v2.5" })
"查看任务恢复状态" → resilience_report({ reportType: "recovery" })
"生成完整状态报告" → resilience_report({ reportType: "full" })

Error Categories

Category	Description	Retryable
`rate_limit`	429 Too Many Requests	✅
`server_overload`	503 Service Unavailable	✅
`timeout`	Request timeout	✅
`auth_failed`	401/403 Authentication failed	❌
`network_error`	Connection errors	✅
`model_unavailable`	Model not found or offline	✅
`context_too_long`	Context length exceeded	❌
`unknown`	Unclassified errors	❌

Retry Strategies

Strategy Types

fixed: Fixed interval between retries (e.g., every 30s)
exponential: Exponential backoff (1min → 2min → 4min → 8min...)
custom: User-defined interval schedule (e.g., [1min, 3min, 5min, 15min])

Default Strategies

Name	Type	Max Retries	Intervals	Error Types
default-exponential	exponential	5	1m→15m	rate_limit, server_overload, timeout, network_error
rate-limit-fixed	fixed	3	30s	rate_limit
model-backoff	custom	6	1m→2h	server_overload, model_unavailable
network-retry	exponential	4	5s→1m	network_error

Data Storage

Per-instance data: ~/.openclaw/plugins/resilience/instances/\x3Cinstance-id>/ (stats, logs, strategies, tasks). Legacy root layout is still read as default.

~/.openclaw/plugins/resilience/instances/\x3Cinstance-id>/
├── meta.json
├── stats.json
├── strategies.json
├── active-retries.json
├── logs/YYYY-MM-DD.jsonl
└── tasks/

Usage Guidance

Install only if you want a local dashboard for resilience/error monitoring. Avoid opening it during screen sharing or on shared machines, and prefer explicit commands such as opening the resilience dashboard rather than generic monitoring phrases.

Capability Assessment

✓ Purpose & Capability

The reported dashboard capability fits the apparent purpose of monitoring operational errors, retries, and recovery state.

ℹ Instruction Scope

Some example phrases appear broad enough to open the dashboard from generic monitoring requests, so users may see the UI invoked unexpectedly.

✓ Install Mechanism

No risky installer, package script, or automatic privileged setup was identified from the supplied telemetry.

ℹ Credentials

Opening a localhost dashboard may display operational metadata such as error logs, model names, retry status, or recovery details; this is purpose-aligned but should be disclosed clearly.

✓ Persistence & Privilege

No evidence of persistence, privilege escalation, credential harvesting, destructive actions, or external exfiltration was provided.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install resilience-monitor
After installation, invoke the skill by name or use /resilience-monitor
Provide required inputs per the skill's parameter spec and get structured output

Version History

v0.3.0

Web dashboard, multi-instance aggregation, gateway_start config fix

Metadata

Slug resilience-monitor

Version 0.3.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Resilience Monitor?

Monitor and manage OpenClaw API errors, track model performance, configure retry strategies, generate reports, and oversee task recovery status. It is an AI Agent Skill for Claude Code / OpenClaw, with 16 downloads so far.

How do I install Resilience Monitor?

Run "/install resilience-monitor" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Resilience Monitor free?

Yes, Resilience Monitor is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Resilience Monitor support?

Resilience Monitor is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Resilience Monitor?

It is built and maintained by leiJack-lo (@leijack-lo); the current version is v0.3.0.

More Skills

Resilience Monitor

Resilience Skill

Overview

Tools

resilience_dashboard

resilience_stats

resilience_strategies

resilience_report

Error Categories

Retry Strategies

Strategy Types

Default Strategies

Data Storage

What is Resilience Monitor?

How do I install Resilience Monitor?

Is Resilience Monitor free?

Which platforms does Resilience Monitor support?

Who created Resilience Monitor?

💬 Comments