← 返回 Skills 市场

Resilience Monitor

Name: Resilience Monitor
Author: leijack-lo

作者 leiJack-lo · GitHub ↗ · v0.3.0 · MIT-0

cross-platform ✓ 安全检测通过

总下载

当前安装

版本数

在 OpenClaw 中安装

/install resilience-monitor

功能描述

Monitor and manage OpenClaw API errors, track model performance, configure retry strategies, generate reports, and oversee task recovery status.

使用说明 (SKILL.md)

Resilience Skill

LLM API error tracking, classification, retry, and task recovery for OpenClaw.

Overview

This skill provides visibility into API call health and automated retry management. Use it to:

Monitor API error rates and patterns
View per-model performance statistics
Configure retry strategies
Generate error reports
Track task recovery status

Tools

resilience_dashboard

Open the live web dashboard in your browser for real-time error stats and retry strategy management.

Parameters:

action: "open" (default) | "status" | "stop"

Features:

Live error overview (today / hour / active retries)
Model breakdown table
Recent errors feed
Retry strategy cards — set default, adjust max retries
Auto-refresh: 5s, 60s, 5min, 1h, or off

URL: http://127.0.0.1:18765/ (default port, configurable via dashboardPort)

Voice / natural language examples:

"打开错误统计页面" → resilience_dashboard({ action: "open" })
"打开监控面板" → resilience_dashboard({ action: "open" })
"打开 resilience 面板" → resilience_dashboard({ action: "open" })

The dashboard starts automatically when OpenClaw Gateway starts (unless dashboardEnabled: false).

Configuration lives in ~/.openclaw/openclaw.json under plugins.entries.resilience.config (not only api.pluginConfig at hook time). Example:

"resilience": {
  "enabled": true,
  "config": {
    "dashboardPort": 18765,
    "dashboardEnabled": true,
    "instanceLabel": "my-workspace"
  }
}

At gateway_start, config is read from ctx.config + ctx.workspaceDir.

Multi-instance: Use the instance dropdown to view all instances (aggregated) or a single Gateway. Each instance stores data under ~/.openclaw/plugins/resilience/instances/\x3Cid>/. Strategy edits apply only to the local Gateway instance.

resilience_stats

View API error statistics by time period or model.

Parameters:

query (optional): Natural language query
- "today" or empty — today's full summary
- "hour" — current hour stats
- "week" — current week stats
- Any model name (e.g., "mimo-v2.5") — model-specific stats

Examples:

"查看今天报错统计" → resilience_stats({ query: "today" })
"查看 mimo-v2.5 的错误率" → resilience_stats({ query: "mimo-v2.5" })
"查看本周错误率" → resilience_stats({ query: "week" })

resilience_strategies

View, add, update, or reset retry strategies.

Parameters:

action: "list" (default) | "add" | "update" | "reset"
strategyName: Strategy name (required for add/update)
updates: Fields to update (for add/update)

Examples:

"查看当前所有策略配置" → resilience_strategies({ action: "list" })
"修改超时重试策略为指数退避" → resilience_strategies({ action: "update", strategyName: "default-exponential", updates: { type: "exponential" } })
"添加一个自定义重试策略" → resilience_strategies({ action: "add", strategyName: "my-strategy", updates: { type: "custom", maxRetries: 3, intervals: [60000, 300000, 600000] } })
"重置策略为默认" → resilience_strategies({ action: "reset" })

resilience_report

Generate detailed error reports.

Parameters:

reportType: "daily" (default) | "model" | "recovery" | "full"
target: Model name or date (YYYY-MM-DD)

Examples:

"生成今日错误日报" → resilience_report({ reportType: "daily" })
"查看 mimo-v2.5 的详细报告" → resilience_report({ reportType: "model", target: "mimo-v2.5" })
"查看任务恢复状态" → resilience_report({ reportType: "recovery" })
"生成完整状态报告" → resilience_report({ reportType: "full" })

Error Categories

Category	Description	Retryable
`rate_limit`	429 Too Many Requests	✅
`server_overload`	503 Service Unavailable	✅
`timeout`	Request timeout	✅
`auth_failed`	401/403 Authentication failed	❌
`network_error`	Connection errors	✅
`model_unavailable`	Model not found or offline	✅
`context_too_long`	Context length exceeded	❌
`unknown`	Unclassified errors	❌

Retry Strategies

Strategy Types

fixed: Fixed interval between retries (e.g., every 30s)
exponential: Exponential backoff (1min → 2min → 4min → 8min...)
custom: User-defined interval schedule (e.g., [1min, 3min, 5min, 15min])

Default Strategies

Name	Type	Max Retries	Intervals	Error Types
default-exponential	exponential	5	1m→15m	rate_limit, server_overload, timeout, network_error
rate-limit-fixed	fixed	3	30s	rate_limit
model-backoff	custom	6	1m→2h	server_overload, model_unavailable
network-retry	exponential	4	5s→1m	network_error

Data Storage

Per-instance data: ~/.openclaw/plugins/resilience/instances/\x3Cinstance-id>/ (stats, logs, strategies, tasks). Legacy root layout is still read as default.

~/.openclaw/plugins/resilience/instances/\x3Cinstance-id>/
├── meta.json
├── stats.json
├── strategies.json
├── active-retries.json
├── logs/YYYY-MM-DD.jsonl
└── tasks/

安全使用建议

Install only if you want a local dashboard for resilience/error monitoring. Avoid opening it during screen sharing or on shared machines, and prefer explicit commands such as opening the resilience dashboard rather than generic monitoring phrases.

能力评估

✓ Purpose & Capability

The reported dashboard capability fits the apparent purpose of monitoring operational errors, retries, and recovery state.

ℹ Instruction Scope

Some example phrases appear broad enough to open the dashboard from generic monitoring requests, so users may see the UI invoked unexpectedly.

✓ Install Mechanism

No risky installer, package script, or automatic privileged setup was identified from the supplied telemetry.

ℹ Credentials

Opening a localhost dashboard may display operational metadata such as error logs, model names, retry status, or recovery details; this is purpose-aligned but should be disclosed clearly.

✓ Persistence & Privilege

No evidence of persistence, privilege escalation, credential harvesting, destructive actions, or external exfiltration was provided.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install resilience-monitor
安装完成后，直接呼叫该 Skill 的名称或使用 /resilience-monitor 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v0.3.0

Web dashboard, multi-instance aggregation, gateway_start config fix

元数据

Slug resilience-monitor

版本 0.3.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题