← 返回 Skills 市场
leijack-lo

Resilience Monitor

作者 leiJack-lo · GitHub ↗ · v0.3.0 · MIT-0
cross-platform ✓ 安全检测通过
16
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install resilience-monitor
功能描述
Monitor and manage OpenClaw API errors, track model performance, configure retry strategies, generate reports, and oversee task recovery status.
使用说明 (SKILL.md)

Resilience Skill

LLM API error tracking, classification, retry, and task recovery for OpenClaw.

Overview

This skill provides visibility into API call health and automated retry management. Use it to:

  • Monitor API error rates and patterns
  • View per-model performance statistics
  • Configure retry strategies
  • Generate error reports
  • Track task recovery status

Tools

resilience_dashboard

Open the live web dashboard in your browser for real-time error stats and retry strategy management.

Parameters:

  • action: "open" (default) | "status" | "stop"

Features:

  • Live error overview (today / hour / active retries)
  • Model breakdown table
  • Recent errors feed
  • Retry strategy cards — set default, adjust max retries
  • Auto-refresh: 5s, 60s, 5min, 1h, or off

URL: http://127.0.0.1:18765/ (default port, configurable via dashboardPort)

Voice / natural language examples:

  • "打开错误统计页面" → resilience_dashboard({ action: "open" })
  • "打开监控面板" → resilience_dashboard({ action: "open" })
  • "打开 resilience 面板" → resilience_dashboard({ action: "open" })

The dashboard starts automatically when OpenClaw Gateway starts (unless dashboardEnabled: false).

Configuration lives in ~/.openclaw/openclaw.json under plugins.entries.resilience.config (not only api.pluginConfig at hook time). Example:

"resilience": {
  "enabled": true,
  "config": {
    "dashboardPort": 18765,
    "dashboardEnabled": true,
    "instanceLabel": "my-workspace"
  }
}

At gateway_start, config is read from ctx.config + ctx.workspaceDir.

Multi-instance: Use the instance dropdown to view all instances (aggregated) or a single Gateway. Each instance stores data under ~/.openclaw/plugins/resilience/instances/\x3Cid>/. Strategy edits apply only to the local Gateway instance.

resilience_stats

View API error statistics by time period or model.

Parameters:

  • query (optional): Natural language query
    • "today" or empty — today's full summary
    • "hour" — current hour stats
    • "week" — current week stats
    • Any model name (e.g., "mimo-v2.5") — model-specific stats

Examples:

  • "查看今天报错统计" → resilience_stats({ query: "today" })
  • "查看 mimo-v2.5 的错误率" → resilience_stats({ query: "mimo-v2.5" })
  • "查看本周错误率" → resilience_stats({ query: "week" })

resilience_strategies

View, add, update, or reset retry strategies.

Parameters:

  • action: "list" (default) | "add" | "update" | "reset"
  • strategyName: Strategy name (required for add/update)
  • updates: Fields to update (for add/update)

Examples:

  • "查看当前所有策略配置" → resilience_strategies({ action: "list" })
  • "修改超时重试策略为指数退避" → resilience_strategies({ action: "update", strategyName: "default-exponential", updates: { type: "exponential" } })
  • "添加一个自定义重试策略" → resilience_strategies({ action: "add", strategyName: "my-strategy", updates: { type: "custom", maxRetries: 3, intervals: [60000, 300000, 600000] } })
  • "重置策略为默认" → resilience_strategies({ action: "reset" })

resilience_report

Generate detailed error reports.

Parameters:

  • reportType: "daily" (default) | "model" | "recovery" | "full"
  • target: Model name or date (YYYY-MM-DD)

Examples:

  • "生成今日错误日报" → resilience_report({ reportType: "daily" })
  • "查看 mimo-v2.5 的详细报告" → resilience_report({ reportType: "model", target: "mimo-v2.5" })
  • "查看任务恢复状态" → resilience_report({ reportType: "recovery" })
  • "生成完整状态报告" → resilience_report({ reportType: "full" })

Error Categories

Category Description Retryable
rate_limit 429 Too Many Requests
server_overload 503 Service Unavailable
timeout Request timeout
auth_failed 401/403 Authentication failed
network_error Connection errors
model_unavailable Model not found or offline
context_too_long Context length exceeded
unknown Unclassified errors

Retry Strategies

Strategy Types

  • fixed: Fixed interval between retries (e.g., every 30s)
  • exponential: Exponential backoff (1min → 2min → 4min → 8min...)
  • custom: User-defined interval schedule (e.g., [1min, 3min, 5min, 15min])

Default Strategies

Name Type Max Retries Intervals Error Types
default-exponential exponential 5 1m→15m rate_limit, server_overload, timeout, network_error
rate-limit-fixed fixed 3 30s rate_limit
model-backoff custom 6 1m→2h server_overload, model_unavailable
network-retry exponential 4 5s→1m network_error

Data Storage

Per-instance data: ~/.openclaw/plugins/resilience/instances/\x3Cinstance-id>/ (stats, logs, strategies, tasks). Legacy root layout is still read as default.

~/.openclaw/plugins/resilience/instances/\x3Cinstance-id>/
├── meta.json
├── stats.json
├── strategies.json
├── active-retries.json
├── logs/YYYY-MM-DD.jsonl
└── tasks/
安全使用建议
Install only if you want a local dashboard for resilience/error monitoring. Avoid opening it during screen sharing or on shared machines, and prefer explicit commands such as opening the resilience dashboard rather than generic monitoring phrases.
能力评估
Purpose & Capability
The reported dashboard capability fits the apparent purpose of monitoring operational errors, retries, and recovery state.
Instruction Scope
Some example phrases appear broad enough to open the dashboard from generic monitoring requests, so users may see the UI invoked unexpectedly.
Install Mechanism
No risky installer, package script, or automatic privileged setup was identified from the supplied telemetry.
Credentials
Opening a localhost dashboard may display operational metadata such as error logs, model names, retry status, or recovery details; this is purpose-aligned but should be disclosed clearly.
Persistence & Privilege
No evidence of persistence, privilege escalation, credential harvesting, destructive actions, or external exfiltration was provided.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install resilience-monitor
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /resilience-monitor 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.3.0
Web dashboard, multi-instance aggregation, gateway_start config fix
元数据
Slug resilience-monitor
版本 0.3.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Resilience Monitor 是什么?

Monitor and manage OpenClaw API errors, track model performance, configure retry strategies, generate reports, and oversee task recovery status. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 16 次。

如何安装 Resilience Monitor?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install resilience-monitor」即可一键安装,无需额外配置。

Resilience Monitor 是免费的吗?

是的,Resilience Monitor 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Resilience Monitor 支持哪些平台?

Resilience Monitor 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Resilience Monitor?

由 leiJack-lo(@leijack-lo)开发并维护,当前版本 v0.3.0。

💬 留言讨论