Agent Health Diagnostics
/install agent-health-diagnostics
Agent Health Diagnostics
Scripts available in the Collective Skills repo
Overview
When an OpenClaw agent misbehaves — spamming messages, going dark, burning API credits, or looping on dead channels — this skill provides the diagnostic playbook. Covers the 4 most common failure modes with exact commands to diagnose and fix each one.
Battle-tested across a 6-agent deployment spanning 3 hosts (Windows + Linux + Proxmox).
When to Use This Skill
Use when you observe any of these symptoms:
- Agent sending repeated heartbeat/status messages to Telegram/Discord/etc.
- Agent goes silent despite gateway showing "active"
- Logs show
429 Too many tokensorrate_limiterrors - Channel connection loops:
auto-restart attempt 1/10,2/10, etc. - Memory search errors:
input length exceeds context length - Gateway says "active" but agent doesn't respond to messages
The 4 Failure Modes
1. Heartbeat Spam
Symptom: Agent sends repeated messages every N minutes. Root cause: Heartbeat interval too low (10m = 144 messages/day) + verbose prompt that always generates output instead of HEARTBEAT_OK. Quick fix:
# Check interval
grep -A5 heartbeat ~/.openclaw/openclaw.json
# Fix: set to 30m minimum, simplify prompt to checklist + HEARTBEAT_OK default
# Then restart gateway
openclaw gateway restart
Prevention: Never set heartbeat below 20 minutes. Heartbeat prompts should CHECK things, not CREATE things.
2. API Rate Limit Cascade
Symptom: All models fail, agent goes dark. Root cause: Heartbeat + N crons = (N+1) API calls per interval. Exceeds provider TPM limit → all fallbacks exhausted simultaneously. Quick fix:
# Check for rate limits
journalctl -u \x3Cservice> --since '1h ago' | grep '429\|rate_limit'
# Count your crons (each burns tokens)
openclaw cron list
# Fix: reduce heartbeat to 30-60m, disable non-essential crons, stagger schedules
Prevention: Calculate token budget before adding crons. Each run ≈ 2K-10K tokens. Route heartbeats to cheap/local models.
3. Channel Death Loop
Symptom: Logs show repeated auto-restart attempt N/10 for IRC/Discord/etc.
Root cause: Target server unreachable → health monitor restarts → fails again → loop. Each restart may trigger model calls, burning API tokens.
Quick fix:
# Check for loops
journalctl -u \x3Cservice> --since '1h ago' | grep 'auto-restart\|timed out'
# Test connectivity
nc -zv \x3Ctarget-ip> \x3Ctarget-port> -w 5
# Fix: disable the broken channel in openclaw.json
# channels.\x3Cname>.enabled = false
openclaw gateway restart
Prevention: Test connectivity BEFORE enabling channels. Disable channels you can't reach.
4. Memory/Embedding Overflow
Symptom: memory sync failed or input length exceeds context length errors.
Root cause: File too large for embedding model's context window (mxbai-embed-large = 8K tokens).
Quick fix: Archive old sections of large files (MEMORY.md → memory/archive/). Keep active files under 8K tokens.
Prevention: Don't let MEMORY.md grow unbounded. Archive quarterly.
Remote Diagnostic Quick Reference
| What | Command |
|---|---|
| Service status | systemctl is-active \x3Cservice> |
| Recent logs | journalctl -u \x3Cservice> --since '1h ago' --no-pager | tail -40 |
| Live tail | journalctl -u \x3Cservice> -f |
| Rate limits | journalctl -u \x3Cservice> --since '1h ago' | grep '429' |
| Cron list | openclaw cron list |
| Port test | nc -zv \x3Cip> \x3Cport> -w 5 |
| Config backup | cp ~/.openclaw/openclaw.json ~/.openclaw/openclaw.json.bak |
Golden Rules
- Always back up config before editing.
cp openclaw.json openclaw.json.bak - Always restart gateway after config changes. Hot reload doesn't catch everything.
- Check logs before guessing.
journalctltells you what's wrong 90% of the time. - Calculate your API budget. Heartbeat freq × (crons + 1) × avg tokens = burn rate.
- Disable what you can't reach. Dead channels create loops that waste resources.
- "Configured" ≠ "working." Verify with actual output after every change.
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install agent-health-diagnostics - 安装完成后,直接呼叫该 Skill 的名称或使用
/agent-health-diagnostics触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Agent Health Diagnostics 是什么?
Diagnose and fix the 4 most common OpenClaw agent failures — heartbeat spam, API rate limit cascades, channel death loops, and memory/embedding errors. Battl... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 111 次。
如何安装 Agent Health Diagnostics?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install agent-health-diagnostics」即可一键安装,无需额外配置。
Agent Health Diagnostics 是免费的吗?
是的,Agent Health Diagnostics 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Agent Health Diagnostics 支持哪些平台?
Agent Health Diagnostics 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Agent Health Diagnostics?
由 agenthyjack(@agenthyjack)开发并维护,当前版本 v1.0.1。