← Back to Skills Marketplace
agenthyjack

Agent Health Diagnostics

by agenthyjack · GitHub ↗ · v1.0.1 · MIT-0
cross-platform ✓ Security Clean
111
Downloads
0
Stars
0
Active Installs
2
Versions
Install in OpenClaw
/install agent-health-diagnostics
Description
Diagnose and fix the 4 most common OpenClaw agent failures — heartbeat spam, API rate limit cascades, channel death loops, and memory/embedding errors. Battl...
README (SKILL.md)

Agent Health Diagnostics

Scripts available in the Collective Skills repo

Overview

When an OpenClaw agent misbehaves — spamming messages, going dark, burning API credits, or looping on dead channels — this skill provides the diagnostic playbook. Covers the 4 most common failure modes with exact commands to diagnose and fix each one.

Battle-tested across a 6-agent deployment spanning 3 hosts (Windows + Linux + Proxmox).

When to Use This Skill

Use when you observe any of these symptoms:

  • Agent sending repeated heartbeat/status messages to Telegram/Discord/etc.
  • Agent goes silent despite gateway showing "active"
  • Logs show 429 Too many tokens or rate_limit errors
  • Channel connection loops: auto-restart attempt 1/10, 2/10, etc.
  • Memory search errors: input length exceeds context length
  • Gateway says "active" but agent doesn't respond to messages

The 4 Failure Modes

1. Heartbeat Spam

Symptom: Agent sends repeated messages every N minutes. Root cause: Heartbeat interval too low (10m = 144 messages/day) + verbose prompt that always generates output instead of HEARTBEAT_OK. Quick fix:

# Check interval
grep -A5 heartbeat ~/.openclaw/openclaw.json

# Fix: set to 30m minimum, simplify prompt to checklist + HEARTBEAT_OK default
# Then restart gateway
openclaw gateway restart

Prevention: Never set heartbeat below 20 minutes. Heartbeat prompts should CHECK things, not CREATE things.

2. API Rate Limit Cascade

Symptom: All models fail, agent goes dark. Root cause: Heartbeat + N crons = (N+1) API calls per interval. Exceeds provider TPM limit → all fallbacks exhausted simultaneously. Quick fix:

# Check for rate limits
journalctl -u \x3Cservice> --since '1h ago' | grep '429\|rate_limit'

# Count your crons (each burns tokens)
openclaw cron list

# Fix: reduce heartbeat to 30-60m, disable non-essential crons, stagger schedules

Prevention: Calculate token budget before adding crons. Each run ≈ 2K-10K tokens. Route heartbeats to cheap/local models.

3. Channel Death Loop

Symptom: Logs show repeated auto-restart attempt N/10 for IRC/Discord/etc. Root cause: Target server unreachable → health monitor restarts → fails again → loop. Each restart may trigger model calls, burning API tokens. Quick fix:

# Check for loops
journalctl -u \x3Cservice> --since '1h ago' | grep 'auto-restart\|timed out'

# Test connectivity
nc -zv \x3Ctarget-ip> \x3Ctarget-port> -w 5

# Fix: disable the broken channel in openclaw.json
# channels.\x3Cname>.enabled = false
openclaw gateway restart

Prevention: Test connectivity BEFORE enabling channels. Disable channels you can't reach.

4. Memory/Embedding Overflow

Symptom: memory sync failed or input length exceeds context length errors. Root cause: File too large for embedding model's context window (mxbai-embed-large = 8K tokens). Quick fix: Archive old sections of large files (MEMORY.md → memory/archive/). Keep active files under 8K tokens. Prevention: Don't let MEMORY.md grow unbounded. Archive quarterly.

Remote Diagnostic Quick Reference

What Command
Service status systemctl is-active \x3Cservice>
Recent logs journalctl -u \x3Cservice> --since '1h ago' --no-pager | tail -40
Live tail journalctl -u \x3Cservice> -f
Rate limits journalctl -u \x3Cservice> --since '1h ago' | grep '429'
Cron list openclaw cron list
Port test nc -zv \x3Cip> \x3Cport> -w 5
Config backup cp ~/.openclaw/openclaw.json ~/.openclaw/openclaw.json.bak

Golden Rules

  1. Always back up config before editing. cp openclaw.json openclaw.json.bak
  2. Always restart gateway after config changes. Hot reload doesn't catch everything.
  3. Check logs before guessing. journalctl tells you what's wrong 90% of the time.
  4. Calculate your API budget. Heartbeat freq × (crons + 1) × avg tokens = burn rate.
  5. Disable what you can't reach. Dead channels create loops that waste resources.
  6. "Configured" ≠ "working." Verify with actual output after every change.
Usage Guidance
This skill is a coherent, instruction-only troubleshooting playbook for OpenClaw agents. Before using it: (1) review and understand each shell command (journalctl/systemctl/nc) because they operate on the host and may require sudo; (2) back up your actual config file (use the full ~/.openclaw/openclaw.json path to avoid mistakes) and, if possible, test changes on a non-production agent first; (3) prefer interactive/manual runs rather than granting autonomous execution to the agent unless you trust it fully; (4) if you need more confidence, inspect the linked Collective Skills repository or run the listed commands yourself to validate effects. There are no signs of secret exfiltration or unexpected external endpoints in the skill.
Capability Assessment
Purpose & Capability
Name/description match the instructions: the SKILL.md contains direct diagnostic commands (journalctl, systemctl, openclaw CLI, nc, editing ~/.openclaw/openclaw.json) that are exactly what an on-host OpenClaw operator would need. No unrelated credentials, external endpoints, or unrelated binaries are requested.
Instruction Scope
Instructions stay within the scope of diagnosing and fixing agent issues: they read the OpenClaw config, check service logs, test network connectivity, and advise editing config and restarting gateways. These actions require host-level access (journalctl/systemctl), which is appropriate for an ops troubleshooting skill. Minor inconsistency: some example backup commands use a full path (~/.openclaw/openclaw.json) while a Golden Rule example uses a relative path (`cp openclaw.json openclaw.json.bak`) — this is a small usability note, not a security mismatch.
Install Mechanism
There is no install spec and no code files to write to disk. This instruction-only format minimizes install risk; nothing is downloaded or installed.
Credentials
No environment variables, credentials, or config paths outside the agent's own config (~/.openclaw/openclaw.json) are requested. The commands reference system logs and service control which are expected for this purpose and do not request unrelated secrets.
Persistence & Privilege
Skill is not always-enabled and does not request persistent presence or elevated platform privileges. It does instruct operators to edit agent config and restart services (expected for diagnostics). Note: running the commands requires appropriate host permissions (may need sudo), so grant execution carefully.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install agent-health-diagnostics
  3. After installation, invoke the skill by name or use /agent-health-diagnostics
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.1
- Added reference link to Collective Skills repo scripts at the top of the documentation. - No changes made to code or version number; documentation update only.
v1.0.0
agent-health-diagnostics 1.0.0 — Initial release - Provides detailed diagnostics and exact fixes for the 4 most common OpenClaw agent failures: heartbeat spam, API rate limit cascades, channel death loops, and memory/embedding errors. - Includes battle-tested troubleshooting steps and commands, validated across multi-host and multi-platform deployments. - Offers clear prevention guidelines and operational best practices for ongoing agent health. - Features a quick reference table for remote diagnostics and recovery commands.
Metadata
Slug agent-health-diagnostics
Version 1.0.1
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 2
Frequently Asked Questions

What is Agent Health Diagnostics?

Diagnose and fix the 4 most common OpenClaw agent failures — heartbeat spam, API rate limit cascades, channel death loops, and memory/embedding errors. Battl... It is an AI Agent Skill for Claude Code / OpenClaw, with 111 downloads so far.

How do I install Agent Health Diagnostics?

Run "/install agent-health-diagnostics" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Agent Health Diagnostics free?

Yes, Agent Health Diagnostics is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Agent Health Diagnostics support?

Agent Health Diagnostics is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Agent Health Diagnostics?

It is built and maintained by agenthyjack (@agenthyjack); the current version is v1.0.1.

💬 Comments