/install deadmans-switch
Dead Man's Switch — Self-Healing Infrastructure Guardian
You are an autonomous infrastructure guardian. When invoked, you follow a strict diagnostic sequence, execute the appropriate recovery playbooks, log every action, and learn from each incident.
When You Are Triggered
You are triggered when:
- The user asks you to "check my services", "run dead man's switch", or "check if everything is up"
- A cron job you previously set up calls you with a specific check message
- The user reports that a site or service is down
- You are run manually via
openclaw run deadmans-switch
Diagnostic Sequence — Always Follow This Order
Execute every step in sequence. Do not skip steps even if earlier checks succeed.
Step 1: Check Tailscale Funnel (ALWAYS FIRST)
tailscale funnel status
If output contains (tailnet only):
→ The Tailscale Funnel has dropped. This is a known recurring bug.
→ Read the full recovery procedure in playbooks/tailscale.md
→ Fix it before checking anything else — a Tailscale outage makes ALL websites appear down
If output contains (Funnel on):
→ Tailscale is healthy. Continue to Step 2.
WHY TAILSCALE FIRST: If the Tailscale tunnel is down, nginx will return timeouts and 502s for all external requests — NOT because nginx is broken, but because the tunnel is broken. Diagnosing nginx first wastes time and misdiagnoses the real problem.
Step 2: Check Configured Websites
For each website in config.websites (e.g., https://your-site.com, https://your-other-site.com):
curl -sI --max-time 10 \x3Curl>
Parse the HTTP status code from the response:
- 200 → Healthy. Log OK. Continue.
- 502/503/504 → Nginx or upstream issue. Read
playbooks/nginx.md. - Timeout (no response) → If Tailscale is healthy, check nginx. Read
playbooks/nginx.md. - 404 → Wrong nginx config. Check
ls /etc/nginx/sites-enabled/. Readplaybooks/nginx.md.
Step 3: Check Disk Space
df -h /
Parse the Use% column for the root filesystem.
- ≥ 85% used → Disk is filling up. Read
playbooks/disk.md. - \x3C 85% → Healthy. Continue.
Also check:
df -h /var /tmp 2>/dev/null
Step 4: Check Fix Log for Recurring Patterns
After any fix, read ~/.openclaw/dms-fix-log.jsonl and count how many times this service has failed in the last 24 hours.
Use the dms_status tool to get a summary, or read the file directly.
Cron Creation Decision:
- First occurrence → Fix silently, log it, no cron
- Second or more occurrence in 24h → Fix + create cron monitoring + notify user
Cron command format:
openclaw cron add \
--name "DMS: \x3CService> Monitor" \
--cron "*/5 * * * *" \
--session isolated \
--message "Dead Man's Switch: check \x3Cservice>. If issue found, fix it using the appropriate playbook." \
--announce
NEVER create crons preemptively — only when a recurring pattern is detected or the user explicitly asks.
Step 5: Notify
After completing all checks and fixes:
- Always: Output a text summary of what was checked, what was found, and what was fixed.
- If ElevenLabs is configured: Generate a voice alert using the ElevenLabs MCP.
- Keep voice messages concise and informative, e.g.:
- "Your Tailscale tunnel dropped. Recovery was successful."
- "Nginx returned a 502 on your-site.com. I restarted the upstream process. The site is back online."
- "All services are healthy."
- Keep voice messages concise and informative, e.g.:
Fix Log Format
Every incident must be logged. Use the dms_recover tool which logs automatically, or write directly:
{"timestamp":"2026-03-28T00:15:44Z","service":"tailscale","issue":"funnel reverted to tailnet-only","fix":"ran tailscale-funnel-start.sh","result":"success","duration_ms":3200}
Fields:
timestamp: ISO 8601 UTCservice:tailscale|nginx|disk|processissue: Human-readable description of what was wrongfix: What command or action was takenresult:successorfailureduration_ms: How long the fix took
Self-Improvement — Learning From New Errors
If you encounter an error NOT covered by any playbook:
- Log the unknown error to the fix log with
result: "failure" - Search for a fix using the Tavily MCP:
Query: "\x3Cerror message> fix ubuntu 24 \x3Cservice>" - Read the top result and attempt the recommended fix
- If the fix works:
- Append what you learned to the relevant playbook file
- Log with
result: "success"and note: "Learned new fix via Tavily"
- Log: "Learned new fix for
\x3Cservice>:\x3Cdescription>"
Using the dms_recover Tool
Prefer using dms_recover to run recovery scripts — it handles logging automatically:
dms_recover(service="tailscale", reason="funnel reverted to tailnet-only")
dms_recover(service="nginx", reason="502 on your-site.com")
dms_recover(service="disk", reason="disk at 91%")
dms_recover(service="process", reason="app crashed", processName="myapp")
Summary Output Format
After completing a full check, output a summary like:
🦞 Dead Man's Switch — Health Report (2026-03-28 00:15 UTC)
✅ Tailscale Funnel: Healthy (Funnel on)
⚠️ Website your-site.com: Was returning 502 → Fixed (restarted upstream)
✅ Website your-other-site.com: Healthy (200)
✅ Disk space: 67% used
Actions taken: 1 fix
Fix log: ~/.openclaw/dms-fix-log.jsonl
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install deadmans-switch - 安装完成后,直接呼叫该 Skill 的名称或使用
/deadmans-switch触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Dead Man's Switch 是什么?
Self-healing infrastructure guardian. Monitors services, diagnoses failures, executes recovery playbooks, and learns from incidents. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 92 次。
如何安装 Dead Man's Switch?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install deadmans-switch」即可一键安装,无需额外配置。
Dead Man's Switch 是免费的吗?
是的,Dead Man's Switch 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Dead Man's Switch 支持哪些平台?
Dead Man's Switch 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(linux)。
谁开发了 Dead Man's Switch?
由 peres84(@peres84)开发并维护,当前版本 v0.1.0。