/install dyagil-services-watchdog
Services Watchdog
Problem
Long-running Node services launched from a parent shell (or as children of an agent runtime) die when the parent exits. Runtime restarts are especially aggressive — they tend to take down everything they spawned as collateral damage. Manual nohup/setsid rituals survive an ssh disconnect but not a reboot.
Architecture
my-watchdog.timer (systemd --user; OnUnitActiveSec=2min)
↓
my-watchdog.service (Type=oneshot; KillMode=process)
↓
services-watchdog.sh
↓
for each service: check → if down → systemd-run --user --scope → exec node
Two non-obvious details make this actually work:
KillMode=process+systemd-run --user --scope— without this, systemd kills the children of aType=oneshotservice as soon as the service exits. The combination puts each restarted service in its own transient scope, outside the watchdog's cgroup..envis loaded INSIDE the new scope. The watchdog wraps the start command inbash -c 'cd \x3Cproject> && set -a && . ./.env; set +a && exec node \x3Centry>'. This propagates every env var without the watchdog having to know which ones the service needs (TELEGRAM_BOT_TOKEN,OPENAI_API_KEY, …).
Files
- scripts/services-watchdog.sh — the script. Customize the per-service
check_*/restart_*blocks. - scripts/sahi-watchdog.service — systemd unit template.
- scripts/sahi-watchdog.timer — runs every 2 minutes.
Rename the unit files to match your own prefix (e.g. mybot-watchdog.*) when adopting.
Install
WORKSPACE="$HOME/.openclaw/workspace" # or wherever your projects live
mkdir -p "$WORKSPACE/scripts" "$WORKSPACE/logs" ~/.config/systemd/user
cp scripts/services-watchdog.sh "$WORKSPACE/scripts/"
cp scripts/sahi-watchdog.service ~/.config/systemd/user/
cp scripts/sahi-watchdog.timer ~/.config/systemd/user/
chmod +x "$WORKSPACE/scripts/services-watchdog.sh"
systemctl --user daemon-reload
systemctl --user enable --now sahi-watchdog.timer
loginctl enable-linger "$USER" # keeps the timer running when not logged in
Verify
# State after most recent run:
cat ~/.openclaw/workspace/memory/watchdog-state.json
# Recent recoveries / failures:
tail ~/.openclaw/workspace/logs/watchdog.log
# Schedule:
systemctl --user list-timers sahi-watchdog.timer --no-pager
End-to-end test (replace 4321 with the port your service listens on):
PID=$(ss -tlnp 2>/dev/null | awk '/:4321 /{print $NF}' | grep -oP 'pid=\K[0-9]+' | head -1)
kill "$PID"
systemctl --user start sahi-watchdog.service # don't wait 2 min
ss -tln | grep 4321 # should be listening again
(Do NOT use pkill -f "myservice/server.js" to kill the test target — your own exec shell often matches the same regex and gets SIGTERM'd.)
Adapt to a New Service
In services-watchdog.sh, add three things and append the service name to the services=() array:
check_myservice() {
pgrep -f "\x3Cunique-marker-in-cmdline>" >/dev/null 2>&1
}
restart_myservice() {
cd "$WORKSPACE/projects/myservice" || return 1
systemd-run --user --scope --quiet --unit="myservice-$(date +%s%N)" \
--setenv=PATH="$PATH" --setenv=HOME="$HOME" \
bash -c 'cd '"$WORKSPACE"'/projects/myservice && set -a && [ -f .env ] && . ./.env; set +a && exec nohup node src/index.js >> logs/svc.log 2>&1 \x3C /dev/null' &
disown 2>/dev/null || true
sleep 3
check_myservice
}
labels_myservice="My Service"
Gotchas (Learned the Hard Way)
- Don't use
Type=simplefor the systemd service — that keeps the watchdog itself alive long after it should have exited, and it re-enters every 2 minutes. - PATH inside
systemd-run --user --scopeis minimal. Always pass--setenv=PATH="$PATH"if a child relies on~/.npm-global/binor similar; or call binaries by absolute path. pgrep -fmatches the watchdog shell itself. Use a unique marker (file path) when definingcheck_*, e.g.pgrep -f "myservice/src/index", not justpgrep -f "node src/index.js"which can collide with other projects.Type=oneshotwith defaultKillMode=control-groupkills the children you just spawned. Always setKillMode=processAND launch viasystemd-run --user --scopeso the new process lives outside the watchdog's cgroup.
See Also
- A
taskflowor cron skill for one-shot scheduled tasks. The watchdog is for "always-on" services, not periodic jobs.
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install dyagil-services-watchdog - 安装完成后,直接呼叫该 Skill 的名称或使用
/dyagil-services-watchdog触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Services Watchdog 是什么?
Set up a systemd-based watchdog that keeps long-running Node.js services (Telegram bots, Express dashboards, etc.) alive across shell exits, ssh disconnects,... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 100 次。
如何安装 Services Watchdog?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install dyagil-services-watchdog」即可一键安装,无需额外配置。
Services Watchdog 是免费的吗?
是的,Services Watchdog 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Services Watchdog 支持哪些平台?
Services Watchdog 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Services Watchdog?
由 dyagil(@dyagil)开发并维护,当前版本 v1.0.0。