/install dyagil-services-watchdog
Services Watchdog
Problem
Long-running Node services launched from a parent shell (or as children of an agent runtime) die when the parent exits. Runtime restarts are especially aggressive — they tend to take down everything they spawned as collateral damage. Manual nohup/setsid rituals survive an ssh disconnect but not a reboot.
Architecture
my-watchdog.timer (systemd --user; OnUnitActiveSec=2min)
↓
my-watchdog.service (Type=oneshot; KillMode=process)
↓
services-watchdog.sh
↓
for each service: check → if down → systemd-run --user --scope → exec node
Two non-obvious details make this actually work:
KillMode=process+systemd-run --user --scope— without this, systemd kills the children of aType=oneshotservice as soon as the service exits. The combination puts each restarted service in its own transient scope, outside the watchdog's cgroup..envis loaded INSIDE the new scope. The watchdog wraps the start command inbash -c 'cd \x3Cproject> && set -a && . ./.env; set +a && exec node \x3Centry>'. This propagates every env var without the watchdog having to know which ones the service needs (TELEGRAM_BOT_TOKEN,OPENAI_API_KEY, …).
Files
- scripts/services-watchdog.sh — the script. Customize the per-service
check_*/restart_*blocks. - scripts/sahi-watchdog.service — systemd unit template.
- scripts/sahi-watchdog.timer — runs every 2 minutes.
Rename the unit files to match your own prefix (e.g. mybot-watchdog.*) when adopting.
Install
WORKSPACE="$HOME/.openclaw/workspace" # or wherever your projects live
mkdir -p "$WORKSPACE/scripts" "$WORKSPACE/logs" ~/.config/systemd/user
cp scripts/services-watchdog.sh "$WORKSPACE/scripts/"
cp scripts/sahi-watchdog.service ~/.config/systemd/user/
cp scripts/sahi-watchdog.timer ~/.config/systemd/user/
chmod +x "$WORKSPACE/scripts/services-watchdog.sh"
systemctl --user daemon-reload
systemctl --user enable --now sahi-watchdog.timer
loginctl enable-linger "$USER" # keeps the timer running when not logged in
Verify
# State after most recent run:
cat ~/.openclaw/workspace/memory/watchdog-state.json
# Recent recoveries / failures:
tail ~/.openclaw/workspace/logs/watchdog.log
# Schedule:
systemctl --user list-timers sahi-watchdog.timer --no-pager
End-to-end test (replace 4321 with the port your service listens on):
PID=$(ss -tlnp 2>/dev/null | awk '/:4321 /{print $NF}' | grep -oP 'pid=\K[0-9]+' | head -1)
kill "$PID"
systemctl --user start sahi-watchdog.service # don't wait 2 min
ss -tln | grep 4321 # should be listening again
(Do NOT use pkill -f "myservice/server.js" to kill the test target — your own exec shell often matches the same regex and gets SIGTERM'd.)
Adapt to a New Service
In services-watchdog.sh, add three things and append the service name to the services=() array:
check_myservice() {
pgrep -f "\x3Cunique-marker-in-cmdline>" >/dev/null 2>&1
}
restart_myservice() {
cd "$WORKSPACE/projects/myservice" || return 1
systemd-run --user --scope --quiet --unit="myservice-$(date +%s%N)" \
--setenv=PATH="$PATH" --setenv=HOME="$HOME" \
bash -c 'cd '"$WORKSPACE"'/projects/myservice && set -a && [ -f .env ] && . ./.env; set +a && exec nohup node src/index.js >> logs/svc.log 2>&1 \x3C /dev/null' &
disown 2>/dev/null || true
sleep 3
check_myservice
}
labels_myservice="My Service"
Gotchas (Learned the Hard Way)
- Don't use
Type=simplefor the systemd service — that keeps the watchdog itself alive long after it should have exited, and it re-enters every 2 minutes. - PATH inside
systemd-run --user --scopeis minimal. Always pass--setenv=PATH="$PATH"if a child relies on~/.npm-global/binor similar; or call binaries by absolute path. pgrep -fmatches the watchdog shell itself. Use a unique marker (file path) when definingcheck_*, e.g.pgrep -f "myservice/src/index", not justpgrep -f "node src/index.js"which can collide with other projects.Type=oneshotwith defaultKillMode=control-groupkills the children you just spawned. Always setKillMode=processAND launch viasystemd-run --user --scopeso the new process lives outside the watchdog's cgroup.
See Also
- A
taskflowor cron skill for one-shot scheduled tasks. The watchdog is for "always-on" services, not periodic jobs.
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install dyagil-services-watchdog - After installation, invoke the skill by name or use
/dyagil-services-watchdog - Provide required inputs per the skill's parameter spec and get structured output
What is Services Watchdog?
Set up a systemd-based watchdog that keeps long-running Node.js services (Telegram bots, Express dashboards, etc.) alive across shell exits, ssh disconnects,... It is an AI Agent Skill for Claude Code / OpenClaw, with 100 downloads so far.
How do I install Services Watchdog?
Run "/install dyagil-services-watchdog" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Services Watchdog free?
Yes, Services Watchdog is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Services Watchdog support?
Services Watchdog is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Services Watchdog?
It is built and maintained by dyagil (@dyagil); the current version is v1.0.0.