← Back to Skills Marketplace
ember-claw

Failover Gateway Pub

by ember-claw · GitHub ↗ · v1.0.0
cross-platform ⚠ suspicious
749
Downloads
0
Stars
1
Active Installs
1
Versions
Install in OpenClaw
/install failover-gateway-pub
Description
Set up an active-passive OpenClaw failover gateway with health monitoring, auto-promotion/demotion, channel splitting, and git workspace sync for seamless re...
README (SKILL.md)

Failover Gateway for OpenClaw

Deploy a standby OpenClaw gateway that automatically takes over when your primary goes down. Active-passive design with auto-promotion and auto-demotion.

What You Get

  • ~30 second failover — health monitor detects primary down, promotes standby
  • Auto-recovery — when primary comes back, standby demotes itself
  • Zero split-brain — primary and standby use different channels (no duplicate messages)
  • Git-synced workspace — standby pulls latest workspace on promotion
  • $12/month — runs on a minimal VPS

Architecture

PRIMARY (your main VPS)          STANDBY (failover VPS)
├─ Full stack (all channels)     ├─ Single channel only (e.g., Discord DM)
├─ All cron jobs                 ├─ No crons (recovery mode)
├─ Gateway active ✅              ├─ Gateway stopped 💤
└─ Pushes workspace to git       └─ Health monitor watches primary
                                      │
                                      ├─ Primary healthy → sleep
                                      ├─ Primary down 30s → PROMOTE
                                      └─ Primary back → DEMOTE

The key insight: split your channels between primary and standby. Don't share credentials — give each node exclusive ownership of different channels. This eliminates split-brain entirely.

Channel Split Examples

Setup Primary Standby
RC + Discord Rocket.Chat (full) Discord DM only
Discord + Telegram Discord (full) Telegram DM only
Slack + Discord Slack (full) Discord DM only

Your primary handles everything. The standby is minimal recovery — just enough to stay reachable.

Prerequisites

  • Primary OpenClaw instance running on a VPS
  • A second VPS for the standby ($6-12/mo, any provider)
  • Tailscale mesh network (or any VPN/private network)
  • Git repository for workspace sync (GitHub, GitLab, etc.)
  • A second messaging channel for the standby (different from primary)

Step-by-Step Deployment

Phase 1: Provision the Standby VPS

Any cheap VPS works. Recommended: 2GB RAM, Ubuntu 24.04.

# Harden the box
ufw allow 22/tcp
ufw enable
apt install -y fail2ban unattended-upgrades

# Create openclaw user
adduser openclaw --disabled-password
usermod -aG sudo openclaw
# Copy your SSH key to openclaw user

# Install Tailscale
curl -fsSL https://tailscale.com/install.sh | sh
tailscale up --hostname=your-failover-name

Phase 2: Install OpenClaw

# As openclaw user
curl -fsSL https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.3/install.sh | bash
source ~/.bashrc
nvm install --lts
npm install -g openclaw

# Clone workspace
git clone \x3Cyour-workspace-repo> ~/.openclaw/workspace

Phase 3: Failover Config

Create a minimal OpenClaw config on the standby. Only enable the standby channel:

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "anthropic/claude-opus-4-6",
        "fallbacks": ["anthropic/claude-sonnet-4-5"]
      },
      "workspace": "/home/openclaw/.openclaw/workspace"
    },
    "list": [{ "id": "main", "default": true }]
  },
  "channels": {
    "discord": {
      "enabled": true,
      "token": "\x3CYOUR_DISCORD_BOT_TOKEN>",
      "dm": {
        "policy": "allowlist",
        "allowFrom": ["\x3CYOUR_DISCORD_USER_ID>"]
      }
    }
  },
  "gateway": {
    "port": 18789,
    "mode": "local",
    "bind": "tailnet"
  }
}

Important: Disable this channel on your primary to avoid conflicts.

Test it works: openclaw gateway run — verify the bot connects and responds, then stop it.

Phase 4: Deploy Health Monitor

Copy the included scripts/health-monitor.sh to the standby:

sudo cp health-monitor.sh /usr/local/bin/openclaw-health-monitor.sh
sudo chmod +x /usr/local/bin/openclaw-health-monitor.sh

Edit the variables at the top:

  • PRIMARY_IP — your primary's Tailscale IP
  • PRIMARY_PORT — your primary's gateway port (default: 18789)
  • SECRETS_HOST — (optional) host to rsync secrets from on promotion

Create the systemd services:

/etc/systemd/system/openclaw-health-monitor.service

[Unit]
Description=OpenClaw Failover Health Monitor
After=network-online.target tailscaled.service
Wants=network-online.target

[Service]
Type=simple
ExecStart=/usr/local/bin/openclaw-health-monitor.sh
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

/etc/systemd/system/openclaw.service

[Unit]
Description=OpenClaw Gateway (Failover)
After=network-online.target tailscaled.service
Wants=network-online.target

[Service]
Type=simple
User=openclaw
Group=openclaw
WorkingDirectory=/home/openclaw/.openclaw/workspace
ExecStart=/usr/bin/openclaw gateway run
Restart=on-failure
RestartSec=5
Environment=HOME=/home/openclaw
Environment=NODE_ENV=production

[Install]
WantedBy=multi-user.target

Enable the monitor (but NOT the gateway — the monitor starts it on promotion):

sudo systemctl daemon-reload
sudo systemctl enable openclaw-health-monitor
sudo systemctl start openclaw-health-monitor
# Do NOT enable openclaw.service — the monitor controls it

Phase 5: Disable Standby Channel on Primary

This is critical. Remove or disable the standby's channel from your primary config:

{
  "channels": {
    "discord": { "enabled": false }
  }
}

Each node owns its channels exclusively. No sharing, no conflicts.

Phase 6: Test

# On primary — simulate failure
sudo systemctl stop openclaw-gateway  # or kill the process

# Watch the standby logs
journalctl -u openclaw-health-monitor -f

# Expected: 3 failed checks → PROMOTE → gateway starts → standby channel live

# On primary — recover
sudo systemctl start openclaw-gateway

# Expected: standby detects primary → DEMOTE → gateway stops

Failover Timeline

Time Event
0s Primary goes down
10s First health check fails
20s Second check fails
30s Third check fails → PROMOTE
35s Git pull, secrets sync
40s Gateway starting
45s Standby channel active
~60s You're reachable again

Edge Cases

Scenario Result
Primary dies Standby promotes in ~30-60s
Primary + standby die You're offline (add a third node?)
Network partition Standby may promote while primary is still running — but since they use different channels, no conflicts
Standby reboots Health monitor auto-restarts (systemd), resumes watching
Primary flaps Promote/demote cycles — health monitor handles it, but consider increasing FAIL_THRESHOLD

Failback

Recovery is automatic. When the primary comes back:

  1. Health monitor detects primary healthy
  2. Stops the standby gateway
  3. Primary resumes all channels
  4. Standby returns to watching

No manual intervention needed.

Cost

Component Cost
VPS (2GB RAM) $6-12/mo
Tailscale Free (personal)
Git repo Free
Total $6-12/mo

Tips

  • Test monthly. Kill your primary, verify failover works. Trust but verify.
  • Keep the standby minimal. No crons, no extra channels. It's recovery mode.
  • Git push frequently. The standby's workspace is only as fresh as your last push.
  • Use Tailscale. It makes cross-VPS networking trivial. No firewall rules, no port forwarding.
  • Different bot tokens. If using Discord on both, you need two bot applications. Same bot token = last-connect-wins.
  • Monitor the monitor. Check journalctl -u openclaw-health-monitor occasionally to make sure it's running.
Usage Guidance
This reads like an admin playbook rather than malicious code, but take precautions before following it: - Do not blindly run curl | sh commands — inspect the installation scripts (Tailscale, nvm) before executing. - Before enabling the health monitor, edit the health-monitor.sh variables: set PRIMARY_IP to your primary’s IP and clear any default addresses. Leaving the default IP will cause the monitor to probe and react to a third-party host. - Treat SECRETS_HOST as highly sensitive: the monitor can rsync secrets from that host. Only set it to a trusted, secured host and ensure SSH keys and rsync access are limited. If you don't need remote sync, leave SECRETS_HOST empty. - Ensure the standby’s OpenClaw channel tokens are separate from primary tokens (the guide recommends this); never copy primary channel tokens to the standby. - Review the included health-monitor.sh and systemd unit files to confirm the commands run as the intended user and that file permissions are correct. - Test this entire flow in an isolated staging environment before deploying to production to verify promotion/demotion behavior and avoid accidental failovers. Given the hardcoded default IP and the optional secret-sync step, treat this skill as potentially risky until you manually audit and adapt it to your environment.
Capability Analysis
Type: OpenClaw Skill Name: failover-gateway-pub Version: 1.0.0 The skill is classified as suspicious due to two main reasons. First, the `SKILL.md` instructions use `curl | sh` for installing Tailscale and NVM, which introduces a supply chain vulnerability where a compromised source could lead to arbitrary code execution. Second, the `scripts/health-monitor.sh` script includes functionality to `rsync` the `~/.secrets/` directory from a user-defined `SECRETS_HOST` during a failover event. While intended for legitimate failover, this is a high-risk operation involving the transfer of sensitive data, posing a significant data exposure vulnerability if the `SECRETS_HOST` is compromised or misconfigured.
Capability Assessment
Purpose & Capability
The name/description (failover gateway) aligns with the included files and steps: a health monitor, systemd units, git sync, and promotion/demotion commands. However the health-monitor.sh contains a hardcoded default PRIMARY_IP (100.99.118.75) which is unexpected for a generic skill and could cause accidental monitoring/promotion if left unchanged.
Instruction Scope
SKILL.md instructs the operator to run remote install scripts (curl | sh for Tailscale and nvm) and to copy SSH keys and tokens; the health monitor can call git pull as another user and optionally rsync secrets from SECRETS_HOST. These actions go beyond simple orchestration and can move or pull secrets and run code from remote endpoints — acceptable for an admin guide but high-impact if done without review.
Install Mechanism
There is no automated install spec (instruction-only), which reduces automated attack surface. Still, the instructions ask you to run curl | sh against remote URLs (tailscale install and nvm install). That is a manual action but is a known risk pattern and should be audited before executing.
Credentials
The skill metadata declares no required env vars, but the guide expects channel tokens and secrets in configs and the monitor script can rsync secrets from a remote SECRETS_HOST. The ability to transfer secrets is built in but not surfaced in metadata, so sensitive credentials are in-scope for the deployment even though the registry shows none — this mismatch is misleading and risky.
Persistence & Privilege
The deployment creates persistent systemd units and places a script in /usr/local/bin: normal for a failover agent, but the service will have permission to start/stop the OpenClaw gateway and perform git/rsync operations. No 'always' privilege is requested, but the system-level persistence and sudo/systemctl operations are powerful and should be intentionally authorized.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install failover-gateway-pub
  3. After installation, invoke the skill by name or use /failover-gateway-pub
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release — deploy active-passive failover gateway for OpenClaw: - Provides automated failover with standby node auto-promotion/demotion when primary changes state. - Includes health monitor script, systemd services, and a channel splitting strategy to avoid split-brain. - Workspace kept in sync via Git on promotion. - Full deployment guide covers prerequisites, system hardening, OpenClaw configuration, and testing. - Ensures high availability and redundancy for OpenClaw, with typical failover in ~30–60 seconds.
Metadata
Slug failover-gateway-pub
Version 1.0.0
License
All-time Installs 1
Active Installs 1
Total Versions 1
Frequently Asked Questions

What is Failover Gateway Pub?

Set up an active-passive OpenClaw failover gateway with health monitoring, auto-promotion/demotion, channel splitting, and git workspace sync for seamless re... It is an AI Agent Skill for Claude Code / OpenClaw, with 749 downloads so far.

How do I install Failover Gateway Pub?

Run "/install failover-gateway-pub" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Failover Gateway Pub free?

Yes, Failover Gateway Pub is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Failover Gateway Pub support?

Failover Gateway Pub is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Failover Gateway Pub?

It is built and maintained by ember-claw (@ember-claw); the current version is v1.0.0.

💬 Comments