Chapter 39

Production Deployment: AWS ARM64 + systemd + Tailscale Reference Architecture

Chapter 39: Production Deployment — AWS ARM64 + systemd + Tailscale Reference Architecture

Overview

Deploying OpenClaw to a production environment means it needs to run stably around the clock, have automatic recovery capabilities, and maintain data and access security. This chapter provides a complete reference architecture: AWS t4g.xlarge ARM64 Graviton instances as the compute platform, systemd managing service lifecycle, Tailscale enabling zero-public-exposure secure access, and a complete backup and update strategy.


39.1 Why AWS t4g.xlarge ARM64 Graviton?

The Case for ARM64 Graviton

AWS Graviton3 processors are AWS's proprietary ARM64 chips. In the OpenClaw context, they offer three core advantages:

Price-Performance Ratio

Equivalent configuration comparison (4 vCPU / 16GB RAM):
  t4g.xlarge (ARM64):    $0.1344/hour  → ~$98/month
  t3.xlarge  (x86_64):   $0.1664/hour  → ~$121/month
  Cost savings: approximately 19%

Node.js Performance

Node.js 22 is specifically optimized for ARM64, and the V8 engine's JIT compiler performs excellently on Graviton3. OpenClaw Gateway's core workloads (JSON serialization/deserialization, WebSocket frame processing, Promise chain scheduling) perform on par with x86 — or slightly better — on ARM64.

Power and Thermal Efficiency

The ARM64 architecture consumes less power for equivalent performance. In an AWS data center, this translates to better thermal density and higher instance stability.

Scenario Instance Type vCPU RAM Monthly (On-Demand)
Personal/small team t4g.medium 2 4 GB ~$24
Standard production t4g.xlarge 4 16 GB ~$98
High concurrency t4g.2xlarge 8 32 GB ~$196
Multi-Agent instances c7g.2xlarge 8 16 GB ~$232

Using a Reserved Instance (1-year term) saves an additional ~38%; the annual cost for a standard production setup is approximately $728.


39.2 Complete Production Deployment Steps

Step 1: Create the EC2 Instance

# Create instance using AWS CLI (or via the console)
aws ec2 run-instances \
  --image-id ami-0xxxxxxxxxxxxxxxxx \  # Ubuntu 24.04 LTS ARM64 AMI
  --instance-type t4g.xlarge \
  --key-name your-key-pair \
  --security-group-ids sg-xxxxxxxxx \
  --subnet-id subnet-xxxxxxxxx \
  --block-device-mappings '[{
    "DeviceName": "/dev/sda1",
    "Ebs": {"VolumeSize": 50, "VolumeType": "gp3"}
  }]' \
  --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=openclaw-gateway}]'

Security group rules (principle of minimum exposure):

Inbound rules:
  SSH (22):    Your IP address/32 (not 0.0.0.0/0)
  No other inbound ports

Outbound rules:
  All traffic:   0.0.0.0/0  (allow LLM API calls)

Port 18789 is not exposed to the public internet; access is via Tailscale.

Step 2: Base System Configuration

# SSH into the instance
ssh -i ~/.ssh/your-key.pem ubuntu@<instance-ip>

# Update the system
sudo apt update && sudo apt upgrade -y

# Install basic tools
sudo apt install -y \
  git curl wget unzip \
  htop iotop nethogs \
  fail2ban ufw \
  logrotate

# Set timezone (UTC recommended for log consistency)
sudo timedatectl set-timezone UTC

# Set hostname
sudo hostnamectl set-hostname openclaw-gateway

# Configure fail2ban (SSH brute-force protection)
sudo systemctl enable fail2ban
sudo systemctl start fail2ban

Step 3: Firewall Configuration

# Configure UFW
sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow from <your-ip>/32 to any port 22  # SSH only from your IP
sudo ufw enable

# Verify rules
sudo ufw status verbose

Step 4: Install Node.js 24

# Option 1: NodeSource (recommended for production)
curl -fsSL https://deb.nodesource.com/setup_24.x | sudo -E bash -
sudo apt install -y nodejs

# Option 2: nvm (for multi-version management)
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.0/install.sh | bash
source ~/.bashrc
nvm install 24
nvm alias default 24

# Verify
node --version  # v24.x.x
npm --version   # 11.x.x
node -e "console.log(process.arch)"  # arm64

Step 5: Install OpenClaw

# Install OpenClaw CLI globally
sudo npm install -g @openclaw/cli

# Verify
openclaw --version

# Run the production initialization wizard
openclaw onboard --mode production

openclaw onboard --mode production performs the following:

  1. Creates the ~/.openclaw/ directory structure
  2. Guides configuration of the primary LLM Provider and API Key
  3. Sets Gateway port and authentication
  4. Generates an Agent ID (agentId, globally unique)
  5. Initializes the memory directory structure
  6. Runs an environment health check

Step 6: Configure Environment Variables

It is not recommended to write API Keys directly into configuration files. Use environment variables or AWS Secrets Manager instead.

# Create an environment variable file (mode 600)
sudo mkdir -p /etc/openclaw
sudo tee /etc/openclaw/gateway.env > /dev/null << 'EOF'
ANTHROPIC_API_KEY=sk-ant-api03-xxxxxxxxxxxxxxxxxx
OPENAI_API_KEY=sk-proj-xxxxxxxxxxxxxxxxxx
OPENCLAW_GATEWAY_SECRET=your-random-secret-here
NODE_ENV=production
LOG_LEVEL=info
EOF

sudo chmod 600 /etc/openclaw/gateway.env
sudo chown root:openclaw-gateway /etc/openclaw/gateway.env

Using AWS Secrets Manager (recommended for enterprise environments):

# Inject secrets from Secrets Manager into the environment
aws secretsmanager get-secret-value \
  --secret-id openclaw/production/api-keys \
  --query SecretString \
  --output text | \
  jq -r 'to_entries[] | "\(.key)=\(.value)"' > /tmp/env-inject

Step 7: Production openclaw.json Configuration

cat > ~/.openclaw/openclaw.json << 'EOF'
{
  "model": "anthropic/claude-sonnet-4-6",
  "fallbackModel": "openai/gpt-4.1-mini",
  "agentId": "prod-gateway-01",
  "gateway": {
    "port": 18789,
    "host": "127.0.0.1",
    "authRequired": true
  },
  "thinking": {
    "enabled": true,
    "budget": 8000
  },
  "skills": {
    "lazy": true,
    "enabled": ["web-search", "code-exec"]
  },
  "context": {
    "reserveFloor": 8000,
    "softThreshold": 0.85
  },
  "lanes": {
    "global": 4,
    "subAgent": 8
  },
  "logging": {
    "level": "info",
    "output": "file",
    "file": "/var/log/openclaw/gateway.log",
    "maxSize": "100m",
    "maxFiles": 14
  },
  "memory": {
    "dir": "/var/lib/openclaw/memory",
    "compaction": {
      "enabled": true,
      "threshold": 50000
    }
  }
}
EOF

Step 8: Create Dedicated User and Directories

# Create a system user
sudo useradd --system \
  --home-dir /var/lib/openclaw \
  --create-home \
  --shell /usr/sbin/nologin \
  openclaw-gateway

# Create necessary directories
sudo mkdir -p /var/log/openclaw /etc/openclaw
sudo chown openclaw-gateway:openclaw-gateway /var/log/openclaw
sudo chown -R openclaw-gateway:openclaw-gateway /var/lib/openclaw

# Move configuration to system path
sudo cp -r ~/.openclaw/* /var/lib/openclaw/
sudo chown -R openclaw-gateway:openclaw-gateway /var/lib/openclaw

Step 9: Create the systemd Service Unit File

sudo tee /etc/systemd/system/openclaw-gateway.service > /dev/null << 'EOF'
[Unit]
Description=OpenClaw AI Gateway
Documentation=https://docs.openclaw.ai
After=network-online.target
Wants=network-online.target
StartLimitIntervalSec=300
StartLimitBurst=5

[Service]
Type=simple
User=openclaw-gateway
Group=openclaw-gateway

# Execution command
ExecStart=/usr/bin/openclaw gateway start \
  --config /var/lib/openclaw/openclaw.json

# Working directory
WorkingDirectory=/var/lib/openclaw

# Environment variables (no secrets here; secrets are in EnvironmentFile)
Environment=NODE_ENV=production
EnvironmentFile=/etc/openclaw/gateway.env

# Restart policy
Restart=always
RestartSec=10s
TimeoutStopSec=30s

# Logging
StandardOutput=append:/var/log/openclaw/gateway.log
StandardError=append:/var/log/openclaw/gateway-error.log

# Security hardening
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=read-only
ReadWritePaths=/var/lib/openclaw /var/log/openclaw
CapabilityBoundingSet=
AmbientCapabilities=

# Resource limits
LimitNOFILE=65536
LimitNPROC=512
MemoryMax=8G
CPUWeight=100

[Install]
WantedBy=multi-user.target
EOF

Step 10: Start and Verify the Service

# Reload systemd configuration
sudo systemctl daemon-reload

# Enable auto-start on boot
sudo systemctl enable openclaw-gateway

# Start the service
sudo systemctl start openclaw-gateway

# Check status
sudo systemctl status openclaw-gateway

# Stream live logs
sudo journalctl -u openclaw-gateway -f

# Verify Gateway health check
curl http://127.0.0.1:18789/health
# {"status": "ok", "version": "4.2.1", "uptime": 45, "sessions": 0}

39.3 Tailscale Installation and Serve Configuration

Why Use Tailscale Instead of Public Exposure?

Directly opening port 18789 in the security group carries the following risks:

Tailscale, based on the WireGuard protocol, provides:

Installing Tailscale

# Install
curl -fsSL https://tailscale.com/install.sh | sh

# Start and authenticate (using your Tailscale account)
sudo tailscale up --authkey=tskey-auth-xxxxxxxx

# Get the Tailscale IP address
tailscale ip -4
# 100.x.x.x

Configuring Tailscale Serve

Tailscale Serve exposes the Gateway's local port through the Tailscale network, with automatic TLS — no manual TLS configuration needed:

# Expose local port 18789 via Tailscale HTTPS proxy (automatic TLS)
sudo tailscale serve --bg https / proxy 18789

# Check Serve status
tailscale serve status

# Example output:
# https://openclaw-gateway.tail-xxxx.ts.net/
# |-- / proxy http://127.0.0.1:18789

After this, the Control UI is accessible at:

https://openclaw-gateway.tail-xxxx.ts.net/

Only devices joined to the same Tailscale network can access this address.

Node Device Connection Configuration

Each Node device (iPhone/Android/Raspberry Pi) connects to the Gateway as follows:

# Raspberry Pi Node connection
openclaw node run \
  --host openclaw-gateway.tail-xxxx.ts.net \
  --port 443 \
  --tls \
  --display-name "Pi Node"

When configuring the Gateway address in the iOS/Android app, enter the Tailscale address as well.

ACL Access Control (Optional)

Configure ACLs in the Tailscale admin console to restrict which devices can access the Gateway:

{
  "acls": [
    {
      "action": "accept",
      "src": ["tag:openclaw-client"],
      "dst": ["tag:openclaw-gateway:18789"]
    }
  ],
  "tagOwners": {
    "tag:openclaw-gateway": ["autogroup:admin"],
    "tag:openclaw-client":  ["autogroup:admin"]
  }
}

39.4 Git-Based Backup Strategy for ~/.openclaw

OpenClaw's core data (configuration, memories, skills) is stored in ~/.openclaw/ (or /var/lib/openclaw/ in production). Memories are stored in Markdown format, making them naturally well-suited to git version control.

Initialize the Git Repository

cd /var/lib/openclaw

# Initialize git repository
git init
git config user.email "[email protected]"
git config user.name "OpenClaw Gateway"

# Create .gitignore
cat > .gitignore << 'EOF'
# Do not commit API keys and sensitive data
*.env
.env*
logs/
*.log

# Do not commit temporary files
tmp/
cache/
.tmp/

# Do not commit binary files
uploads/
media/
*.mp4
*.jpg
*.png
EOF

# Initial commit
git add .
git commit -m "initial: openclaw production config"

Configure Remote Repository (Private)

# Use a GitHub private repository
git remote add origin [email protected]:your-org/openclaw-config.git
git push -u origin main

Automated Backup Script

Create /usr/local/bin/openclaw-backup.sh:

#!/bin/bash
set -euo pipefail

OPENCLAW_DIR="/var/lib/openclaw"
LOG_FILE="/var/log/openclaw/backup.log"
DATE=$(date '+%Y-%m-%d %H:%M:%S')

cd "$OPENCLAW_DIR"

# Check for changes
if git diff --quiet && git diff --cached --quiet; then
  echo "[$DATE] No changes to commit" >> "$LOG_FILE"
  exit 0
fi

# Commit changes
git add -A
git commit -m "auto: backup $(date '+%Y-%m-%d %H:%M')" >> "$LOG_FILE" 2>&1

# Push to remote (fail silently; do not affect Gateway operation)
git push origin main >> "$LOG_FILE" 2>&1 || \
  echo "[$DATE] Push failed (will retry next cycle)" >> "$LOG_FILE"

echo "[$DATE] Backup completed" >> "$LOG_FILE"
chmod +x /usr/local/bin/openclaw-backup.sh

Automated Backup systemd Timer

Create /etc/systemd/system/openclaw-backup.timer:

[Unit]
Description=OpenClaw Config Backup Timer

[Timer]
OnCalendar=*:0/30    # Run every 30 minutes
Persistent=true

[Install]
WantedBy=timers.target

Create /etc/systemd/system/openclaw-backup.service:

[Unit]
Description=OpenClaw Config Backup
After=openclaw-gateway.service

[Service]
Type=oneshot
User=openclaw-gateway
ExecStart=/usr/local/bin/openclaw-backup.sh
sudo systemctl daemon-reload
sudo systemctl enable openclaw-backup.timer
sudo systemctl start openclaw-backup.timer

39.5 Multi-Instance Deployment Isolation

When you need to run multiple OpenClaw instances on the same server (e.g., separate production/staging environments, or isolated instances for different users):

Isolation Principles

Each instance requires:

Multi-Instance systemd Services (Using Template Units)

Create /etc/systemd/system/[email protected]:

[Unit]
Description=OpenClaw Gateway (%i instance)
After=network-online.target
Wants=network-online.target

[Service]
Type=simple
User=openclaw-%i
EnvironmentFile=/etc/openclaw/%i.env
ExecStart=/usr/bin/openclaw gateway start \
  --config /var/lib/openclaw-%i/openclaw.json
WorkingDirectory=/var/lib/openclaw-%i
Restart=always
RestartSec=10s

[Install]
WantedBy=multi-user.target

Start the instances:

# Start the production instance
sudo systemctl enable --now openclaw-gateway@prod

# Start the staging instance
sudo systemctl enable --now openclaw-gateway@staging

39.6 Update Strategy

Update Channel Descriptions

Channel Stability Update Frequency Recommended For
stable Most stable Every 4-8 weeks Production (recommended)
beta Fairly stable Every 1-2 weeks Staging, early adopters
dev May be unstable Daily Developers and contributors

Configuring the Update Channel

# Check the current channel
openclaw update --channel

# Switch to the stable channel (recommended for production)
openclaw update --channel stable

# Check for available updates (without applying them)
openclaw update --check
# OpenClaw 4.2.1 → 4.2.2 (stable)
# Changelog: [Fix] Memory compaction edge case, [Perf] Tool routing 15% faster

Automated Update Script (Run in Maintenance Window)

Create /usr/local/bin/openclaw-update.sh:

#!/bin/bash
set -euo pipefail

LOG="/var/log/openclaw/update.log"
DATE=$(date '+%Y-%m-%d %H:%M:%S')

echo "[$DATE] Checking for updates..." >> "$LOG"

# Check for updates
UPDATE_AVAILABLE=$(openclaw update --check --json | jq -r '.available')

if [ "$UPDATE_AVAILABLE" != "true" ]; then
  echo "[$DATE] No update available" >> "$LOG"
  exit 0
fi

NEW_VERSION=$(openclaw update --check --json | jq -r '.newVersion')
echo "[$DATE] Updating to $NEW_VERSION..." >> "$LOG"

# Back up current configuration
/usr/local/bin/openclaw-backup.sh

# Apply update
sudo npm update -g @openclaw/cli >> "$LOG" 2>&1

# Restart service
sudo systemctl restart openclaw-gateway

# Verify
sleep 5
HEALTH=$(curl -sf http://127.0.0.1:18789/health | jq -r '.status')
if [ "$HEALTH" = "ok" ]; then
  echo "[$DATE] Update to $NEW_VERSION completed successfully" >> "$LOG"
else
  echo "[$DATE] ERROR: Health check failed after update!" >> "$LOG"
  exit 1
fi

Use a systemd Timer to run this every Sunday at 2 AM:

# /etc/systemd/system/openclaw-update.timer
[Timer]
OnCalendar=Sun 02:00:00
Persistent=true

39.7 Failure Recovery Procedures

Failure Scenario 1: Gateway Service Crash

# Check service status
sudo systemctl status openclaw-gateway

# View recent error logs
sudo journalctl -u openclaw-gateway --since "10 minutes ago" -p err

# Restart the service
sudo systemctl restart openclaw-gateway

# If it keeps crashing, check for OOM
sudo dmesg | grep -i "oom\|killed"

# Temporarily increase the memory limit
sudo systemctl set-property openclaw-gateway MemoryMax=12G

Failure Scenario 2: Configuration Corruption

# Roll back to the previous git commit
cd /var/lib/openclaw
git log --oneline -10  # View history
git checkout HEAD~1 -- openclaw.json  # Roll back a single file

# Validate configuration
openclaw config validate /var/lib/openclaw/openclaw.json

# Restart the service
sudo systemctl restart openclaw-gateway

Failure Scenario 3: Disk Space Exhaustion

# Check disk usage
df -h /

# Clean up logs
sudo journalctl --vacuum-size=500M
sudo find /var/log/openclaw -name "*.log.gz" -mtime +30 -delete

# Identify large files in memory
du -sh /var/lib/openclaw/memory/*

Failure Scenario 4: API Key Compromise Emergency Response

# Immediately stop the Gateway (disconnect all connections)
sudo systemctl stop openclaw-gateway

# Revoke the old key and generate a new one in the LLM Provider console

# Update the environment variable file
sudo vi /etc/openclaw/gateway.env
# Update ANTHROPIC_API_KEY and others

# Restart the Gateway
sudo systemctl start openclaw-gateway

# Audit recent call logs
grep "api_key" /var/log/openclaw/gateway.log | tail -100

39.8 Monitoring and Health Checks

Gateway Health Check Endpoint

# Basic health check
curl http://127.0.0.1:18789/health
# {
#   "status": "ok",
#   "version": "4.2.1",
#   "uptime": 86400,
#   "sessions": {
#     "active": 2,
#     "total": 47
#   },
#   "nodes": {
#     "online": 3,
#     "offline": 1
#   }
# }

# Detailed status (command line)
openclaw gateway status

Key Monitoring Metrics

Metric Normal Range Alert Threshold
Session success rate > 98% < 95%
Tool error rate < 2% > 5%
Compaction frequency < 5/hour > 20/hour
Node offline rate < 5% > 20%
Memory usage < 70% > 90%
API response time < 2s > 10s

Integration with AWS CloudWatch

# Install CloudWatch Agent
sudo apt install -y amazon-cloudwatch-agent

# Configure custom metric push
# Create /etc/openclaw/cloudwatch-config.json

39.9 Summary

With the complete deployment steps covered in this chapter, you have established an OpenClaw Gateway on AWS ARM64 that is:

The next chapter dives into performance tuning — from token cost reduction to concurrency control — to further unlock the system's performance potential.


Next Chapter: Chapter 40 — Performance Tuning: Token Cost Control, Context Budget Management, and Concurrent Lane Configuration

Rate this chapter
4.8  / 5  (3 ratings)

💬 Comments