Production Deployment: AWS ARM64 + systemd + Tailscale Reference Architecture
Chapter 39: Production Deployment — AWS ARM64 + systemd + Tailscale Reference Architecture
Overview
Deploying OpenClaw to a production environment means it needs to run stably around the clock, have automatic recovery capabilities, and maintain data and access security. This chapter provides a complete reference architecture: AWS t4g.xlarge ARM64 Graviton instances as the compute platform, systemd managing service lifecycle, Tailscale enabling zero-public-exposure secure access, and a complete backup and update strategy.
39.1 Why AWS t4g.xlarge ARM64 Graviton?
The Case for ARM64 Graviton
AWS Graviton3 processors are AWS's proprietary ARM64 chips. In the OpenClaw context, they offer three core advantages:
Price-Performance Ratio
Equivalent configuration comparison (4 vCPU / 16GB RAM):
t4g.xlarge (ARM64): $0.1344/hour → ~$98/month
t3.xlarge (x86_64): $0.1664/hour → ~$121/month
Cost savings: approximately 19%
Node.js Performance
Node.js 22 is specifically optimized for ARM64, and the V8 engine's JIT compiler performs excellently on Graviton3. OpenClaw Gateway's core workloads (JSON serialization/deserialization, WebSocket frame processing, Promise chain scheduling) perform on par with x86 — or slightly better — on ARM64.
Power and Thermal Efficiency
The ARM64 architecture consumes less power for equivalent performance. In an AWS data center, this translates to better thermal density and higher instance stability.
Recommended Instance Specifications
| Scenario | Instance Type | vCPU | RAM | Monthly (On-Demand) |
|---|---|---|---|---|
| Personal/small team | t4g.medium | 2 | 4 GB | ~$24 |
| Standard production | t4g.xlarge | 4 | 16 GB | ~$98 |
| High concurrency | t4g.2xlarge | 8 | 32 GB | ~$196 |
| Multi-Agent instances | c7g.2xlarge | 8 | 16 GB | ~$232 |
Using a Reserved Instance (1-year term) saves an additional ~38%; the annual cost for a standard production setup is approximately $728.
39.2 Complete Production Deployment Steps
Step 1: Create the EC2 Instance
# Create instance using AWS CLI (or via the console)
aws ec2 run-instances \
--image-id ami-0xxxxxxxxxxxxxxxxx \ # Ubuntu 24.04 LTS ARM64 AMI
--instance-type t4g.xlarge \
--key-name your-key-pair \
--security-group-ids sg-xxxxxxxxx \
--subnet-id subnet-xxxxxxxxx \
--block-device-mappings '[{
"DeviceName": "/dev/sda1",
"Ebs": {"VolumeSize": 50, "VolumeType": "gp3"}
}]' \
--tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=openclaw-gateway}]'
Security group rules (principle of minimum exposure):
Inbound rules:
SSH (22): Your IP address/32 (not 0.0.0.0/0)
No other inbound ports
Outbound rules:
All traffic: 0.0.0.0/0 (allow LLM API calls)
Port 18789 is not exposed to the public internet; access is via Tailscale.
Step 2: Base System Configuration
# SSH into the instance
ssh -i ~/.ssh/your-key.pem ubuntu@<instance-ip>
# Update the system
sudo apt update && sudo apt upgrade -y
# Install basic tools
sudo apt install -y \
git curl wget unzip \
htop iotop nethogs \
fail2ban ufw \
logrotate
# Set timezone (UTC recommended for log consistency)
sudo timedatectl set-timezone UTC
# Set hostname
sudo hostnamectl set-hostname openclaw-gateway
# Configure fail2ban (SSH brute-force protection)
sudo systemctl enable fail2ban
sudo systemctl start fail2ban
Step 3: Firewall Configuration
# Configure UFW
sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow from <your-ip>/32 to any port 22 # SSH only from your IP
sudo ufw enable
# Verify rules
sudo ufw status verbose
Step 4: Install Node.js 24
# Option 1: NodeSource (recommended for production)
curl -fsSL https://deb.nodesource.com/setup_24.x | sudo -E bash -
sudo apt install -y nodejs
# Option 2: nvm (for multi-version management)
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.0/install.sh | bash
source ~/.bashrc
nvm install 24
nvm alias default 24
# Verify
node --version # v24.x.x
npm --version # 11.x.x
node -e "console.log(process.arch)" # arm64
Step 5: Install OpenClaw
# Install OpenClaw CLI globally
sudo npm install -g @openclaw/cli
# Verify
openclaw --version
# Run the production initialization wizard
openclaw onboard --mode production
openclaw onboard --mode production performs the following:
- Creates the
~/.openclaw/directory structure - Guides configuration of the primary LLM Provider and API Key
- Sets Gateway port and authentication
- Generates an Agent ID (
agentId, globally unique) - Initializes the memory directory structure
- Runs an environment health check
Step 6: Configure Environment Variables
It is not recommended to write API Keys directly into configuration files. Use environment variables or AWS Secrets Manager instead.
# Create an environment variable file (mode 600)
sudo mkdir -p /etc/openclaw
sudo tee /etc/openclaw/gateway.env > /dev/null << 'EOF'
ANTHROPIC_API_KEY=sk-ant-api03-xxxxxxxxxxxxxxxxxx
OPENAI_API_KEY=sk-proj-xxxxxxxxxxxxxxxxxx
OPENCLAW_GATEWAY_SECRET=your-random-secret-here
NODE_ENV=production
LOG_LEVEL=info
EOF
sudo chmod 600 /etc/openclaw/gateway.env
sudo chown root:openclaw-gateway /etc/openclaw/gateway.env
Using AWS Secrets Manager (recommended for enterprise environments):
# Inject secrets from Secrets Manager into the environment
aws secretsmanager get-secret-value \
--secret-id openclaw/production/api-keys \
--query SecretString \
--output text | \
jq -r 'to_entries[] | "\(.key)=\(.value)"' > /tmp/env-inject
Step 7: Production openclaw.json Configuration
cat > ~/.openclaw/openclaw.json << 'EOF'
{
"model": "anthropic/claude-sonnet-4-6",
"fallbackModel": "openai/gpt-4.1-mini",
"agentId": "prod-gateway-01",
"gateway": {
"port": 18789,
"host": "127.0.0.1",
"authRequired": true
},
"thinking": {
"enabled": true,
"budget": 8000
},
"skills": {
"lazy": true,
"enabled": ["web-search", "code-exec"]
},
"context": {
"reserveFloor": 8000,
"softThreshold": 0.85
},
"lanes": {
"global": 4,
"subAgent": 8
},
"logging": {
"level": "info",
"output": "file",
"file": "/var/log/openclaw/gateway.log",
"maxSize": "100m",
"maxFiles": 14
},
"memory": {
"dir": "/var/lib/openclaw/memory",
"compaction": {
"enabled": true,
"threshold": 50000
}
}
}
EOF
Step 8: Create Dedicated User and Directories
# Create a system user
sudo useradd --system \
--home-dir /var/lib/openclaw \
--create-home \
--shell /usr/sbin/nologin \
openclaw-gateway
# Create necessary directories
sudo mkdir -p /var/log/openclaw /etc/openclaw
sudo chown openclaw-gateway:openclaw-gateway /var/log/openclaw
sudo chown -R openclaw-gateway:openclaw-gateway /var/lib/openclaw
# Move configuration to system path
sudo cp -r ~/.openclaw/* /var/lib/openclaw/
sudo chown -R openclaw-gateway:openclaw-gateway /var/lib/openclaw
Step 9: Create the systemd Service Unit File
sudo tee /etc/systemd/system/openclaw-gateway.service > /dev/null << 'EOF'
[Unit]
Description=OpenClaw AI Gateway
Documentation=https://docs.openclaw.ai
After=network-online.target
Wants=network-online.target
StartLimitIntervalSec=300
StartLimitBurst=5
[Service]
Type=simple
User=openclaw-gateway
Group=openclaw-gateway
# Execution command
ExecStart=/usr/bin/openclaw gateway start \
--config /var/lib/openclaw/openclaw.json
# Working directory
WorkingDirectory=/var/lib/openclaw
# Environment variables (no secrets here; secrets are in EnvironmentFile)
Environment=NODE_ENV=production
EnvironmentFile=/etc/openclaw/gateway.env
# Restart policy
Restart=always
RestartSec=10s
TimeoutStopSec=30s
# Logging
StandardOutput=append:/var/log/openclaw/gateway.log
StandardError=append:/var/log/openclaw/gateway-error.log
# Security hardening
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=read-only
ReadWritePaths=/var/lib/openclaw /var/log/openclaw
CapabilityBoundingSet=
AmbientCapabilities=
# Resource limits
LimitNOFILE=65536
LimitNPROC=512
MemoryMax=8G
CPUWeight=100
[Install]
WantedBy=multi-user.target
EOF
Step 10: Start and Verify the Service
# Reload systemd configuration
sudo systemctl daemon-reload
# Enable auto-start on boot
sudo systemctl enable openclaw-gateway
# Start the service
sudo systemctl start openclaw-gateway
# Check status
sudo systemctl status openclaw-gateway
# Stream live logs
sudo journalctl -u openclaw-gateway -f
# Verify Gateway health check
curl http://127.0.0.1:18789/health
# {"status": "ok", "version": "4.2.1", "uptime": 45, "sessions": 0}
39.3 Tailscale Installation and Serve Configuration
Why Use Tailscale Instead of Public Exposure?
Directly opening port 18789 in the security group carries the following risks:
- Port scanning attacks
- Brute-force attacks against WebSocket authentication
- DDoS attacks overloading the Gateway
- Unencrypted WebSocket (if TLS is not configured)
Tailscale, based on the WireGuard protocol, provides:
- Zero public exposure: Port 18789 is only accessible within the Tailscale virtual network
- Authentication: All clients must authenticate with Tailscale, preventing unauthorized access
- End-to-end WireGuard encryption: More fundamental than application-layer TLS
- NAT traversal: No network configuration required for Node devices
Installing Tailscale
# Install
curl -fsSL https://tailscale.com/install.sh | sh
# Start and authenticate (using your Tailscale account)
sudo tailscale up --authkey=tskey-auth-xxxxxxxx
# Get the Tailscale IP address
tailscale ip -4
# 100.x.x.x
Configuring Tailscale Serve
Tailscale Serve exposes the Gateway's local port through the Tailscale network, with automatic TLS — no manual TLS configuration needed:
# Expose local port 18789 via Tailscale HTTPS proxy (automatic TLS)
sudo tailscale serve --bg https / proxy 18789
# Check Serve status
tailscale serve status
# Example output:
# https://openclaw-gateway.tail-xxxx.ts.net/
# |-- / proxy http://127.0.0.1:18789
After this, the Control UI is accessible at:
https://openclaw-gateway.tail-xxxx.ts.net/
Only devices joined to the same Tailscale network can access this address.
Node Device Connection Configuration
Each Node device (iPhone/Android/Raspberry Pi) connects to the Gateway as follows:
# Raspberry Pi Node connection
openclaw node run \
--host openclaw-gateway.tail-xxxx.ts.net \
--port 443 \
--tls \
--display-name "Pi Node"
When configuring the Gateway address in the iOS/Android app, enter the Tailscale address as well.
ACL Access Control (Optional)
Configure ACLs in the Tailscale admin console to restrict which devices can access the Gateway:
{
"acls": [
{
"action": "accept",
"src": ["tag:openclaw-client"],
"dst": ["tag:openclaw-gateway:18789"]
}
],
"tagOwners": {
"tag:openclaw-gateway": ["autogroup:admin"],
"tag:openclaw-client": ["autogroup:admin"]
}
}
39.4 Git-Based Backup Strategy for ~/.openclaw
OpenClaw's core data (configuration, memories, skills) is stored in ~/.openclaw/ (or /var/lib/openclaw/ in production). Memories are stored in Markdown format, making them naturally well-suited to git version control.
Initialize the Git Repository
cd /var/lib/openclaw
# Initialize git repository
git init
git config user.email "[email protected]"
git config user.name "OpenClaw Gateway"
# Create .gitignore
cat > .gitignore << 'EOF'
# Do not commit API keys and sensitive data
*.env
.env*
logs/
*.log
# Do not commit temporary files
tmp/
cache/
.tmp/
# Do not commit binary files
uploads/
media/
*.mp4
*.jpg
*.png
EOF
# Initial commit
git add .
git commit -m "initial: openclaw production config"
Configure Remote Repository (Private)
# Use a GitHub private repository
git remote add origin [email protected]:your-org/openclaw-config.git
git push -u origin main
Automated Backup Script
Create /usr/local/bin/openclaw-backup.sh:
#!/bin/bash
set -euo pipefail
OPENCLAW_DIR="/var/lib/openclaw"
LOG_FILE="/var/log/openclaw/backup.log"
DATE=$(date '+%Y-%m-%d %H:%M:%S')
cd "$OPENCLAW_DIR"
# Check for changes
if git diff --quiet && git diff --cached --quiet; then
echo "[$DATE] No changes to commit" >> "$LOG_FILE"
exit 0
fi
# Commit changes
git add -A
git commit -m "auto: backup $(date '+%Y-%m-%d %H:%M')" >> "$LOG_FILE" 2>&1
# Push to remote (fail silently; do not affect Gateway operation)
git push origin main >> "$LOG_FILE" 2>&1 || \
echo "[$DATE] Push failed (will retry next cycle)" >> "$LOG_FILE"
echo "[$DATE] Backup completed" >> "$LOG_FILE"
chmod +x /usr/local/bin/openclaw-backup.sh
Automated Backup systemd Timer
Create /etc/systemd/system/openclaw-backup.timer:
[Unit]
Description=OpenClaw Config Backup Timer
[Timer]
OnCalendar=*:0/30 # Run every 30 minutes
Persistent=true
[Install]
WantedBy=timers.target
Create /etc/systemd/system/openclaw-backup.service:
[Unit]
Description=OpenClaw Config Backup
After=openclaw-gateway.service
[Service]
Type=oneshot
User=openclaw-gateway
ExecStart=/usr/local/bin/openclaw-backup.sh
sudo systemctl daemon-reload
sudo systemctl enable openclaw-backup.timer
sudo systemctl start openclaw-backup.timer
39.5 Multi-Instance Deployment Isolation
When you need to run multiple OpenClaw instances on the same server (e.g., separate production/staging environments, or isolated instances for different users):
Isolation Principles
Each instance requires:
- Independent agentId: Prevents memory and configuration cross-contamination
- Independent working directory:
/var/lib/openclaw-prod/,/var/lib/openclaw-staging/ - Independent port: 18789 (production), 18790 (staging)
- Independent systemd service:
openclaw-gateway-prod,openclaw-gateway-staging
Multi-Instance systemd Services (Using Template Units)
Create /etc/systemd/system/[email protected]:
[Unit]
Description=OpenClaw Gateway (%i instance)
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
User=openclaw-%i
EnvironmentFile=/etc/openclaw/%i.env
ExecStart=/usr/bin/openclaw gateway start \
--config /var/lib/openclaw-%i/openclaw.json
WorkingDirectory=/var/lib/openclaw-%i
Restart=always
RestartSec=10s
[Install]
WantedBy=multi-user.target
Start the instances:
# Start the production instance
sudo systemctl enable --now openclaw-gateway@prod
# Start the staging instance
sudo systemctl enable --now openclaw-gateway@staging
39.6 Update Strategy
Update Channel Descriptions
| Channel | Stability | Update Frequency | Recommended For |
|---|---|---|---|
stable |
Most stable | Every 4-8 weeks | Production (recommended) |
beta |
Fairly stable | Every 1-2 weeks | Staging, early adopters |
dev |
May be unstable | Daily | Developers and contributors |
Configuring the Update Channel
# Check the current channel
openclaw update --channel
# Switch to the stable channel (recommended for production)
openclaw update --channel stable
# Check for available updates (without applying them)
openclaw update --check
# OpenClaw 4.2.1 → 4.2.2 (stable)
# Changelog: [Fix] Memory compaction edge case, [Perf] Tool routing 15% faster
Automated Update Script (Run in Maintenance Window)
Create /usr/local/bin/openclaw-update.sh:
#!/bin/bash
set -euo pipefail
LOG="/var/log/openclaw/update.log"
DATE=$(date '+%Y-%m-%d %H:%M:%S')
echo "[$DATE] Checking for updates..." >> "$LOG"
# Check for updates
UPDATE_AVAILABLE=$(openclaw update --check --json | jq -r '.available')
if [ "$UPDATE_AVAILABLE" != "true" ]; then
echo "[$DATE] No update available" >> "$LOG"
exit 0
fi
NEW_VERSION=$(openclaw update --check --json | jq -r '.newVersion')
echo "[$DATE] Updating to $NEW_VERSION..." >> "$LOG"
# Back up current configuration
/usr/local/bin/openclaw-backup.sh
# Apply update
sudo npm update -g @openclaw/cli >> "$LOG" 2>&1
# Restart service
sudo systemctl restart openclaw-gateway
# Verify
sleep 5
HEALTH=$(curl -sf http://127.0.0.1:18789/health | jq -r '.status')
if [ "$HEALTH" = "ok" ]; then
echo "[$DATE] Update to $NEW_VERSION completed successfully" >> "$LOG"
else
echo "[$DATE] ERROR: Health check failed after update!" >> "$LOG"
exit 1
fi
Use a systemd Timer to run this every Sunday at 2 AM:
# /etc/systemd/system/openclaw-update.timer
[Timer]
OnCalendar=Sun 02:00:00
Persistent=true
39.7 Failure Recovery Procedures
Failure Scenario 1: Gateway Service Crash
# Check service status
sudo systemctl status openclaw-gateway
# View recent error logs
sudo journalctl -u openclaw-gateway --since "10 minutes ago" -p err
# Restart the service
sudo systemctl restart openclaw-gateway
# If it keeps crashing, check for OOM
sudo dmesg | grep -i "oom\|killed"
# Temporarily increase the memory limit
sudo systemctl set-property openclaw-gateway MemoryMax=12G
Failure Scenario 2: Configuration Corruption
# Roll back to the previous git commit
cd /var/lib/openclaw
git log --oneline -10 # View history
git checkout HEAD~1 -- openclaw.json # Roll back a single file
# Validate configuration
openclaw config validate /var/lib/openclaw/openclaw.json
# Restart the service
sudo systemctl restart openclaw-gateway
Failure Scenario 3: Disk Space Exhaustion
# Check disk usage
df -h /
# Clean up logs
sudo journalctl --vacuum-size=500M
sudo find /var/log/openclaw -name "*.log.gz" -mtime +30 -delete
# Identify large files in memory
du -sh /var/lib/openclaw/memory/*
Failure Scenario 4: API Key Compromise Emergency Response
# Immediately stop the Gateway (disconnect all connections)
sudo systemctl stop openclaw-gateway
# Revoke the old key and generate a new one in the LLM Provider console
# Update the environment variable file
sudo vi /etc/openclaw/gateway.env
# Update ANTHROPIC_API_KEY and others
# Restart the Gateway
sudo systemctl start openclaw-gateway
# Audit recent call logs
grep "api_key" /var/log/openclaw/gateway.log | tail -100
39.8 Monitoring and Health Checks
Gateway Health Check Endpoint
# Basic health check
curl http://127.0.0.1:18789/health
# {
# "status": "ok",
# "version": "4.2.1",
# "uptime": 86400,
# "sessions": {
# "active": 2,
# "total": 47
# },
# "nodes": {
# "online": 3,
# "offline": 1
# }
# }
# Detailed status (command line)
openclaw gateway status
Key Monitoring Metrics
| Metric | Normal Range | Alert Threshold |
|---|---|---|
| Session success rate | > 98% | < 95% |
| Tool error rate | < 2% | > 5% |
| Compaction frequency | < 5/hour | > 20/hour |
| Node offline rate | < 5% | > 20% |
| Memory usage | < 70% | > 90% |
| API response time | < 2s | > 10s |
Integration with AWS CloudWatch
# Install CloudWatch Agent
sudo apt install -y amazon-cloudwatch-agent
# Configure custom metric push
# Create /etc/openclaw/cloudwatch-config.json
39.9 Summary
With the complete deployment steps covered in this chapter, you have established an OpenClaw Gateway on AWS ARM64 that is:
- Highly available: systemd Restart=always + automatic health checks
- Secure: Tailscale zero-public-exposure + minimum-privilege system user
- Maintainable: git-versioned memories + automated update scripts
- Observable: Structured logs + health check endpoint
The next chapter dives into performance tuning — from token cost reduction to concurrency control — to further unlock the system's performance potential.
Next Chapter: Chapter 40 — Performance Tuning: Token Cost Control, Context Budget Management, and Concurrent Lane Configuration