Chapter 6

Gateway Control Plane: WebSocket Protocol, Three-Step Handshake and Session Resolution

Chapter 6: Gateway Control Plane: WebSocket Protocol, Three-Step Handshake, and Session Resolution Algorithm

6.1 Why Gateway Binds to localhost

In distributed systems, the network attack surface is the first line of defense in security design. OpenClaw Gateway defaults to binding its WebSocket service to localhost:18789 rather than 0.0.0.0:18789. This seemingly minor configuration decision carries profound security philosophy behind it.

6.1.1 The Security Foundation of localhost Binding

Binding to 0.0.0.0 means the service listens on all network interfaces, including public IP addresses, VPN interfaces, and container bridges. Any process or remote node that can access the host's network can attempt a connection. For a service holding control over AI Agents, this level of exposure is unacceptable.

Binding to 127.0.0.1 (localhost) restricts connection sources to processes on the same host. External network traffic is dropped at the OS TCP/IP stack level before ever reaching the Gateway process. This provides several layers of protection:

Network Isolation Layers (outside in):

[External Internet]  ──×──→  [Router/Firewall]
                                    │
[VPN/Container Network] ──×──→  [Network Interface]
                                    │
[Other LAN Hosts] ──×──→  [OS TCP/IP Stack]
                                    │
[Same-host Processes] ──────→  [127.0.0.1:18789]  ← Gateway actually listens here

6.1.2 Explicit Trust Boundary Definition

The localhost binding draws a clear trust boundary at the OS process isolation layer. Only the following entities can legitimately connect:

This design shifts network exposure responsibility to the administrator's proxy configuration rather than defaulting to open access. Administrators who need to expose the Gateway externally must explicitly configure a reverse proxy and accept the associated security responsibility — a manifestation of the "secure by default" principle.

6.1.3 Collaborative Defense with dmPolicy

Even if Gateway is exposed to the public internet through a reverse proxy, dmPolicy (Device Management Policy) provides a second line of defense. This defense-in-depth approach ensures that a single configuration mistake does not result in complete system compromise.


6.2 Three WebSocket Message Formats

The Gateway protocol defines three message types that cover all client-server interaction scenarios. All messages are UTF-8 encoded JSON text frames.

6.2.1 Request Message (req)

Format of client-initiated request messages:

{
  "type": "req",
  "id": "7f3a9b2c-4e1d-4f8a-b6e5-2c8d9f0a1b3c",
  "method": "sessions.list",
  "params": {
    "filter": "active",
    "limit": 20
  }
}

Field descriptions:

Field Type Description
type "req" Fixed identifier marking this as a request message
id UUID v4 string Unique request identifier, echoed back in the server response
method string RPC method name, using dot-separated hierarchical naming (namespace.action)
params object Method parameters, may be an empty object {}

Common method list:

sessions.list          - List all sessions
sessions.create        - Create a new session
sessions.get           - Get details of a single session
sessions.terminate     - Terminate a session
messages.send          - Send a message to a session
agent.interrupt        - Interrupt current Agent execution
config.get             - Read configuration values
config.set             - Write configuration values

6.2.2 Response Message (res)

Server responses to requests:

Success response:

{
  "type": "res",
  "id": "7f3a9b2c-4e1d-4f8a-b6e5-2c8d9f0a1b3c",
  "ok": true,
  "payload": {
    "sessions": [
      {
        "id": "sess_01HXYZ",
        "contextKey": "default",
        "agentId": "agent_claude",
        "status": "idle",
        "createdAt": "2024-01-15T08:30:00Z"
      }
    ],
    "total": 1
  }
}

Error response:

{
  "type": "res",
  "id": "7f3a9b2c-4e1d-4f8a-b6e5-2c8d9f0a1b3c",
  "ok": false,
  "error": {
    "code": "SESSION_NOT_FOUND",
    "message": "Session sess_NOTEXIST does not exist",
    "details": {
      "sessionId": "sess_NOTEXIST"
    }
  }
}
Field Type Description
type "res" Fixed identifier
id UUID v4 Same as the corresponding request's id
ok boolean true on success, false on failure
payload object Response data on success (present when ok=true)
error object Error information (present when ok=false)

6.2.3 Event Message (event)

Server-initiated pushes that do not correspond to any request:

{
  "type": "event",
  "event": "session.message",
  "payload": {
    "sessionId": "sess_01HXYZ",
    "messageId": "msg_001",
    "role": "assistant",
    "content": [
      {
        "type": "text",
        "text": "I've analyzed your code and found three issues..."
      }
    ]
  },
  "seq": 42,
  "stateVersion": 7
}
Field Type Description
type "event" Fixed identifier
event string Event type name
payload object Event data
seq integer Globally incrementing sequence number; clients can detect message loss
stateVersion integer Gateway internal state version; supports cache validation after reconnection

The distinction between seq and stateVersion: seq is a monotonically increasing counter for message delivery order; stateVersion tracks how many times the Gateway's internal state has changed, allowing clients to determine whether their cached state is still valid.

Common event types:

session.message         - Agent produced a new message (text/tool call)
session.message.delta   - Streaming message increment
session.status.changed  - Session status change (idle/running/error)
session.tool.started    - Tool began execution
session.tool.completed  - Tool execution completed
gateway.reload          - Gateway configuration hot-reload notification

6.3 Three-Step Handshake Sequence Diagram

Connection establishment is not a simple WebSocket upgrade — it is an explicit three-step authentication handshake.

Client                                          Gateway
  │                                                │
  │  ─── HTTP Upgrade (WebSocket) ──────────→     │
  │                                                │
  │  ←── 101 Switching Protocols ──────────────   │
  │                                                │
  │         ┌──── Step 1: Challenge ───┐           │
  │         │                          │           │
  │  ←── {"type":"challenge",          │           │
  │         "nonce":"a3f8...",          │           │
  │         "timestamp":1705123456}     │           │
  │         └──────────────────────────┘           │
  │                                                │
  │         ┌──── Step 2: Auth Request ──┐         │
  │         │                            │         │
  │  ─── {"type":"connect",              │         │
  │         "role":"client",             │         │
  │         "scopes":["read","write"],   │         │
  │         "credentials":{...},         │         │
  │         "signedNonce":"b9c2..."}     │         │
  │  ─────────────────────────────────────────→   │
  │         └────────────────────────────┘         │
  │                                                │
  │         ┌──── Step 3: Confirmation ──┐         │
  │         │                            │         │
  │  ←── {"type":"hello-ok",             │         │
  │         "protocolVersion":"1.2",      │         │
  │         "features":["streaming",      │         │
  │           "sub-agents"],              │         │
  │         "payloadPolicy":{...}}        │         │
  │         └────────────────────────────┘         │
  │                                                │
  │  ════ Normal Communication Phase ════════════ │

6.3.1 Step 1: Nonce Challenge

Immediately after the WebSocket handshake completes (without waiting for any client data), the Gateway pushes a challenge message:

{
  "type": "challenge",
  "nonce": "a3f8b9c2d1e4f5a6b7c8d9e0f1a2b3c4",
  "timestamp": 1705123456789
}

The time window restriction prevents replay attacks: even if an attacker intercepts an old authentication message, they cannot reuse it after the timeout expires.

6.3.2 Step 2: Authentication Request

After receiving the challenge, the client sends a connect message:

{
  "type": "connect",
  "role": "client",
  "scopes": ["sessions.read", "sessions.write", "messages.send"],
  "credentials": {
    "method": "shared-secret",
    "token": "sk-openclaw-..."
  },
  "signedNonce": "HMAC-SHA256(token, nonce + timestamp)"
}

The signedNonce calculation:

signedNonce = base64url(HMAC-SHA256(
  key   = credentials.token,
  data  = nonce + ":" + timestamp
))

This design ensures:

  1. The token itself is never transmitted over the network (prevents interception by a man-in-the-middle)
  2. Each connection produces a different signature value (prevents replay attacks)
  3. The server can verify that the client holds the correct key

6.3.3 Step 3: Hello-OK Confirmation

After successful authentication, the Gateway sends the handshake completion message:

{
  "type": "hello-ok",
  "protocolVersion": "1.2",
  "features": [
    "streaming",
    "sub-agents",
    "tool-interruption",
    "session-branching"
  ],
  "payloadPolicy": {
    "maxMessageSize": 4194304,
    "compressionEnabled": true,
    "streamingChunkSize": 1024
  },
  "sessionId": "conn_01HXYZ",
  "serverTime": 1705123457123
}

Clients should use the features list to determine which advanced features to enable, ensuring backward compatibility.


6.4 Four Authentication Paths

Gateway supports four authentication methods, suited to different deployment scenarios.

6.4.1 Shared Secret

The simplest authentication method, suitable for single-user or trusted environments:

{
  "credentials": {
    "method": "shared-secret",
    "token": "sk-openclaw-a1b2c3d4e5f6..."
  }
}

Configuration file config/gateway.yaml:

auth:
  sharedSecret:
    enabled: true
    token: "${OPENCLAW_SECRET}"  # Read from environment variable, avoid plaintext storage
    minLength: 32                 # Enforce minimum length

Use cases: Local development, single-user deployment, trusted intranet environments Security warning: Tokens must be rotated immediately upon compromise; use a key management system (KMS) where possible

6.4.2 Identity Proxy

Pass identity information through a trusted reverse proxy:

{
  "credentials": {
    "method": "trusted-proxy",
    "proxyHeader": "X-Tailscale-User"
  }
}
Client ──→ [Tailscale/nginx] ──X-Tailscale-User: [email protected]──→ Gateway

Configuration:

auth:
  trustedProxy:
    enabled: true
    trustedNetworks:
      - "100.64.0.0/10"   # Tailscale CGNAT range
      - "127.0.0.1/32"
    userHeader: "X-Tailscale-User"
    groupHeader: "X-Tailscale-Groups"

Use cases: Enterprise intranet, Tailscale overlay networks, SSO integration Security requirement: Must ensure Gateway only accepts connections from trusted proxies; otherwise attackers can forge headers

6.4.3 Device Token

Devices obtain a long-lived token after initial pairing and reuse it for subsequent connections:

{
  "credentials": {
    "method": "device-token",
    "deviceId": "dev_01HXYZ",
    "token": "dt-a1b2c3..."
  }
}

Pairing flow:

1. Device initiates pairing request → Gateway generates verification code (6 digits)
2. Administrator confirms the code in the console
3. Gateway issues a device token, stored in devices.json
4. Subsequent connections use the device token, no re-pairing needed

Use cases: IoT devices, unattended automation nodes, mobile clients Security feature: Device tokens can be individually revoked without affecting other devices

6.4.4 Bootstrap Token

A one-time, high-privilege token for initial configuration or emergency recovery:

# Set only in configuration file, never exposed through the API
auth:
  bootstrapToken:
    token: "${OPENCLAW_BOOTSTRAP}"
    allowedScopes:
      - "admin.config"
      - "devices.pair"
    expiresAt: "2024-02-01T00:00:00Z"  # Expiration time is mandatory

Use cases: Initial deployment, admin password reset, emergency access Security warning: Disable immediately after use, or set an extremely short expiration time


6.5 dmPolicy: Four Modes

Device Management Policy controls which clients can connect to the Gateway.

6.5.1 pairing (Default Mode)

gateway:
  dmPolicy: pairing

Workflow:

New device attempts to connect
           │
           ▼
Gateway generates 6-digit verification code
           │
           ▼
Displayed in Gateway console/logs
           │
           ▼
Admin manually confirms ──No──→ Connection rejected
           │
          Yes
           ▼
Device added to allowlist, device token issued

Characteristics: Human review of every new device, prevents unauthorized device access Use cases: Production environments, multi-user teams, security-sensitive deployments

6.5.2 allowlist (Whitelist Mode)

gateway:
  dmPolicy: allowlist
  allowedDevices:
    - deviceId: "dev_laptop_alice"
      name: "Alice's Work Laptop"
      publicKey: "ssh-ed25519 AAAA..."
    - deviceId: "dev_ci_runner"
      name: "CI/CD Build Machine"
      publicKey: "ssh-ed25519 BBBB..."

Only explicitly listed devices can connect. Unlisted devices are rejected immediately without generating verification codes.

Use cases: Fixed device sets, pre-configured enterprise environments

6.5.3 open (Open Mode)

gateway:
  dmPolicy: open
  # Must also configure allowlist as supplementary filter
  requireValidCredentials: true

Any client with valid credentials can connect without pre-registration.

Security trade-off: Convenient for development and testing, but loses device-level access control. Must be used in conjunction with strong credential authentication (shared secret or identity proxy).

Use cases: Development and test environments, one-off scripting tools

6.5.4 disabled (Completely Disabled)

gateway:
  dmPolicy: disabled

Completely disables the device management feature. All device management APIs (pairing, revocation, listing) return a FEATURE_DISABLED error.

Use cases: Deployments that rely entirely on external IAM systems (such as Tailscale ACLs) for access control


6.6 Session Key Generation Algorithm and sessions.json Storage

6.6.1 The Three-Segment Structure of Session Keys

A Session Key is the composite key that uniquely identifies a conversation session in OpenClaw, in the format:

agent:<agentId>:<contextKey>

Examples:

agent:claude-3-opus:workspace-alice
agent:gpt-4:project-backend
agent:claude-3-sonnet:default

The three components:

Component Description Example
agent Fixed prefix indicating this is an Agent session key agent
agentId Unique ID defined in the Agent configuration claude-3-opus
contextKey Distinguishes different contexts for the same Agent workspace-alice

6.6.2 contextKey Generation Rules

contextKey is automatically generated based on the channel source:

// Pseudocode: contextKey generation logic
function generateContextKey(source) {
  switch (source.type) {
    case "discord-dm":
      return `dm:${source.userId}`;
    case "discord-channel":
      return `channel:${source.channelId}`;
    case "slack-thread":
      return `thread:${source.threadTs}`;
    case "api-explicit":
      return source.contextKey;  // Client specifies explicitly
    default:
      return "default";
  }
}

6.6.3 sessions.json Storage Format

Session state is persisted in data/sessions.json:

{
  "version": 2,
  "sessions": {
    "agent:claude-3-opus:dm:user123": {
      "id": "sess_01HXYZ",
      "key": "agent:claude-3-opus:dm:user123",
      "agentId": "claude-3-opus",
      "contextKey": "dm:user123",
      "status": "idle",
      "createdAt": "2024-01-15T08:00:00Z",
      "lastActiveAt": "2024-01-15T09:30:00Z",
      "messageCount": 47,
      "transcriptPath": "data/transcripts/sess_01HXYZ.jsonl",
      "metadata": {
        "userId": "user123",
        "channelType": "discord-dm",
        "guildId": null
      }
    }
  },
  "updatedAt": "2024-01-15T09:30:00Z"
}

6.6.4 Session Lookup Algorithm

Input: agentId + source information
           │
           ▼
Compute contextKey (based on source type)
           │
           ▼
Construct Session Key = "agent:" + agentId + ":" + contextKey
           │
           ▼
Look up the key in sessions.json
           │
     ┌─────┴─────┐
  Found         Not Found
     │               │
     ▼               ▼
Check freshness  Create new Session
(see ch10)       and persist it
     │
     ▼
Return existing Session

6.7 Single-Writer Mode: Eliminating Distributed Consistency Problems

6.7.1 The Core Problem

In traditional distributed AI systems, multiple nodes may simultaneously hold session state and attempt to modify it:

[Node A] ──read session state──→ [Shared Storage]
[Node B] ──read session state──→ [Shared Storage]
[Node A] ──write (append msg)──→ [Shared Storage]  ← Overwrites Node B's view!
[Node B] ──write (append msg)──→ [Shared Storage]  ← State inconsistency!

This requires introducing distributed locks, CAS (compare-and-swap) operations, version vectors, and other complex mechanisms.

6.7.2 The Single-Writer Solution

OpenClaw Gateway adopts a simpler strategy: each Session has exactly one writer — the Gateway itself.

[Client A]  ──req──→ [Gateway] ──write──→ [sessions.json]
[Client B]  ──req──→ [Gateway] ──write──→ [sessions.json]

Inside Gateway: Command Queue's Session Lane (serial)
                 ensures operations on the same Session execute sequentially

All modifications to a Session are serialized through the Gateway's Session Lane (a serial queue). Clients only send requests; the Gateway is the sole state writer.

6.7.3 Practical Outcomes

Problem Traditional Distributed Approach Single-Writer Approach
Concurrent write conflicts Requires distributed locks Non-existent (serialized)
State inconsistency Requires eventual consistency protocols Non-existent (single storage)
Read-write consistency Requires memory barriers Naturally guaranteed
Failure recovery Requires log replay sessions.json is the source of truth
Complexity Extremely high Extremely low

The trade-off: The Gateway becomes a single point. But for an AI Agent control plane, this is a reasonable trade-off — the Gateway typically runs on the same machine as the user, and high availability needs are handled at a higher layer (such as Kubernetes Pod restarts) rather than through internal Gateway distribution.


6.8 Complete Connection Example

The following is a complete WebSocket interaction example showing the full flow from connection to session creation:

// Client implementation example (Node.js / TypeScript)
import { WebSocket } from "ws";
import { createHmac } from "crypto";

const ws = new WebSocket("ws://localhost:18789");

ws.on("message", async (data) => {
  const msg = JSON.parse(data.toString());
  
  if (msg.type === "challenge") {
    // Step 2: Send authentication request
    const signedNonce = createHmac("sha256", process.env.OPENCLAW_SECRET!)
      .update(`${msg.nonce}:${msg.timestamp}`)
      .digest("base64url");
    
    ws.send(JSON.stringify({
      type: "connect",
      role: "client",
      scopes: ["sessions.read", "sessions.write", "messages.send"],
      credentials: {
        method: "shared-secret",
        token: process.env.OPENCLAW_SECRET!
      },
      signedNonce
    }));
  }
  
  if (msg.type === "hello-ok") {
    console.log(`Protocol version: ${msg.protocolVersion}`);
    console.log(`Supported features: ${msg.features.join(", ")}`);
    
    // Send first request
    ws.send(JSON.stringify({
      type: "req",
      id: crypto.randomUUID(),
      method: "sessions.list",
      params: {}
    }));
  }
  
  if (msg.type === "res" && msg.ok) {
    console.log("Session list:", msg.payload.sessions);
  }
  
  if (msg.type === "event" && msg.event === "session.message") {
    console.log(`[seq:${msg.seq}] Agent message:`, msg.payload.content);
  }
});

Chapter Summary

This chapter explored the core mechanisms of the OpenClaw Gateway control plane in depth:

  1. localhost binding is the first line of security defense, delegating network exposure decisions to the administrator's proxy configuration
  2. Three message formats (req/res/event) cover all interaction scenarios; seq and stateVersion support reliable delivery
  3. Three-step handshake (nonce challenge → auth request → hello-ok) combined with HMAC-SHA256 signing prevents replay attacks
  4. Four authentication paths accommodate different security requirements from local development to enterprise deployment
  5. dmPolicy four modes provide flexible device access control, from fully open to strict allowlist
  6. Session Key three-segment composite structure (agent:ID:context) supports precise identification across multiple Agents and contexts
  7. Single-writer mode eliminates distributed consistency problems entirely in exchange for architectural simplicity

The next chapter dives into Command Queue implementation, understanding how Lane isolation ensures correct concurrency semantics for different types of tasks.

Rate this chapter
4.7  / 5  (64 ratings)

💬 Comments