Gateway Control Plane: WebSocket Protocol, Three-Step Handshake and Session Resolution
Chapter 6: Gateway Control Plane: WebSocket Protocol, Three-Step Handshake, and Session Resolution Algorithm
6.1 Why Gateway Binds to localhost
In distributed systems, the network attack surface is the first line of defense in security design. OpenClaw Gateway defaults to binding its WebSocket service to localhost:18789 rather than 0.0.0.0:18789. This seemingly minor configuration decision carries profound security philosophy behind it.
6.1.1 The Security Foundation of localhost Binding
Binding to 0.0.0.0 means the service listens on all network interfaces, including public IP addresses, VPN interfaces, and container bridges. Any process or remote node that can access the host's network can attempt a connection. For a service holding control over AI Agents, this level of exposure is unacceptable.
Binding to 127.0.0.1 (localhost) restricts connection sources to processes on the same host. External network traffic is dropped at the OS TCP/IP stack level before ever reaching the Gateway process. This provides several layers of protection:
Network Isolation Layers (outside in):
[External Internet] ──×──→ [Router/Firewall]
│
[VPN/Container Network] ──×──→ [Network Interface]
│
[Other LAN Hosts] ──×──→ [OS TCP/IP Stack]
│
[Same-host Processes] ──────→ [127.0.0.1:18789] ← Gateway actually listens here
6.1.2 Explicit Trust Boundary Definition
The localhost binding draws a clear trust boundary at the OS process isolation layer. Only the following entities can legitimately connect:
- CLI clients on the same host (
openclawcommand-line tool) - Web UI on the same host (proxied through a local HTTP server)
- Reverse proxies (nginx, caddy) explicitly configured by an administrator
This design shifts network exposure responsibility to the administrator's proxy configuration rather than defaulting to open access. Administrators who need to expose the Gateway externally must explicitly configure a reverse proxy and accept the associated security responsibility — a manifestation of the "secure by default" principle.
6.1.3 Collaborative Defense with dmPolicy
Even if Gateway is exposed to the public internet through a reverse proxy, dmPolicy (Device Management Policy) provides a second line of defense. This defense-in-depth approach ensures that a single configuration mistake does not result in complete system compromise.
6.2 Three WebSocket Message Formats
The Gateway protocol defines three message types that cover all client-server interaction scenarios. All messages are UTF-8 encoded JSON text frames.
6.2.1 Request Message (req)
Format of client-initiated request messages:
{
"type": "req",
"id": "7f3a9b2c-4e1d-4f8a-b6e5-2c8d9f0a1b3c",
"method": "sessions.list",
"params": {
"filter": "active",
"limit": 20
}
}
Field descriptions:
| Field | Type | Description |
|---|---|---|
type |
"req" |
Fixed identifier marking this as a request message |
id |
UUID v4 string | Unique request identifier, echoed back in the server response |
method |
string | RPC method name, using dot-separated hierarchical naming (namespace.action) |
params |
object | Method parameters, may be an empty object {} |
Common method list:
sessions.list - List all sessions
sessions.create - Create a new session
sessions.get - Get details of a single session
sessions.terminate - Terminate a session
messages.send - Send a message to a session
agent.interrupt - Interrupt current Agent execution
config.get - Read configuration values
config.set - Write configuration values
6.2.2 Response Message (res)
Server responses to requests:
Success response:
{
"type": "res",
"id": "7f3a9b2c-4e1d-4f8a-b6e5-2c8d9f0a1b3c",
"ok": true,
"payload": {
"sessions": [
{
"id": "sess_01HXYZ",
"contextKey": "default",
"agentId": "agent_claude",
"status": "idle",
"createdAt": "2024-01-15T08:30:00Z"
}
],
"total": 1
}
}
Error response:
{
"type": "res",
"id": "7f3a9b2c-4e1d-4f8a-b6e5-2c8d9f0a1b3c",
"ok": false,
"error": {
"code": "SESSION_NOT_FOUND",
"message": "Session sess_NOTEXIST does not exist",
"details": {
"sessionId": "sess_NOTEXIST"
}
}
}
| Field | Type | Description |
|---|---|---|
type |
"res" |
Fixed identifier |
id |
UUID v4 | Same as the corresponding request's id |
ok |
boolean | true on success, false on failure |
payload |
object | Response data on success (present when ok=true) |
error |
object | Error information (present when ok=false) |
6.2.3 Event Message (event)
Server-initiated pushes that do not correspond to any request:
{
"type": "event",
"event": "session.message",
"payload": {
"sessionId": "sess_01HXYZ",
"messageId": "msg_001",
"role": "assistant",
"content": [
{
"type": "text",
"text": "I've analyzed your code and found three issues..."
}
]
},
"seq": 42,
"stateVersion": 7
}
| Field | Type | Description |
|---|---|---|
type |
"event" |
Fixed identifier |
event |
string | Event type name |
payload |
object | Event data |
seq |
integer | Globally incrementing sequence number; clients can detect message loss |
stateVersion |
integer | Gateway internal state version; supports cache validation after reconnection |
The distinction between seq and stateVersion: seq is a monotonically increasing counter for message delivery order; stateVersion tracks how many times the Gateway's internal state has changed, allowing clients to determine whether their cached state is still valid.
Common event types:
session.message - Agent produced a new message (text/tool call)
session.message.delta - Streaming message increment
session.status.changed - Session status change (idle/running/error)
session.tool.started - Tool began execution
session.tool.completed - Tool execution completed
gateway.reload - Gateway configuration hot-reload notification
6.3 Three-Step Handshake Sequence Diagram
Connection establishment is not a simple WebSocket upgrade — it is an explicit three-step authentication handshake.
Client Gateway
│ │
│ ─── HTTP Upgrade (WebSocket) ──────────→ │
│ │
│ ←── 101 Switching Protocols ────────────── │
│ │
│ ┌──── Step 1: Challenge ───┐ │
│ │ │ │
│ ←── {"type":"challenge", │ │
│ "nonce":"a3f8...", │ │
│ "timestamp":1705123456} │ │
│ └──────────────────────────┘ │
│ │
│ ┌──── Step 2: Auth Request ──┐ │
│ │ │ │
│ ─── {"type":"connect", │ │
│ "role":"client", │ │
│ "scopes":["read","write"], │ │
│ "credentials":{...}, │ │
│ "signedNonce":"b9c2..."} │ │
│ ─────────────────────────────────────────→ │
│ └────────────────────────────┘ │
│ │
│ ┌──── Step 3: Confirmation ──┐ │
│ │ │ │
│ ←── {"type":"hello-ok", │ │
│ "protocolVersion":"1.2", │ │
│ "features":["streaming", │ │
│ "sub-agents"], │ │
│ "payloadPolicy":{...}} │ │
│ └────────────────────────────┘ │
│ │
│ ════ Normal Communication Phase ════════════ │
6.3.1 Step 1: Nonce Challenge
Immediately after the WebSocket handshake completes (without waiting for any client data), the Gateway pushes a challenge message:
{
"type": "challenge",
"nonce": "a3f8b9c2d1e4f5a6b7c8d9e0f1a2b3c4",
"timestamp": 1705123456789
}
nonce: 32 bytes of random data, hex-encoded, uniquely generated per connectiontimestamp: Server's current Unix timestamp in milliseconds; clients must complete authentication within 30 seconds
The time window restriction prevents replay attacks: even if an attacker intercepts an old authentication message, they cannot reuse it after the timeout expires.
6.3.2 Step 2: Authentication Request
After receiving the challenge, the client sends a connect message:
{
"type": "connect",
"role": "client",
"scopes": ["sessions.read", "sessions.write", "messages.send"],
"credentials": {
"method": "shared-secret",
"token": "sk-openclaw-..."
},
"signedNonce": "HMAC-SHA256(token, nonce + timestamp)"
}
The signedNonce calculation:
signedNonce = base64url(HMAC-SHA256(
key = credentials.token,
data = nonce + ":" + timestamp
))
This design ensures:
- The token itself is never transmitted over the network (prevents interception by a man-in-the-middle)
- Each connection produces a different signature value (prevents replay attacks)
- The server can verify that the client holds the correct key
6.3.3 Step 3: Hello-OK Confirmation
After successful authentication, the Gateway sends the handshake completion message:
{
"type": "hello-ok",
"protocolVersion": "1.2",
"features": [
"streaming",
"sub-agents",
"tool-interruption",
"session-branching"
],
"payloadPolicy": {
"maxMessageSize": 4194304,
"compressionEnabled": true,
"streamingChunkSize": 1024
},
"sessionId": "conn_01HXYZ",
"serverTime": 1705123457123
}
Clients should use the features list to determine which advanced features to enable, ensuring backward compatibility.
6.4 Four Authentication Paths
Gateway supports four authentication methods, suited to different deployment scenarios.
6.4.1 Shared Secret
The simplest authentication method, suitable for single-user or trusted environments:
{
"credentials": {
"method": "shared-secret",
"token": "sk-openclaw-a1b2c3d4e5f6..."
}
}
Configuration file config/gateway.yaml:
auth:
sharedSecret:
enabled: true
token: "${OPENCLAW_SECRET}" # Read from environment variable, avoid plaintext storage
minLength: 32 # Enforce minimum length
Use cases: Local development, single-user deployment, trusted intranet environments Security warning: Tokens must be rotated immediately upon compromise; use a key management system (KMS) where possible
6.4.2 Identity Proxy
Pass identity information through a trusted reverse proxy:
{
"credentials": {
"method": "trusted-proxy",
"proxyHeader": "X-Tailscale-User"
}
}
Client ──→ [Tailscale/nginx] ──X-Tailscale-User: [email protected]──→ Gateway
Configuration:
auth:
trustedProxy:
enabled: true
trustedNetworks:
- "100.64.0.0/10" # Tailscale CGNAT range
- "127.0.0.1/32"
userHeader: "X-Tailscale-User"
groupHeader: "X-Tailscale-Groups"
Use cases: Enterprise intranet, Tailscale overlay networks, SSO integration Security requirement: Must ensure Gateway only accepts connections from trusted proxies; otherwise attackers can forge headers
6.4.3 Device Token
Devices obtain a long-lived token after initial pairing and reuse it for subsequent connections:
{
"credentials": {
"method": "device-token",
"deviceId": "dev_01HXYZ",
"token": "dt-a1b2c3..."
}
}
Pairing flow:
1. Device initiates pairing request → Gateway generates verification code (6 digits)
2. Administrator confirms the code in the console
3. Gateway issues a device token, stored in devices.json
4. Subsequent connections use the device token, no re-pairing needed
Use cases: IoT devices, unattended automation nodes, mobile clients Security feature: Device tokens can be individually revoked without affecting other devices
6.4.4 Bootstrap Token
A one-time, high-privilege token for initial configuration or emergency recovery:
# Set only in configuration file, never exposed through the API
auth:
bootstrapToken:
token: "${OPENCLAW_BOOTSTRAP}"
allowedScopes:
- "admin.config"
- "devices.pair"
expiresAt: "2024-02-01T00:00:00Z" # Expiration time is mandatory
Use cases: Initial deployment, admin password reset, emergency access Security warning: Disable immediately after use, or set an extremely short expiration time
6.5 dmPolicy: Four Modes
Device Management Policy controls which clients can connect to the Gateway.
6.5.1 pairing (Default Mode)
gateway:
dmPolicy: pairing
Workflow:
New device attempts to connect
│
▼
Gateway generates 6-digit verification code
│
▼
Displayed in Gateway console/logs
│
▼
Admin manually confirms ──No──→ Connection rejected
│
Yes
▼
Device added to allowlist, device token issued
Characteristics: Human review of every new device, prevents unauthorized device access Use cases: Production environments, multi-user teams, security-sensitive deployments
6.5.2 allowlist (Whitelist Mode)
gateway:
dmPolicy: allowlist
allowedDevices:
- deviceId: "dev_laptop_alice"
name: "Alice's Work Laptop"
publicKey: "ssh-ed25519 AAAA..."
- deviceId: "dev_ci_runner"
name: "CI/CD Build Machine"
publicKey: "ssh-ed25519 BBBB..."
Only explicitly listed devices can connect. Unlisted devices are rejected immediately without generating verification codes.
Use cases: Fixed device sets, pre-configured enterprise environments
6.5.3 open (Open Mode)
gateway:
dmPolicy: open
# Must also configure allowlist as supplementary filter
requireValidCredentials: true
Any client with valid credentials can connect without pre-registration.
Security trade-off: Convenient for development and testing, but loses device-level access control. Must be used in conjunction with strong credential authentication (shared secret or identity proxy).
Use cases: Development and test environments, one-off scripting tools
6.5.4 disabled (Completely Disabled)
gateway:
dmPolicy: disabled
Completely disables the device management feature. All device management APIs (pairing, revocation, listing) return a FEATURE_DISABLED error.
Use cases: Deployments that rely entirely on external IAM systems (such as Tailscale ACLs) for access control
6.6 Session Key Generation Algorithm and sessions.json Storage
6.6.1 The Three-Segment Structure of Session Keys
A Session Key is the composite key that uniquely identifies a conversation session in OpenClaw, in the format:
agent:<agentId>:<contextKey>
Examples:
agent:claude-3-opus:workspace-alice
agent:gpt-4:project-backend
agent:claude-3-sonnet:default
The three components:
| Component | Description | Example |
|---|---|---|
agent |
Fixed prefix indicating this is an Agent session key | agent |
agentId |
Unique ID defined in the Agent configuration | claude-3-opus |
contextKey |
Distinguishes different contexts for the same Agent | workspace-alice |
6.6.2 contextKey Generation Rules
contextKey is automatically generated based on the channel source:
// Pseudocode: contextKey generation logic
function generateContextKey(source) {
switch (source.type) {
case "discord-dm":
return `dm:${source.userId}`;
case "discord-channel":
return `channel:${source.channelId}`;
case "slack-thread":
return `thread:${source.threadTs}`;
case "api-explicit":
return source.contextKey; // Client specifies explicitly
default:
return "default";
}
}
6.6.3 sessions.json Storage Format
Session state is persisted in data/sessions.json:
{
"version": 2,
"sessions": {
"agent:claude-3-opus:dm:user123": {
"id": "sess_01HXYZ",
"key": "agent:claude-3-opus:dm:user123",
"agentId": "claude-3-opus",
"contextKey": "dm:user123",
"status": "idle",
"createdAt": "2024-01-15T08:00:00Z",
"lastActiveAt": "2024-01-15T09:30:00Z",
"messageCount": 47,
"transcriptPath": "data/transcripts/sess_01HXYZ.jsonl",
"metadata": {
"userId": "user123",
"channelType": "discord-dm",
"guildId": null
}
}
},
"updatedAt": "2024-01-15T09:30:00Z"
}
6.6.4 Session Lookup Algorithm
Input: agentId + source information
│
▼
Compute contextKey (based on source type)
│
▼
Construct Session Key = "agent:" + agentId + ":" + contextKey
│
▼
Look up the key in sessions.json
│
┌─────┴─────┐
Found Not Found
│ │
▼ ▼
Check freshness Create new Session
(see ch10) and persist it
│
▼
Return existing Session
6.7 Single-Writer Mode: Eliminating Distributed Consistency Problems
6.7.1 The Core Problem
In traditional distributed AI systems, multiple nodes may simultaneously hold session state and attempt to modify it:
[Node A] ──read session state──→ [Shared Storage]
[Node B] ──read session state──→ [Shared Storage]
[Node A] ──write (append msg)──→ [Shared Storage] ← Overwrites Node B's view!
[Node B] ──write (append msg)──→ [Shared Storage] ← State inconsistency!
This requires introducing distributed locks, CAS (compare-and-swap) operations, version vectors, and other complex mechanisms.
6.7.2 The Single-Writer Solution
OpenClaw Gateway adopts a simpler strategy: each Session has exactly one writer — the Gateway itself.
[Client A] ──req──→ [Gateway] ──write──→ [sessions.json]
[Client B] ──req──→ [Gateway] ──write──→ [sessions.json]
Inside Gateway: Command Queue's Session Lane (serial)
ensures operations on the same Session execute sequentially
All modifications to a Session are serialized through the Gateway's Session Lane (a serial queue). Clients only send requests; the Gateway is the sole state writer.
6.7.3 Practical Outcomes
| Problem | Traditional Distributed Approach | Single-Writer Approach |
|---|---|---|
| Concurrent write conflicts | Requires distributed locks | Non-existent (serialized) |
| State inconsistency | Requires eventual consistency protocols | Non-existent (single storage) |
| Read-write consistency | Requires memory barriers | Naturally guaranteed |
| Failure recovery | Requires log replay | sessions.json is the source of truth |
| Complexity | Extremely high | Extremely low |
The trade-off: The Gateway becomes a single point. But for an AI Agent control plane, this is a reasonable trade-off — the Gateway typically runs on the same machine as the user, and high availability needs are handled at a higher layer (such as Kubernetes Pod restarts) rather than through internal Gateway distribution.
6.8 Complete Connection Example
The following is a complete WebSocket interaction example showing the full flow from connection to session creation:
// Client implementation example (Node.js / TypeScript)
import { WebSocket } from "ws";
import { createHmac } from "crypto";
const ws = new WebSocket("ws://localhost:18789");
ws.on("message", async (data) => {
const msg = JSON.parse(data.toString());
if (msg.type === "challenge") {
// Step 2: Send authentication request
const signedNonce = createHmac("sha256", process.env.OPENCLAW_SECRET!)
.update(`${msg.nonce}:${msg.timestamp}`)
.digest("base64url");
ws.send(JSON.stringify({
type: "connect",
role: "client",
scopes: ["sessions.read", "sessions.write", "messages.send"],
credentials: {
method: "shared-secret",
token: process.env.OPENCLAW_SECRET!
},
signedNonce
}));
}
if (msg.type === "hello-ok") {
console.log(`Protocol version: ${msg.protocolVersion}`);
console.log(`Supported features: ${msg.features.join(", ")}`);
// Send first request
ws.send(JSON.stringify({
type: "req",
id: crypto.randomUUID(),
method: "sessions.list",
params: {}
}));
}
if (msg.type === "res" && msg.ok) {
console.log("Session list:", msg.payload.sessions);
}
if (msg.type === "event" && msg.event === "session.message") {
console.log(`[seq:${msg.seq}] Agent message:`, msg.payload.content);
}
});
Chapter Summary
This chapter explored the core mechanisms of the OpenClaw Gateway control plane in depth:
- localhost binding is the first line of security defense, delegating network exposure decisions to the administrator's proxy configuration
- Three message formats (req/res/event) cover all interaction scenarios;
seqandstateVersionsupport reliable delivery - Three-step handshake (nonce challenge → auth request → hello-ok) combined with HMAC-SHA256 signing prevents replay attacks
- Four authentication paths accommodate different security requirements from local development to enterprise deployment
- dmPolicy four modes provide flexible device access control, from fully open to strict allowlist
- Session Key three-segment composite structure (agent:ID:context) supports precise identification across multiple Agents and contexts
- Single-writer mode eliminates distributed consistency problems entirely in exchange for architectural simplicity
The next chapter dives into Command Queue implementation, understanding how Lane isolation ensures correct concurrency semantics for different types of tasks.