Chapter 36

Nodes Architecture: Physical Device Integration (macOS, iOS, Android and Raspberry Pi)

Chapter 36: Nodes Architecture — Physical Device Integration (macOS / iOS / Android / Raspberry Pi)

Overview

OpenClaw's Node system is one of the most distinctive designs in the entire platform. It brings real physical devices — your desktop Mac, the iPhone in your pocket, an Android phone, or a Raspberry Pi quietly running in the corner — fully into the perception and execution scope of the Agent. This chapter starts from architectural principles and systematically covers the design intent, connection mechanism, capability declaration system, and per-platform capability matrix for Nodes, helping you truly understand what it means to give an AI Agent a physical presence.


36.1 Node Design Intent: Why Physical Device Nodes Are Needed

In a purely cloud-based AI Agent architecture, the Agent's perception boundary ends at the API — it can call search engines, read files, and execute code, but it has no reach into the physical world. It doesn't know the temperature in your room, cannot capture the current scene on camera, and cannot send a text message from your phone.

OpenClaw's Node architecture solves this boundary problem. The core idea is:

Physical devices are themselves the Agent's sensory organs and executing limbs.

A Node is not a standalone AI — it is a Capability Provider. After connecting to the Gateway, it declares what it can do and then waits for Agent invocations. When the Agent needs to "take a photo," it routes the tool call through the Gateway to the iOS Node that holds camera.* capabilities. When the Agent needs to "run a shell command on a server," it routes to the Headless Node that holds system.run.

This design delivers three core advantages:

  1. Composable capabilities: Capabilities from different devices can be combined within a single Agent session.
  2. Clear security boundaries: Each Node only declares and exposes the capabilities it explicitly supports, with no implicit over-privileged access.
  3. Horizontal scalability: Adding a new device Node requires no changes to the Gateway or Agent core — only the pairing flow needs to be completed.

36.2 WebSocket Connection Mechanism: role:"node"

Nodes connect to the Gateway via WebSocket protocol, carrying a special role field in the connection message to identify themselves.

Connection Handshake Flow

Node initiates WebSocket connection
  → URL: ws://<gateway-host>:18789/ws
  → Handshake message includes:
      {
        "role": "node",
        "nodeId": "<unique identifier>",
        "displayName": "Hexin's iPhone 15 Pro",
        "platform": "ios",
        "capabilities": [ ... ]
      }

Gateway responds:
  → Validates whether nodeId is authorized
  → Returns { "status": "approved" } or { "status": "pending" }
  → If pending, Node waits until an admin runs approve

Connection State Machine

[Unpaired] ──approve──→ [Paired/Online]
     ↑                        ↓
     └────disconnect timeout── [Offline/Reconnecting]

Nodes maintain a heartbeat with the Gateway (default 30-second interval). When the Gateway detects a Node has disconnected, it marks its status as offline. Tool calls routed to that Node will return an error or trigger fallback logic.

Node ID Generation Rules

Each Node generates a persistent UUID as its nodeId on first connection, stored locally on the device (iOS Keychain / Android Keystore / macOS Keychain) and preserved across restarts. This ensures the Gateway can recognize "the same device reconnecting" rather than "a brand new unknown device."


36.3 Capability Declaration System

The Capability Declaration System is the core mechanism of Node architecture. It solves a fundamental problem: How does the Gateway know what a given Node can do?

Declaration Format

Each Capability consists of a namespace and an action, using the namespace.action format:

{
  "capabilities": [
    "camera.snap",
    "camera.record",
    "camera.screenshot",
    "location.get",
    "screen.record"
  ]
}

Full Lifecycle of a Capability Declaration

1. Node establishes WebSocket connection
2. Node sends capabilities list (static declaration)
3. Gateway writes the declaration into its in-memory registry
4. Agent initiates a tool call specifying a required capability
5. Gateway queries the registry, finds online Nodes holding that capability
6. Gateway routes the tool call to the target Node
7. Node executes and returns the result
8. When Node disconnects, Gateway removes all its capability entries from the registry

Routing Selection Strategy

When multiple Nodes declare the same capability (e.g., two iPhones are both connected), the Gateway selects according to:

Dynamic Capability Updates

Nodes can dynamically modify their declared capability list at runtime by sending a capability_update message. A common scenario: the camera permission is revoked by the user in system settings, and the iOS Node proactively removes camera.* related capabilities.

{
  "type": "capability_update",
  "add": [],
  "remove": ["camera.snap", "camera.record"]
}

36.4 Complete Device Pairing Flow

Step 1: Node Initiates a Connection Request

Install and launch the OpenClaw Node client on the target device (iOS/Android App or macOS menu bar app). The Node automatically attempts to connect to the configured Gateway address. On first connection, it enters the pending state.

Step 2: List Pending Devices

Run on the Gateway host:

openclaw devices list

Sample output:

ID          DISPLAY NAME              PLATFORM   STATUS    REQUESTED
req-a1b2    Hexin's iPhone 15 Pro     ios        pending   2026-04-26 09:15:33
req-c3d4    Raspberry Pi 4B           headless   pending   2026-04-26 09:16:01
req-e5f6    Hexin's MacBook Pro       macos      pending   2026-04-26 09:16:45

Step 3: Approve a Device

openclaw devices approve req-a1b2

Output:

✓ Device approved: Hexin's iPhone 15 Pro (req-a1b2)
  Node ID assigned: node-7f8a9b0c
  Capabilities registered: 5

Batch approval is also supported:

openclaw devices approve req-a1b2 req-c3d4 req-e5f6

Step 4: Verify Node Status

openclaw nodes status

Sample output:

NODE ID         DISPLAY NAME              PLATFORM   STATUS   CAPABILITIES   LAST SEEN
node-7f8a9b0c   Hexin's iPhone 15 Pro     ios        online   5              just now
node-2d3e4f50   Raspberry Pi 4B           headless   online   2              just now
node-9a8b7c6d   Hexin's MacBook Pro       macos      online   6              just now

Step 5 (Optional): Inspect Specific Node Capabilities

openclaw nodes status --node node-7f8a9b0c --verbose
Node: Hexin's iPhone 15 Pro (node-7f8a9b0c)
Platform: ios | Status: online | Uptime: 2m 34s

Capabilities:
  ✓ camera.snap         Capture a static photo
  ✓ camera.record       Record video (≤60s mp4)
  ✓ camera.screenshot   Capture a screen screenshot
  ✓ location.get        Retrieve GPS coordinates
  ✓ screen.record       Screen recording (≤60s)

36.5 Per-Platform Capability Matrix

macOS Node Capabilities

Capability ID Description
system.run Execute shell commands on macOS
system.which Locate executable file paths
system.notify Send system notifications (Notification Center)
execApprovals Command execution approval (requires human confirmation)
canvas.* Control the OpenClaw Canvas UI

The macOS Node is the core of "local automation." system.run gives the Agent the ability to execute arbitrary shell commands on your Mac (subject to the execApprovals mechanism), while canvas.* allows the Agent to render content visually on the canvas.

iOS Node Capabilities

Capability ID Description
canvas.* Canvas UI control
camera.snap Capture a static photo
camera.record Record video (up to 60 seconds, mp4 format)
camera.screenshot Capture the current screen screenshot
screen.record Screen recording (up to 60 seconds)
location.get Retrieve current GPS coordinates

iOS's sandboxing makes system.run impossible, but camera and location capabilities make the iOS Node the ideal "on-site perception device."

Android Node Capabilities

The Android Node inherits all iOS capabilities and additionally provides:

Capability ID Description
device.status Get device status (battery/signal/storage)
device.info Get device hardware information
device.permissions Query runtime permission status
device.health Device health report
notifications.list List current notification bar notifications
notifications.actions Perform actions on notifications (read/delete)
photos.latest Retrieve the latest photos
contacts.search Search contacts
contacts.add Add a contact
calendar.events Read calendar events
calendar.add Create a calendar event
callLog.search Search call logs
sms.send Send an SMS message
sms.search Search SMS message records
motion.activity Get motion state (walking/cycling/driving)
motion.pedometer Get step count data

Android's openness makes it the most capability-rich Node platform.

Headless Node Capabilities (Raspberry Pi / Server)

Capability ID Description
system.run Remote shell command execution
system.which Locate executables

The power of the Headless Node lies not in the number of capabilities but in the unlimited reach of system.run — through the shell, it can indirectly control GPIO pins, read sensors, trigger scripts, and bring any programmable hardware behavior into the Agent's execution scope.


36.6 Android-Specific Capability Use Cases

Android's additional capabilities open up a range of scenarios that are difficult to achieve with traditional AI Agents:

Scenario 1: Smart Assistant Schedule Integration

User: "Add next Wednesday's dentist appointment to my calendar and send the clinic a confirmation text"
Agent:
  1. calendar.add → Create calendar event "Dentist Appointment Wed 14:00"
  2. contacts.search → Search "Dental Clinic" to get phone number
  3. sms.send → Send "Hello, confirming the appointment for next Wednesday at 14:00, thank you"

Scenario 2: Health Data Tracking

User: "How many steps did I take today? Compare with this week's average"
Agent:
  1. motion.pedometer → Get today's step count: 8,432
  2. motion.pedometer (batch) → Get daily steps for this week
  3. Calculate average and generate comparison report

Scenario 3: Device Health Monitoring

Scheduled task (every 6 hours):
  1. device.health → Get health report
  2. device.status → Check battery/storage
  3. If storage < 10% → Notify user to free space
  4. If battery < 20% → Send reminder

Scenario 4: Intelligent Notification Filtering

User: "Help me organize all my bank notifications from today"
Agent:
  1. notifications.list → Retrieve all notifications
  2. Filter entries where sender is a banking app
  3. Extract amount/time/type information
  4. Compile into a structured report

36.7 Node and Agent Interaction Patterns

Direct Tool Call Pattern

This is the most common pattern. The Agent decides during reasoning that it needs a capability and invokes it directly via the Tool Use mechanism:

{
  "tool": "node_invoke",
  "parameters": {
    "capability": "camera.snap",
    "nodeId": "node-7f8a9b0c",
    "options": {
      "quality": "high",
      "flash": "auto"
    }
  }
}

Streaming Result Transfer Pattern

For capabilities that produce large files, such as camera.record and screen.record, the Node uses chunked streaming:

Node → Gateway → Agent:
  chunk_1: { "type": "media_chunk", "index": 0, "data": "<base64>" }
  chunk_2: { "type": "media_chunk", "index": 1, "data": "<base64>" }
  ...
  final:   { "type": "media_complete", "totalSize": 2048576, "format": "mp4" }

Event Push Pattern (Node → Agent)

Nodes can proactively push events to the Agent without the Agent needing to poll:

{
  "type": "node_event",
  "source": "node-7f8a9b0c",
  "event": "location_update",
  "data": {
    "lat": 31.2304,
    "lng": 121.4737,
    "accuracy": 5.0
  }
}

This pattern is used for continuous tracking with location.get.

Sub-Agent Delegation Pattern

In complex tasks, the primary Agent can delegate sub-tasks involving Node operations to Sub-Agents:

Primary Agent: "Take three photos from different angles and merge them into a panorama"
  → Delegate Sub-Agent-1: camera.snap (angle A)
  → Delegate Sub-Agent-2: camera.snap (angle B)
  → Delegate Sub-Agent-3: camera.snap (angle C)
  → Primary Agent aggregates results and performs the merge

36.8 Node Security Considerations

Pairing Verification Mechanism

Every step of the pairing flow has security guarantees:

  1. No automatic trust on first connection: All new Nodes enter the pending state and must be explicitly approved by an administrator.
  2. nodeId binding: After approval, the nodeId is bound to device hardware (via system Keychain/Keystore); impersonation requires physical access to the device.
  3. TLS-encrypted transport: All WebSocket communications are TLS-encrypted (WireGuard-encrypted tunnel when used with Tailscale).

Principle of Minimal Capability Declaration

Nodes should follow the principle of least privilege and only declare capabilities that are actually needed:

// Bad: declaring all capabilities
{ "capabilities": ["camera.*", "location.*", "sms.*", "contacts.*"] }

// Good: declare on demand
{ "capabilities": ["camera.snap", "location.get"] }

execApprovals: Human Approval for Command Execution

For high-risk capabilities like system.run, OpenClaw provides the execApprovals mechanism:

{
  "execApprovals": {
    "enabled": true,
    "requireApproval": ["rm", "sudo", "curl", "wget", "chmod"],
    "autoApprove": ["ls", "pwd", "echo", "cat"]
  }
}

When the Agent attempts to execute a command in the requireApproval list, execution pauses and a pending confirmation item appears in the Control UI's approval queue, waiting for human approval before continuing.

Capability Revocation

Administrators can revoke specific capabilities from specific Nodes at any time:

# Temporarily disable SMS sending for a Node
openclaw nodes capability revoke node-xxxx sms.send

# Fully disconnect a Node
openclaw nodes disconnect node-xxxx

36.9 Summary

The Node architecture is the pivotal design that evolves OpenClaw from a "pure software Agent" to a "physical-world-aware Agent." Through the WebSocket role:"node" connection mechanism, structured Capability Declaration System, and strict pairing approval flow, OpenClaw seamlessly integrates the perception and execution capabilities of physical devices into the Agent's tool-calling framework while maintaining clear security boundaries.

The next chapter moves into hands-on practice: how to configure a Raspberry Pi as a 24/7 always-on Agent Node.


Next Chapter: Chapter 37 — Edge Computing in Practice: Complete Setup of a Raspberry Pi 24/7 Always-On Agent Node

Rate this chapter
4.6  / 5  (3 ratings)

💬 Comments