Nodes Architecture: Physical Device Integration (macOS, iOS, Android and Raspberry Pi)
Chapter 36: Nodes Architecture — Physical Device Integration (macOS / iOS / Android / Raspberry Pi)
Overview
OpenClaw's Node system is one of the most distinctive designs in the entire platform. It brings real physical devices — your desktop Mac, the iPhone in your pocket, an Android phone, or a Raspberry Pi quietly running in the corner — fully into the perception and execution scope of the Agent. This chapter starts from architectural principles and systematically covers the design intent, connection mechanism, capability declaration system, and per-platform capability matrix for Nodes, helping you truly understand what it means to give an AI Agent a physical presence.
36.1 Node Design Intent: Why Physical Device Nodes Are Needed
In a purely cloud-based AI Agent architecture, the Agent's perception boundary ends at the API — it can call search engines, read files, and execute code, but it has no reach into the physical world. It doesn't know the temperature in your room, cannot capture the current scene on camera, and cannot send a text message from your phone.
OpenClaw's Node architecture solves this boundary problem. The core idea is:
Physical devices are themselves the Agent's sensory organs and executing limbs.
A Node is not a standalone AI — it is a Capability Provider. After connecting to the Gateway, it declares what it can do and then waits for Agent invocations. When the Agent needs to "take a photo," it routes the tool call through the Gateway to the iOS Node that holds camera.* capabilities. When the Agent needs to "run a shell command on a server," it routes to the Headless Node that holds system.run.
This design delivers three core advantages:
- Composable capabilities: Capabilities from different devices can be combined within a single Agent session.
- Clear security boundaries: Each Node only declares and exposes the capabilities it explicitly supports, with no implicit over-privileged access.
- Horizontal scalability: Adding a new device Node requires no changes to the Gateway or Agent core — only the pairing flow needs to be completed.
36.2 WebSocket Connection Mechanism: role:"node"
Nodes connect to the Gateway via WebSocket protocol, carrying a special role field in the connection message to identify themselves.
Connection Handshake Flow
Node initiates WebSocket connection
→ URL: ws://<gateway-host>:18789/ws
→ Handshake message includes:
{
"role": "node",
"nodeId": "<unique identifier>",
"displayName": "Hexin's iPhone 15 Pro",
"platform": "ios",
"capabilities": [ ... ]
}
Gateway responds:
→ Validates whether nodeId is authorized
→ Returns { "status": "approved" } or { "status": "pending" }
→ If pending, Node waits until an admin runs approve
Connection State Machine
[Unpaired] ──approve──→ [Paired/Online]
↑ ↓
└────disconnect timeout── [Offline/Reconnecting]
Nodes maintain a heartbeat with the Gateway (default 30-second interval). When the Gateway detects a Node has disconnected, it marks its status as offline. Tool calls routed to that Node will return an error or trigger fallback logic.
Node ID Generation Rules
Each Node generates a persistent UUID as its nodeId on first connection, stored locally on the device (iOS Keychain / Android Keystore / macOS Keychain) and preserved across restarts. This ensures the Gateway can recognize "the same device reconnecting" rather than "a brand new unknown device."
36.3 Capability Declaration System
The Capability Declaration System is the core mechanism of Node architecture. It solves a fundamental problem: How does the Gateway know what a given Node can do?
Declaration Format
Each Capability consists of a namespace and an action, using the namespace.action format:
{
"capabilities": [
"camera.snap",
"camera.record",
"camera.screenshot",
"location.get",
"screen.record"
]
}
Full Lifecycle of a Capability Declaration
1. Node establishes WebSocket connection
2. Node sends capabilities list (static declaration)
3. Gateway writes the declaration into its in-memory registry
4. Agent initiates a tool call specifying a required capability
5. Gateway queries the registry, finds online Nodes holding that capability
6. Gateway routes the tool call to the target Node
7. Node executes and returns the result
8. When Node disconnects, Gateway removes all its capability entries from the registry
Routing Selection Strategy
When multiple Nodes declare the same capability (e.g., two iPhones are both connected), the Gateway selects according to:
- Specified nodeId: The Agent explicitly sets
targetNodeIdin the call for exact routing. - Priority weight: Nodes can declare a
priorityfield on connection; higher values take precedence. - Most recently active: The default strategy — selects the Node with the most recent communication activity.
- Round-Robin: Used when
routing: "round-robin"is enabled in configuration.
Dynamic Capability Updates
Nodes can dynamically modify their declared capability list at runtime by sending a capability_update message. A common scenario: the camera permission is revoked by the user in system settings, and the iOS Node proactively removes camera.* related capabilities.
{
"type": "capability_update",
"add": [],
"remove": ["camera.snap", "camera.record"]
}
36.4 Complete Device Pairing Flow
Step 1: Node Initiates a Connection Request
Install and launch the OpenClaw Node client on the target device (iOS/Android App or macOS menu bar app). The Node automatically attempts to connect to the configured Gateway address. On first connection, it enters the pending state.
Step 2: List Pending Devices
Run on the Gateway host:
openclaw devices list
Sample output:
ID DISPLAY NAME PLATFORM STATUS REQUESTED
req-a1b2 Hexin's iPhone 15 Pro ios pending 2026-04-26 09:15:33
req-c3d4 Raspberry Pi 4B headless pending 2026-04-26 09:16:01
req-e5f6 Hexin's MacBook Pro macos pending 2026-04-26 09:16:45
Step 3: Approve a Device
openclaw devices approve req-a1b2
Output:
✓ Device approved: Hexin's iPhone 15 Pro (req-a1b2)
Node ID assigned: node-7f8a9b0c
Capabilities registered: 5
Batch approval is also supported:
openclaw devices approve req-a1b2 req-c3d4 req-e5f6
Step 4: Verify Node Status
openclaw nodes status
Sample output:
NODE ID DISPLAY NAME PLATFORM STATUS CAPABILITIES LAST SEEN
node-7f8a9b0c Hexin's iPhone 15 Pro ios online 5 just now
node-2d3e4f50 Raspberry Pi 4B headless online 2 just now
node-9a8b7c6d Hexin's MacBook Pro macos online 6 just now
Step 5 (Optional): Inspect Specific Node Capabilities
openclaw nodes status --node node-7f8a9b0c --verbose
Node: Hexin's iPhone 15 Pro (node-7f8a9b0c)
Platform: ios | Status: online | Uptime: 2m 34s
Capabilities:
✓ camera.snap Capture a static photo
✓ camera.record Record video (≤60s mp4)
✓ camera.screenshot Capture a screen screenshot
✓ location.get Retrieve GPS coordinates
✓ screen.record Screen recording (≤60s)
36.5 Per-Platform Capability Matrix
macOS Node Capabilities
| Capability ID | Description |
|---|---|
system.run |
Execute shell commands on macOS |
system.which |
Locate executable file paths |
system.notify |
Send system notifications (Notification Center) |
execApprovals |
Command execution approval (requires human confirmation) |
canvas.* |
Control the OpenClaw Canvas UI |
The macOS Node is the core of "local automation." system.run gives the Agent the ability to execute arbitrary shell commands on your Mac (subject to the execApprovals mechanism), while canvas.* allows the Agent to render content visually on the canvas.
iOS Node Capabilities
| Capability ID | Description |
|---|---|
canvas.* |
Canvas UI control |
camera.snap |
Capture a static photo |
camera.record |
Record video (up to 60 seconds, mp4 format) |
camera.screenshot |
Capture the current screen screenshot |
screen.record |
Screen recording (up to 60 seconds) |
location.get |
Retrieve current GPS coordinates |
iOS's sandboxing makes system.run impossible, but camera and location capabilities make the iOS Node the ideal "on-site perception device."
Android Node Capabilities
The Android Node inherits all iOS capabilities and additionally provides:
| Capability ID | Description |
|---|---|
device.status |
Get device status (battery/signal/storage) |
device.info |
Get device hardware information |
device.permissions |
Query runtime permission status |
device.health |
Device health report |
notifications.list |
List current notification bar notifications |
notifications.actions |
Perform actions on notifications (read/delete) |
photos.latest |
Retrieve the latest photos |
contacts.search |
Search contacts |
contacts.add |
Add a contact |
calendar.events |
Read calendar events |
calendar.add |
Create a calendar event |
callLog.search |
Search call logs |
sms.send |
Send an SMS message |
sms.search |
Search SMS message records |
motion.activity |
Get motion state (walking/cycling/driving) |
motion.pedometer |
Get step count data |
Android's openness makes it the most capability-rich Node platform.
Headless Node Capabilities (Raspberry Pi / Server)
| Capability ID | Description |
|---|---|
system.run |
Remote shell command execution |
system.which |
Locate executables |
The power of the Headless Node lies not in the number of capabilities but in the unlimited reach of system.run — through the shell, it can indirectly control GPIO pins, read sensors, trigger scripts, and bring any programmable hardware behavior into the Agent's execution scope.
36.6 Android-Specific Capability Use Cases
Android's additional capabilities open up a range of scenarios that are difficult to achieve with traditional AI Agents:
Scenario 1: Smart Assistant Schedule Integration
User: "Add next Wednesday's dentist appointment to my calendar and send the clinic a confirmation text"
Agent:
1. calendar.add → Create calendar event "Dentist Appointment Wed 14:00"
2. contacts.search → Search "Dental Clinic" to get phone number
3. sms.send → Send "Hello, confirming the appointment for next Wednesday at 14:00, thank you"
Scenario 2: Health Data Tracking
User: "How many steps did I take today? Compare with this week's average"
Agent:
1. motion.pedometer → Get today's step count: 8,432
2. motion.pedometer (batch) → Get daily steps for this week
3. Calculate average and generate comparison report
Scenario 3: Device Health Monitoring
Scheduled task (every 6 hours):
1. device.health → Get health report
2. device.status → Check battery/storage
3. If storage < 10% → Notify user to free space
4. If battery < 20% → Send reminder
Scenario 4: Intelligent Notification Filtering
User: "Help me organize all my bank notifications from today"
Agent:
1. notifications.list → Retrieve all notifications
2. Filter entries where sender is a banking app
3. Extract amount/time/type information
4. Compile into a structured report
36.7 Node and Agent Interaction Patterns
Direct Tool Call Pattern
This is the most common pattern. The Agent decides during reasoning that it needs a capability and invokes it directly via the Tool Use mechanism:
{
"tool": "node_invoke",
"parameters": {
"capability": "camera.snap",
"nodeId": "node-7f8a9b0c",
"options": {
"quality": "high",
"flash": "auto"
}
}
}
Streaming Result Transfer Pattern
For capabilities that produce large files, such as camera.record and screen.record, the Node uses chunked streaming:
Node → Gateway → Agent:
chunk_1: { "type": "media_chunk", "index": 0, "data": "<base64>" }
chunk_2: { "type": "media_chunk", "index": 1, "data": "<base64>" }
...
final: { "type": "media_complete", "totalSize": 2048576, "format": "mp4" }
Event Push Pattern (Node → Agent)
Nodes can proactively push events to the Agent without the Agent needing to poll:
{
"type": "node_event",
"source": "node-7f8a9b0c",
"event": "location_update",
"data": {
"lat": 31.2304,
"lng": 121.4737,
"accuracy": 5.0
}
}
This pattern is used for continuous tracking with location.get.
Sub-Agent Delegation Pattern
In complex tasks, the primary Agent can delegate sub-tasks involving Node operations to Sub-Agents:
Primary Agent: "Take three photos from different angles and merge them into a panorama"
→ Delegate Sub-Agent-1: camera.snap (angle A)
→ Delegate Sub-Agent-2: camera.snap (angle B)
→ Delegate Sub-Agent-3: camera.snap (angle C)
→ Primary Agent aggregates results and performs the merge
36.8 Node Security Considerations
Pairing Verification Mechanism
Every step of the pairing flow has security guarantees:
- No automatic trust on first connection: All new Nodes enter the
pendingstate and must be explicitlyapprovedby an administrator. - nodeId binding: After approval, the
nodeIdis bound to device hardware (via system Keychain/Keystore); impersonation requires physical access to the device. - TLS-encrypted transport: All WebSocket communications are TLS-encrypted (WireGuard-encrypted tunnel when used with Tailscale).
Principle of Minimal Capability Declaration
Nodes should follow the principle of least privilege and only declare capabilities that are actually needed:
// Bad: declaring all capabilities
{ "capabilities": ["camera.*", "location.*", "sms.*", "contacts.*"] }
// Good: declare on demand
{ "capabilities": ["camera.snap", "location.get"] }
execApprovals: Human Approval for Command Execution
For high-risk capabilities like system.run, OpenClaw provides the execApprovals mechanism:
{
"execApprovals": {
"enabled": true,
"requireApproval": ["rm", "sudo", "curl", "wget", "chmod"],
"autoApprove": ["ls", "pwd", "echo", "cat"]
}
}
When the Agent attempts to execute a command in the requireApproval list, execution pauses and a pending confirmation item appears in the Control UI's approval queue, waiting for human approval before continuing.
Capability Revocation
Administrators can revoke specific capabilities from specific Nodes at any time:
# Temporarily disable SMS sending for a Node
openclaw nodes capability revoke node-xxxx sms.send
# Fully disconnect a Node
openclaw nodes disconnect node-xxxx
36.9 Summary
The Node architecture is the pivotal design that evolves OpenClaw from a "pure software Agent" to a "physical-world-aware Agent." Through the WebSocket role:"node" connection mechanism, structured Capability Declaration System, and strict pairing approval flow, OpenClaw seamlessly integrates the perception and execution capabilities of physical devices into the Agent's tool-calling framework while maintaining clear security boundaries.
The next chapter moves into hands-on practice: how to configure a Raspberry Pi as a 24/7 always-on Agent Node.
Next Chapter: Chapter 37 — Edge Computing in Practice: Complete Setup of a Raspberry Pi 24/7 Always-On Agent Node