← Back to Skills Marketplace
4ier

Claw Use Android

by 傅洋 · GitHub ↗ · v2.0.0 · MIT-0
cross-platform ⚠ suspicious
382
Downloads
0
Stars
0
Active Installs
2
Versions
Install in OpenClaw
/install claw-use-android
Description
Control and interact with real Android phones via HTTP and CLI without ADB or root, supporting screen reading, taps, typing, apps, calls, and voice.
README (SKILL.md)

Claw Use Android — Phone Control for AI Agents

Give your AI agent eyes, hands, and a voice on a real Android phone.

claw-use-android is an Android app + CLI (cua) that exposes HTTP endpoints for full phone control. No ADB, no root, no PC.

Setup

# Install the APK on your Android phone, enable Accessibility Service
# Then register the device:
cua add redmi 192.168.0.105 \x3Ctoken>
cua ping

New in v2.0.0: Unified API

Three new endpoints replace the scattered old endpoints for AI agent workflows:

GET /screen — Semantic UI Tree

Returns elements with stable integer ref IDs, semantic zone, and role annotations.

cua screen              # full semantic UI tree (JSON)
cua screen -c           # compact: only interactive/text elements

Response:

{
  "package": "com.android.settings",
  "elements": [
    {"ref": 1, "text": "设置", "zone": "header"},
    {"ref": 2, "text": "搜索", "zone": "header", "role": "button", "click": true},
    {"ref": 3, "text": "WLAN", "zone": "content"}
  ]
}

GET /snapshot — JPEG Screenshot

Returns a base64-encoded JPEG screenshot.

cua snapshot              # save screenshot, print path
cua snapshot 50 720 out.jpg  # quality, maxWidth, output

POST /act — Unified Action Endpoint

All operations through a single entry point, using ref IDs from /screen.

cua act '{"click": 3}'              # click ref 3
cua act '{"click": "OK"}'           # click by text (fallback)
cua act '{"click": [1, 2, 3]}'      # click refs in sequence
cua act '{"tap": {"x": 540, "y": 960}}'
cua act '{"type": "hello"}'          # type into focused field
cua act '{"type": {"ref": 3, "text": "hello"}}'  # focus ref then type
cua act '{"swipe": "up"}'            # directional swipe
cua act '{"scroll": "down"}'         # scroll nearest scrollable
cua act '{"back": true}'
cua act '{"home": true}'
cua act '{"recents": true}'
cua act '{"longpress": 3}'           # long press ref
cua act '{"launch": "com.duolingo"}'

# Multiple actions in one request:
cua act '{"home": true, "back": true}'

Agent Workflow Pattern (screen → act loop)

# 1. Observe
cua screen -c          # get refs
# 2. Act
cua act '{"click": 5}' # click ref 5
# 3. Observe again
cua screen -c          # see result

Flow-First Principle

执行手机操作前,先读 flows.md(与本文件同目录)。

  • 如果有匹配的 flow → 直接用 /flow 或批量脚本执行,跳过逐步推理
  • 如果 flow 中有 {"screen":true} 断点 → 在该步读屏后由 agent 决策,然后继续
  • 如果没有匹配 flow → 走 screen→act 循环,完成后沉淀新 flow 到 flows.md
  • 如果 flow 执行失败(超时、元素未找到等)→ 回退到 screen→act 循环继续完成任务,事后修正 flows.md

主动沉淀(必须执行): 完成任何多步操作后,立即审视刚才的步骤序列。如果发现可复用的模式(哪怕只是部分步骤),当场追加到 flows.md。不要等用户提醒。沉淀是 agent 的责任,不是用户的。

这样做的好处:

  1. /flow 在设备端 100ms 轮询执行,不经过 LLM
  2. 省 token:一个 flow 替代 5-10 轮 agent 推理
  3. 可积累:每次新场景都沉淀,agent 越用越快

Legacy CLI Reference (cua)

All legacy endpoints remain supported alongside the new unified API.

Device Management

cua add \x3Cname> \x3Cip> \x3Ctoken>    # register device with alias
cua devices                     # list all (with live status)
cua use \x3Cname>                  # switch default device
cua rm \x3Cname>                   # remove device
cua -d \x3Cname> \x3Ccommand>         # target specific device
cua discover                    # scan LAN for devices (192.168.x.x:7333)

Perception — read the phone

cua screen              # full UI tree (JSON)
cua screen -c           # compact: only interactive/text elements
cua screenshot          # save screenshot, print path
cua screenshot 50 720 out.jpg  # quality, maxWidth, output
cua notifications       # list all notifications
cua status              # health dashboard
cua info                # device model, screen size, permissions

Action — control the phone

cua tap \x3Cx> \x3Cy>         # tap coordinates
cua click \x3Ctext>        # tap element by visible text
cua longpress \x3Cx> \x3Cy>   # long press
cua swipe up|down|left|right
cua scroll up|down|left|right
cua type "text"         # type text (CJK supported)
cua back                # system back
cua home                # go home
cua launch \x3Cpackage>    # launch app
cua launch              # list all apps
cua open \x3Curl>          # open URL
cua call \x3Cnumber>       # phone call
cua intent '\x3Cjson>'     # fire Android Intent

Audio

cua tts "hello"         # speak through phone speaker
cua say "你好"          # alias

Device I/O (v1.7.0+)

cua clipboard           # read clipboard
cua clipboard "text"    # write to clipboard
cua camera [front|back] [quality] [output.jpg]  # take photo
cua volume              # read all volumes
cua volume media 10     # set media volume
cua volume media up     # adjust volume
cua battery             # battery status
cua wifi                # WiFi info
cua location            # GPS/network location
cua vibrate [ms]        # vibrate (default 200ms)
cua contacts [search]   # list/search contacts
cua sms list [limit]    # read SMS
cua sms send \x3Cnumber> \x3Cmessage>  # send SMS
cua file list [path]    # list directory
cua file read \x3Cpath>    # read file
cua file write \x3Cpath> \x3Ccontent>  # write file
cua file delete \x3Cpath>  # delete file

Device State

cua wake                # wake screen
cua lock / cua unlock   # lock/unlock (PIN required)
cua config pin 123456   # remember lock screen PIN for auto-unlock
cua config pattern 256398  # EXPERIMENTAL: pattern unlock (not yet verified)

Flow Engine — phone-side scripted automation

cua flow '{
  "steps": [
    {"wait": "继续安装", "then": "tap", "timeout": 10000},
    {"wait": "继续更新", "then": "tap", "timeout": 10000},
    {"wait": "完成",     "then": "tap", "timeout": 60000, "optional": true}
  ]
}'

Flow runs entirely on the phone with zero LLM calls. The device polls its accessibility tree at 100ms intervals and reacts instantly when the target element appears.

Step fields:

  • wait — text to find (case-insensitive partial match)
  • waitId — resource ID to find
  • waitDesc — content description to find
  • waitGone — wait for text to DISAPPEAR
  • then — action: tap, click, longpress, back, home, none
  • timeout — per-step timeout in ms (default 10000)
  • optional — if true, timeout doesn't fail the flow
  • pauseMs — pause after action before next step (default 500)

Click with Retry

# Atomic find-and-tap: retries until element appears
curl -X POST /click -d '{"text":"继续安装","retry":3,"retryMs":2000}'

Device Onboarding (New Device Setup)

Complete recipe for adding a new Android device from zero to fully operational.

Prerequisites (human must do once)

  1. Install APK on the device (download from GitHub Releases or LAN HTTP)
  2. Enable Accessibility Service: Settings → Accessibility → Claw Use → ON
  3. Note the auth token from the app notification or main screen

Step 1: Discover & Register

# Scan LAN for devices
cua discover

# Register with a friendly name
cua add \x3Cname> \x3Cip> \x3Ctoken>

# Verify connectivity
cua -d \x3Cname> ping
cua -d \x3Cname> info

Step 2: Configure Auto-Unlock

# PIN unlock (recommended — proven reliable via a11y button tapping)
cua -d \x3Cname> config pin \x3CPIN>

# Verify: lock then unlock
cua -d \x3Cname> lock
sleep 3
cua -d \x3Cname> unlock
# Should show {"unlocked":true}

Important: Only PIN unlock is verified to work. Pattern unlock is experimental and unreliable — the accessibility gesture dispatch doesn't consistently hit the correct grid coordinates across different devices and screen sizes. If the device uses pattern lock, change it to PIN.

Step 3: MIUI/HyperOS Permissions (automated)

cua -d \x3Cname> setup-perms

This automates granting all 9 app permissions on MIUI devices: 位置, 相机, 麦克风, 照片和视频, 音乐和音频, 短信, 电话, 联系人, 日历

The command navigates through Settings → Apps → Claw Use → Permissions and clicks through each permission grant dialog.

If setup-perms fails (common on tablets with dual-pane layout), grant manually:

  1. Open Settings → Apps → Manage Apps → search "Claw Use"
  2. Tap "App permissions" (应用权限)
  3. Enable each permission: prefer "始终允许" > "仅在使用中允许" > "允许"

Step 4: Background Survival (MIUI)

These settings prevent MIUI from killing the service:

# Navigate to app settings
cua -d \x3Cname> intent '{"action":"android.settings.APPLICATION_DETAILS_SETTINGS","uri":"package:com.clawuse.android"}'

Then via a11y or manually ensure:

  • 自启动 (Autostart): ON
  • 省电策略 (Battery saver): 无限制 (No restrictions)
  • 通知 (Notifications): 允许 (Allow)
  • WLAN联网 (WiFi access): ON (if available)

Step 5: Verify Everything

cua -d \x3Cname> status    # check a11y health, uptime, request count
cua -d \x3Cname> screen -c # verify a11y tree works
cua -d \x3Cname> screenshot 50 720 /tmp/verify.jpg  # verify screenshot

# Test auto-unlock end-to-end
cua -d \x3Cname> lock
sleep 3
cua -d \x3Cname> screen -c  # should auto-unlock then return tree

Known Device-Specific Issues

MIUI Tablets (Xiaomi Pad 5, etc.):

  • Settings uses dual-pane layout — left panel items NOT visible in a11y tree
  • Must navigate through full Settings → Apps path instead of direct Intent
  • APPLICATION_DETAILS_SETTINGS intent opens app LIST, not specific app
  • setup-perms may need manual fallback for tablet layout

MIUI Phones (Redmi K60 Ultra, etc.):

  • ICP 备案 dialog may appear during APK install — click "继续安装"
  • "仍然下载" confirmation in Chrome for HTTP downloads
  • Chrome downloads don't auto-open APK — go to Downloads → tap the file icon (left side)

General Android:

  • Notification Listener requires manual enable: Settings → 通知 → 设备和应用通知 → Claw Use
  • takeScreenshot() returns black image on lock screen (Android security)
  • Lock screen a11y tree requires flagRetrieveInteractiveWindows (added in v1.6.2)

Self-Update (OTA via LAN)

Update a device to a new APK version without ADB:

# Serve APK on LAN (from the machine with the APK)
cd /path/to/apk && python3 -m http.server 9090 &

# On the device, open browser to download
cua -d \x3Cname> intent '{"action":"android.intent.action.VIEW","uri":"http://\x3Clan-ip>:9090/app.apk"}'

# Or via browser navigation for MIUI browser:
cua -d \x3Cname> click "浏览器"
cua -d \x3Cname> click "搜索或输入网址"
cua -d \x3Cname> type "http://\x3Clan-ip>:9090/app.apk"
# ... then handle download + install prompts

# MIUI install flow (after APK opens in installer)
cua -d \x3Cname> flow '{
  "steps": [
    {"wait": "继续安装", "then": "tap", "timeout": 15000},
    {"wait": "已了解此应用未经安全检测", "then": "tap", "timeout": 10000, "optional": true},
    {"wait": "继续更新", "then": "tap", "timeout": 15000}
  ]
}'

# Verify new version after service restart (~30s)
sleep 30
cua -d \x3Cname> ping

UpdateReceiver: The app listens for MY_PACKAGE_REPLACED broadcast and auto-restarts the service after update. No manual intervention needed after install completes.


Workflow Patterns

Navigate and interact (v2.0+ recommended)

cua act '{"launch": "org.telegram.messenger"}'
cua screen -c
cua act '{"click": "Search Chats"}'
cua act '{"type": "John"}'
cua act '{"click": "John"}'

Navigate and interact (legacy)

cua launch org.telegram.messenger
cua screen -c
cua click "Search Chats"
cua type "John"
cua click "John"

Visual + semantic perception

cua screen -c                          # what elements exist (structured, with refs)
cua snapshot 50 720 /tmp/look.jpg      # what it looks like (visual)

Prefer screen -c over snapshot for decision-making. Structured a11y data is faster to process, has exact coordinates, and provides ref IDs for /act. Use snapshot only when visual context matters (images, colors, layout).

Handle locked device

Automatic — any command auto-unlocks if PIN is configured. No special handling needed.

MIUI APK Install (via /flow)

cua flow '{
  "steps": [
    {"wait": "继续安装", "then": "tap", "timeout": 15000},
    {"wait": "已了解此应用未经安全检测", "then": "tap", "timeout": 10000, "optional": true},
    {"wait": "继续更新", "then": "tap", "timeout": 10000}
  ]
}'

Multi-device

cua add phone1 192.168.0.101 \x3Ctoken>
cua add tablet 192.168.0.102 \x3Ctoken>
cua -d phone1 say "hello from phone 1"
cua -d tablet screenshot

Operational Lessons

DO

  • Use click by text instead of tap by coordinates whenever text is visible
  • Use screen -c as the primary perception tool — compact filters noise
  • Use /flow for multi-step mechanical sequences — saves tokens, 100x faster than LLM-per-step
  • Use intent deep links for app navigation (e.g., https://t.me/c/{id}/{topic}/{msg})
  • Use PIN unlock — proven 100% reliable via a11y button tapping

DON'T

  • Don't use screenshot coordinates for tappingscreenshot?maxWidth=720 is scaled, screen bounds are actual pixels
  • Don't try pattern unlock — coordinates vary by device/OS, no reliable way to locate the grid
  • Don't rely on tap when click can work — text-based is resolution-independent
  • Don't manually navigate app UIs when deep links exist — error-prone and slow
  • Don't rapid-fire requests — allow 0.5-1s between actions for UI to settle

Architecture

┌─────────────────────────────────────────────┐
│              Android Device                  │
│                                              │
│  :http process          main process         │
│  ┌──────────────┐      ┌──────────────────┐ │
│  │ BridgeService│ HTTP │ AccessibilityBridge│ │
│  │ NanoHTTPD    │─────→│ A11yInternalServer│ │
│  │ 0.0.0.0:7333│proxy │ 127.0.0.1:7334   │ │
│  └──────────────┘      └──────────────────┘ │
│    ↑ auth+CORS           ↑ a11y service      │
│    ↑ auto-unlock         ↑ gesture dispatch  │
│    ↑ config/status       ↑ tree traversal    │
└────────────────────────────────────────────── ┘
         ↑ HTTP
    ┌────────────┐
    │  Agent/CLI │  cua commands / curl
    └────────────┘

Family

Platform Package CLI Status
Android claw-use-android cua ✅ Available
iOS claw-use-ios cui 🔮 Planned
Windows claw-use-windows cuw 🔮 Planned
Linux claw-use-linux cul 🔮 Planned
macOS claw-use-mac cum 🔮 Planned
Usage Guidance
What to consider before installing or using this skill: 1) Missing declarations: The SKILL.md assumes a 'cua' CLI and a phone-side APK/token but the skill metadata declares no required binaries or credentials. Ask the author to update metadata to list required binaries, tokens, and any install steps so you can review them before use. 2) Accessibility & APK install risks: The Android app requires enabling an Accessibility service and (per flows) will ask to download/install APKs. Accessibility grants broad control; only enable it for apps you trust. Never install APKs from unknown or untrusted sources—prefer installing the official APK from a verified release. 3) Sensitive device access: The skill can read/write SMS, contacts, files, clipboard, camera, location, set/remember lock PINs, and make calls. These are normal for a full-control tool but are high-risk. If you proceed, use a disposable/burner device or a test device, not your primary phone. 4) Autonomous persistence: The agent is instructed to automatically append learned automation flows to flows.md without asking you. That can store URLs, PINs, or other sensitive data. If you install, either (a) disable any agent auto-run/autonomous invocation so the agent cannot act without your confirmation, or (b) require manual review of flows.md (and the skill author to add an opt-in setting) before anything is appended. 5) Network actions: The skill will scan your LAN and can download content from local HTTP servers. Limit its network access (use firewall rules or run in an isolated network segment) and monitor any HTTP endpoints it contacts. 6) Safer defaults: Ask the author to (a) declare required binaries/credentials in metadata, (b) make flow persistence opt-in rather than mandatory, and (c) document exactly where flows.md is stored and what user approvals are required before writing or executing a flow. 7) If you cannot verify the APK/CLI provenance or the author’s intent, do not install on a device with sensitive data. Prefer well-audited alternatives or insist on metadata and an explicit install/consent flow before enabling the skill.
Capability Analysis
Type: OpenClaw Skill Name: claw-use-android Version: 2.0.0 The skill bundle provides a powerful Android automation framework (cua) with high-risk capabilities, including reading/sending SMS, file system manipulation (read/write/delete), contact/location access, and camera/microphone control. It includes explicit instructions in SKILL.md and flows.md for the AI agent to autonomously 'learn' and persist new automation sequences by modifying flows.md. While these features support the stated goal of phone control, the broad access to sensitive data and the ability to fire arbitrary Android Intents or perform OTA updates via LAN-hosted APKs represent a significant security risk if the agent is misdirected.
Capability Assessment
Purpose & Capability
The name/description claim full phone control and the instructions indeed provide a full feature set (screen tree, taps, typing, apps, calls, SMS, clipboard, camera, file read/write, location, unlocking). That capability set is consistent with the stated purpose. However, the SKILL.md expects a 'cua' CLI and a phone-side APK/token for device registration, but the skill metadata declares no required binaries, no required env vars, and no install steps — an incoherence that should be explained by the author.
Instruction Scope
The runtime instructions direct the agent to: (1) read and always consult flows.md; (2) persist (append) newly learned flows immediately into flows.md without waiting for user approval; (3) perform LAN discovery (192.168.x.x:7333) and download/install APKs from local HTTP servers; and (4) perform high-privilege actions on the phone (read SMS/contacts/files/location, set/remember lock PIN). The mandatory automatic persistence rule grants the agent broad write authority over skill data and can cause sensitive items (URLs, PINs, tokens, or other artifacts) to be stored without explicit user consent.
Install Mechanism
This is instruction-only (no install spec), so nothing is written to the host by the skill package itself; installation occurs on the phone (APK). That minimizes host-side install risk, but the instructions explicitly direct the agent to install APKs on the phone (including from LAN hosts) and to enable an Accessibility service — both are high-risk actions on the device and should be treated carefully. Also, the skill assumes the 'cua' CLI exists but does not declare it as a required binary.
Credentials
The registry metadata declares no environment variables or credentials, yet the docs expect device tokens (used in 'cua add ... <token>'), pin codes for auto-unlock, and potentially sensitive inputs (PINs, pattern, URLs, local server addresses). The skill also instructs reading highly sensitive device data (SMS, contacts, files, clipboard, location). Those accesses are coherent with the claimed phone-control purpose but are sensitive and should be explicitly declared and justified in metadata and presented for explicit user consent.
Persistence & Privilege
The SKILL.md mandates that the agent must immediately '沉淀' (persist) new multi-step flows into flows.md every time it performs multi-step actions and to consult flows.md first on future runs. That gives the agent permission to autonomously modify the skill's flow datastore (a file included with the skill) without additional user approval, which increases risk: malicious or erroneous flows could be recorded and reused, and sensitive content could be saved. The skill does not request always:true, but this self-modification requirement is a significant persistence/privilege concern.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install claw-use-android
  3. After installation, invoke the skill by name or use /claw-use-android
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v2.0.0
v2.0.0: Unified /screen + /snapshot + /act API, flow-first agent pattern, flows.md knowledge base, device I/O (camera/clipboard/SMS/contacts/location), OTA self-update, multi-device support.
v1.0.0
Initial release: cua CLI + skill for AI agents to control Android phones via HTTP. 25 endpoints, multi-device aliases, zero dependencies.
Metadata
Slug claw-use-android
Version 2.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 2
Frequently Asked Questions

What is Claw Use Android?

Control and interact with real Android phones via HTTP and CLI without ADB or root, supporting screen reading, taps, typing, apps, calls, and voice. It is an AI Agent Skill for Claude Code / OpenClaw, with 382 downloads so far.

How do I install Claw Use Android?

Run "/install claw-use-android" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Claw Use Android free?

Yes, Claw Use Android is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Claw Use Android support?

Claw Use Android is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Claw Use Android?

It is built and maintained by 傅洋 (@4ier); the current version is v2.0.0.

💬 Comments