/install adb-phone-control
ADB Phone Control
Control Android devices through ADB with a structured observe-locate-act-verify loop.
Requirements
- adb — Android Debug Bridge, must be in PATH
- python3 — Required for
app_explorer.py - ADB_OUTPUT_DIR (optional env var) — Directory for saving screenshots and UI dumps; defaults to current working directory
Permissions Used
This skill executes the following on the connected Android device:
adb shell input— tap, swipe, text inputadb shell uiautomator dump— UI hierarchy extractionadb shell screencap— screen captureadb shell am broadcast— ADBKeyboard IME input (for CJK text)adb shell service call clipboard— clipboard-based text input fallback
Prerequisites
Before any operation, verify device connection:
adb devices
If no device found, instruct the user to:
- Connect via USB and enable USB Debugging
- Or connect wirelessly:
adb connect \x3Cip>:5555
Core Principle
NEVER guess coordinates from screenshots. ALWAYS use UI hierarchy as the primary locator.
Screenshots are for human-readable context and visual verification. UI dumps give exact pixel bounds.
Operation Loop
Every interaction follows this cycle:
┌─────────────────────────────────────────┐
│ 1. OBSERVE — dump UI + screenshot │
│ 2. LOCATE — find element by text/id │
│ 3. ACT — tap / swipe / type │
│ 4. VERIFY — screenshot + dump again │
│ 5. REPEAT — next action or done │
└─────────────────────────────────────────┘
Do NOT skip the VERIFY step. UI transitions may take time; always confirm before proceeding.
Helper Functions
Source the helper script before starting any operation session:
source "$(dirname "${BASH_SOURCE[0]:-$0}")/adb-helpers.sh" 2>/dev/null || source ./adb-helpers.sh
Available Functions
| Function | Usage | Description |
|---|---|---|
adb_dump |
adb_dump |
Dump UI hierarchy to /tmp/ui_dump.xml |
adb_screenshot |
adb_screenshot |
Capture screen to /tmp/adb_screen.png |
adb_observe |
adb_observe |
Dump UI + screenshot in one call |
adb_tap_text "Submit" |
Find element by text, tap center | |
adb_tap_id "btn_send" |
Find element by resource-id, tap center | |
adb_tap_xy 540 1200 |
Tap exact coordinates | |
adb_swipe x1 y1 x2 y2 [ms] |
Swipe between points (default 300ms) | |
adb_input_text "hello" |
Type text (supports spaces and CJK) | |
adb_key \x3Ckeycode> |
Send keyevent (BACK, HOME, ENTER, etc.) | |
adb_hide_keyboard |
Press BACK to dismiss keyboard | |
adb_scroll_down |
Swipe up to scroll content down | |
adb_scroll_up |
Swipe down to scroll content up | |
adb_long_press x y [ms] |
Long press at coordinates (default 1000ms) | |
adb_wait [seconds] |
Sleep before next action (default 1s) | |
adb_screen_size |
Get device screen resolution | |
adb_launch_app \x3Cpackage> |
Launch app by package name | |
adb_find_package \x3Ckeyword> |
Search installed packages by keyword | |
adb_bounds_center "bounds_string" |
Parse "[x1,y1][x2,y2]" → center x y |
Element Lookup Details
adb_tap_text and adb_tap_id work by:
- Running
adb_dumpto get fresh UI hierarchy - Parsing the XML for matching
text=orresource-id=attributes - Extracting the
bounds="[x1,y1][x2,y2]"attribute - Computing center point:
((x1+x2)/2, (y1+y2)/2) - Executing
adb shell input tap \x3Ccx> \x3Ccy>
If multiple matches are found, the function taps the first match and prints a warning.
If no match is found, the function prints an error — fall back to adb_screenshot + Read tool for visual inspection.
Standard Operating Procedure
Phase 1: Setup
# Source helpers
source "$(dirname "${BASH_SOURCE[0]:-$0}")/adb-helpers.sh" 2>/dev/null || source ./adb-helpers.sh
# Verify connection
adb devices
# Get screen resolution (important for swipe calculations)
adb_screen_size
Phase 2: Navigate & Operate
For each interaction step:
# 1. Observe current state
adb_observe
# Then read /tmp/adb_screen.png with the Read tool to see the screen
# 2. Locate and act (prefer text/id over raw coordinates)
adb_tap_text "Create"
# or: adb_tap_id "iv_send"
# or as last resort: adb_tap_xy 540 2009
# 3. Wait for transition
adb_wait 2
# 4. Verify result
adb_screenshot
# Then read /tmp/adb_screen.png to confirm the action worked
Phase 3: Text Input
# Tap the input field first
adb_tap_text "Search..."
adb_wait 1
# Type text
adb_input_text "Hello World"
# Hide keyboard before tapping other elements
adb_hide_keyboard
adb_wait 1
# Now safe to tap other buttons
adb_tap_text "Send"
Critical Rules
1. UI Dump First, Screenshot Second
uiautomator dumpgives exact bounds, element states (enabled/focused/clickable), text content, and resource IDs- Screenshots only for: visual verification, understanding layout context, or when UI dump fails (e.g., animations, WebView content)
- When UI dump returns elements with
NAF="true", the element has No Accessible Framework info — use screenshot + coordinates as fallback
2. Keyboard Awareness
- Always hide keyboard before tapping non-input elements. The keyboard shifts the layout, making UI dump bounds stale.
- After typing, call
adb_hide_keyboardthenadb_dumpbefore tapping anything else. - If
uiautomator dumpreturnsERROR: could not get idle state, the keyboard animation may still be running — wait 1s and retry.
3. Wait Strategy
- After tap: wait 1s before next dump/screenshot
- After launching app: wait 2-3s
- After page navigation: wait 2s
- After typing: wait 0.5s
- If UI hasn't changed after action: wait longer, up to 5s, then re-check
- Never blindly chain actions without verification
4. Chinese / CJK Text Input
adb shell input text does not support CJK characters natively. The helper adb_input_text handles this by:
- Using
adb shell am broadcastwith ADBKeyboard if available - Falling back to clipboard-based input: copy to clipboard via
adb shell service call clipboard, then paste
If ADB Keyboard IME is installed (com.android.adbkeyboard), enable it:
adb shell ime set com.android.adbkeyboard/.AdbIME
5. Coordinate System
- All coordinates are in physical pixels matching the device resolution
adb shell wm sizereturns the canonical resolution (e.g., 1080x2340)- Screenshot pixel dimensions may differ from device resolution — never estimate coordinates from screenshot pixel positions
- Always derive coordinates from
uiautomator dumpbounds
6. Handling Failures
If an action doesn't produce the expected result:
- Re-dump UI hierarchy — the element may have moved or state changed
- Take a screenshot — visual context may reveal popups, loading states, or errors
- Check if the element is
enabled="true"andclickable="true"before tapping - If element is not found by text, try partial match or search by resource-id
- If the app is in a WebView, UI dump may not capture web elements — use screenshot + coordinate estimation as fallback
7. App Launch
Prefer adb_find_package + adb_launch_app over monkey command:
# Find the app
adb_find_package "wechat"
# Launch it
adb_launch_app "com.tencent.mm"
Limitations
uiautomator dumpdoesn't work during animations — wait for idle state- WebView/Flutter/game content may not appear in UI hierarchy — use screenshot-based approach
- Some custom views may have empty text and no resource-id — use bounds + screenshot cross-reference
- Maximum ~100 actions per task is a reasonable limit to avoid infinite loops
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install adb-phone-control - 安装完成后,直接呼叫该 Skill 的名称或使用
/adb-phone-control触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
adb-phone-control 是什么?
Use when the user asks to control, operate, or automate an Android phone via ADB — tapping, swiping, typing, launching apps, or any UI interaction on a conne... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 143 次。
如何安装 adb-phone-control?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install adb-phone-control」即可一键安装,无需额外配置。
adb-phone-control 是免费的吗?
是的,adb-phone-control 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
adb-phone-control 支持哪些平台?
adb-phone-control 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 adb-phone-control?
由 txmonkey(@txmonkey)开发并维护,当前版本 v1.0.1。