← Back to Skills Marketplace
txmonkey

adb-phone-control

by txmonkey · GitHub ↗ · v1.0.1 · MIT-0
cross-platform ✓ Security Clean
143
Downloads
0
Stars
0
Active Installs
2
Versions
Install in OpenClaw
/install adb-phone-control
Description
Use when the user asks to control, operate, or automate an Android phone via ADB — tapping, swiping, typing, launching apps, or any UI interaction on a conne...
README (SKILL.md)

ADB Phone Control

Control Android devices through ADB with a structured observe-locate-act-verify loop.

Requirements

  • adb — Android Debug Bridge, must be in PATH
  • python3 — Required for app_explorer.py
  • ADB_OUTPUT_DIR (optional env var) — Directory for saving screenshots and UI dumps; defaults to current working directory

Permissions Used

This skill executes the following on the connected Android device:

  • adb shell input — tap, swipe, text input
  • adb shell uiautomator dump — UI hierarchy extraction
  • adb shell screencap — screen capture
  • adb shell am broadcast — ADBKeyboard IME input (for CJK text)
  • adb shell service call clipboard — clipboard-based text input fallback

Prerequisites

Before any operation, verify device connection:

adb devices

If no device found, instruct the user to:

  1. Connect via USB and enable USB Debugging
  2. Or connect wirelessly: adb connect \x3Cip>:5555

Core Principle

NEVER guess coordinates from screenshots. ALWAYS use UI hierarchy as the primary locator.

Screenshots are for human-readable context and visual verification. UI dumps give exact pixel bounds.

Operation Loop

Every interaction follows this cycle:

┌─────────────────────────────────────────┐
│  1. OBSERVE  — dump UI + screenshot     │
│  2. LOCATE   — find element by text/id  │
│  3. ACT      — tap / swipe / type       │
│  4. VERIFY   — screenshot + dump again  │
│  5. REPEAT   — next action or done      │
└─────────────────────────────────────────┘

Do NOT skip the VERIFY step. UI transitions may take time; always confirm before proceeding.

Helper Functions

Source the helper script before starting any operation session:

source "$(dirname "${BASH_SOURCE[0]:-$0}")/adb-helpers.sh" 2>/dev/null || source ./adb-helpers.sh

Available Functions

Function Usage Description
adb_dump adb_dump Dump UI hierarchy to /tmp/ui_dump.xml
adb_screenshot adb_screenshot Capture screen to /tmp/adb_screen.png
adb_observe adb_observe Dump UI + screenshot in one call
adb_tap_text "Submit" Find element by text, tap center
adb_tap_id "btn_send" Find element by resource-id, tap center
adb_tap_xy 540 1200 Tap exact coordinates
adb_swipe x1 y1 x2 y2 [ms] Swipe between points (default 300ms)
adb_input_text "hello" Type text (supports spaces and CJK)
adb_key \x3Ckeycode> Send keyevent (BACK, HOME, ENTER, etc.)
adb_hide_keyboard Press BACK to dismiss keyboard
adb_scroll_down Swipe up to scroll content down
adb_scroll_up Swipe down to scroll content up
adb_long_press x y [ms] Long press at coordinates (default 1000ms)
adb_wait [seconds] Sleep before next action (default 1s)
adb_screen_size Get device screen resolution
adb_launch_app \x3Cpackage> Launch app by package name
adb_find_package \x3Ckeyword> Search installed packages by keyword
adb_bounds_center "bounds_string" Parse "[x1,y1][x2,y2]" → center x y

Element Lookup Details

adb_tap_text and adb_tap_id work by:

  1. Running adb_dump to get fresh UI hierarchy
  2. Parsing the XML for matching text= or resource-id= attributes
  3. Extracting the bounds="[x1,y1][x2,y2]" attribute
  4. Computing center point: ((x1+x2)/2, (y1+y2)/2)
  5. Executing adb shell input tap \x3Ccx> \x3Ccy>

If multiple matches are found, the function taps the first match and prints a warning. If no match is found, the function prints an error — fall back to adb_screenshot + Read tool for visual inspection.

Standard Operating Procedure

Phase 1: Setup

# Source helpers
source "$(dirname "${BASH_SOURCE[0]:-$0}")/adb-helpers.sh" 2>/dev/null || source ./adb-helpers.sh

# Verify connection
adb devices

# Get screen resolution (important for swipe calculations)
adb_screen_size

Phase 2: Navigate & Operate

For each interaction step:

# 1. Observe current state
adb_observe
# Then read /tmp/adb_screen.png with the Read tool to see the screen

# 2. Locate and act (prefer text/id over raw coordinates)
adb_tap_text "Create"
# or: adb_tap_id "iv_send"
# or as last resort: adb_tap_xy 540 2009

# 3. Wait for transition
adb_wait 2

# 4. Verify result
adb_screenshot
# Then read /tmp/adb_screen.png to confirm the action worked

Phase 3: Text Input

# Tap the input field first
adb_tap_text "Search..."
adb_wait 1

# Type text
adb_input_text "Hello World"

# Hide keyboard before tapping other elements
adb_hide_keyboard
adb_wait 1

# Now safe to tap other buttons
adb_tap_text "Send"

Critical Rules

1. UI Dump First, Screenshot Second

  • uiautomator dump gives exact bounds, element states (enabled/focused/clickable), text content, and resource IDs
  • Screenshots only for: visual verification, understanding layout context, or when UI dump fails (e.g., animations, WebView content)
  • When UI dump returns elements with NAF="true", the element has No Accessible Framework info — use screenshot + coordinates as fallback

2. Keyboard Awareness

  • Always hide keyboard before tapping non-input elements. The keyboard shifts the layout, making UI dump bounds stale.
  • After typing, call adb_hide_keyboard then adb_dump before tapping anything else.
  • If uiautomator dump returns ERROR: could not get idle state, the keyboard animation may still be running — wait 1s and retry.

3. Wait Strategy

  • After tap: wait 1s before next dump/screenshot
  • After launching app: wait 2-3s
  • After page navigation: wait 2s
  • After typing: wait 0.5s
  • If UI hasn't changed after action: wait longer, up to 5s, then re-check
  • Never blindly chain actions without verification

4. Chinese / CJK Text Input

adb shell input text does not support CJK characters natively. The helper adb_input_text handles this by:

  • Using adb shell am broadcast with ADBKeyboard if available
  • Falling back to clipboard-based input: copy to clipboard via adb shell service call clipboard, then paste

If ADB Keyboard IME is installed (com.android.adbkeyboard), enable it:

adb shell ime set com.android.adbkeyboard/.AdbIME

5. Coordinate System

  • All coordinates are in physical pixels matching the device resolution
  • adb shell wm size returns the canonical resolution (e.g., 1080x2340)
  • Screenshot pixel dimensions may differ from device resolution — never estimate coordinates from screenshot pixel positions
  • Always derive coordinates from uiautomator dump bounds

6. Handling Failures

If an action doesn't produce the expected result:

  1. Re-dump UI hierarchy — the element may have moved or state changed
  2. Take a screenshot — visual context may reveal popups, loading states, or errors
  3. Check if the element is enabled="true" and clickable="true" before tapping
  4. If element is not found by text, try partial match or search by resource-id
  5. If the app is in a WebView, UI dump may not capture web elements — use screenshot + coordinate estimation as fallback

7. App Launch

Prefer adb_find_package + adb_launch_app over monkey command:

# Find the app
adb_find_package "wechat"
# Launch it
adb_launch_app "com.tencent.mm"

Limitations

  • uiautomator dump doesn't work during animations — wait for idle state
  • WebView/Flutter/game content may not appear in UI hierarchy — use screenshot-based approach
  • Some custom views may have empty text and no resource-id — use bounds + screenshot cross-reference
  • Maximum ~100 actions per task is a reasonable limit to avoid infinite loops
Usage Guidance
This skill appears to do what it claims (ADB-based UI automation). However: 1) automated exploration will click/tap elements and can perform destructive or privacy-sensitive actions on the device (send messages, change settings, make purchases). Only run it on a test device or after explicitly reviewing which package and actions will be targeted; prefer manual observe-first runs. 2) Review the included scripts (adb-helpers.sh and app_explorer.py) before use to confirm no unexpected commands for your environment. 3) Keep ADB enabled only when needed and disconnect when done. 4) If you intend to let an agent run this autonomously, restrict it to devices you control and consider disabling autonomous invocation until you’ve tested behavior interactively.
Capability Analysis
Type: OpenClaw Skill Name: adb-phone-control Version: 1.0.1 The adb-phone-control skill bundle provides a legitimate set of tools for automating and exploring Android devices via the Android Debug Bridge (ADB). It includes a bash helper script (adb-helpers.sh) for UI interactions like tapping elements by text/ID and a Python script (app_explorer.py) for recursively crawling an app's UI to generate a navigation tree. The code uses standard ADB commands for screen capture, UI dumping, and input events, with no evidence of data exfiltration, malicious persistence, or prompt injection.
Capability Assessment
Purpose & Capability
Name/description (ADB control) matches the code and SKILL.md: both require adb and python3 and implement UI dump, screenshot, tap/swipe/input, and an app explorer. No unrelated binaries or credentials are requested.
Instruction Scope
SKILL.md and helper scripts instruct the agent to dump UI, pull screenshots/dumps to local path, and run automated taps/swipes/inputs. This is expected for device automation, but the app_explorer recursively clicks elements which can trigger side effects (sending messages, purchases, or destructive actions). The instructions also tell the agent to 'read' screenshots for verification — that will expose device screen content to whatever model/tool is used to view images.
Install Mechanism
Instruction-only install (no download/extract). Two local code files are included and executed by sourcing/running them. No external installers or remote downloads are used by the skill itself.
Credentials
No secret environment variables or credentials are requested; ADB_OUTPUT_DIR is optional and reasonable. The skill does not ask for cloud keys or unrelated secrets.
Persistence & Privilege
always is false and the skill does not request persistent system-wide privileges. It operates only by invoking adb commands against a connected device and writing output into a local output directory.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install adb-phone-control
  3. After installation, invoke the skill by name or use /adb-phone-control
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.1
**Changelog for adb-phone-control v1.0.1** - Added explicit requirements section: now documents dependencies on `adb`, `python3`, and optional `ADB_OUTPUT_DIR` environment variable. - Detailed permission usage for device operations via ADB, covering input events, UI dump, screencap, IME broadcast, and clipboard. - Minor documentation edits: new "Requirements" and "Permissions Used" sections, clarifications for user setup. - No code or functional changes; this is a documentation enhancement for improved clarity and onboarding.
v1.0.0
Initial release of ADB Phone Control. - Provides a structured approach for automating Android devices via ADB using an observe-locate-act-verify loop. - Emphasizes using UI hierarchy (via `uiautomator dump`) for locating elements; screenshots are only for verification or fallback. - Includes a comprehensive Bash helper toolset for tapping by text/id, swiping, typing (including CJK characters), scrolling, launching apps, and more. - Documents critical rules for reliable automation: waiting strategies, keyboard management, safe use of coordinates, and robust error handling. - Details standard operating procedures for setup, navigation, text input, and failure recovery.
Metadata
Slug adb-phone-control
Version 1.0.1
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 2
Frequently Asked Questions

What is adb-phone-control?

Use when the user asks to control, operate, or automate an Android phone via ADB — tapping, swiping, typing, launching apps, or any UI interaction on a conne... It is an AI Agent Skill for Claude Code / OpenClaw, with 143 downloads so far.

How do I install adb-phone-control?

Run "/install adb-phone-control" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is adb-phone-control free?

Yes, adb-phone-control is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does adb-phone-control support?

adb-phone-control is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created adb-phone-control?

It is built and maintained by txmonkey (@txmonkey); the current version is v1.0.1.

💬 Comments