← Back to Skills Marketplace
quanru

Midscene Automations Skills for Android

by Leyang · GitHub ↗ · v1.0.2
cross-platform ⚠ suspicious
1740
Downloads
0
Stars
8
Active Installs
3
Versions
Install in OpenClaw
/install midscene-android-automation
Description
Vision-driven Android device automation using Midscene. Operates entirely from screenshots — no DOM or accessibility labels required. Can interact with all v...
Usage Guidance
What to consider before installing/using this skill: - Metadata mismatch: The registry claims no required binaries or environment variables, but the SKILL.md requires Node (npx), ADB, and multiple model API keys/BASE_URLs. Ask the publisher to correct the metadata before trusting the skill. - Sensitive data exposure: The workflow captures screenshots of your Android device and (by design) sends them to a model endpoint or Midscene service configured by MIDSCENE_MODEL_BASE_URL. Those screenshots can contain passwords, 2FA codes, messages, or other sensitive data. Only use with providers and endpoints whose privacy/security policies you trust. - Dynamic code execution: npx will fetch and run @midscene/android from npm at runtime. If you want to proceed, inspect the package source (or run in an isolated environment) to verify behavior. - Secrets handling: The skill suggests storing API keys in a .env file which Midscene will load. Ensure your .env contains only the intended keys and is not shared. Prefer provider-scoped API keys with minimal privileges and short lifetimes when possible. - Test safely: If you must use the skill, test on an emulator or a disposable device to avoid leaking personal data. Monitor network traffic and limit which model endpoints you configure. - Ask for provenance: There is no homepage or source listed. Prefer skills with a verifiable publisher, source repository, and documentation. If you cannot verify origin, exercise caution. If you want help: I can extract the exact env vars and commands the SKILL.md requires, suggest safer configuration choices (e.g., local/private model endpoints, scoped API keys), or draft questions to ask the publisher to clarify metadata and data handling.
Capability Analysis
Type: OpenClaw Skill Name: midscene-android-automation Version: 1.0.2 The skill facilitates Android automation by executing shell commands via `npx @midscene/android@1`, which involves downloading and running remote code at runtime. It requires users to provide sensitive LLM API keys through environment variables and grants the agent broad UI control over connected devices. While these capabilities are necessary for the stated purpose of vision-driven automation, the reliance on `Bash` and remote package execution represents a significant high-risk attack surface (SKILL.md).
Capability Assessment
Purpose & Capability
The SKILL.md describes vision-driven Android automation via Midscene and ADB which is internally coherent for the stated purpose. However the registry metadata claims no required binaries or env vars, while the instructions clearly require Node (npx @midscene/android@1), ADB usage (adb shell ...), and model credentials. The omitted declarations in the metadata are a mismatch that reduces transparency and is unexpected for this capability.
Instruction Scope
Instructions direct the agent to run npx commands, take screenshots, read saved image files, and supply model configuration (MIDSCENE_MODEL_*) including a BASE_URL. That implies screenshots and device UI content will be sent to remote model endpoints or Midscene services. Exfiltration of potentially sensitive screen contents to external providers is not called out in the registry metadata and is material to risk. The instructions also advise using ADB (powerful device control), which is consistent with purpose but increases the threat surface.
Install Mechanism
There is no install spec in the registry (instruction-only), which is lower friction. However the runtime uses npx to fetch @midscene/android at invocation time — this will download and run code from npm dynamically. The metadata did not list Node/npm as a required binary. Dynamically pulling code at runtime is normal for npx but worth noting because it executes third-party code on demand.
Credentials
The SKILL.md requires multiple environment variables (MIDSCENE_MODEL_API_KEY, MIDSCENE_MODEL_NAME, MIDSCENE_MODEL_BASE_URL, MIDSCENE_MODEL_FAMILY, etc.) and suggests provider-specific keys (Google, Alibaba, OpenRouter, Doubao). These are appropriate for remote-model driven automation, but the skill registry declared 'none' for required env vars/primary credential. In addition, placing keys in a .env file (as recommended) means the tool will read local secret files; that access is not declared in metadata and could expose unrelated secrets if present.
Persistence & Privilege
The skill is instruction-only, has no install spec, always:false, and does not request to modify other skills or system-wide settings. It does require ADB access at runtime but does not request forced persistent inclusion or elevated platform privileges.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install midscene-android-automation
  3. After installation, invoke the skill by name or use /midscene-android-automation
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.2
- Added rule: always summarize and report automation results to the user after completing tasks; never end silently. - Updated model configuration section with new examples for Qwen 3.5 and Doubao Seed 2.0 Lite, replacing older Qwen3-VL and Doubao Seed 1.6 references. - Enhanced workflow pattern and best practices to mandate proactive user reporting, specifying what information must be included in the summary. - Minor clarifications and improvements to existing documentation for prerequisites and expected output.
v1.0.1
**Major update: Moves to vision-first, prompt-driven UI automation and enhances setup instructions.** - Introduced vision-based automation with no reliance on DOM or accessibility labels; all UI interaction is screenshot-driven. - Expanded and clarified environment variable setup for multiple AI model providers (Doubao, Gemini, Qwen3-VL, Zhipu, etc.). - Simplified the recommended workflow: focus actions into high-level `act` prompts instead of step-by-step commands. - Added best practices for speed and reliability, such as pre-launching target apps via ADB and batching actions in single prompts. - Updated troubleshooting and usage documentation for clarity, reflecting current CLI command usage and model setup conventions.
v1.0.0
Initial release of Android Device Automation skill using Midscene. - Enables automation of Android devices via natural language commands using Midscene and adb. - Supports taps, swipes, text input, app launches, screenshots, and more. - Provides strict workflow and CLI usage guidelines to ensure reliable operation. - Emphasizes best practices for screenshot-driven automation. - Includes troubleshooting section for common device and connectivity issues.
Metadata
Slug midscene-android-automation
Version 1.0.2
License
All-time Installs 8
Active Installs 8
Total Versions 3
Frequently Asked Questions

What is Midscene Automations Skills for Android?

Vision-driven Android device automation using Midscene. Operates entirely from screenshots — no DOM or accessibility labels required. Can interact with all v... It is an AI Agent Skill for Claude Code / OpenClaw, with 1740 downloads so far.

How do I install Midscene Automations Skills for Android?

Run "/install midscene-android-automation" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Midscene Automations Skills for Android free?

Yes, Midscene Automations Skills for Android is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Midscene Automations Skills for Android support?

Midscene Automations Skills for Android is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Midscene Automations Skills for Android?

It is built and maintained by Leyang (@quanru); the current version is v1.0.2.

💬 Comments