← Back to Skills Marketplace
1152
Downloads
0
Stars
4
Active Installs
5
Versions
Install in OpenClaw
/install midscene-ios-automation
Description
Vision-driven iOS device automation using Midscene CLI. Operates entirely from screenshots — no DOM or accessibility labels required. Can interact with all v...
Usage Guidance
This skill's instructions require Node/npm (npx), an iOS device connection (WebDriverAgent), and API keys for external model providers — but the registry metadata does not declare any of those requirements. Before installing or using it: (1) Confirm you have Node/npm and understand that `npx` will download and run @midscene/ios from npm at runtime. (2) Only provide model API keys for trusted providers and avoid using high-privilege production keys while testing. Screenshots captured by the skill will be sent to the configured MIDSCENE_MODEL_BASE_URL, so do not use it with sensitive apps or data unless you trust the endpoint. (3) Verify how the CLI connects to your iOS device (WebDriverAgent, network/USB proxy) and whether additional local tooling or permissions are needed. (4) Ask the publisher for the source/homepage and a manifest that correctly declares required binaries and environment variables; do not proceed if you cannot validate the origin. (5) If you must try it, test in a controlled environment with disposable API keys and non-sensitive apps.
Capability Analysis
Type: OpenClaw Skill
Name: midscene-ios-automation
Version: 1.0.4
The skill provides a legitimate interface for iOS device automation using the Midscene.js framework via the `@midscene/ios` CLI. It follows standard automation patterns using WebDriverAgent and requires environment variables for AI model integration (e.g., Gemini, Qwen) as documented in SKILL.md. There is no evidence of malicious intent, data exfiltration, or unauthorized persistence.
Capability Assessment
Purpose & Capability
The skill's description and SKILL.md expect use of the Midscene CLI via `npx @midscene/ios@1`, visual models, and connection to iOS devices (WebDriverAgent). However, the registry metadata declares no required binaries or config paths. In practice the agent will need Node/npm (npx) and access to an iOS device/agent — none of which are declared, which is inconsistent with the stated purpose.
Instruction Scope
Runtime instructions direct the agent to take screenshots and read the saved image files, and to send them to the configured model endpoint (MIDSCENE_MODEL_BASE_URL) for visual analysis. That is coherent with the described functionality but implies transmitting potentially sensitive screen contents to external model providers. The SKILL.md also instructs the agent to load .env in the current working directory and rely on environment variables for API keys — this expands the agent's access to secrets and local files.
Install Mechanism
There is no install spec (instruction-only), which minimizes direct disk writes. However the runtime use of `npx @midscene/ios@1` means code will be fetched from the npm registry at runtime. This is expected for an npm-based CLI but is not declared in the manifest; there are no direct download URLs or extract steps in the skill itself.
Credentials
SKILL.md requires several model-related environment variables (MIDSCENE_MODEL_API_KEY, MIDSCENE_MODEL_NAME, MIDSCENE_MODEL_BASE_URL, MIDSCENE_MODEL_FAMILY and optional flags). Those are reasonable for a vision model-driven CLI, but the registry metadata declares no required env vars — an important mismatch. The variables are sensitive (API keys) and will be used to send screenshots to third-party endpoints, so requesting them should have been declared explicitly in the registry metadata.
Persistence & Privilege
The skill does not request 'always: true' and does not declare system-wide config changes. It does instruct the agent to run synchronous CLI commands; autonomous invocation is allowed by default but not excessive here. There is no sign the skill requests persistent elevated system privileges.
How to Use
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install midscene-ios-automation - After installation, invoke the skill by name or use
/midscene-ios-automation - Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.4
midscene-ios-automation 1.0.4
- Added explicit documentation of required environment variables in the skill manifest with an env section.
- Included security guidance for protecting API keys and recommended adding `.env` to `.gitignore`.
- No functional changes; documentation update only.
v1.0.3
- Version 1.0.3 released.
- No file changes were detected in this update.
- All workflow rules, best practices, and prerequisites remain unchanged.
- Documentation, usage instructions, and troubleshooting steps are fully preserved from the previous version.
v1.0.2
- Enforced a new rule: after automation completes, always summarize and present task results to the user, including key data, actions, screenshots, and findings.
- Updated model configuration examples (added Qwen 3.5, Doubao Seed 2.0 Lite; removed outdated Qwen3-VL and Doubao 1.6).
- Clarified workflow and best practices to require proactive reporting of results after each task.
- Minor corrections to naming and environment variable descriptions.
v1.0.1
**Summary:** This update refines the skill to focus on end-to-end, vision-driven iOS automation with clearer setup, streamlined workflows, and improved best practices.
- Simplifies workflow by promoting the use of a single, high-level `act` command for multi-step UI interactions rather than step-by-step CLI commands.
- Updates environment variable requirements for modern visual grounding AI models, with explicit setup examples and stronger prerequisite checks.
- Revises best practices: batch related actions into one prompt, describe UI elements clearly, and always summarize generated output files for users.
- Removes references to running commands in the background and tool misuse for a more robust, synchronous automation process.
- Documentation is streamlined to emphasize screenshot-driven, technology-agnostic interactions for all visible elements.
- Adds troubleshooting and model configuration resources for easier onboarding and debugging.
v1.0.0
Initial release of iOS Device Automation using Midscene CLI:
- Automate iOS devices and simulators with natural language commands via WebDriverAgent.
- Use Bash tool calls to execute Midscene CLI actions like tap, scroll, input, screenshots, and more.
- Strict workflow: connect, take screenshot, analyze, perform single action, repeat, then disconnect.
- Enforced rules: do not use background execution, only one CLI command per Bash call, 60s timeout.
- Includes best practices for UI targeting, transient UI, and troubleshooting connectivity and API key issues.
Metadata
Frequently Asked Questions
What is Midscene Automations Skills for iOS?
Vision-driven iOS device automation using Midscene CLI. Operates entirely from screenshots — no DOM or accessibility labels required. Can interact with all v... It is an AI Agent Skill for Claude Code / OpenClaw, with 1152 downloads so far.
How do I install Midscene Automations Skills for iOS?
Run "/install midscene-ios-automation" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Midscene Automations Skills for iOS free?
Yes, Midscene Automations Skills for iOS is completely free (open-source). You can download, install and use it at no cost.
Which platforms does Midscene Automations Skills for iOS support?
Midscene Automations Skills for iOS is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Midscene Automations Skills for iOS?
It is built and maintained by Leyang (@quanru); the current version is v1.0.4.
More Skills