← 返回 Skills 市场

Midscene Automations Skills for iOS

Name: Midscene Automations Skills for iOS
Author: quanru

作者 Leyang · GitHub ↗ · v1.0.4

cross-platform ⚠ suspicious

1152

总下载

当前安装

版本数

在 OpenClaw 中安装

/install midscene-ios-automation

功能描述

Vision-driven iOS device automation using Midscene CLI. Operates entirely from screenshots — no DOM or accessibility labels required. Can interact with all v...

安全使用建议

This skill's instructions require Node/npm (npx), an iOS device connection (WebDriverAgent), and API keys for external model providers — but the registry metadata does not declare any of those requirements. Before installing or using it: (1) Confirm you have Node/npm and understand that `npx` will download and run @midscene/ios from npm at runtime. (2) Only provide model API keys for trusted providers and avoid using high-privilege production keys while testing. Screenshots captured by the skill will be sent to the configured MIDSCENE_MODEL_BASE_URL, so do not use it with sensitive apps or data unless you trust the endpoint. (3) Verify how the CLI connects to your iOS device (WebDriverAgent, network/USB proxy) and whether additional local tooling or permissions are needed. (4) Ask the publisher for the source/homepage and a manifest that correctly declares required binaries and environment variables; do not proceed if you cannot validate the origin. (5) If you must try it, test in a controlled environment with disposable API keys and non-sensitive apps.

功能分析

Type: OpenClaw Skill Name: midscene-ios-automation Version: 1.0.4 The skill provides a legitimate interface for iOS device automation using the Midscene.js framework via the `@midscene/ios` CLI. It follows standard automation patterns using WebDriverAgent and requires environment variables for AI model integration (e.g., Gemini, Qwen) as documented in SKILL.md. There is no evidence of malicious intent, data exfiltration, or unauthorized persistence.

能力评估

⚠ Purpose & Capability

The skill's description and SKILL.md expect use of the Midscene CLI via `npx @midscene/ios@1`, visual models, and connection to iOS devices (WebDriverAgent). However, the registry metadata declares no required binaries or config paths. In practice the agent will need Node/npm (npx) and access to an iOS device/agent — none of which are declared, which is inconsistent with the stated purpose.

⚠ Instruction Scope

Runtime instructions direct the agent to take screenshots and read the saved image files, and to send them to the configured model endpoint (MIDSCENE_MODEL_BASE_URL) for visual analysis. That is coherent with the described functionality but implies transmitting potentially sensitive screen contents to external model providers. The SKILL.md also instructs the agent to load .env in the current working directory and rely on environment variables for API keys — this expands the agent's access to secrets and local files.

ℹ Install Mechanism

There is no install spec (instruction-only), which minimizes direct disk writes. However the runtime use of `npx @midscene/ios@1` means code will be fetched from the npm registry at runtime. This is expected for an npm-based CLI but is not declared in the manifest; there are no direct download URLs or extract steps in the skill itself.

⚠ Credentials

SKILL.md requires several model-related environment variables (MIDSCENE_MODEL_API_KEY, MIDSCENE_MODEL_NAME, MIDSCENE_MODEL_BASE_URL, MIDSCENE_MODEL_FAMILY and optional flags). Those are reasonable for a vision model-driven CLI, but the registry metadata declares no required env vars — an important mismatch. The variables are sensitive (API keys) and will be used to send screenshots to third-party endpoints, so requesting them should have been declared explicitly in the registry metadata.

✓ Persistence & Privilege

The skill does not request 'always: true' and does not declare system-wide config changes. It does instruct the agent to run synchronous CLI commands; autonomous invocation is allowed by default but not excessive here. There is no sign the skill requests persistent elevated system privileges.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install midscene-ios-automation
安装完成后，直接呼叫该 Skill 的名称或使用 /midscene-ios-automation 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.4

midscene-ios-automation 1.0.4 - Added explicit documentation of required environment variables in the skill manifest with an env section. - Included security guidance for protecting API keys and recommended adding `.env` to `.gitignore`. - No functional changes; documentation update only.

v1.0.3

- Version 1.0.3 released. - No file changes were detected in this update. - All workflow rules, best practices, and prerequisites remain unchanged. - Documentation, usage instructions, and troubleshooting steps are fully preserved from the previous version.

v1.0.2

- Enforced a new rule: after automation completes, always summarize and present task results to the user, including key data, actions, screenshots, and findings. - Updated model configuration examples (added Qwen 3.5, Doubao Seed 2.0 Lite; removed outdated Qwen3-VL and Doubao 1.6). - Clarified workflow and best practices to require proactive reporting of results after each task. - Minor corrections to naming and environment variable descriptions.

v1.0.1

**Summary:** This update refines the skill to focus on end-to-end, vision-driven iOS automation with clearer setup, streamlined workflows, and improved best practices. - Simplifies workflow by promoting the use of a single, high-level `act` command for multi-step UI interactions rather than step-by-step CLI commands. - Updates environment variable requirements for modern visual grounding AI models, with explicit setup examples and stronger prerequisite checks. - Revises best practices: batch related actions into one prompt, describe UI elements clearly, and always summarize generated output files for users. - Removes references to running commands in the background and tool misuse for a more robust, synchronous automation process. - Documentation is streamlined to emphasize screenshot-driven, technology-agnostic interactions for all visible elements. - Adds troubleshooting and model configuration resources for easier onboarding and debugging.

v1.0.0

Initial release of iOS Device Automation using Midscene CLI: - Automate iOS devices and simulators with natural language commands via WebDriverAgent. - Use Bash tool calls to execute Midscene CLI actions like tap, scroll, input, screenshots, and more. - Strict workflow: connect, take screenshot, analyze, perform single action, repeat, then disconnect. - Enforced rules: do not use background execution, only one CLI command per Bash call, 60s timeout. - Includes best practices for UI targeting, transient UI, and troubleshooting connectivity and API key issues.

元数据

Slug midscene-ios-automation

版本 1.0.4

许可证 —

累计安装 4

当前安装数 4

历史版本数 5

常见问题

Midscene Automations Skills for iOS 是什么？

Vision-driven iOS device automation using Midscene CLI. Operates entirely from screenshots — no DOM or accessibility labels required. Can interact with all v... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 1152 次。

如何安装 Midscene Automations Skills for iOS？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install midscene-ios-automation」即可一键安装，无需额外配置。

Midscene Automations Skills for iOS 是免费的吗？

是的，Midscene Automations Skills for iOS 完全免费（开源免费），可自由下载、安装和使用。

Midscene Automations Skills for iOS 支持哪些平台？

Midscene Automations Skills for iOS 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Midscene Automations Skills for iOS？

由 Leyang（@quanru）开发并维护，当前版本 v1.0.4。