← 返回 Skills 市场

Midscene Automations Skills for Browser

Name: Midscene Automations Skills for Browser
Author: quanru

作者 Leyang · GitHub ↗ · v1.0.3

cross-platform ⚠ suspicious

554

总下载

当前安装

版本数

在 OpenClaw 中安装

/install midscene-computer-browser

功能描述

Vision-driven browser automation using Midscene. Operates entirely from screenshots — no DOM or accessibility labels required. Can interact with all visible...

安全使用建议

This skill runs an npm CLI (npx @midscene/web@1) and requires model API keys, but the registry metadata doesn't declare those secrets and there's no homepage/source listed. Before installing or running it: (1) ask for the package repository or official homepage and verify the npm package contents (or prefer a published GitHub release), (2) do not supply high-privilege API keys — use limited-scope keys or a quota-limited test project, (3) run the skill first in an isolated/disposable environment, (4) prefer the developer add required env vars to the registry metadata and include integrity/pinned package info, and (5) monitor network and process activity while the skill runs. If you cannot verify the package source, treat it as higher risk and avoid providing real credentials.

功能分析

Type: OpenClaw Skill Name: midscene-computer-browser Version: 1.0.3 The skill bundle provides a legitimate interface for vision-driven browser automation using the Midscene.js framework. It utilizes the official `@midscene/web` package via `npx` to perform web interactions, screenshots, and data extraction. The instructions in `SKILL.md` are consistent with the stated purpose of the tool, and while it requires LLM API keys for operation, there is no evidence of malicious intent, data exfiltration, or unauthorized system access.

能力评估

ℹ Purpose & Capability

The described purpose (vision-driven browser automation using Midscene) aligns with the instructions to run npx @midscene/web and drive a headless Chrome via screenshots. However the skill metadata claims no required environment variables or credentials while the SKILL.md explicitly requires MIDSCENE_MODEL_API_KEY, MIDSCENE_MODEL_NAME, MIDSCENE_MODEL_BASE_URL, and MIDSCENE_MODEL_FAMILY. That metadata/instruction mismatch is incoherent.

⚠ Instruction Scope

SKILL.md instructs the agent to run npx CLI commands, take and read screenshots, and rely on a .env file (or system env vars) for model credentials. While reading screenshots is expected, the instructions implicitly expect access to .env and to secrets (API keys) and to execute network-fetched code via npx. The document does not instruct explicit exfiltration, but it gives the agent broad runtime powers (running arbitrary CLI commands from npm, persisting a browser process) which expands its attack surface.

⚠ Install Mechanism

There is no install spec in the registry (instruction-only), but the runtime relies on npx @midscene/web@1 which will fetch and run code from the npm registry at runtime. The skill package source and homepage are unknown in registry metadata, increasing risk: running npx pulls arbitrary remote code unless you verify the package/release. This is moderate-to-high risk compared with a pinned, verifiable install source.

⚠ Credentials

The SKILL.md requires multiple API-related environment variables (MIDSCENE_MODEL_API_KEY, NAME, BASE_URL, FAMILY, etc.) for external LLM providers, which is reasonable for a vision/LLM-backed tool — but the registry metadata lists no required env vars and no primary credential. The mismatch is problematic: a user would not see declared secrets required before installing. Also the skill suggests storing keys in a local .env file (which the agent may read indirectly via the CLI), so secret handling should be clarified and minimized.

✓ Persistence & Privilege

The skill does not request always:true or other elevated platform privileges. It runs CLI commands that spawn a persistent headless Chrome process across CLI calls (as part of the automation flow), but that is local process behavior, not elevated registry-level privilege.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install midscene-computer-browser
安装完成后，直接呼叫该 Skill 的名称或使用 /midscene-computer-browser 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.3

**User-facing changelog for midscene-computer-browser v1.0.3:** - Enforces proactive result reporting: skill now requires a clear summary of actions taken, key data found, and files generated after every automation task. - Adds a new critical rule: always report task results before finishing—no silent endings. - Updates model configuration examples for latest supported models (Qwen 3.5, Doubao Seed 2.0). - Clarifies that summary reporting (results, findings, file paths) must be included after automation tasks. - Renames skill metadata from "Browser Automation" to "browser-automation".

v1.0.1

- Updated skill to emphasize vision-based browser automation: now operates solely from screenshots with no DOM or accessibility label requirements. - Simplified workflow and command usage: recommend using high-level natural language `act` commands instead of separate step-by-step operations. - Added explicit instructions for environment variables and model configuration with practical model examples. - Revised best practices: batch consecutive actions into a single `act` prompt for speed and reliability. - Added troubleshooting section for connection issues, API key errors, timeouts, and screenshot file handling. - Clarified synchronous command execution: never run Midscene commands in the background or chain commands together.

v1.0.0

Initial release — AI-powered browser automation using Midscene. - Automate web browsing, data extraction, and frontend UI testing via headless Chrome (Puppeteer). - Supports actions like navigating, form filling, clicking, scrolling, keyboard input, and complex workflows. - Take and analyze screenshots to guide step-by-step interactions. - Persistent browser session across CLI calls; allows multi-step workflows without losing state. - Includes critical usage rules, example workflows, best practices, and detailed command references. - Requires `.env` file with API key for operation; explicit workflow for transient and persistent UI.

元数据

Slug midscene-computer-browser

版本 1.0.3

许可证 —

累计安装 1

当前安装数 1

历史版本数 3

常见问题

Midscene Automations Skills for Browser 是什么？

Vision-driven browser automation using Midscene. Operates entirely from screenshots — no DOM or accessibility labels required. Can interact with all visible... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 554 次。

如何安装 Midscene Automations Skills for Browser？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install midscene-computer-browser」即可一键安装，无需额外配置。

Midscene Automations Skills for Browser 是免费的吗？

是的，Midscene Automations Skills for Browser 完全免费（开源免费），可自由下载、安装和使用。

Midscene Automations Skills for Browser 支持哪些平台？

Midscene Automations Skills for Browser 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Midscene Automations Skills for Browser？

由 Leyang（@quanru）开发并维护，当前版本 v1.0.3。