← Back to Skills Marketplace

Midscene Automations Skills for Browser

Name: Midscene Automations Skills for Browser
Author: quanru

by Leyang · GitHub ↗ · v1.0.3

cross-platform ⚠ suspicious

554

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install midscene-computer-browser

Description

Vision-driven browser automation using Midscene. Operates entirely from screenshots — no DOM or accessibility labels required. Can interact with all visible...

Usage Guidance

This skill runs an npm CLI (npx @midscene/web@1) and requires model API keys, but the registry metadata doesn't declare those secrets and there's no homepage/source listed. Before installing or running it: (1) ask for the package repository or official homepage and verify the npm package contents (or prefer a published GitHub release), (2) do not supply high-privilege API keys — use limited-scope keys or a quota-limited test project, (3) run the skill first in an isolated/disposable environment, (4) prefer the developer add required env vars to the registry metadata and include integrity/pinned package info, and (5) monitor network and process activity while the skill runs. If you cannot verify the package source, treat it as higher risk and avoid providing real credentials.

Capability Analysis

Type: OpenClaw Skill Name: midscene-computer-browser Version: 1.0.3 The skill bundle provides a legitimate interface for vision-driven browser automation using the Midscene.js framework. It utilizes the official `@midscene/web` package via `npx` to perform web interactions, screenshots, and data extraction. The instructions in `SKILL.md` are consistent with the stated purpose of the tool, and while it requires LLM API keys for operation, there is no evidence of malicious intent, data exfiltration, or unauthorized system access.

Capability Assessment

ℹ Purpose & Capability

The described purpose (vision-driven browser automation using Midscene) aligns with the instructions to run npx @midscene/web and drive a headless Chrome via screenshots. However the skill metadata claims no required environment variables or credentials while the SKILL.md explicitly requires MIDSCENE_MODEL_API_KEY, MIDSCENE_MODEL_NAME, MIDSCENE_MODEL_BASE_URL, and MIDSCENE_MODEL_FAMILY. That metadata/instruction mismatch is incoherent.

⚠ Instruction Scope

SKILL.md instructs the agent to run npx CLI commands, take and read screenshots, and rely on a .env file (or system env vars) for model credentials. While reading screenshots is expected, the instructions implicitly expect access to .env and to secrets (API keys) and to execute network-fetched code via npx. The document does not instruct explicit exfiltration, but it gives the agent broad runtime powers (running arbitrary CLI commands from npm, persisting a browser process) which expands its attack surface.

⚠ Install Mechanism

There is no install spec in the registry (instruction-only), but the runtime relies on npx @midscene/web@1 which will fetch and run code from the npm registry at runtime. The skill package source and homepage are unknown in registry metadata, increasing risk: running npx pulls arbitrary remote code unless you verify the package/release. This is moderate-to-high risk compared with a pinned, verifiable install source.

⚠ Credentials

The SKILL.md requires multiple API-related environment variables (MIDSCENE_MODEL_API_KEY, NAME, BASE_URL, FAMILY, etc.) for external LLM providers, which is reasonable for a vision/LLM-backed tool — but the registry metadata lists no required env vars and no primary credential. The mismatch is problematic: a user would not see declared secrets required before installing. Also the skill suggests storing keys in a local .env file (which the agent may read indirectly via the CLI), so secret handling should be clarified and minimized.

✓ Persistence & Privilege

The skill does not request always:true or other elevated platform privileges. It runs CLI commands that spawn a persistent headless Chrome process across CLI calls (as part of the automation flow), but that is local process behavior, not elevated registry-level privilege.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install midscene-computer-browser
After installation, invoke the skill by name or use /midscene-computer-browser
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.3

**User-facing changelog for midscene-computer-browser v1.0.3:** - Enforces proactive result reporting: skill now requires a clear summary of actions taken, key data found, and files generated after every automation task. - Adds a new critical rule: always report task results before finishing—no silent endings. - Updates model configuration examples for latest supported models (Qwen 3.5, Doubao Seed 2.0). - Clarifies that summary reporting (results, findings, file paths) must be included after automation tasks. - Renames skill metadata from "Browser Automation" to "browser-automation".

v1.0.1

- Updated skill to emphasize vision-based browser automation: now operates solely from screenshots with no DOM or accessibility label requirements. - Simplified workflow and command usage: recommend using high-level natural language `act` commands instead of separate step-by-step operations. - Added explicit instructions for environment variables and model configuration with practical model examples. - Revised best practices: batch consecutive actions into a single `act` prompt for speed and reliability. - Added troubleshooting section for connection issues, API key errors, timeouts, and screenshot file handling. - Clarified synchronous command execution: never run Midscene commands in the background or chain commands together.

v1.0.0

Initial release — AI-powered browser automation using Midscene. - Automate web browsing, data extraction, and frontend UI testing via headless Chrome (Puppeteer). - Supports actions like navigating, form filling, clicking, scrolling, keyboard input, and complex workflows. - Take and analyze screenshots to guide step-by-step interactions. - Persistent browser session across CLI calls; allows multi-step workflows without losing state. - Includes critical usage rules, example workflows, best practices, and detailed command references. - Requires `.env` file with API key for operation; explicit workflow for transient and persistent UI.

Metadata

Slug midscene-computer-browser

Version 1.0.3

License —

All-time Installs 1

Active Installs 1

Total Versions 3

Frequently Asked Questions

What is Midscene Automations Skills for Browser?

Vision-driven browser automation using Midscene. Operates entirely from screenshots — no DOM or accessibility labels required. Can interact with all visible... It is an AI Agent Skill for Claude Code / OpenClaw, with 554 downloads so far.

How do I install Midscene Automations Skills for Browser?

Run "/install midscene-computer-browser" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Midscene Automations Skills for Browser free?

Yes, Midscene Automations Skills for Browser is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Midscene Automations Skills for Browser support?

Midscene Automations Skills for Browser is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Midscene Automations Skills for Browser?

It is built and maintained by Leyang (@quanru); the current version is v1.0.3.

More Skills