← Back to Skills Marketplace

Midscene Automations Skills for Browser with Bridge

Name: Midscene Automations Skills for Browser with Bridge
Author: quanru

by Leyang · GitHub ↗ · v1.0.3

cross-platform ⚠ suspicious

910

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install midscene-computer-chrome-bridge

Description

Vision-driven browser automation using Midscene Bridge mode. Operates entirely from screenshots — no DOM or accessibility labels required. Can interact with...

Usage Guidance

This skill will drive your real Chrome and send screenshots and interactions to external model endpoints. Before installing or using it: 1) Ask the publisher for a source repository or homepage and a clear provenance for the Midscene Chrome extension and the @midscene/web package. Do not proceed if there is no trusted source. 2) Recognize that model API keys configured for this skill will allow third-party services to receive screenshots — do NOT use on accounts/sites containing sensitive information (banking, healthcare, private messages, MFA tokens) unless you fully trust the endpoint. 3) Verify the npm package @midscene/web@1 (check the code, release signatures, and publisher) or request a pinned tarball/sha to avoid unexpected remote code execution via npx. 4) Prefer running in an isolated environment: a disposable browser profile or VM, and limit the scope of pages the skill may access. 5) Require explicit user confirmation before interacting with any sensitive page and consider using a private/local model endpoint (set MIDSCENE_MODEL_BASE_URL to a trusted internal host) so screenshots are not sent to third parties. 6) Ask the author to update registry metadata to declare required env vars and runtime binaries (npx/node), and to publish source/homepage; until then treat this skill as untrusted. If you cannot verify these items, do not install or use it on sensitive systems.

Capability Analysis

Type: OpenClaw Skill Name: midscene-computer-chrome-bridge Version: 1.0.3 The skill bundle provides a legitimate interface for Midscene.js, a vision-driven browser automation tool. It uses the official `@midscene/web` npm package to interact with a user's Chrome browser via a dedicated extension. The instructions in `SKILL.md` are well-structured, focusing on operational reliability (synchronous execution, result reporting) and standard AI model configuration (Gemini, Qwen, Doubao). No evidence of data exfiltration, malicious command execution, or deceptive prompt injection was found.

Capability Assessment

⚠ Purpose & Capability

The SKILL.md describes vision-driven automation of the user's real Chrome via a Midscene Chrome extension and requires model credentials to run. However the registry metadata lists no required environment variables or binaries and there is no source/homepage. The SKILL.md implicitly depends on node/npx and a remote @midscene/web package (via npx), which are not declared. The required capabilities (access to browser state and external model endpoints) are plausible for the stated purpose, but the missing declarations and absent source/homepage are incoherent and concerning.

⚠ Instruction Scope

The instructions explicitly tell the agent to connect to the user's real Chrome, preserve cookies/sessions, take screenshots, read the saved image files, and send high-level prompts to the Midscene tool. That means screenshots (and therefore potentially passwords, 2FA, private messages, bank details, etc.) will be seen by downstream model endpoints. The SKILL.md also tells the agent not to verify extension status and to connect directly, which removes a safety/check step. The agent is given broad discretion to interact with any visible element and to scrape data, which is consistent with the stated purpose but increases privacy risk.

ℹ Install Mechanism

There is no declared install spec (instruction-only), which lowers static install risk. However the runtime commands use 'npx @midscene/web@1', meaning npx will download and execute code from the npm registry at runtime. That is an implicit install/execute step not represented in the registry metadata and carries risk if the package or its release source is untrusted.

⚠ Credentials

SKILL.md requires multiple environment variables (MIDSCENE_MODEL_API_KEY, MIDSCENE_MODEL_NAME, MIDSCENE_MODEL_BASE_URL, MIDSCENE_MODEL_FAMILY, etc.) for external model providers (Google, OpenRouter, Aliyun, Doubao examples). None of these required env vars appear in the registry metadata. These credentials would allow external services to receive screenshots and page content — a high-sensitivity capability. Requesting model API keys is consistent with the skill's function, but the absence of these requirements from the declared metadata is an incoherence and increases risk.

✓ Persistence & Privilege

The skill is not always-enabled and does not declare config paths or persistent system-wide changes. Autonomous model invocation is allowed (platform default) but not combined with 'always: true'. The skill does instruct storing a .env in the working directory (local only), which is normal for credentials but should be treated carefully.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install midscene-computer-chrome-bridge
After installation, invoke the skill by name or use /midscene-computer-chrome-bridge
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.3

**Adds strict result reporting requirements and updates model guidance.** - Now mandates a clear summary of results after every automation task, including data found, actions completed, screenshots taken, and relevant findings. - Updates environment variable examples and recommended models (adds Qwen 3.5, Doubao Seed 2.0 Lite, and related provider instructions). - Clarifies workflow: do not disconnect from Bridge mode unless the user's entire task is fully complete; keep sessions available for continued use. - Revises best practices and workflow pattern for improved clarity and efficiency.

v1.0.1

- Updated documentation to emphasize "vision-driven" automation — all actions operate from screenshots, no DOM/selector logic required. - Expanded model setup instructions: now includes required environment variables and detailed configuration examples for Gemini, Qwen3-VL, Doubao, and others. - Changed command usage examples to prefer high-level, natural language `act` commands for batching multi-step UI operations, improving efficiency. - Clarified best practices: batch related actions into single `act` commands, summarize report files for the user, and never run commands in background. - Updated troubleshooting guides for connection and screenshot issues; added direct Chrome Extension store link. - Minor formatting and clarity improvements throughout for easier onboarding.

v1.0.0

Chrome Bridge Automation skill initial release: - Enables AI-powered automation of the user's real Chrome browser via the Midscene Extension in Bridge mode. - Supports browsing, navigation, interaction with authenticated pages, scraping, form filling, UI testing, and screenshot capture. - Follows strict command formats and workflow patterns for reliable operation. - Preserves browser context, sessions, and cookies for seamless web automation. - Includes troubleshooting guidance and best practices for stable multi-step workflows.

Metadata

Slug midscene-computer-chrome-bridge

Version 1.0.3

License —

All-time Installs 3

Active Installs 3

Total Versions 3

Frequently Asked Questions

What is Midscene Automations Skills for Browser with Bridge?

Vision-driven browser automation using Midscene Bridge mode. Operates entirely from screenshots — no DOM or accessibility labels required. Can interact with... It is an AI Agent Skill for Claude Code / OpenClaw, with 910 downloads so far.

How do I install Midscene Automations Skills for Browser with Bridge?

Run "/install midscene-computer-chrome-bridge" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Midscene Automations Skills for Browser with Bridge free?

Yes, Midscene Automations Skills for Browser with Bridge is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Midscene Automations Skills for Browser with Bridge support?

Midscene Automations Skills for Browser with Bridge is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Midscene Automations Skills for Browser with Bridge?

It is built and maintained by Leyang (@quanru); the current version is v1.0.3.

More Skills