功能描述

Build and run Gemini 2.5 Computer Use browser-control agents with Playwright. Use when a user wants to automate web browser tasks via the Gemini Computer Use model, needs an agent loop (screenshot → function_call → action → function_response), or asks to integrate safety confirmation for risky UI actions.

使用说明 (SKILL.md)

Gemini Computer Use

Name: Gemini Computer Use
Author: am-will

Quick start

Source the env file and set your API key:

cp env.example env.sh
$EDITOR env.sh
source env.sh

Create a virtual environment and install dependencies:

python -m venv .venv
source .venv/bin/activate
pip install google-genai playwright
playwright install chromium

Run the agent script with a prompt:

python scripts/computer_use_agent.py \
  --prompt "Find the latest blog post title on example.com" \
  --start-url "https://example.com" \
  --turn-limit 6

Browser selection

Default: Playwright's bundled Chromium (no env vars required).
Choose a channel (Chrome/Edge) with COMPUTER_USE_BROWSER_CHANNEL.
Use a custom Chromium-based executable (e.g., Brave) with COMPUTER_USE_BROWSER_EXECUTABLE.

If both are set, COMPUTER_USE_BROWSER_EXECUTABLE takes precedence.

Core workflow (agent loop)

Capture a screenshot and send the user goal + screenshot to the model.
Parse function_call actions in the response.
Execute each action in Playwright.
If a safety_decision is require_confirmation, prompt the user before executing.
Send function_response objects containing the latest URL + screenshot.
Repeat until the model returns only text (no actions) or you hit the turn limit.

Operational guidance

Run in a sandboxed browser profile or container.
Use --exclude to block risky actions you do not want the model to take.
Keep the viewport at 1440x900 unless you have a reason to change it.

Resources

Script: scripts/computer_use_agent.py
Reference notes: references/google-computer-use.md
Env template: env.example

安全使用建议

Before installing or running this skill: - Expect to set GEMINI_API_KEY (the code will exit if GEMINI_API_KEY is not set). The registry metadata incorrectly claimed no env vars — don't trust that field alone. - Screenshots of the browser are sent to the Gemini/Google GenAI endpoint as part of normal operation. Those screenshots can contain sensitive data (credentials, personal info, 2FA codes). Only run this against pages you are comfortable sending to an external API. - Run the agent in a sandboxed environment or container and avoid pointing COMPUTER_USE_BROWSER_EXECUTABLE at a browser that uses your real profile (bookmarks/cookies/sessions) — otherwise the agent could act using your authenticated sessions. - The included Python script appears to be truncated near the model invocation (a fragment referencing 'MOD' was cut off). That may be a bug or hide additional behavior. Inspect the full script locally before running; fix the apparent variable name and ensure the model call and loop are readable. - Verify the safety confirmation flow: the script will prompt via input() only when the model provides 'safety_decision: require_confirmation'; many actions will execute without prompting. If you need stricter controls, modify the code to enforce confirmation or block lists before running. - If you are uncertain about network exposure, run the script in an isolated VM/container and review network traffic to confirm only expected calls to Google GenAI occur. - If you want to proceed, obtain the env.example referenced in SKILL.md, set GEMINI_API_KEY, inspect and possibly patch the script, and test on non-sensitive sites first.

功能分析

Type: OpenClaw Skill Name: gemini-computer-use Version: 1.0.0 The skill is designed for browser automation using the Gemini Computer Use model and Playwright. It implements robust security features, including whitelisting supported actions, requiring user confirmation for potentially risky actions, and allowing users to exclude specific actions. The `SKILL.md` provides clear, benign instructions without any prompt injection attempts against the agent. The Python script's capabilities (browser navigation, interaction, screenshot capture) are directly aligned with its stated purpose, and there is no evidence of intentional malicious behavior such as data exfiltration, unauthorized execution, or persistence.

能力评估

ℹ Purpose & Capability

The name/description (Gemini Computer Use browser-control agents) matches the included script and instructions: it uses Playwright and the Google GenAI client to run a screenshot → function_call → action → function_response loop. However the registry metadata claims 'Required env vars: none' while both the SKILL.md quickstart and the script require a GEMINI_API_KEY (and optionally COMPUTER_USE_BROWSER_CHANNEL / COMPUTER_USE_BROWSER_EXECUTABLE). That registry vs implementation mismatch is inconsistent and should be corrected/clarified.

⚠ Instruction Scope

SKILL.md tells the user to set an API key and run the provided script. The runtime instructions and code capture full-page screenshots and send them (inline image/png parts) along with the user prompt to the external Gemini model (Google GenAI). This is expected for the skill's purpose, but it means screenshots (which may contain sensitive information) are transmitted off-host. The instructions also allow the model to emit function_call actions that the script executes directly in Playwright; while the script supports a user confirmation flow for 'require_confirmation', most actions will execute without prompting. Additionally, the script included in the package is truncated near the model call (it references an apparent variable/modification error), which could hide additional behavior or indicate the shipped script will fail or behave unexpectedly.

✓ Install Mechanism

There is no automated install spec (instruction-only install). The SKILL.md instructs the user to create a virtualenv and pip install google-genai and playwright, then run 'playwright install chromium'. This is a standard, low-risk approach compared to bundled downloads from arbitrary URLs. The package includes a Python script; no external downloads or extract/install steps are declared in the skill bundle itself.

⚠ Credentials

The code legitimately requires GEMINI_API_KEY to call the Gemini Computer Use model and optionally COMPUTER_USE_BROWSER_CHANNEL and COMPUTER_USE_BROWSER_EXECUTABLE to control browser selection. Those env vars are proportional to the stated purpose. However the public registry metadata incorrectly lists no required env vars, which is misleading. Also note that transmitting screenshots to the external API is intrinsic to functionality but is a privacy-sensitive operation — the skill will send image data to Google's API, and users should consider whether that exposure is acceptable for the pages/screens they automate.

ℹ Persistence & Privilege

The skill is not always-enabled and does not request special platform privileges. The skill is allowed to be invoked autonomously (disable-model-invocation is false), which is the platform default; combined with broad browser control capabilities, autonomous invocation increases the blast radius (the agent could autonomously navigate, click, and type). SKILL.md does recommend running in a sandboxed profile or container. There is no evidence the skill modifies other skills or system settings.

版本历史

v1.0.0

Initial release - Gemini 2.5 Computer Use browser-control agents with Playwright

元数据

Slug gemini-computer-use

版本 1.0.0

许可证 —

累计安装 13

当前安装数 13

历史版本数 1

常见问题