← 返回 Skills 市场
am-will

Gemini Computer Use

作者 am-will · GitHub ↗ · v1.0.0
cross-platform ⚠ suspicious
3867
总下载
5
收藏
13
当前安装
1
版本数
在 OpenClaw 中安装
/install gemini-computer-use
功能描述
Build and run Gemini 2.5 Computer Use browser-control agents with Playwright. Use when a user wants to automate web browser tasks via the Gemini Computer Use model, needs an agent loop (screenshot → function_call → action → function_response), or asks to integrate safety confirmation for risky UI actions.
使用说明 (SKILL.md)

Gemini Computer Use

Quick start

  1. Source the env file and set your API key:

    cp env.example env.sh
    $EDITOR env.sh
    source env.sh
    
  2. Create a virtual environment and install dependencies:

    python -m venv .venv
    source .venv/bin/activate
    pip install google-genai playwright
    playwright install chromium
    
  3. Run the agent script with a prompt:

    python scripts/computer_use_agent.py \
      --prompt "Find the latest blog post title on example.com" \
      --start-url "https://example.com" \
      --turn-limit 6
    

Browser selection

  • Default: Playwright's bundled Chromium (no env vars required).
  • Choose a channel (Chrome/Edge) with COMPUTER_USE_BROWSER_CHANNEL.
  • Use a custom Chromium-based executable (e.g., Brave) with COMPUTER_USE_BROWSER_EXECUTABLE.

If both are set, COMPUTER_USE_BROWSER_EXECUTABLE takes precedence.

Core workflow (agent loop)

  1. Capture a screenshot and send the user goal + screenshot to the model.
  2. Parse function_call actions in the response.
  3. Execute each action in Playwright.
  4. If a safety_decision is require_confirmation, prompt the user before executing.
  5. Send function_response objects containing the latest URL + screenshot.
  6. Repeat until the model returns only text (no actions) or you hit the turn limit.

Operational guidance

  • Run in a sandboxed browser profile or container.
  • Use --exclude to block risky actions you do not want the model to take.
  • Keep the viewport at 1440x900 unless you have a reason to change it.

Resources

  • Script: scripts/computer_use_agent.py
  • Reference notes: references/google-computer-use.md
  • Env template: env.example
安全使用建议
Before installing or running this skill: - Expect to set GEMINI_API_KEY (the code will exit if GEMINI_API_KEY is not set). The registry metadata incorrectly claimed no env vars — don't trust that field alone. - Screenshots of the browser are sent to the Gemini/Google GenAI endpoint as part of normal operation. Those screenshots can contain sensitive data (credentials, personal info, 2FA codes). Only run this against pages you are comfortable sending to an external API. - Run the agent in a sandboxed environment or container and avoid pointing COMPUTER_USE_BROWSER_EXECUTABLE at a browser that uses your real profile (bookmarks/cookies/sessions) — otherwise the agent could act using your authenticated sessions. - The included Python script appears to be truncated near the model invocation (a fragment referencing 'MOD' was cut off). That may be a bug or hide additional behavior. Inspect the full script locally before running; fix the apparent variable name and ensure the model call and loop are readable. - Verify the safety confirmation flow: the script will prompt via input() only when the model provides 'safety_decision: require_confirmation'; many actions will execute without prompting. If you need stricter controls, modify the code to enforce confirmation or block lists before running. - If you are uncertain about network exposure, run the script in an isolated VM/container and review network traffic to confirm only expected calls to Google GenAI occur. - If you want to proceed, obtain the env.example referenced in SKILL.md, set GEMINI_API_KEY, inspect and possibly patch the script, and test on non-sensitive sites first.
功能分析
Type: OpenClaw Skill Name: gemini-computer-use Version: 1.0.0 The skill is designed for browser automation using the Gemini Computer Use model and Playwright. It implements robust security features, including whitelisting supported actions, requiring user confirmation for potentially risky actions, and allowing users to exclude specific actions. The `SKILL.md` provides clear, benign instructions without any prompt injection attempts against the agent. The Python script's capabilities (browser navigation, interaction, screenshot capture) are directly aligned with its stated purpose, and there is no evidence of intentional malicious behavior such as data exfiltration, unauthorized execution, or persistence.
能力评估
Purpose & Capability
The name/description (Gemini Computer Use browser-control agents) matches the included script and instructions: it uses Playwright and the Google GenAI client to run a screenshot → function_call → action → function_response loop. However the registry metadata claims 'Required env vars: none' while both the SKILL.md quickstart and the script require a GEMINI_API_KEY (and optionally COMPUTER_USE_BROWSER_CHANNEL / COMPUTER_USE_BROWSER_EXECUTABLE). That registry vs implementation mismatch is inconsistent and should be corrected/clarified.
Instruction Scope
SKILL.md tells the user to set an API key and run the provided script. The runtime instructions and code capture full-page screenshots and send them (inline image/png parts) along with the user prompt to the external Gemini model (Google GenAI). This is expected for the skill's purpose, but it means screenshots (which may contain sensitive information) are transmitted off-host. The instructions also allow the model to emit function_call actions that the script executes directly in Playwright; while the script supports a user confirmation flow for 'require_confirmation', most actions will execute without prompting. Additionally, the script included in the package is truncated near the model call (it references an apparent variable/modification error), which could hide additional behavior or indicate the shipped script will fail or behave unexpectedly.
Install Mechanism
There is no automated install spec (instruction-only install). The SKILL.md instructs the user to create a virtualenv and pip install google-genai and playwright, then run 'playwright install chromium'. This is a standard, low-risk approach compared to bundled downloads from arbitrary URLs. The package includes a Python script; no external downloads or extract/install steps are declared in the skill bundle itself.
Credentials
The code legitimately requires GEMINI_API_KEY to call the Gemini Computer Use model and optionally COMPUTER_USE_BROWSER_CHANNEL and COMPUTER_USE_BROWSER_EXECUTABLE to control browser selection. Those env vars are proportional to the stated purpose. However the public registry metadata incorrectly lists no required env vars, which is misleading. Also note that transmitting screenshots to the external API is intrinsic to functionality but is a privacy-sensitive operation — the skill will send image data to Google's API, and users should consider whether that exposure is acceptable for the pages/screens they automate.
Persistence & Privilege
The skill is not always-enabled and does not request special platform privileges. The skill is allowed to be invoked autonomously (disable-model-invocation is false), which is the platform default; combined with broad browser control capabilities, autonomous invocation increases the blast radius (the agent could autonomously navigate, click, and type). SKILL.md does recommend running in a sandboxed profile or container. There is no evidence the skill modifies other skills or system settings.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install gemini-computer-use
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /gemini-computer-use 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release - Gemini 2.5 Computer Use browser-control agents with Playwright
元数据
Slug gemini-computer-use
版本 1.0.0
许可证
累计安装 13
当前安装数 13
历史版本数 1
常见问题

Gemini Computer Use 是什么?

Build and run Gemini 2.5 Computer Use browser-control agents with Playwright. Use when a user wants to automate web browser tasks via the Gemini Computer Use model, needs an agent loop (screenshot → function_call → action → function_response), or asks to integrate safety confirmation for risky UI actions. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 3867 次。

如何安装 Gemini Computer Use?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install gemini-computer-use」即可一键安装,无需额外配置。

Gemini Computer Use 是免费的吗?

是的,Gemini Computer Use 完全免费(开源免费),可自由下载、安装和使用。

Gemini Computer Use 支持哪些平台?

Gemini Computer Use 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Gemini Computer Use?

由 am-will(@am-will)开发并维护,当前版本 v1.0.0。

💬 留言讨论