← 返回 Skills 市场
mehdi149

browseanything ai browser agent

作者 MEHDI BAHRA · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ 安全检测通过
89
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install browseanything
功能描述
Drive a real Chromium browser with an autonomous AI agent to do anything on the web — book flights, scrape sites, fill forms, log into apps, extract data beh...
使用说明 (SKILL.md)

Browse Anything

This skill lets you delegate any web task to a real browser driven by an autonomous AI agent. You give a natural-language prompt; BrowseAnything opens Chromium, navigates, clicks, types, solves CAPTCHAs, and returns the result — including a screenshot.

When to use

Trigger this skill whenever the task requires the live web, e.g.:

  • "Find the cheapest flight from X to Y next month"
  • "Log into my Notion and pull the latest entries from this database"
  • "Fill out this Google Form with the following answers"
  • "Check whether \x3CSaaS app> is down right now"
  • "Buy item Z if it's under $50"
  • "Scrape the top 20 results for query Q from \x3Csite>"
  • "Take a screenshot of \x3CURL> after clicking Accept"

Do not use it for tasks the model can answer from internal knowledge, or for tasks that have a dedicated MCP/API the user already configured (prefer the more specific tool when available).

One-time setup

  1. The user must have a BrowseAnything API key (ba_live_...). Direct them to \x3Chttps://platform.browseanything.io> → Settings → API Keys to create one.

  2. They export it once:

    export BROWSEANYTHING_API_KEY=ba_live_...
    
  3. (Optional self-host) Set BROWSEANYTHING_API_URL=https://your-host to point at a self-hosted engine. Default is the hosted platform.

If BROWSEANYTHING_API_KEY is missing the scripts exit 2 with a clear message — surface that to the user verbatim.

Default workflow (high-level)

For 95% of requests use the one-shot browse.py script. It creates a task, polls until done, and prints the result.

python3 {baseDir}/scripts/browse.py "Find the cheapest direct flight from CDG to NRT in May, return airline + price + booking URL."

Useful flags:

  • --model \x3Cname>: override the LLM (e.g. gpt-5.2, kimi-k2.6)
  • --max-steps \x3Cn>: cap agent steps (default 80)
  • --proxy \x3Cregion>: e.g. us, eu
  • --metadata '{"key":"value"}': attach JSON metadata
  • --timeout \x3Cseconds>: max wait (default 900)
  • --json: emit the full task object instead of a friendly summary

Exit codes:

Code Meaning
0 Task completed successfully
1 Task failed (read stderr / error_message)
2 Auth/usage problem (missing key, insufficient credits, bad input)
3 Network unreachable
4 Local timeout (task may still be running on server)
5 Task is paused waiting for human input — see below

Low-level workflow (manual control)

Use these when you need to fire-and-forget, run many tasks in parallel, fetch screenshots mid-execution, or react to requires_input.

ID=$(python3 {baseDir}/scripts/create_task.py "Prompt...")
python3 {baseDir}/scripts/get_task.py "$ID" --field status
python3 {baseDir}/scripts/get_task.py "$ID"                # full JSON
python3 {baseDir}/scripts/get_screenshot.py "$ID" --out latest.png
python3 {baseDir}/scripts/list_tasks.py --limit 20
python3 {baseDir}/scripts/cancel_task.py "$ID"
python3 {baseDir}/scripts/status.py                        # backend capacity

Handling human-in-the-loop

If a task can't proceed without information only the user has (a 2FA code, a clarification, a confirmation), it transitions to status requires_input. The high-level browse.py exits with code 5 and prints the question. To answer:

python3 {baseDir}/scripts/submit_input.py \x3Ctask_id> "the user's answer"

Then resume polling with get_task.py (or call browse.py flow again on the same id by polling manually). Always ask the user before inventing an answer for a requires_input prompt.

Authoring great prompts

The agent works best with prompts that are concrete and verifiable.

  • ✅ "On amazon.fr, search 'Sony WH-1000XM5', open the cheapest new listing shipped from Amazon, return seller + price + ETA."
  • ❌ "find me good headphones"

Tips:

  • Name the website explicitly when you know it
  • State the success criterion ("return X, Y, Z")
  • Mention any login state ("I'm already logged in, my session is in the saved profile") — though credentials should never be passed in plain text; prefer pre-saved sessions in the BrowseAnything dashboard
  • Cap scope: one task, one outcome

Cost & limits

  • Tasks consume credits; tier-dependent step/concurrency caps apply
  • Default per-task hard cap: 80 steps, 20 minutes
  • Rate limit: 100 API requests/min/key
  • Supported models include gpt-5.2, gpt-5.4, kimi-k2.6, anthropic/claude-haiku-4.5, gemini-3-flash-preview, gpt-4.1, llama-4, openai/gpt-oss-120b, plus mini variants. The available set depends on your tier; unsupported values return a hard error rather than falling back. Copy the exact string from the API error message when retrying.

Pitfalls & troubleshooting

  • Model names are exact strings. The API validates the --model value strictly (e.g. gpt-5.2 works, gpt5.4 without a hyphen does not). If you get Invalid model, retry with the exact name from the API error message.
  • Cancel only works on running tasks. cancel_task.py returns Task not found or cannot be cancelled for tasks that have already failed or completed. Check status with get_task.py --field status first.
  • Human-in-the-loop blocks billing. A task stuck on requires_input consumes concurrency but not steps; answer promptly or cancel to free the slot.
  • Foreground timeouts may be clamped by the host environment. If the terminal tool rejects a 900 s wait, run browse.py in the background (background=true, notify_on_complete=true) and poll with get_task.py until it finishes.
  • Inspect requires_input messages before replying. The agent sometimes embeds the completed answer inside its question (e.g. a table of flight results). If the task is effectively done, cancel it rather than submitting unnecessary input.

More

  • REFERENCE.md — full API surface, request/response shapes, status enum
  • EXAMPLES.md — copy-paste prompt patterns for common scenarios
  • README.md — install instructions for Claude Code, OpenClaw, Cursor, Codex, Gemini, Windsurf
  • references/recurring-scraping-pipeline.md — architecture for daily automated scraping, deduplication, enrichment, and dashboard reporting (real estate, price monitoring, job boards, etc.)
安全使用建议
Treat this as an incomplete review and retry when the workspace artifacts can be read before installing or approving the skill.
能力标签
requires-sensitive-credentials
能力评估
Purpose & Capability
Workspace inspection failed before SKILL.md or metadata content could be read; no purpose/capability mismatch is evidenced.
Instruction Scope
Instruction scope could not be evaluated from artifact text; no unsupported risky instruction is evidenced.
Install Mechanism
Install artifacts could not be inspected; no unsafe install behavior is evidenced.
Credentials
Environment access could not be evaluated from artifact text; no overbroad access is evidenced.
Persistence & Privilege
Persistence and privilege behavior could not be evaluated from artifact text; no persistence or privilege abuse is evidenced.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install browseanything
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /browseanything 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
- Initial release of the "browse-anything" skill. - Allows users to delegate any web task to a real Chromium browser controlled by an autonomous agent. - Supports booking, scraping, form filling, live lookups, login flows, data extraction behind authentication, and automated checkouts. - Requires a BrowseAnything API key for operation; guided setup instructions provided. - Offers both high-level (single-command) and low-level (manual control) workflows, including human-in-the-loop handling. - Provides clear troubleshooting, usage tips, and guidance on prompt authoring for optimal results.
元数据
Slug browseanything
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

browseanything ai browser agent 是什么?

Drive a real Chromium browser with an autonomous AI agent to do anything on the web — book flights, scrape sites, fill forms, log into apps, extract data beh... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 89 次。

如何安装 browseanything ai browser agent?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install browseanything」即可一键安装,无需额外配置。

browseanything ai browser agent 是免费的吗?

是的,browseanything ai browser agent 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

browseanything ai browser agent 支持哪些平台?

browseanything ai browser agent 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 browseanything ai browser agent?

由 MEHDI BAHRA(@mehdi149)开发并维护,当前版本 v1.0.0。

💬 留言讨论