browseanything ai browser agent
/install browseanything
Browse Anything
This skill lets you delegate any web task to a real browser driven by an autonomous AI agent. You give a natural-language prompt; BrowseAnything opens Chromium, navigates, clicks, types, solves CAPTCHAs, and returns the result — including a screenshot.
When to use
Trigger this skill whenever the task requires the live web, e.g.:
- "Find the cheapest flight from X to Y next month"
- "Log into my Notion and pull the latest entries from this database"
- "Fill out this Google Form with the following answers"
- "Check whether \x3CSaaS app> is down right now"
- "Buy item Z if it's under $50"
- "Scrape the top 20 results for query Q from \x3Csite>"
- "Take a screenshot of \x3CURL> after clicking Accept"
Do not use it for tasks the model can answer from internal knowledge, or for tasks that have a dedicated MCP/API the user already configured (prefer the more specific tool when available).
One-time setup
-
The user must have a BrowseAnything API key (
ba_live_...). Direct them to \x3Chttps://platform.browseanything.io> → Settings → API Keys to create one. -
They export it once:
export BROWSEANYTHING_API_KEY=ba_live_... -
(Optional self-host) Set
BROWSEANYTHING_API_URL=https://your-hostto point at a self-hosted engine. Default is the hosted platform.
If BROWSEANYTHING_API_KEY is missing the scripts exit 2 with a clear
message — surface that to the user verbatim.
Default workflow (high-level)
For 95% of requests use the one-shot browse.py script. It creates a
task, polls until done, and prints the result.
python3 {baseDir}/scripts/browse.py "Find the cheapest direct flight from CDG to NRT in May, return airline + price + booking URL."
Useful flags:
--model \x3Cname>: override the LLM (e.g.gpt-5.2,kimi-k2.6)--max-steps \x3Cn>: cap agent steps (default 80)--proxy \x3Cregion>: e.g.us,eu--metadata '{"key":"value"}': attach JSON metadata--timeout \x3Cseconds>: max wait (default 900)--json: emit the full task object instead of a friendly summary
Exit codes:
| Code | Meaning |
|---|---|
| 0 | Task completed successfully |
| 1 | Task failed (read stderr / error_message) |
| 2 | Auth/usage problem (missing key, insufficient credits, bad input) |
| 3 | Network unreachable |
| 4 | Local timeout (task may still be running on server) |
| 5 | Task is paused waiting for human input — see below |
Low-level workflow (manual control)
Use these when you need to fire-and-forget, run many tasks in parallel,
fetch screenshots mid-execution, or react to requires_input.
ID=$(python3 {baseDir}/scripts/create_task.py "Prompt...")
python3 {baseDir}/scripts/get_task.py "$ID" --field status
python3 {baseDir}/scripts/get_task.py "$ID" # full JSON
python3 {baseDir}/scripts/get_screenshot.py "$ID" --out latest.png
python3 {baseDir}/scripts/list_tasks.py --limit 20
python3 {baseDir}/scripts/cancel_task.py "$ID"
python3 {baseDir}/scripts/status.py # backend capacity
Handling human-in-the-loop
If a task can't proceed without information only the user has (a 2FA
code, a clarification, a confirmation), it transitions to status
requires_input. The high-level browse.py exits with code 5 and
prints the question. To answer:
python3 {baseDir}/scripts/submit_input.py \x3Ctask_id> "the user's answer"
Then resume polling with get_task.py (or call browse.py flow again
on the same id by polling manually). Always ask the user before
inventing an answer for a requires_input prompt.
Authoring great prompts
The agent works best with prompts that are concrete and verifiable.
- ✅ "On amazon.fr, search 'Sony WH-1000XM5', open the cheapest new listing shipped from Amazon, return seller + price + ETA."
- ❌ "find me good headphones"
Tips:
- Name the website explicitly when you know it
- State the success criterion ("return X, Y, Z")
- Mention any login state ("I'm already logged in, my session is in the saved profile") — though credentials should never be passed in plain text; prefer pre-saved sessions in the BrowseAnything dashboard
- Cap scope: one task, one outcome
Cost & limits
- Tasks consume credits; tier-dependent step/concurrency caps apply
- Default per-task hard cap: 80 steps, 20 minutes
- Rate limit: 100 API requests/min/key
- Supported models include
gpt-5.2,gpt-5.4,kimi-k2.6,anthropic/claude-haiku-4.5,gemini-3-flash-preview,gpt-4.1,llama-4,openai/gpt-oss-120b, plus mini variants. The available set depends on your tier; unsupported values return a hard error rather than falling back. Copy the exact string from the API error message when retrying.
Pitfalls & troubleshooting
- Model names are exact strings. The API validates the
--modelvalue strictly (e.g.gpt-5.2works,gpt5.4without a hyphen does not). If you getInvalid model, retry with the exact name from the API error message. - Cancel only works on running tasks.
cancel_task.pyreturnsTask not found or cannot be cancelledfor tasks that have already failed or completed. Check status withget_task.py --field statusfirst. - Human-in-the-loop blocks billing. A task stuck on
requires_inputconsumes concurrency but not steps; answer promptly or cancel to free the slot. - Foreground timeouts may be clamped by the host environment. If the
terminal tool rejects a 900 s wait, run
browse.pyin the background (background=true,notify_on_complete=true) and poll withget_task.pyuntil it finishes. - Inspect
requires_inputmessages before replying. The agent sometimes embeds the completed answer inside its question (e.g. a table of flight results). If the task is effectively done, cancel it rather than submitting unnecessary input.
More
REFERENCE.md— full API surface, request/response shapes, status enumEXAMPLES.md— copy-paste prompt patterns for common scenariosREADME.md— install instructions for Claude Code, OpenClaw, Cursor, Codex, Gemini, Windsurfreferences/recurring-scraping-pipeline.md— architecture for daily automated scraping, deduplication, enrichment, and dashboard reporting (real estate, price monitoring, job boards, etc.)
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install browseanything - After installation, invoke the skill by name or use
/browseanything - Provide required inputs per the skill's parameter spec and get structured output
What is browseanything ai browser agent?
Drive a real Chromium browser with an autonomous AI agent to do anything on the web — book flights, scrape sites, fill forms, log into apps, extract data beh... It is an AI Agent Skill for Claude Code / OpenClaw, with 89 downloads so far.
How do I install browseanything ai browser agent?
Run "/install browseanything" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is browseanything ai browser agent free?
Yes, browseanything ai browser agent is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does browseanything ai browser agent support?
browseanything ai browser agent is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created browseanything ai browser agent?
It is built and maintained by MEHDI BAHRA (@mehdi149); the current version is v1.0.0.