AgentBrowse

Name: AgentBrowse
Author: xor777

Description

Browser automation workflows through the agentbrowse CLI for launch, attach, observe, act, extract, navigation, and screenshots.

README (SKILL.md)

AgentBrowse is the browser layer for agent tasks that happen on a real website.

Use this skill when the agent needs to:

launch a browser or attach to an existing one;
inspect the current page and decide from visible state;
click, type, select, and otherwise act on returned target refs;
navigate directly to a known URL;
extract structured data from the page;
capture screenshots or recover a stuck browser session.

AgentBrowse works well on its own for browser automation. It can also be paired with MagicPay later when a broader flow reaches an approved login, identity, or payment step.

Open source:

Browser library and docs: https://github.com/MercuryoAI/agentbrowse
CLI package: @mercuryo-ai/agentbrowse-cli

Setup

agentbrowse must be available on PATH. If it is missing or outdated, run npm i -g @mercuryo-ai/agentbrowse-cli@latest, then verify with agentbrowse --version.
agentbrowse launch needs an environment that can start a browser. agentbrowse attach \x3Ccdp-url> needs a reachable CDP endpoint.
Core browser commands such as launch, attach, navigate, act, browser-status, screenshot, and close do not need any API key.
AI-assisted features — observe with a natural-language goal and extract — call an LLM through the gateway. Configure API access with agentbrowse init \x3CapiKey> before using them. Pass a non-default API URL during init if needed.
agentbrowse doctor inspects the local config. Use it after init when AI-assisted observe or extract still fails.

Core Loop

Start or connect to a browser with agentbrowse launch [url] or agentbrowse attach \x3Ccdp-url>.
Read the page with agentbrowse observe.
Act on the returned refs with agentbrowse act \x3CtargetRef> \x3Caction> [value].
Re-run agentbrowse observe after navigation or meaningful UI changes.
Use agentbrowse navigate \x3Curl> when the destination is already known.
Use agentbrowse extract '\x3Cschema-json>' [scopeRef] when you need structured output instead of another page action.
Use agentbrowse screenshot or agentbrowse browser-status only for evidence and debugging.
Finish with agentbrowse close when the browser session is no longer needed.

When To Bring In Another Tool

Bring in a companion protected-flow tool when the site reaches:

a login step that needs approved protected values;
an identity form with protected personal data;
a payment step with protected card details or approval flow.

At that point AgentBrowse can stay the browsing layer around the protected step, but it should not invent its own secret-handling flow.

Ask-User Boundary

Ask the user only when:

the correct next step is still ambiguous after re-observing the page;
the environment cannot launch or attach to a browser;
the task crosses into a protected approval or payment boundary.

Operating Rules

Trust the visible page state, not assumptions about what should have happened.
Re-observe after meaningful page changes instead of reusing stale refs.
Keep browser work and protected-step handling separated.
close is only teardown or recovery. Never treat close as a success signal — task success comes from the visible page state before close.

More Detail

Open an extra reference only when it helps:

Operating guide for resume and recovery.
Command guide for every CLI command.
Failure recovery for common runtime states.
Boundaries and escalation for safety rules.

If a term (session, ref, targetRef, scopeRef, fillRef, pageRef) is unfamiliar, check the AgentBrowse API reference glossary.

Usage Guidance

This skill appears to do what it says: automate and interact with a real browser using the agentbrowse CLI. Before installing or using it, verify the npm package owner and version on the npm/GitHub pages, and inspect the CLI source if you can. Do not enable or use AI-assisted features (observe with free-text goals or extract) on pages that contain credentials, payment details, or other sensitive personal data — those features will send visible page content to whatever LLM gateway you configure. If your flow reaches login/identity/payment, follow the skill's guidance to switch to a protected flow tool that is designed to handle secrets. Finally, be aware the CLI stores the API key in local config (use a dedicated key with least privilege and inspect/secure the config file).

Capability Analysis

Type: OpenClaw Skill Name: agentbrowse Version: 0.1.22 The agentbrowse skill provides a structured interface for browser automation using the @mercuryo-ai/agentbrowse-cli tool. The documentation (SKILL.md and references/) includes clear operating rules and safety guardrails, specifically instructing the agent to avoid handling sensitive data like credentials or payments directly and instead defer to protected-flow tools or user intervention. No indicators of malicious intent, such as data exfiltration or unauthorized persistence, were found in the code or instructions.

Capability Tags

cryptocan-make-purchasesrequires-sensitive-credentials

Capability Assessment

✓ Purpose & Capability

Name/description (browser automation) align with requirements: the skill only requires the agentbrowse CLI and installs via an npm package that provides the 'agentbrowse' binary. There are no unrelated credentials, binaries, or config paths requested.

ℹ Instruction Scope

SKILL.md instructs the agent to run agentbrowse CLI commands (launch, attach, observe, act, extract, screenshot, close). This stays within the browsing domain. Important caveat: AI-assisted commands (observe with natural-language goal and extract) call an LLM gateway and will send page content to that gateway; the CLI stores an API key locally via 'agentbrowse init <apiKey>'. The skill explicitly warns to switch to a protected flow for logins/payments, but the runtime behavior can transmit visible page data to an external LLM if used — review before running on pages with sensitive content.

✓ Install Mechanism

Install is a single npm package (@mercuryo-ai/agentbrowse-cli) that provides the 'agentbrowse' binary. This is a typical mechanism for CLIs and not an arbitrary download; risk is moderate (npm registry), so verify package provenance and version before installing.

✓ Credentials

The skill does not declare required environment variables or credentials. AI-assisted features require an API key provided at runtime via 'agentbrowse init' (stored in local config) — this is proportional to the advertised LLM features and is optional for core browsing commands.

✓ Persistence & Privilege

always is false and the skill does not request elevated/persistent platform privileges or to modify other skills. Default autonomous invocation is allowed (normal). The skill may store an API key in its own local config via the CLI, which is expected behavior for LLM integration.

Version History

v0.1.22

Release agentbrowse-v0.1.22

v0.1.21

Release agentbrowse-v0.1.21

v0.1.20

Release agentbrowse-v0.1.20

v0.1.19

Release agentbrowse-v0.1.19

v0.1.18

Release agentbrowse-v0.1.18

v0.1.17

Release agentbrowse-v0.1.17

v0.1.16

Release agentbrowse-v0.1.16

v0.1.15

Release agentbrowse-v0.1.15

v0.1.14

Release agentbrowse-v0.1.14

v0.1.13

Release agentbrowse-v0.1.13

v0.1.12

Release agentbrowse-v0.1.12

v0.1.11

Release agentbrowse-v0.1.11

v0.1.10

Release agentbrowse-v0.1.10

v0.1.9

Release agentbrowse-v0.1.9

v0.1.8

Release agentbrowse-v0.1.8

v0.1.7

Release agentbrowse-v0.1.7

v0.1.6

Release agentbrowse-v0.1.6

v0.1.5

Release agentbrowse-v0.1.5

v0.1.4

Release agentbrowse-v0.1.4

v0.1.3

Release agentbrowse-v0.1.3

Metadata

Slug agentbrowse

Version 0.1.22

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 22

Frequently Asked Questions

What is AgentBrowse?

Browser automation workflows through the agentbrowse CLI for launch, attach, observe, act, extract, navigation, and screenshots. It is an AI Agent Skill for Claude Code / OpenClaw, with 248 downloads so far.

How do I install AgentBrowse?

Run "/install agentbrowse" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is AgentBrowse free?

Yes, AgentBrowse is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does AgentBrowse support?

AgentBrowse is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created AgentBrowse?

It is built and maintained by Dmitry Ukhanov (@xor777); the current version is v0.1.22.

More Skills