功能描述

Browser automation for AI agents via inference.sh. Navigate web pages, interact with elements using @e refs, take screenshots, record video. Capabilities: we...

使用说明 (SKILL.md)

Agentic Browser

Name: Agent Browser
Author: okaris

Browser automation for AI agents via inference.sh. Uses Playwright under the hood with a simple @e ref system for element interaction.

Agentic Browser

Quick Start

# Install CLI
curl -fsSL https://cli.inference.sh | sh && infsh login

# Open a page and get interactive elements
infsh app run agent-browser --function open --input '{"url": "https://example.com"}' --session new

Install note: The install script only detects your OS/architecture, downloads the matching binary from dist.inference.sh, and verifies its SHA-256 checksum. No elevated permissions or background processes. Manual install & verification available.

Core Workflow

Every browser automation follows this pattern:

Open - Navigate to URL, get @e refs for elements
Interact - Use refs to click, fill, drag, etc.
Re-snapshot - After navigation/changes, get fresh refs
Close - End session (returns video if recording)

# 1. Start session
RESULT=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com/login"
}')
SESSION_ID=$(echo $RESULT | jq -r '.session_id')
# Elements: @e1 [input] "Email", @e2 [input] "Password", @e3 [button] "Sign In"

# 2. Fill and submit
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
  "action": "fill", "ref": "@e1", "text": "[email protected]"
}'
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
  "action": "fill", "ref": "@e2", "text": "password123"
}'
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
  "action": "click", "ref": "@e3"
}'

# 3. Re-snapshot after navigation
infsh app run agent-browser --function snapshot --session $SESSION_ID --input '{}'

# 4. Close when done
infsh app run agent-browser --function close --session $SESSION_ID --input '{}'

Functions

Function	Description
`open`	Navigate to URL, configure browser (viewport, proxy, video recording)
`snapshot`	Re-fetch page state with `@e` refs after DOM changes
`interact`	Perform actions using `@e` refs (click, fill, drag, upload, etc.)
`screenshot`	Take page screenshot (viewport or full page)
`execute`	Run JavaScript code on the page
`close`	Close session, returns video if recording was enabled

Interact Actions

Action	Description	Required Fields
`click`	Click element	`ref`
`dblclick`	Double-click element	`ref`
`fill`	Clear and type text	`ref`, `text`
`type`	Type text (no clear)	`text`
`press`	Press key (Enter, Tab, etc.)	`text`
`select`	Select dropdown option	`ref`, `text`
`hover`	Hover over element	`ref`
`check`	Check checkbox	`ref`
`uncheck`	Uncheck checkbox	`ref`
`drag`	Drag and drop	`ref`, `target_ref`
`upload`	Upload file(s)	`ref`, `file_paths`
`scroll`	Scroll page	`direction` (up/down/left/right), `scroll_amount`
`back`	Go back in history	-
`wait`	Wait milliseconds	`wait_ms`
`goto`	Navigate to URL	`url`

Element Refs

Elements are returned with @e refs:

@e1 [a] "Home" href="/"
@e2 [input type="text"] placeholder="Search"
@e3 [button] "Submit"
@e4 [select] "Choose option"
@e5 [input type="checkbox"] name="agree"

Important: Refs are invalidated after navigation. Always re-snapshot after:

Clicking links/buttons that navigate
Form submissions
Dynamic content loading

Features

Video Recording

Record browser sessions for debugging or documentation:

# Start with recording enabled (optionally show cursor indicator)
SESSION=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com",
  "record_video": true,
  "show_cursor": true
}' | jq -r '.session_id')

# ... perform actions ...

# Close to get the video file
infsh app run agent-browser --function close --session $SESSION --input '{}'
# Returns: {"success": true, "video": \x3CFile>}

Cursor Indicator

Show a visible cursor in screenshots and video (useful for demos):

infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com",
  "show_cursor": true,
  "record_video": true
}'

The cursor appears as a red dot that follows mouse movements and shows click feedback.

Proxy Support

Route traffic through a proxy server:

infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com",
  "proxy_url": "http://proxy.example.com:8080",
  "proxy_username": "user",
  "proxy_password": "pass"
}'

File Upload

Upload files to file inputs:

infsh app run agent-browser --function interact --session $SESSION --input '{
  "action": "upload",
  "ref": "@e5",
  "file_paths": ["/path/to/file.pdf"]
}'

Drag and Drop

Drag elements to targets:

infsh app run agent-browser --function interact --session $SESSION --input '{
  "action": "drag",
  "ref": "@e1",
  "target_ref": "@e2"
}'

JavaScript Execution

Run custom JavaScript:

infsh app run agent-browser --function execute --session $SESSION --input '{
  "code": "document.querySelectorAll(\"h2\").length"
}'
# Returns: {"result": "5", "screenshot": \x3CFile>}

Deep-Dive Documentation

Reference	Description
references/commands.md	Full function reference with all options
references/snapshot-refs.md	Ref lifecycle, invalidation rules, troubleshooting
references/session-management.md	Session persistence, parallel sessions
references/authentication.md	Login flows, OAuth, 2FA handling
references/video-recording.md	Recording workflows for debugging
references/proxy-support.md	Proxy configuration, geo-testing

Ready-to-Use Templates

Template	Description
templates/form-automation.sh	Form filling with validation
templates/authenticated-session.sh	Login once, reuse session
templates/capture-workflow.sh	Content extraction with screenshots

Examples

Form Submission

SESSION=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com/contact"
}' | jq -r '.session_id')

# Get elements: @e1 [input] "Name", @e2 [input] "Email", @e3 [textarea], @e4 [button] "Send"

infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "John Doe"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e2", "text": "[email protected]"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e3", "text": "Hello!"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "click", "ref": "@e4"}'

infsh app run agent-browser --function snapshot --session $SESSION --input '{}'
infsh app run agent-browser --function close --session $SESSION --input '{}'

Search and Extract

SESSION=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://google.com"
}' | jq -r '.session_id')

infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "weather today"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "press", "text": "Enter"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "wait", "wait_ms": 2000}'

infsh app run agent-browser --function snapshot --session $SESSION --input '{}'
infsh app run agent-browser --function close --session $SESSION --input '{}'

Screenshot with Video

SESSION=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com",
  "record_video": true
}' | jq -r '.session_id')

# Take full page screenshot
infsh app run agent-browser --function screenshot --session $SESSION --input '{
  "full_page": true
}'

# Close and get video
RESULT=$(infsh app run agent-browser --function close --session $SESSION --input '{}')
echo $RESULT | jq '.video'

Sessions

Browser state persists within a session. Always:

Start with --session new on first call
Use returned session_id for subsequent calls
Close session when done

Related Skills

# Web search (for research + browse)
npx skills add inference-sh/skills@web-search

# LLM models (analyze extracted content)
npx skills add inference-sh/skills@llm-models

Documentation

inference.sh Sessions - Session management
Multi-function Apps - How functions work

安全使用建议

This skill appears to be a legitimate browser-automation wrapper, but take these precautions before installing or using it: - Understand remote execution: The instructions use the infsh CLI to run sessions on inference.sh — page content, cookies, screenshots, recorded video, and any files you upload will be sent to that service. Do not use it with accounts or pages that contain secrets you cannot share. - Avoid piping unknown installers into sh: The Quick Start recommends curl | sh from cli.inference.sh and downloads from dist.inference.sh. Manually review the installer, verify checksums from a trusted source, or prefer installing known, auditable clients. - Be careful with credentials and local files: Templates show passing APP_PASSWORD, TOTP secrets, proxy credentials, and absolute local file paths. Only provide secrets when you understand where they go and are comfortable they will be handled securely. - Recording/video: Enabling video will capture on-screen sensitive information. Don’t record sessions with credentials or PII unless you control the destination and storage. - Proxy & scraping guidance: The skill includes examples for rotating proxies and scraping; ensure you comply with site terms of service and legal/privacy requirements. - If you need more assurance: Ask the publisher for a homepage, source repo, and reproducible installer steps; prefer self-hosted Playwright or a local CLI you control if you must automate sensitive sites. If you decide to proceed, verify the infsh CLI's authenticity and read its privacy/hosting policy so you know how and where captured data is stored and for how long.

功能分析

Type: OpenClaw Skill Name: agentic-browser Version: 0.1.5 The 'agentic-browser' skill is classified as suspicious due to its broad `Bash(infsh *)` permissions, which allow the AI agent to execute arbitrary `infsh` commands. While designed for legitimate web automation, the skill's capabilities, such as running arbitrary JavaScript code (`execute` function in `SKILL.md`, `references/commands.md`), uploading local files (`upload` action in `SKILL.md`, `references/commands.md`), and routing traffic through arbitrary proxies (`proxy_url` in `SKILL.md`, `references/proxy-support.md`), present significant attack surfaces. Furthermore, the shell scripts (`templates/*.sh`) directly interpolate user-provided URLs into JSON inputs for `infsh`, creating a potential shell injection vulnerability if a malicious URL containing special shell characters is provided. There is no clear evidence of intentional malicious behavior, but the powerful and potentially exploitable capabilities warrant a 'suspicious' classification.

能力评估

✓ Purpose & Capability

Name/description match the provided assets: the SKILL.md, command reference, and templates all implement a Playwright-style browser automation flow (open, snapshot, interact, screenshot, execute, close), proxies, file upload, video, and session management. The scripts and examples are consistent with a web-automation/scraping/browser-automation tool.

⚠ Instruction Scope

SKILL.md and the templates instruct callers to install and use the external infsh CLI and to run commands that will (by design) fetch page HTML/text, execute arbitrary JS, extract cookies, upload local files, and request session video. Those instructions do not restrict or warn strongly enough that page content, cookies, uploaded files, or recorded video will be transmitted to the inference.sh service. The templates show workflows that handle credentials, TOTP, and cookie extraction (including examples to put passwords into env vars and to extract cookies), which increases the chance of sensitive data being exposed to the remote service or being stored in its sessions.

⚠ Install Mechanism

There is no install spec in the skill bundle, but the Quick Start explicitly tells users/agents to run a remote installer: curl -fsSL https://cli.inference.sh | sh and to download binaries from dist.inference.sh. 'curl | sh' is a high-risk pattern because it executes a remote script. The domains used (cli.inference.sh, dist.inference.sh) are not standard well-known installer hosts like GitHub releases; while checksums are referenced, the installer pattern and remote binary download are notable risks and deserve manual verification before use.

ℹ Credentials

The registry metadata declares no required environment variables or credentials, which is accurate for the skill package itself. However the included templates and references routinely show using environment variables for APP_USERNAME, APP_PASSWORD, TOTP secrets, proxy usernames/passwords, and passing local file paths (for upload). Those examples imply the skill will accept and transmit sensitive secrets and local files to the remote inference.sh service if provided — this is proportionate to a remote browser automation service but users should be aware that sensitive env vars and local files may leave their machine.

ℹ Persistence & Privilege

The skill does not request 'always: true' and has no special platform privileges; autonomous invocation is allowed but that is the platform default. The real-world risk is that if the agent invokes this skill autonomously it could perform remote browser sessions and transmit data without the user noticing — combine autonomous invocation with the ability to capture cookies, page content, files, and video and the blast radius increases. There is no evidence the skill modifies other skills or system-level config.

版本历史

v0.1.5

- Initial release of agent-browser v0.1.5. - Provides browser automation for AI agents via inference.sh, with Playwright backend. - Supports navigation, web interaction (click, fill, upload, drag/drop), screenshots, and video recording. - Element referencing via simple `@e` system; elements must be re-snapshotted after navigation. - Features include proxy support, JavaScript execution, cursor highlights, file upload, and session management.

v0.1.2

- Added a banner image to the top of the documentation for improved visual presentation. - No changes to functionality or CLI usage—documentation only update.

v0.1.1

- Renamed the CLI and API references from agentic-browser to agent-browser throughout documentation and examples. - Updated command examples and template calls to use the new agent-browser naming. - Adjusted references in function usage, quick start, and workflow sections for consistency. - Made corresponding reference and template documentation updates to reflect the new naming convention.

v0.1.0

Agentic Browser 0.1.0 – Initial Release - Introduces browser automation for AI agents through inference.sh, using Playwright with `@e` element ref system. - Supports a wide range of actions: navigation, clicking, typing, form filling, drag and drop, file upload, JavaScript execution, screenshot capture, and video recording. - Session-based workflow with functions for open, interact, snapshot, execute, screenshot, and close (with video download). - Features include element ref lifecycle management, proxy support, visible cursor indicator, and persistent session management. - Provides detailed documentation and ready-to-use shell script templates for common automation scenarios.

元数据

Slug agentic-browser

版本 0.1.5

许可证 —

累计安装 2

当前安装数 2

历史版本数 4

常见问题

Agent Browser 是什么？

Browser automation for AI agents via inference.sh. Navigate web pages, interact with elements using @e refs, take screenshots, record video. Capabilities: we... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 1571 次。

如何安装 Agent Browser？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install agentic-browser」即可一键安装，无需额外配置。

Agent Browser 是免费的吗？

是的，Agent Browser 完全免费（开源免费），可自由下载、安装和使用。

Agent Browser 支持哪些平台？

Agent Browser 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Agent Browser？

由 Ömer Karışman（@okaris）开发并维护，当前版本 v0.1.5。

Agent Browser

Agentic Browser

Quick Start

Core Workflow

Functions

Interact Actions

Element Refs

Features

Video Recording

Cursor Indicator

Proxy Support

File Upload

Drag and Drop

JavaScript Execution

Deep-Dive Documentation

Ready-to-Use Templates

Examples

Form Submission

Search and Extract

Screenshot with Video

Sessions

Related Skills

Documentation

Agent Browser 是什么？

如何安装 Agent Browser？

Agent Browser 是免费的吗？

Agent Browser 支持哪些平台？

谁开发了 Agent Browser？

💬 留言讨论