← 返回 Skills 市场
okaris

Agent Browser

作者 Ömer Karışman · GitHub ↗ · v0.1.5
cross-platform ⚠ suspicious
1571
总下载
2
收藏
2
当前安装
4
版本数
在 OpenClaw 中安装
/install agentic-browser
功能描述
Browser automation for AI agents via inference.sh. Navigate web pages, interact with elements using @e refs, take screenshots, record video. Capabilities: we...
使用说明 (SKILL.md)

Agentic Browser

Browser automation for AI agents via inference.sh. Uses Playwright under the hood with a simple @e ref system for element interaction.

Agentic Browser

Quick Start

# Install CLI
curl -fsSL https://cli.inference.sh | sh && infsh login

# Open a page and get interactive elements
infsh app run agent-browser --function open --input '{"url": "https://example.com"}' --session new

Install note: The install script only detects your OS/architecture, downloads the matching binary from dist.inference.sh, and verifies its SHA-256 checksum. No elevated permissions or background processes. Manual install & verification available.

Core Workflow

Every browser automation follows this pattern:

  1. Open - Navigate to URL, get @e refs for elements
  2. Interact - Use refs to click, fill, drag, etc.
  3. Re-snapshot - After navigation/changes, get fresh refs
  4. Close - End session (returns video if recording)
# 1. Start session
RESULT=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com/login"
}')
SESSION_ID=$(echo $RESULT | jq -r '.session_id')
# Elements: @e1 [input] "Email", @e2 [input] "Password", @e3 [button] "Sign In"

# 2. Fill and submit
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
  "action": "fill", "ref": "@e1", "text": "[email protected]"
}'
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
  "action": "fill", "ref": "@e2", "text": "password123"
}'
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
  "action": "click", "ref": "@e3"
}'

# 3. Re-snapshot after navigation
infsh app run agent-browser --function snapshot --session $SESSION_ID --input '{}'

# 4. Close when done
infsh app run agent-browser --function close --session $SESSION_ID --input '{}'

Functions

Function Description
open Navigate to URL, configure browser (viewport, proxy, video recording)
snapshot Re-fetch page state with @e refs after DOM changes
interact Perform actions using @e refs (click, fill, drag, upload, etc.)
screenshot Take page screenshot (viewport or full page)
execute Run JavaScript code on the page
close Close session, returns video if recording was enabled

Interact Actions

Action Description Required Fields
click Click element ref
dblclick Double-click element ref
fill Clear and type text ref, text
type Type text (no clear) text
press Press key (Enter, Tab, etc.) text
select Select dropdown option ref, text
hover Hover over element ref
check Check checkbox ref
uncheck Uncheck checkbox ref
drag Drag and drop ref, target_ref
upload Upload file(s) ref, file_paths
scroll Scroll page direction (up/down/left/right), scroll_amount
back Go back in history -
wait Wait milliseconds wait_ms
goto Navigate to URL url

Element Refs

Elements are returned with @e refs:

@e1 [a] "Home" href="/"
@e2 [input type="text"] placeholder="Search"
@e3 [button] "Submit"
@e4 [select] "Choose option"
@e5 [input type="checkbox"] name="agree"

Important: Refs are invalidated after navigation. Always re-snapshot after:

  • Clicking links/buttons that navigate
  • Form submissions
  • Dynamic content loading

Features

Video Recording

Record browser sessions for debugging or documentation:

# Start with recording enabled (optionally show cursor indicator)
SESSION=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com",
  "record_video": true,
  "show_cursor": true
}' | jq -r '.session_id')

# ... perform actions ...

# Close to get the video file
infsh app run agent-browser --function close --session $SESSION --input '{}'
# Returns: {"success": true, "video": \x3CFile>}

Cursor Indicator

Show a visible cursor in screenshots and video (useful for demos):

infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com",
  "show_cursor": true,
  "record_video": true
}'

The cursor appears as a red dot that follows mouse movements and shows click feedback.

Proxy Support

Route traffic through a proxy server:

infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com",
  "proxy_url": "http://proxy.example.com:8080",
  "proxy_username": "user",
  "proxy_password": "pass"
}'

File Upload

Upload files to file inputs:

infsh app run agent-browser --function interact --session $SESSION --input '{
  "action": "upload",
  "ref": "@e5",
  "file_paths": ["/path/to/file.pdf"]
}'

Drag and Drop

Drag elements to targets:

infsh app run agent-browser --function interact --session $SESSION --input '{
  "action": "drag",
  "ref": "@e1",
  "target_ref": "@e2"
}'

JavaScript Execution

Run custom JavaScript:

infsh app run agent-browser --function execute --session $SESSION --input '{
  "code": "document.querySelectorAll(\"h2\").length"
}'
# Returns: {"result": "5", "screenshot": \x3CFile>}

Deep-Dive Documentation

Reference Description
references/commands.md Full function reference with all options
references/snapshot-refs.md Ref lifecycle, invalidation rules, troubleshooting
references/session-management.md Session persistence, parallel sessions
references/authentication.md Login flows, OAuth, 2FA handling
references/video-recording.md Recording workflows for debugging
references/proxy-support.md Proxy configuration, geo-testing

Ready-to-Use Templates

Template Description
templates/form-automation.sh Form filling with validation
templates/authenticated-session.sh Login once, reuse session
templates/capture-workflow.sh Content extraction with screenshots

Examples

Form Submission

SESSION=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com/contact"
}' | jq -r '.session_id')

# Get elements: @e1 [input] "Name", @e2 [input] "Email", @e3 [textarea], @e4 [button] "Send"

infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "John Doe"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e2", "text": "[email protected]"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e3", "text": "Hello!"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "click", "ref": "@e4"}'

infsh app run agent-browser --function snapshot --session $SESSION --input '{}'
infsh app run agent-browser --function close --session $SESSION --input '{}'

Search and Extract

SESSION=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://google.com"
}' | jq -r '.session_id')

infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "weather today"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "press", "text": "Enter"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "wait", "wait_ms": 2000}'

infsh app run agent-browser --function snapshot --session $SESSION --input '{}'
infsh app run agent-browser --function close --session $SESSION --input '{}'

Screenshot with Video

SESSION=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com",
  "record_video": true
}' | jq -r '.session_id')

# Take full page screenshot
infsh app run agent-browser --function screenshot --session $SESSION --input '{
  "full_page": true
}'

# Close and get video
RESULT=$(infsh app run agent-browser --function close --session $SESSION --input '{}')
echo $RESULT | jq '.video'

Sessions

Browser state persists within a session. Always:

  1. Start with --session new on first call
  2. Use returned session_id for subsequent calls
  3. Close session when done

Related Skills

# Web search (for research + browse)
npx skills add inference-sh/skills@web-search

# LLM models (analyze extracted content)
npx skills add inference-sh/skills@llm-models

Documentation

安全使用建议
This skill appears to be a legitimate browser-automation wrapper, but take these precautions before installing or using it: - Understand remote execution: The instructions use the infsh CLI to run sessions on inference.sh — page content, cookies, screenshots, recorded video, and any files you upload will be sent to that service. Do not use it with accounts or pages that contain secrets you cannot share. - Avoid piping unknown installers into sh: The Quick Start recommends curl | sh from cli.inference.sh and downloads from dist.inference.sh. Manually review the installer, verify checksums from a trusted source, or prefer installing known, auditable clients. - Be careful with credentials and local files: Templates show passing APP_PASSWORD, TOTP secrets, proxy credentials, and absolute local file paths. Only provide secrets when you understand where they go and are comfortable they will be handled securely. - Recording/video: Enabling video will capture on-screen sensitive information. Don’t record sessions with credentials or PII unless you control the destination and storage. - Proxy & scraping guidance: The skill includes examples for rotating proxies and scraping; ensure you comply with site terms of service and legal/privacy requirements. - If you need more assurance: Ask the publisher for a homepage, source repo, and reproducible installer steps; prefer self-hosted Playwright or a local CLI you control if you must automate sensitive sites. If you decide to proceed, verify the infsh CLI's authenticity and read its privacy/hosting policy so you know how and where captured data is stored and for how long.
功能分析
Type: OpenClaw Skill Name: agentic-browser Version: 0.1.5 The 'agentic-browser' skill is classified as suspicious due to its broad `Bash(infsh *)` permissions, which allow the AI agent to execute arbitrary `infsh` commands. While designed for legitimate web automation, the skill's capabilities, such as running arbitrary JavaScript code (`execute` function in `SKILL.md`, `references/commands.md`), uploading local files (`upload` action in `SKILL.md`, `references/commands.md`), and routing traffic through arbitrary proxies (`proxy_url` in `SKILL.md`, `references/proxy-support.md`), present significant attack surfaces. Furthermore, the shell scripts (`templates/*.sh`) directly interpolate user-provided URLs into JSON inputs for `infsh`, creating a potential shell injection vulnerability if a malicious URL containing special shell characters is provided. There is no clear evidence of intentional malicious behavior, but the powerful and potentially exploitable capabilities warrant a 'suspicious' classification.
能力评估
Purpose & Capability
Name/description match the provided assets: the SKILL.md, command reference, and templates all implement a Playwright-style browser automation flow (open, snapshot, interact, screenshot, execute, close), proxies, file upload, video, and session management. The scripts and examples are consistent with a web-automation/scraping/browser-automation tool.
Instruction Scope
SKILL.md and the templates instruct callers to install and use the external infsh CLI and to run commands that will (by design) fetch page HTML/text, execute arbitrary JS, extract cookies, upload local files, and request session video. Those instructions do not restrict or warn strongly enough that page content, cookies, uploaded files, or recorded video will be transmitted to the inference.sh service. The templates show workflows that handle credentials, TOTP, and cookie extraction (including examples to put passwords into env vars and to extract cookies), which increases the chance of sensitive data being exposed to the remote service or being stored in its sessions.
Install Mechanism
There is no install spec in the skill bundle, but the Quick Start explicitly tells users/agents to run a remote installer: curl -fsSL https://cli.inference.sh | sh and to download binaries from dist.inference.sh. 'curl | sh' is a high-risk pattern because it executes a remote script. The domains used (cli.inference.sh, dist.inference.sh) are not standard well-known installer hosts like GitHub releases; while checksums are referenced, the installer pattern and remote binary download are notable risks and deserve manual verification before use.
Credentials
The registry metadata declares no required environment variables or credentials, which is accurate for the skill package itself. However the included templates and references routinely show using environment variables for APP_USERNAME, APP_PASSWORD, TOTP secrets, proxy usernames/passwords, and passing local file paths (for upload). Those examples imply the skill will accept and transmit sensitive secrets and local files to the remote inference.sh service if provided — this is proportionate to a remote browser automation service but users should be aware that sensitive env vars and local files may leave their machine.
Persistence & Privilege
The skill does not request 'always: true' and has no special platform privileges; autonomous invocation is allowed but that is the platform default. The real-world risk is that if the agent invokes this skill autonomously it could perform remote browser sessions and transmit data without the user noticing — combine autonomous invocation with the ability to capture cookies, page content, files, and video and the blast radius increases. There is no evidence the skill modifies other skills or system-level config.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install agentic-browser
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /agentic-browser 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.1.5
- Initial release of agent-browser v0.1.5. - Provides browser automation for AI agents via inference.sh, with Playwright backend. - Supports navigation, web interaction (click, fill, upload, drag/drop), screenshots, and video recording. - Element referencing via simple `@e` system; elements must be re-snapshotted after navigation. - Features include proxy support, JavaScript execution, cursor highlights, file upload, and session management.
v0.1.2
- Added a banner image to the top of the documentation for improved visual presentation. - No changes to functionality or CLI usage—documentation only update.
v0.1.1
- Renamed the CLI and API references from agentic-browser to agent-browser throughout documentation and examples. - Updated command examples and template calls to use the new agent-browser naming. - Adjusted references in function usage, quick start, and workflow sections for consistency. - Made corresponding reference and template documentation updates to reflect the new naming convention.
v0.1.0
Agentic Browser 0.1.0 – Initial Release - Introduces browser automation for AI agents through inference.sh, using Playwright with `@e` element ref system. - Supports a wide range of actions: navigation, clicking, typing, form filling, drag and drop, file upload, JavaScript execution, screenshot capture, and video recording. - Session-based workflow with functions for open, interact, snapshot, execute, screenshot, and close (with video download). - Features include element ref lifecycle management, proxy support, visible cursor indicator, and persistent session management. - Provides detailed documentation and ready-to-use shell script templates for common automation scenarios.
元数据
Slug agentic-browser
版本 0.1.5
许可证
累计安装 2
当前安装数 2
历史版本数 4
常见问题

Agent Browser 是什么?

Browser automation for AI agents via inference.sh. Navigate web pages, interact with elements using @e refs, take screenshots, record video. Capabilities: we... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 1571 次。

如何安装 Agent Browser?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install agentic-browser」即可一键安装,无需额外配置。

Agent Browser 是免费的吗?

是的,Agent Browser 完全免费(开源免费),可自由下载、安装和使用。

Agent Browser 支持哪些平台?

Agent Browser 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Agent Browser?

由 Ömer Karışman(@okaris)开发并维护,当前版本 v0.1.5。

💬 留言讨论