功能描述

Browser automation for AI agents via inference.sh. Navigate web pages, interact with elements using @e refs, take screenshots, record video. Capabilities: web scraping, form filling, clicking, typing, drag-drop, file upload, JavaScript execution. Use for: web automation, data extraction, testing, agent browsing, research. Triggers: browser, web automation, scrape, navigate, click, fill form, screenshot, browse web, playwright, headless browser, web agent, surf internet, record video

使用说明 (SKILL.md)

Agentic Browser

Name: Agentic Browser 0.1.2
Author: xyny89

Agentic Browser

Browser automation for AI agents via inference.sh. Uses Playwright under the hood with a simple @e ref system for element interaction.

Quick Start

# Install CLI
curl -fsSL https://cli.inference.sh | sh && infsh login

# Open a page and get interactive elements
infsh app run agent-browser --function open --input '{"url": "https://example.com"}' --session new

Core Workflow

Every browser automation follows this pattern:

Open - Navigate to URL, get @e refs for elements
Interact - Use refs to click, fill, drag, etc.
Re-snapshot - After navigation/changes, get fresh refs
Close - End session (returns video if recording)

# 1. Start session
RESULT=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com/login"
}')
SESSION_ID=$(echo $RESULT | jq -r '.session_id')
# Elements: @e1 [input] "Email", @e2 [input] "Password", @e3 [button] "Sign In"

# 2. Fill and submit
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
  "action": "fill", "ref": "@e1", "text": "[email protected]"
}'
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
  "action": "fill", "ref": "@e2", "text": "password123"
}'
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
  "action": "click", "ref": "@e3"
}'

# 3. Re-snapshot after navigation
infsh app run agent-browser --function snapshot --session $SESSION_ID --input '{}'

# 4. Close when done
infsh app run agent-browser --function close --session $SESSION_ID --input '{}'

Functions

Function	Description
`open`	Navigate to URL, configure browser (viewport, proxy, video recording)
`snapshot`	Re-fetch page state with `@e` refs after DOM changes
`interact`	Perform actions using `@e` refs (click, fill, drag, upload, etc.)
`screenshot`	Take page screenshot (viewport or full page)
`execute`	Run JavaScript code on the page
`close`	Close session, returns video if recording was enabled

Interact Actions

Action	Description	Required Fields
`click`	Click element	`ref`
`dblclick`	Double-click element	`ref`
`fill`	Clear and type text	`ref`, `text`
`type`	Type text (no clear)	`text`
`press`	Press key (Enter, Tab, etc.)	`text`
`select`	Select dropdown option	`ref`, `text`
`hover`	Hover over element	`ref`
`check`	Check checkbox	`ref`
`uncheck`	Uncheck checkbox	`ref`
`drag`	Drag and drop	`ref`, `target_ref`
`upload`	Upload file(s)	`ref`, `file_paths`
`scroll`	Scroll page	`direction` (up/down/left/right), `scroll_amount`
`back`	Go back in history	-
`wait`	Wait milliseconds	`wait_ms`
`goto`	Navigate to URL	`url`

Element Refs

Elements are returned with @e refs:

@e1 [a] "Home" href="/"
@e2 [input type="text"] placeholder="Search"
@e3 [button] "Submit"
@e4 [select] "Choose option"
@e5 [input type="checkbox"] name="agree"

Important: Refs are invalidated after navigation. Always re-snapshot after:

Clicking links/buttons that navigate
Form submissions
Dynamic content loading

Features

Video Recording

Record browser sessions for debugging or documentation:

# Start with recording enabled (optionally show cursor indicator)
SESSION=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com",
  "record_video": true,
  "show_cursor": true
}' | jq -r '.session_id')

# ... perform actions ...

# Close to get the video file
infsh app run agent-browser --function close --session $SESSION --input '{}'
# Returns: {"success": true, "video": \x3CFile>}

Cursor Indicator

Show a visible cursor in screenshots and video (useful for demos):

infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com",
  "show_cursor": true,
  "record_video": true
}'

The cursor appears as a red dot that follows mouse movements and shows click feedback.

Proxy Support

Route traffic through a proxy server:

infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com",
  "proxy_url": "http://proxy.example.com:8080",
  "proxy_username": "user",
  "proxy_password": "pass"
}'

File Upload

Upload files to file inputs:

infsh app run agent-browser --function interact --session $SESSION --input '{
  "action": "upload",
  "ref": "@e5",
  "file_paths": ["/path/to/file.pdf"]
}'

Drag and Drop

Drag elements to targets:

infsh app run agent-browser --function interact --session $SESSION --input '{
  "action": "drag",
  "ref": "@e1",
  "target_ref": "@e2"
}'

JavaScript Execution

Run custom JavaScript:

infsh app run agent-browser --function execute --session $SESSION --input '{
  "code": "document.querySelectorAll(\"h2\").length"
}'
# Returns: {"result": "5", "screenshot": \x3CFile>}

Deep-Dive Documentation

Reference	Description
references/commands.md	Full function reference with all options
references/snapshot-refs.md	Ref lifecycle, invalidation rules, troubleshooting
references/session-management.md	Session persistence, parallel sessions
references/authentication.md	Login flows, OAuth, 2FA handling
references/video-recording.md	Recording workflows for debugging
references/proxy-support.md	Proxy configuration, geo-testing

Ready-to-Use Templates

Template	Description
templates/form-automation.sh	Form filling with validation
templates/authenticated-session.sh	Login once, reuse session
templates/capture-workflow.sh	Content extraction with screenshots

Examples

Form Submission

SESSION=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com/contact"
}' | jq -r '.session_id')

# Get elements: @e1 [input] "Name", @e2 [input] "Email", @e3 [textarea], @e4 [button] "Send"

infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "John Doe"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e2", "text": "[email protected]"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e3", "text": "Hello!"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "click", "ref": "@e4"}'

infsh app run agent-browser --function snapshot --session $SESSION --input '{}'
infsh app run agent-browser --function close --session $SESSION --input '{}'

Search and Extract

SESSION=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://google.com"
}' | jq -r '.session_id')

infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "weather today"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "press", "text": "Enter"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "wait", "wait_ms": 2000}'

infsh app run agent-browser --function snapshot --session $SESSION --input '{}'
infsh app run agent-browser --function close --session $SESSION --input '{}'

Screenshot with Video

SESSION=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com",
  "record_video": true
}' | jq -r '.session_id')

# Take full page screenshot
infsh app run agent-browser --function screenshot --session $SESSION --input '{
  "full_page": true
}'

# Close and get video
RESULT=$(infsh app run agent-browser --function close --session $SESSION --input '{}')
echo $RESULT | jq '.video'

Sessions

Browser state persists within a session. Always:

Start with --session new on first call
Use returned session_id for subsequent calls
Close session when done

Related Skills

# Web search (for research + browse)
npx skills add inferencesh/skills@web-search

# LLM models (analyze extracted content)
npx skills add inferencesh/skills@llm-models

Documentation

inference.sh Sessions - Session management
Multi-function Apps - How functions work

安全使用建议

This skill appears to implement a remote browser automation client for inference.sh and is internally consistent—but there are several things to consider before installing or using it: - Remote execution and data exposure: The workflows run a remote CLI and perform page interactions on inference.sh infrastructure. Any page content, credentials you enter into automated sessions, uploaded files, and recorded videos will be handled by that remote service. If you will be automating login flows, private dashboards, or uploading private files, assume those data are accessible to the remote provider unless you verify otherwise. - Inspect the installer: The Quick Start recommends `curl https://cli.inference.sh | sh`. That downloads and executes code from the internet; review the script on the server before running it (do not run it blindly). Prefer to fetch the script and inspect it locally first. - Misleading cookie guidance: The docs claim ways to extract "all cookies including httpOnly" via page JavaScript — httpOnly cookies are not accessible to document.cookie. Treat that section as incorrect and avoid assuming it can expose server-only cookies. - Sensitive features: Video recording, screenshots, execute (arbitrary JS), cookie extraction, file uploads and proxy credentials can all leak secrets. The skill includes guidance to avoid recording sensitive sessions, but it does not explicitly state that data will be transmitted to the provider. Use disposable/test accounts, redact secrets, or run only non-sensitive automation if you cannot verify the provider's data handling. - Recommended precautions: (1) Review the inference.sh CLI install script before executing it; (2) test with throwaway accounts and non-sensitive targets; (3) avoid uploading or automating private credentials unless you trust the service and have reviewed its privacy/security policy; (4) if you need local-only execution, prefer tools you can run entirely on your host (local Playwright/Playwright CLI) rather than a remote service. If you want, I can (a) list the exact lines that claim httpOnly cookie extraction and the curl|sh call, (b) point out other inaccurate JS examples, or (c suggest safer alternatives to run browser automation locally.

功能分析

Type: OpenClaw Skill Name: agentic-browser-0-1-2 Version: 1.0.0 The skill is classified as suspicious due to several high-risk capabilities inherent to its web automation purpose, which could be leveraged for malicious activities if misused. Specifically, the `execute` function allows arbitrary JavaScript execution within the browser context, enabling sensitive data extraction (e.g., `document.cookie` as demonstrated in `references/authentication.md`). The `interact` function supports file uploads, and the `open` function allows proxy configuration, both of which are powerful features. While the `SKILL.md` grants broad `Bash(infsh *)` permissions, this is scoped to the platform's CLI. The initial `curl | sh` command for CLI installation is also a generally risky practice. Although the documentation includes security best practices, the inherent power of these features without strict sandboxing or explicit user consent for each sensitive action warrants a 'suspicious' classification.

能力评估

✓ Purpose & Capability

Name/description, functions, templates and docs consistently describe a Playwright-backed browser automation workflow via the inference.sh CLI (open, interact, snapshot, execute, screenshot, record video). Requested capabilities (form filling, proxy, uploads, recording) are coherent with the stated purpose.

⚠ Instruction Scope

Runtime instructions and templates instruct the agent/user to (a) run the remote inference.sh CLI (curl | sh), (b) execute arbitrary JavaScript in pages (execute), (c) extract cookies and page text, (d) upload files and record video. These actions can expose sensitive credentials/page content to the remote service. The docs also contain inaccurate/misleading guidance (e.g., suggesting retrieval of httpOnly cookies via JS and an odd use of performance.getEntriesByType to 'get all cookies'), which is incorrect and could mislead users into unsafe practices.

⚠ Install Mechanism

The Quick Start shows installing the CLI via `curl -fsSL https://cli.inference.sh | sh` (pipe-to-sh) — an instruction to download and execute a remote install script. Although the skill package itself has no formal install spec, the recommended install method is high-risk because it runs arbitrary code hosted on an external domain; users should inspect that script before running it.

ℹ Credentials

requires.env is empty (no secrets declared), which aligns with being a generic automation skill. Templates and docs do reference environment variables (APP_USERNAME, APP_PASSWORD, TOTP_SECRET, proxy credentials) and encourage using env-vars/secrets managers. Requesting proxy credentials, upload paths and encouraging use of stored passwords is proportionate for automation, but these are sensitive and the docs do not explicitly state that data will be sent to the inference.sh servers.

✓ Persistence & Privilege

Skill does not request always:true and has no unusual persistence or cross-skill config modification. Sessions are managed via the remote service and session IDs; that behavior is consistent with its purpose.

版本历史

v1.0.0

Initial release of agentic-browser-0-1-2. - Provides browser automation for AI agents using inference.sh and Playwright. - Supports navigation, element interaction via `@e` references, screenshots, and video recording. - Includes full web automation actions: clicking, filling forms, dragging, uploading files, and executing JavaScript. - Offers reusable session management, proxy support, and visible cursor for video demos. - Extensive documentation, templates, and ready-to-use automation workflows.

元数据

Slug agentic-browser-0-1-2

版本 1.0.0

许可证 —

累计安装 9

当前安装数 9

历史版本数 1

常见问题

Agentic Browser 0.1.2 是什么？

Browser automation for AI agents via inference.sh. Navigate web pages, interact with elements using @e refs, take screenshots, record video. Capabilities: web scraping, form filling, clicking, typing, drag-drop, file upload, JavaScript execution. Use for: web automation, data extraction, testing, agent browsing, research. Triggers: browser, web automation, scrape, navigate, click, fill form, screenshot, browse web, playwright, headless browser, web agent, surf internet, record video. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 1269 次。

如何安装 Agentic Browser 0.1.2？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install agentic-browser-0-1-2」即可一键安装，无需额外配置。

Agentic Browser 0.1.2 是免费的吗？

是的，Agentic Browser 0.1.2 完全免费（开源免费），可自由下载、安装和使用。

Agentic Browser 0.1.2 支持哪些平台？

Agentic Browser 0.1.2 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Agentic Browser 0.1.2？

由 xyny89（@xyny89）开发并维护，当前版本 v1.0.0。

Agentic Browser 0.1.2

Agentic Browser

Quick Start

Core Workflow

Functions

Interact Actions

Element Refs

Features

Video Recording

Cursor Indicator

Proxy Support

File Upload

Drag and Drop

JavaScript Execution

Deep-Dive Documentation

Ready-to-Use Templates

Examples

Form Submission

Search and Extract

Screenshot with Video

Sessions

Related Skills

Documentation

Agentic Browser 0.1.2 是什么？

如何安装 Agentic Browser 0.1.2？

Agentic Browser 0.1.2 是免费的吗？

Agentic Browser 0.1.2 支持哪些平台？

谁开发了 Agentic Browser 0.1.2？

💬 留言讨论