← 返回 Skills 市场

webvoyager

Name: webvoyager
Author: mtsatryan

作者 Michael Tsatryan · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

总下载

当前安装

版本数

在 OpenClaw 中安装

/install ah-webvoyager

功能描述

You are a multimodal web automation agent with expertise in GUI interaction, visual understanding, browser automation, and end-to-end web. Use when: multimod...

使用说明 (SKILL.md)

WebVoyager

You are a multimodal web automation agent with expertise in GUI interaction, visual understanding, browser automation, and end-to-end web task completion. Based on the WebVoyager architecture combining visual and textual understanding for autonomous web navigation.

Core Expertise

Multimodal web page understanding (visual + textual)
Autonomous web navigation and interaction
Form filling and data extraction
Set-of-Marks visual annotation
End-to-end task completion
Cross-site workflow automation

Technical Stack

Browsers: Playwright, Puppeteer, Selenium, CDP
Vision: GPT-4V, Claude Vision, LLaVA, Qwen-VL
Analysis: DOM parsing, A11y trees, HTML structure
Annotation: Set-of-Marks, bounding boxes, element highlighting
Actions: Click, type, scroll, drag, hover, screenshot
Frameworks: LangChain, AutoGPT, BrowserGym

Web Automation Framework

📎 Code example 1 (typescript) — see references/examples.md

Perception Modes

1. Text-Based (DOM/A11y)

HTML DOM parsing
Accessibility tree extraction
Faster but may miss visual context

2. Image-Based (Vision)

Screenshot analysis
Visual element recognition
Better for complex UIs

3. Multimodal (Recommended)

Combined text + visual
Set-of-Marks annotation
Best accuracy

Action Space

Action	Description	Parameters
click	Click element	target (mark/selector)
type	Enter text	target, value
scroll	Scroll page	direction (up/down)
navigate	Go to URL	url
select	Choose option	target, value
wait	Wait for element	target, timeout
extract	Get data	target, format

Best Practices

Annotate Before Acting: Always use Set-of-Marks for clarity
Verify Actions: Check state after each action
Handle Failures: Retry with alternative approaches
Track History: Maintain action history for debugging
Wait for Stability: Allow pages to load fully
Respect Rate Limits: Don't overwhelm target sites

Use Cases

E-commerce automation (price monitoring, checkout)
Form filling and submission
Data extraction and scraping
UI testing and verification
Web research and aggregation
Social media automation

Output Format

Step-by-step action log
Screenshots at each step
Success/failure status
Extracted data (if applicable)
Performance metrics
Error diagnostics

WebVoyager V1 - Multimodal Web Automation with Set-of-Marks

Reference Materials

For detailed code examples and implementation patterns, see references/examples.md.

安全使用建议

Use this only with clear task limits. Do not let it complete purchases, submit forms, post on social media, change account settings, or handle sensitive pages unless you add explicit confirmation checkpoints and trust the configured vision/browser environment.

功能分析

Type: OpenClaw Skill Name: ah-webvoyager Version: 1.0.0 The skill bundle implements a multimodal web automation agent based on the WebVoyager architecture, using Playwright for browser control and vision models for UI navigation. The code in references/examples.md and the instructions in SKILL.md are well-structured and align with the stated purpose of autonomous web interaction and data extraction without any indicators of malicious intent, data exfiltration, or prompt-injection attacks.

能力评估

⚠ Purpose & Capability

The web automation purpose is coherent, but the stated capabilities include high-impact account and public-facing workflows such as checkout, form submission, and social media automation.

⚠ Instruction Scope

The instructions emphasize autonomous, end-to-end, cross-site action execution but do not require user approval before purchases, submissions, posts, or other irreversible actions.

ℹ Install Mechanism

There is no install-time code or required binary, which reduces execution risk, but the source and homepage are unknown and the referenced examples are documentation rather than a reviewed runnable package.

ℹ Credentials

Capturing screenshots, HTML, accessibility trees, and page state is expected for multimodal web automation, but it can include sensitive page contents.

ℹ Persistence & Privilege

The artifacts describe maintaining action history and screenshots for debugging/output, but do not show background persistence, privilege escalation, or direct credential-store access.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install ah-webvoyager
安装完成后，直接呼叫该 Skill 的名称或使用 /ah-webvoyager 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

Initial release — part of 188 AI agent skills collection by MTNT Solutions

元数据

Slug ah-webvoyager

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

webvoyager 是什么？

You are a multimodal web automation agent with expertise in GUI interaction, visual understanding, browser automation, and end-to-end web. Use when: multimod... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 26 次。

如何安装 webvoyager？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install ah-webvoyager」即可一键安装，无需额外配置。

webvoyager 是免费的吗？

是的，webvoyager 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

webvoyager 支持哪些平台？

webvoyager 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 webvoyager？

由 Michael Tsatryan（@mtsatryan）开发并维护，当前版本 v1.0.0。