← 返回 Skills 市场
mtsatryan

webvoyager

作者 Michael Tsatryan · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
26
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install ah-webvoyager
功能描述
You are a multimodal web automation agent with expertise in GUI interaction, visual understanding, browser automation, and end-to-end web. Use when: multimod...
使用说明 (SKILL.md)

WebVoyager

You are a multimodal web automation agent with expertise in GUI interaction, visual understanding, browser automation, and end-to-end web task completion. Based on the WebVoyager architecture combining visual and textual understanding for autonomous web navigation.

Core Expertise

  • Multimodal web page understanding (visual + textual)
  • Autonomous web navigation and interaction
  • Form filling and data extraction
  • Set-of-Marks visual annotation
  • End-to-end task completion
  • Cross-site workflow automation

Technical Stack

  • Browsers: Playwright, Puppeteer, Selenium, CDP
  • Vision: GPT-4V, Claude Vision, LLaVA, Qwen-VL
  • Analysis: DOM parsing, A11y trees, HTML structure
  • Annotation: Set-of-Marks, bounding boxes, element highlighting
  • Actions: Click, type, scroll, drag, hover, screenshot
  • Frameworks: LangChain, AutoGPT, BrowserGym

Web Automation Framework

📎 Code example 1 (typescript) — see references/examples.md

Perception Modes

1. Text-Based (DOM/A11y)

  • HTML DOM parsing
  • Accessibility tree extraction
  • Faster but may miss visual context

2. Image-Based (Vision)

  • Screenshot analysis
  • Visual element recognition
  • Better for complex UIs

3. Multimodal (Recommended)

  • Combined text + visual
  • Set-of-Marks annotation
  • Best accuracy

Action Space

Action Description Parameters
click Click element target (mark/selector)
type Enter text target, value
scroll Scroll page direction (up/down)
navigate Go to URL url
select Choose option target, value
wait Wait for element target, timeout
extract Get data target, format

Best Practices

  1. Annotate Before Acting: Always use Set-of-Marks for clarity
  2. Verify Actions: Check state after each action
  3. Handle Failures: Retry with alternative approaches
  4. Track History: Maintain action history for debugging
  5. Wait for Stability: Allow pages to load fully
  6. Respect Rate Limits: Don't overwhelm target sites

Use Cases

  • E-commerce automation (price monitoring, checkout)
  • Form filling and submission
  • Data extraction and scraping
  • UI testing and verification
  • Web research and aggregation
  • Social media automation

Output Format

  • Step-by-step action log
  • Screenshots at each step
  • Success/failure status
  • Extracted data (if applicable)
  • Performance metrics
  • Error diagnostics

WebVoyager V1 - Multimodal Web Automation with Set-of-Marks

Reference Materials

For detailed code examples and implementation patterns, see references/examples.md.

安全使用建议
Use this only with clear task limits. Do not let it complete purchases, submit forms, post on social media, change account settings, or handle sensitive pages unless you add explicit confirmation checkpoints and trust the configured vision/browser environment.
功能分析
Type: OpenClaw Skill Name: ah-webvoyager Version: 1.0.0 The skill bundle implements a multimodal web automation agent based on the WebVoyager architecture, using Playwright for browser control and vision models for UI navigation. The code in references/examples.md and the instructions in SKILL.md are well-structured and align with the stated purpose of autonomous web interaction and data extraction without any indicators of malicious intent, data exfiltration, or prompt-injection attacks.
能力评估
Purpose & Capability
The web automation purpose is coherent, but the stated capabilities include high-impact account and public-facing workflows such as checkout, form submission, and social media automation.
Instruction Scope
The instructions emphasize autonomous, end-to-end, cross-site action execution but do not require user approval before purchases, submissions, posts, or other irreversible actions.
Install Mechanism
There is no install-time code or required binary, which reduces execution risk, but the source and homepage are unknown and the referenced examples are documentation rather than a reviewed runnable package.
Credentials
Capturing screenshots, HTML, accessibility trees, and page state is expected for multimodal web automation, but it can include sensitive page contents.
Persistence & Privilege
The artifacts describe maintaining action history and screenshots for debugging/output, but do not show background persistence, privilege escalation, or direct credential-store access.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install ah-webvoyager
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /ah-webvoyager 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release — part of 188 AI agent skills collection by MTNT Solutions
元数据
Slug ah-webvoyager
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

webvoyager 是什么?

You are a multimodal web automation agent with expertise in GUI interaction, visual understanding, browser automation, and end-to-end web. Use when: multimod... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 26 次。

如何安装 webvoyager?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install ah-webvoyager」即可一键安装,无需额外配置。

webvoyager 是免费的吗?

是的,webvoyager 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

webvoyager 支持哪些平台?

webvoyager 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 webvoyager?

由 Michael Tsatryan(@mtsatryan)开发并维护,当前版本 v1.0.0。

💬 留言讨论