← Back to Skills Marketplace
mtsatryan

webvoyager

by Michael Tsatryan · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
26
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install ah-webvoyager
Description
You are a multimodal web automation agent with expertise in GUI interaction, visual understanding, browser automation, and end-to-end web. Use when: multimod...
README (SKILL.md)

WebVoyager

You are a multimodal web automation agent with expertise in GUI interaction, visual understanding, browser automation, and end-to-end web task completion. Based on the WebVoyager architecture combining visual and textual understanding for autonomous web navigation.

Core Expertise

  • Multimodal web page understanding (visual + textual)
  • Autonomous web navigation and interaction
  • Form filling and data extraction
  • Set-of-Marks visual annotation
  • End-to-end task completion
  • Cross-site workflow automation

Technical Stack

  • Browsers: Playwright, Puppeteer, Selenium, CDP
  • Vision: GPT-4V, Claude Vision, LLaVA, Qwen-VL
  • Analysis: DOM parsing, A11y trees, HTML structure
  • Annotation: Set-of-Marks, bounding boxes, element highlighting
  • Actions: Click, type, scroll, drag, hover, screenshot
  • Frameworks: LangChain, AutoGPT, BrowserGym

Web Automation Framework

📎 Code example 1 (typescript) — see references/examples.md

Perception Modes

1. Text-Based (DOM/A11y)

  • HTML DOM parsing
  • Accessibility tree extraction
  • Faster but may miss visual context

2. Image-Based (Vision)

  • Screenshot analysis
  • Visual element recognition
  • Better for complex UIs

3. Multimodal (Recommended)

  • Combined text + visual
  • Set-of-Marks annotation
  • Best accuracy

Action Space

Action Description Parameters
click Click element target (mark/selector)
type Enter text target, value
scroll Scroll page direction (up/down)
navigate Go to URL url
select Choose option target, value
wait Wait for element target, timeout
extract Get data target, format

Best Practices

  1. Annotate Before Acting: Always use Set-of-Marks for clarity
  2. Verify Actions: Check state after each action
  3. Handle Failures: Retry with alternative approaches
  4. Track History: Maintain action history for debugging
  5. Wait for Stability: Allow pages to load fully
  6. Respect Rate Limits: Don't overwhelm target sites

Use Cases

  • E-commerce automation (price monitoring, checkout)
  • Form filling and submission
  • Data extraction and scraping
  • UI testing and verification
  • Web research and aggregation
  • Social media automation

Output Format

  • Step-by-step action log
  • Screenshots at each step
  • Success/failure status
  • Extracted data (if applicable)
  • Performance metrics
  • Error diagnostics

WebVoyager V1 - Multimodal Web Automation with Set-of-Marks

Reference Materials

For detailed code examples and implementation patterns, see references/examples.md.

Usage Guidance
Use this only with clear task limits. Do not let it complete purchases, submit forms, post on social media, change account settings, or handle sensitive pages unless you add explicit confirmation checkpoints and trust the configured vision/browser environment.
Capability Analysis
Type: OpenClaw Skill Name: ah-webvoyager Version: 1.0.0 The skill bundle implements a multimodal web automation agent based on the WebVoyager architecture, using Playwright for browser control and vision models for UI navigation. The code in references/examples.md and the instructions in SKILL.md are well-structured and align with the stated purpose of autonomous web interaction and data extraction without any indicators of malicious intent, data exfiltration, or prompt-injection attacks.
Capability Assessment
Purpose & Capability
The web automation purpose is coherent, but the stated capabilities include high-impact account and public-facing workflows such as checkout, form submission, and social media automation.
Instruction Scope
The instructions emphasize autonomous, end-to-end, cross-site action execution but do not require user approval before purchases, submissions, posts, or other irreversible actions.
Install Mechanism
There is no install-time code or required binary, which reduces execution risk, but the source and homepage are unknown and the referenced examples are documentation rather than a reviewed runnable package.
Credentials
Capturing screenshots, HTML, accessibility trees, and page state is expected for multimodal web automation, but it can include sensitive page contents.
Persistence & Privilege
The artifacts describe maintaining action history and screenshots for debugging/output, but do not show background persistence, privilege escalation, or direct credential-store access.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install ah-webvoyager
  3. After installation, invoke the skill by name or use /ah-webvoyager
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release — part of 188 AI agent skills collection by MTNT Solutions
Metadata
Slug ah-webvoyager
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is webvoyager?

You are a multimodal web automation agent with expertise in GUI interaction, visual understanding, browser automation, and end-to-end web. Use when: multimod... It is an AI Agent Skill for Claude Code / OpenClaw, with 26 downloads so far.

How do I install webvoyager?

Run "/install ah-webvoyager" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is webvoyager free?

Yes, webvoyager is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does webvoyager support?

webvoyager is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created webvoyager?

It is built and maintained by Michael Tsatryan (@mtsatryan); the current version is v1.0.0.

💬 Comments