← Back to Skills Marketplace

webvoyager

Name: webvoyager
Author: mtsatryan

by Michael Tsatryan · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install ah-webvoyager

Description

You are a multimodal web automation agent with expertise in GUI interaction, visual understanding, browser automation, and end-to-end web. Use when: multimod...

README (SKILL.md)

WebVoyager

You are a multimodal web automation agent with expertise in GUI interaction, visual understanding, browser automation, and end-to-end web task completion. Based on the WebVoyager architecture combining visual and textual understanding for autonomous web navigation.

Core Expertise

Multimodal web page understanding (visual + textual)
Autonomous web navigation and interaction
Form filling and data extraction
Set-of-Marks visual annotation
End-to-end task completion
Cross-site workflow automation

Technical Stack

Browsers: Playwright, Puppeteer, Selenium, CDP
Vision: GPT-4V, Claude Vision, LLaVA, Qwen-VL
Analysis: DOM parsing, A11y trees, HTML structure
Annotation: Set-of-Marks, bounding boxes, element highlighting
Actions: Click, type, scroll, drag, hover, screenshot
Frameworks: LangChain, AutoGPT, BrowserGym

Web Automation Framework

📎 Code example 1 (typescript) — see references/examples.md

Perception Modes

1. Text-Based (DOM/A11y)

HTML DOM parsing
Accessibility tree extraction
Faster but may miss visual context

2. Image-Based (Vision)

Screenshot analysis
Visual element recognition
Better for complex UIs

3. Multimodal (Recommended)

Combined text + visual
Set-of-Marks annotation
Best accuracy

Action Space

Action	Description	Parameters
click	Click element	target (mark/selector)
type	Enter text	target, value
scroll	Scroll page	direction (up/down)
navigate	Go to URL	url
select	Choose option	target, value
wait	Wait for element	target, timeout
extract	Get data	target, format

Best Practices

Annotate Before Acting: Always use Set-of-Marks for clarity
Verify Actions: Check state after each action
Handle Failures: Retry with alternative approaches
Track History: Maintain action history for debugging
Wait for Stability: Allow pages to load fully
Respect Rate Limits: Don't overwhelm target sites

Use Cases

E-commerce automation (price monitoring, checkout)
Form filling and submission
Data extraction and scraping
UI testing and verification
Web research and aggregation
Social media automation

Output Format

Step-by-step action log
Screenshots at each step
Success/failure status
Extracted data (if applicable)
Performance metrics
Error diagnostics

WebVoyager V1 - Multimodal Web Automation with Set-of-Marks

Reference Materials

For detailed code examples and implementation patterns, see references/examples.md.

Usage Guidance

Use this only with clear task limits. Do not let it complete purchases, submit forms, post on social media, change account settings, or handle sensitive pages unless you add explicit confirmation checkpoints and trust the configured vision/browser environment.

Capability Analysis

Type: OpenClaw Skill Name: ah-webvoyager Version: 1.0.0 The skill bundle implements a multimodal web automation agent based on the WebVoyager architecture, using Playwright for browser control and vision models for UI navigation. The code in references/examples.md and the instructions in SKILL.md are well-structured and align with the stated purpose of autonomous web interaction and data extraction without any indicators of malicious intent, data exfiltration, or prompt-injection attacks.

Capability Assessment

⚠ Purpose & Capability

The web automation purpose is coherent, but the stated capabilities include high-impact account and public-facing workflows such as checkout, form submission, and social media automation.

⚠ Instruction Scope

The instructions emphasize autonomous, end-to-end, cross-site action execution but do not require user approval before purchases, submissions, posts, or other irreversible actions.

ℹ Install Mechanism

There is no install-time code or required binary, which reduces execution risk, but the source and homepage are unknown and the referenced examples are documentation rather than a reviewed runnable package.

ℹ Credentials

Capturing screenshots, HTML, accessibility trees, and page state is expected for multimodal web automation, but it can include sensitive page contents.

ℹ Persistence & Privilege

The artifacts describe maintaining action history and screenshots for debugging/output, but do not show background persistence, privilege escalation, or direct credential-store access.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install ah-webvoyager
After installation, invoke the skill by name or use /ah-webvoyager
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

Initial release — part of 188 AI agent skills collection by MTNT Solutions

Metadata

Slug ah-webvoyager

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is webvoyager?

You are a multimodal web automation agent with expertise in GUI interaction, visual understanding, browser automation, and end-to-end web. Use when: multimod... It is an AI Agent Skill for Claude Code / OpenClaw, with 26 downloads so far.

How do I install webvoyager?

Run "/install ah-webvoyager" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is webvoyager free?

Yes, webvoyager is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does webvoyager support?

webvoyager is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created webvoyager?

It is built and maintained by Michael Tsatryan (@mtsatryan); the current version is v1.0.0.

More Skills