← Back to Skills Marketplace
tedtalk

Agent Vision Scraper

by Tedtalk · GitHub ↗ · v1.0.0
cross-platform ⚠ suspicious
602
Downloads
0
Stars
8
Active Installs
1
Versions
Install in OpenClaw
/install agent-vision-scraper
Description
Dockerized AI-powered web scraper using Playwright with virtual display and vision-based captcha solving, no third-party captcha services needed.
README (SKILL.md)

Dockerized Vision Agent Scraper

Description

这是一个运行在独立 Docker 容器中的拟人化网页操作与抓取工具。它内置了虚拟屏幕 (Xvfb) 和 Playwright 隐身指纹,完全依靠 AI 的视觉能力 (Vision) 来读取和破解图形验证码,不依赖任何第三方打码服务。

Base Image: mcr.microsoft.com/playwright:v1.58.2-jammy

Usage Instructions

当用户要求进行复杂网页自动化、数据抓取、或遇到具备反爬/图形验证码的网站时,请调用此工具。

重要说明:必须通过 docker run 命令来启动它,以确保它运行在配置好的虚拟屏幕环境中。

Command Format:

docker run --rm --env-file .env -p 5900:5900 agent-scraper-image node agent-scraper.js

Examples:

docker run --rm --env-file .env -p 5900:5900 agent-scraper-image node agent-scraper.js "https://example.com/login" "查看页面,如果发现图形验证码,读出图片上的字母,并与账号 admin 和密码 123456 一起填入表单,点击登录。"

Build Image

cd ~/.openclaw/skills/agent-vision-scraper
docker build -t agent-scraper-image .

Features

  • ✅ 虚拟屏幕 (Xvfb) - 无头服务器也能运行有头浏览器
  • ✅ VNC 远程查看 - 端口 5900 可连接查看实时操作
  • ✅ Playwright Stealth - 绕过常见反爬检测
  • ✅ Vision 验证码识别 - 利用 LLM 视觉能力破解图形验证码
  • ✅ 最新 Playwright v1.58.2 - 与当前环境保持一致

最后更新:2026-03-03

Usage Guidance
Do not run this image blindly. Key things to check before installing or executing: - Ask for the missing Dockerfile and inspect it: the registry files reference a Dockerfile but none is provided — do not build or run a container from an unknown Dockerfile or from untrusted sources. - Understand where screenshots and page data are sent: the code delegates to AgentBrowser.execute which will call an external model service; confirm which endpoints are used and whether you control the API key. Treat page screenshots as sensitive (they may contain credentials or PII). - Avoid handing over sensitive .env values: the run examples mount an .env into the container. Do not place secrets (AWS, DB, SSH keys, etc.) into .env that is passed to untrusted containers. - Harden VNC: the examples expose port 5900 with no password. If you must run, bind VNC only to an isolated network, require authentication, or avoid exposing it to host network. - Consider offline/air-gapped testing: run the container in an isolated network or VM without network access to verify behavior before exposing it to the internet or production data. - If you need the capability, prefer a vetted implementation: request the missing Dockerfile, a signed release, and clear documentation about which LLM endpoints are used and how long data is retained. If the skill will see login pages, do not provide real credentials during testing. Given the inconsistencies and potential for sensitive data leakage, treat this skill as suspicious and require additional transparency and controls before using it with real data or secrets.
Capability Analysis
Type: OpenClaw Skill Name: agent-vision-scraper Version: 1.0.0 The skill is classified as suspicious due to two primary security vulnerabilities: 1) The `skill.md` and `README.md` explicitly state that VNC access is enabled on port 5900 without a password, allowing anyone with network access to view and potentially control the browser session, which could expose sensitive data. 2) The `agent-scraper.js` script directly embeds the user-provided instruction (`userInstruction`) into the LLM's prompt (`augmentedInstruction`), creating a significant prompt injection vulnerability. While the skill's stated purpose is legitimate web scraping, these flaws could be exploited by a malicious user or external attacker to compromise data or control the agent.
Capability Assessment
Purpose & Capability
The skill claims to be a Dockerized Playwright agent, with a Docker base image noted in SKILL.md/README, VNC, and vision-based captcha solving — which aligns with the included code that launches Playwright with stealth plugins. However, SKILL.md and README instruct building a Docker image (mentioning a Dockerfile and base image) but the file manifest does not include a Dockerfile. Metadata declares no required env vars or credentials while README/SKILL.md explicitly reference optional LLM API keys (OPENAI_API_KEY / ANTHROPIC_API_KEY). These omissions are inconsistent with the declared requirements.
Instruction Scope
Runtime instructions direct the agent to perform arbitrary web automation and to 'crack' graphical captchas by sending screenshots to a vision-enabled LLM via AgentBrowser. The code takes screenshots and delegates actions to AgentBrowser.execute, which likely transmits page images/content to external model endpoints. The SKILL.md examples explicitly show submitting credentials (e.g., admin/123456) into login forms — the tool is designed to bypass anti-bot measures and interact with login forms, which can be used for legitimate automation but also for account takeover or scraping protected content. The instructions also recommend running with VNC port 5900 exposed and no VNC password, increasing risk of session observation. There is no explicit description of what external endpoints receive screenshots or what data is logged or retained.
Install Mechanism
There is no install spec in registry metadata (instruction-only), but the skill includes code and a package.json which imply npm dependencies. SKILL.md/README instruct building a Docker image (docker build -t agent-scraper-image .) yet the repository does not include the Dockerfile referenced in the docs — that's an inconsistency. The absence of an explicit, provided Dockerfile means users may create or obtain a separate Dockerfile to run this code, increasing risk if they follow undocumented build steps from elsewhere. The included dependencies (playwright-extra, puppeteer stealth) are expected for the stated purpose.
Credentials
Metadata lists no required environment variables, but README/SKILL.md instruct creating a .env and mention OPENAI_API_KEY and ANTHROPIC_API_KEY as optional for calling external vision models. The code itself does dotenv.config() and uses AgentBrowser which will likely require or use API keys; those env vars are therefore effectively required for the tool's full behavior. The docker run examples mount an .env into the container (--env-file .env) which can expose any host secrets placed there to the container; that is disproportionate relative to the metadata that declared no secrets. In addition, binding host port 5900 with no password exposes the running session to the network. Together, these practices create opportunities for accidental or intentional exfiltration of sensitive data (page screenshots, credentials entered into pages, secrets from .env).
Persistence & Privilege
The skill does not request always:true and does not claim to modify other skills or system-wide config. It is user-invocable and allowed to be invoked autonomously (default), which is normal. However, the recommended runtime (docker run -p 5900:5900 --env-file .env) requires network and file exposure that increases blast radius when the skill executes (open VNC port, injected environment file). This is not a permissions-plane privilege request in metadata, but it materially increases operational risk when run.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install agent-vision-scraper
  3. After installation, invoke the skill by name or use /agent-vision-scraper
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release of Dockerized Vision Agent Scraper. - Runs autonomously in a pre-configured Docker container with virtual screen support. - Uses Playwright with stealth capabilities to bypass anti-scraping measures. - Vision-based CAPTCHA solving—no third-party services required. - VNC support (port 5900) allows real-time operation viewing. - Easy startup and build instructions included in documentation.
Metadata
Slug agent-vision-scraper
Version 1.0.0
License
All-time Installs 9
Active Installs 8
Total Versions 1
Frequently Asked Questions

What is Agent Vision Scraper?

Dockerized AI-powered web scraper using Playwright with virtual display and vision-based captcha solving, no third-party captcha services needed. It is an AI Agent Skill for Claude Code / OpenClaw, with 602 downloads so far.

How do I install Agent Vision Scraper?

Run "/install agent-vision-scraper" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Agent Vision Scraper free?

Yes, Agent Vision Scraper is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Agent Vision Scraper support?

Agent Vision Scraper is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Agent Vision Scraper?

It is built and maintained by Tedtalk (@tedtalk); the current version is v1.0.0.

💬 Comments