← Back to Skills Marketplace
sdt328606

Playwright Scraper

by sdT328606 · GitHub ↗ · v2.0.0 · MIT-0
cross-platform ⚠ suspicious
39
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install stitch-playwright-scraper
Description
使用 Playwright + Stealth 插件绕过反爬机制抓取页面。
README (SKILL.md)

Playwright Stealth Scraper 🕷️

web_fetch 工具拿不到目标页面内容时(403、JS渲染、反爬),用这个技能。

前置条件

需要本地安装了 Playwright 和 Chromium。

# 在技能目录下安装
cd ~/.openclaw/workspace/skills/playwright-scraper
npm install playwright puppeteer-extra-plugin-stealth
npx playwright install chromium

注意:这会下载约 300MB 的 Chromium 二进制文件。

什么时候用这个

场景 用什么
普通网页、API 返回 web_fetch(内置,轻量)
被 Cloudflare 等反爬拦截 playwright-scraper
页面需要 JS 渲染才能显示内容 playwright-scraper
需要登录后抓取 playwright-scraper + 手动处理 cookie

使用方式

方式一:写一个独立脚本(推荐)

const { chromium } = require('playwright-extra');
const stealth = require('puppeteer-extra-plugin-stealth')();
chromium.use(stealth);

(async () => {
  const browser = await chromium.launch({ headless: true });
  const page = await browser.newPage();
  await page.goto('目标URL', { waitUntil: 'networkidle' });
  const content = await page.content();
  // 提取你需要的内容
  console.log(content);
  await browser.close();
})();

方式二:直接用 OpenClaw 的 browser 工具

# 启动浏览器
browser action=start profile=openclaw
# 打开页面
browser action=open url=目标URL
# 获取快照
browser action=snapshot targetId=XXX

内置 browser 工具通常够用,playwright-scraper 主要应对特殊反爬场景。

注意

  • 不要高频爬取,尊重 robots.txt
  • 如果目标网站有登录墙,不要自动填凭据,请示老板
  • 爬下来的数据存到工作区对应目录,不要丢在 /tmp
Usage Guidance
Review before installing. Use only in an isolated browser/profile, do not attach it to your normal logged-in Chrome session, and fix or remove the execSync-based tool entry point. Remove the bundled Goofish search scripts if you only need a general scraper, and avoid using the skill to bypass access controls, scrape against site rules, or search for cheating/fraud services.
Capability Assessment
Purpose & Capability
The stated purpose is Playwright plus stealth scraping for pages blocked by normal fetch tools, which fits some dependencies and examples. However, bundled scripts are Goofish-specific, hard-code marketplace searches including academic/writing-service terms, and read content from a live browser session, which is broader and less clearly aligned than the general scraper description.
Instruction Scope
The user-facing instructions mention ethical scraping and manual cookie handling, but they do not disclose the fixed Chrome DevTools endpoint, existing-tab enumeration, live-session reuse, or the unsafe MCP command path. There is no clear consent, domain allowlist, or isolation requirement for the higher-risk scripts.
Install Mechanism
Installation uses ordinary npm dependencies and Chromium setup, and no install-time persistence or install scripts were found. The runtime entry point is still problematic because it builds a shell command from the user-supplied URL and references a missing scripts/playwright-stealth.js file.
Credentials
Stealth browser automation is expected for this kind of scraper, but attaching to a hard-coded existing Chrome DevTools session can inherit cookies, authenticated tabs, and unrelated browsing state. That level of access is not proportionate without explicit user control and isolation.
Persistence & Privilege
No background service, self-starting persistence, destructive file operation, or credential exfiltration was found. The concern is privilege: live browser-session access and shell command execution can affect or expose much more than a single requested scrape.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install stitch-playwright-scraper
  3. After installation, invoke the skill by name or use /stitch-playwright-scraper
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v2.0.0
playwright-scraper v2.0.0 - Major update: Adds Playwright + Stealth support to bypass anti-bot mechanisms and handle JS-rendered pages. - Guides users for setup, including required dependencies and Chromium installation. - Clarifies when to use the skill versus standard fetch tools. - Provides example usage scripts and OpenClaw integration steps. - Emphasizes ethical scraping practices and storage recommendations.
Metadata
Slug stitch-playwright-scraper
Version 2.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Playwright Scraper?

使用 Playwright + Stealth 插件绕过反爬机制抓取页面。 It is an AI Agent Skill for Claude Code / OpenClaw, with 39 downloads so far.

How do I install Playwright Scraper?

Run "/install stitch-playwright-scraper" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Playwright Scraper free?

Yes, Playwright Scraper is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Playwright Scraper support?

Playwright Scraper is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Playwright Scraper?

It is built and maintained by sdT328606 (@sdt328606); the current version is v2.0.0.

💬 Comments