← Back to Skills Marketplace

Playwright Scraper Skill

Name: Playwright Scraper Skill
Author: hongjiahao371-pixel

by hongjiahao371-pixel · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ✓ Security Clean

200

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install coco-playwright-stealth

Description

Playwright-based web scraping OpenClaw Skill with anti-bot protection. Successfully tested on complex sites like Discuss.com.hk.

README (SKILL.md)

Playwright Scraper Skill

A Playwright-based web scraping OpenClaw Skill with anti-bot protection. Choose the best approach based on the target website's anti-bot level.

🎯 Use Case Matrix

Target Website	Anti-Bot Level	Recommended Method	Script
Regular Sites	Low	web_fetch tool	N/A (built-in)
Dynamic Sites	Medium	Playwright Simple	`scripts/playwright-simple.js`
Cloudflare Protected	High	Playwright Stealth ⭐	`scripts/playwright-stealth.js`
YouTube	Special	deep-scraper	Install separately
Reddit	Special	reddit-scraper	Install separately

📦 Installation

cd playwright-scraper-skill
npm install
npx playwright install chromium

🚀 Quick Start

1️⃣ Simple Sites (No Anti-Bot)

Use OpenClaw's built-in web_fetch tool:

# Invoke directly in OpenClaw
Hey, fetch me the content from https://example.com

2️⃣ Dynamic Sites (Requires JavaScript)

Use Playwright Simple:

node scripts/playwright-simple.js "https://example.com"

Example output:

{
  "url": "https://example.com",
  "title": "Example Domain",
  "content": "...",
  "elapsedSeconds": "3.45"
}

3️⃣ Anti-Bot Protected Sites (Cloudflare etc.)

Use Playwright Stealth:

node scripts/playwright-stealth.js "https://m.discuss.com.hk/#hot"

Features:

Hide automation markers (navigator.webdriver = false)
Realistic User-Agent (iPhone, Android)
Random delays to mimic human behavior
Screenshot and HTML saving support

4️⃣ YouTube Video Transcripts

Use deep-scraper (install separately):

# Install deep-scraper skill
npx clawhub install deep-scraper

# Use it
cd skills/deep-scraper
node assets/youtube_handler.js "https://www.youtube.com/watch?v=VIDEO_ID"

📖 Script Descriptions

`scripts/playwright-simple.js`

Use Case: Regular dynamic websites
Speed: Fast (3-5 seconds)
Anti-Bot: None
Output: JSON (title, content, URL)

`scripts/playwright-stealth.js` ⭐

Use Case: Sites with Cloudflare or anti-bot protection
Speed: Medium (5-20 seconds)
Anti-Bot: Medium-High (hides automation, realistic UA)
Output: JSON + Screenshot + HTML file
Verified: 100% success on Discuss.com.hk

🎓 Best Practices

1. Try web_fetch First

If the site doesn't have dynamic loading, use OpenClaw's web_fetch tool—it's fastest.

2. Need JavaScript? Use Playwright Simple

If you need to wait for JavaScript rendering, use playwright-simple.js.

3. Getting Blocked? Use Stealth

If you encounter 403 or Cloudflare challenges, use playwright-stealth.js.

4. Special Sites Need Specialized Skills

YouTube → deep-scraper
Reddit → reddit-scraper
Twitter → bird skill

🔧 Customization

All scripts support environment variables:

# Set screenshot path
SCREENSHOT_PATH=/path/to/screenshot.png node scripts/playwright-stealth.js URL

# Set wait time (milliseconds)
WAIT_TIME=10000 node scripts/playwright-simple.js URL

# Enable headful mode (show browser)
HEADLESS=false node scripts/playwright-stealth.js URL

# Save HTML
SAVE_HTML=true node scripts/playwright-stealth.js URL

# Custom User-Agent
USER_AGENT="Mozilla/5.0 ..." node scripts/playwright-stealth.js URL

📊 Performance Comparison

Method	Speed	Anti-Bot	Success Rate (Discuss.com.hk)
web_fetch	⚡ Fastest	❌ None	0%
Playwright Simple	🚀 Fast	⚠️ Low	20%
Playwright Stealth	⏱️ Medium	✅ Medium	100% ✅
Puppeteer Stealth	⏱️ Medium	✅ Medium-High	~80%
Crawlee (deep-scraper)	🐢 Slow	❌ Detected	0%
Chaser (Rust)	⏱️ Medium	❌ Detected	0%

🛡️ Anti-Bot Techniques Summary

Lessons learned from our testing:

✅ Effective Anti-Bot Measures

Hide navigator.webdriver — Essential
Realistic User-Agent — Use real devices (iPhone, Android)
Mimic Human Behavior — Random delays, scrolling
Avoid Framework Signatures — Crawlee, Selenium are easily detected
Use addInitScript (Playwright) — Inject before page load

❌ Ineffective Anti-Bot Measures

Only changing User-Agent — Not enough
Using high-level frameworks (Crawlee) — More easily detected
Docker isolation — Doesn't help with Cloudflare

🔍 Troubleshooting

Issue: 403 Forbidden

Solution: Use playwright-stealth.js

Issue: Cloudflare Challenge Page

Solution:

Increase wait time (10-15 seconds)
Try headless: false (headful mode sometimes has higher success rate)
Consider using proxy IPs

Issue: Blank Page

Solution:

Increase waitForTimeout
Use waitUntil: 'networkidle' or 'domcontentloaded'
Check if login is required

📝 Memory & Experience

2026-02-07 Discuss.com.hk Test Conclusions

✅ Pure Playwright + Stealth succeeded (5s, 200 OK)
❌ Crawlee (deep-scraper) failed (403)
❌ Chaser (Rust) failed (Cloudflare)
❌ Puppeteer standard failed (403)

Best Solution: Pure Playwright + anti-bot techniques (framework-independent)

🚧 Future Improvements

Add proxy IP rotation
Implement cookie management (maintain login state)
Add CAPTCHA handling (2captcha / Anti-Captcha)
Batch scraping (parallel URLs)
Integration with OpenClaw's browser tool

📚 References

Usage Guidance

This skill appears internally coherent and does what it claims: it runs Playwright-based scrapers and includes explicit stealth techniques to evade anti-bot measures. Before installing or running it: (1) audit and run npm install in an isolated/ephemeral environment (container or VM) and inspect package-lock.json for unexpected dependencies; (2) avoid running it on hosts containing sensitive credentials or cookies (it writes screenshots/HTML to disk); (3) be aware that using anti-bot evasion may violate target sites' terms of service or local laws — use responsibly; (4) if you later enable CAPTCHA solving or proxy rotation, expect to provide API keys (which would increase risk and require secret-handling review); and (5) if you want extra safety, run the scripts with limited filesystem permissions and network access or manually review the scripts before use.

Capability Analysis

Type: OpenClaw Skill Name: coco-playwright-stealth Version: 1.0.0 The skill bundle provides a legitimate set of web scraping tools based on Playwright, including scripts for basic and stealthy scraping (playwright-simple.js, playwright-stealth.js). The code is well-documented and implements standard anti-bot bypass techniques, such as hiding the automation flag and mimicking human browser behavior. No evidence of malicious intent, data exfiltration, or unauthorized system access was found; the scripts function exactly as described for their stated purpose of web data extraction.

Capability Assessment

✓ Purpose & Capability

The name/description match the included files and scripts (playwright-simple.js, playwright-stealth.js, smzdm-scraper.js). package.json lists Playwright as a dependency and SKILL.md documents running npm install and npx playwright install — all consistent with a Playwright scraper.

✓ Instruction Scope

Runtime instructions tell the agent to run local Node scripts and set local environment variables. The scripts read the URL argument, interact with pages, save screenshots/HTML locally, and print JSON to stdout. There are no instructions to read unrelated system files or to POST data to remote endpoints. The stealth code intentionally alters browser properties to evade detection (navigator.webdriver, userAgent, etc.), which is consistent with the stated anti-bot goal.

ℹ Install Mechanism

There is no platform 'install' spec, but SKILL.md and package.json instruct npm install and npx playwright install chromium. Using npm/playwright from the official registry is expected for this functionality; verify you run npm install from a trusted source or review package-lock.json before installing.

✓ Credentials

The skill does not declare required secrets or credentials. Scripts accept optional env vars (HEADLESS, WAIT_TIME, SCREENSHOT_PATH, SAVE_HTML, USER_AGENT) which are reasonable for customization and do not require providing sensitive tokens. Future-noted features (CAPTCHA services, proxy rotation) would require credentials, but those are not present in the current code.

✓ Persistence & Privilege

The skill does not request always: true and does not modify other skills or system-wide configs. It writes only to local files (screenshots/HTML) in the working directory or paths you provide; there is no evidence it alters agent settings or other installed skills.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install coco-playwright-stealth
After installation, invoke the skill by name or use /coco-playwright-stealth
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

初版发布：基于playwright的隐形爬虫工具，支持反检测、截图、内容提取

Metadata

Slug coco-playwright-stealth

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Playwright Scraper Skill?

Playwright-based web scraping OpenClaw Skill with anti-bot protection. Successfully tested on complex sites like Discuss.com.hk. It is an AI Agent Skill for Claude Code / OpenClaw, with 200 downloads so far.

How do I install Playwright Scraper Skill?

Run "/install coco-playwright-stealth" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Playwright Scraper Skill free?

Yes, Playwright Scraper Skill is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Playwright Scraper Skill support?

Playwright Scraper Skill is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Playwright Scraper Skill?

It is built and maintained by hongjiahao371-pixel (@hongjiahao371-pixel); the current version is v1.0.0.

More Skills

Playwright Scraper Skill

Playwright Scraper Skill

🎯 Use Case Matrix

📦 Installation

🚀 Quick Start

1️⃣ Simple Sites (No Anti-Bot)

2️⃣ Dynamic Sites (Requires JavaScript)

3️⃣ Anti-Bot Protected Sites (Cloudflare etc.)

4️⃣ YouTube Video Transcripts

📖 Script Descriptions

scripts/playwright-simple.js

scripts/playwright-stealth.js ⭐

🎓 Best Practices

1. Try web_fetch First

2. Need JavaScript? Use Playwright Simple

3. Getting Blocked? Use Stealth

4. Special Sites Need Specialized Skills

🔧 Customization

📊 Performance Comparison

🛡️ Anti-Bot Techniques Summary

✅ Effective Anti-Bot Measures

❌ Ineffective Anti-Bot Measures

🔍 Troubleshooting

Issue: 403 Forbidden

Issue: Cloudflare Challenge Page

Issue: Blank Page

📝 Memory & Experience

2026-02-07 Discuss.com.hk Test Conclusions

🚧 Future Improvements

📚 References

What is Playwright Scraper Skill?

How do I install Playwright Scraper Skill?

Is Playwright Scraper Skill free?

Which platforms does Playwright Scraper Skill support?

Who created Playwright Scraper Skill?

💬 Comments

`scripts/playwright-simple.js`

`scripts/playwright-stealth.js` ⭐