← 返回 Skills 市场
1
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install browser-automation-puppeteer
功能描述
Web scraping and browser automation using Puppeteer. Use when the user wants to extract data from websites, crawl pages, scrape dynamic content rendered by J...
使用说明 (SKILL.md)
Browser Automation
Web scraping and browser automation powered by Puppeteer.
When to Use
✅ USE this skill when:
- "Scrape data from [URL]"
- "Extract all [products/listings/items] from [website]"
- "Take a screenshot of [page]"
- "Crawl [website] and collect [info]"
- "Fill and submit [form]"
- Any JavaScript-rendered content that won't load without a browser
❌ DON'T use this skill when:
- Simple static pages → use
web_fetchinstead - APIs available → fetch API directly
- Rate-limited sites → respect robots.txt
Quick Start
# Install Puppeteer
npm install puppeteer
# Basic scraping
node scripts/scrape.js https://example.com
Core Patterns
Launch Browser
const puppeteer = require('puppeteer');
async function scrape(url) {
const browser = await puppeteer.launch({
headless: 'new',
args: ['--no-sandbox', '--disable-setuid-sandbox']
});
const page = await browser.newPage();
await page.goto(url, { waitUntil: 'networkidle2' });
// ... extract data ...
await browser.close();
}
Extract Text Content
// Get all text from a selector
const titles = await page.$$eval('h2', els => els.map(el => el.textContent.trim()));
// Get text from single element
const price = await page.$eval('.price', el => el.textContent.trim());
Extract HTML
const html = await page.$eval('.product-list', el => el.innerHTML);
Extract Attributes
const links = await page.$$eval('a', els => els.map(el => ({
text: el.textContent.trim(),
href: el.getAttribute('href')
})));
Wait for Content
// Wait for selector
await page.waitForSelector('.results', { timeout: 10000 });
// Wait for network idle
await page.goto(url, { waitUntil: 'networkidle2' });
// Wait for function
await page.waitForFunction(() => document.querySelectorAll('.item').length > 10);
Pagination
async function scrapeWithPagination(baseUrl, maxPages = 5) {
const browser = await puppeteer.launch({ headless: 'new' });
const page = await browser.newPage();
let results = [];
for (let i = 1; i \x3C= maxPages; i++) {
const url = `${baseUrl}?page=${i}`;
await page.goto(url, { waitUntil: 'networkidle2' });
const items = await page.$$eval('.item', els =>
els.map(el => el.textContent.trim())
);
if (items.length === 0) break;
results.push(...items);
}
await browser.close();
return results;
}
Screenshots
// Full page screenshot
await page.screenshot({ path: 'screenshot.png', fullPage: true });
// Element screenshot
const element = await page.$('.chart');
await element.screenshot({ path: 'chart.png' });
Block Resources (Speed Up)
await page.setRequestInterception(true);
page.on('request', req => {
if (['image', 'stylesheet', 'font'].includes(req.resourceType())) {
req.abort();
} else {
req.continue();
}
});
Scripts
scrape.js — Basic Scraping
// Usage: node scripts/scrape.js \x3Curl> [selector]
const puppeteer = require('puppeteer');
const url = process.argv[2];
const selector = process.argv[3] || 'body';
if (!url) {
console.error('Usage: node scrape.js \x3Curl> [selector]');
process.exit(1);
}
(async () => {
const browser = await puppeteer.launch({ headless: 'new' });
const page = await browser.newPage();
await page.goto(url, { waitUntil: 'networkidle2' });
const content = await page.$$eval(selector, els =>
els.map(el => el.textContent.trim())
);
console.log(JSON.stringify(content, null, 2));
await browser.close();
})();
screenshot.js — Page Screenshots
// Usage: node scripts/screenshot.js \x3Curl> [output.png]
const puppeteer = require('puppeteer');
const url = process.argv[2];
const output = process.argv[3] || 'screenshot.png';
if (!url) {
console.error('Usage: node screenshot.js \x3Curl> [output.png]');
process.exit(1);
}
(async () => {
const browser = await puppeteer.launch({ headless: 'new' });
const page = await browser.newPage();
await page.goto(url, { waitUntil: 'networkidle2' });
await page.screenshot({ path: output, fullPage: true });
console.log(`Screenshot saved to ${output}`);
await browser.close();
})();
crawl.js — Multi-Page Crawler
// Usage: node crawl.js \x3Curl> \x3Cselector> [maxPages]
const puppeteer = require('puppeteer');
const url = process.argv[2];
const selector = process.argv[3];
const maxPages = parseInt(process.argv[4]) || 10;
if (!url || !selector) {
console.error('Usage: node crawl.js \x3Curl> \x3Cselector> [maxPages]');
process.exit(1);
}
(async () => {
const browser = await puppeteer.launch({ headless: 'new' });
const page = await browser.newPage();
let allData = [];
for (let i = 1; i \x3C= maxPages; i++) {
const pageUrl = url.includes('?') ? `${url}&page=${i}` : `${url}?page=${i}`;
console.error(`Crawling: ${pageUrl}`);
await page.goto(pageUrl, { waitUntil: 'networkidle2' });
const data = await page.$$eval(selector, els =>
els.map(el => el.textContent.trim())
);
if (data.length === 0) break;
allData.push(...data);
}
console.log(JSON.stringify(allData, null, 2));
await browser.close();
})();
Common Selectors
| Target | Selector |
|---|---|
| All links | a |
| All images | img |
| Headings | h1, h2, h3 |
| Lists | ul li, ol li |
| Tables | table tr |
| Cards/Items | .item, .card, .product |
| Prices | .price, [class*="price"] |
| Descriptions | .description, .summary |
Tips
- Check robots.txt before scraping:
curl example.com/robots.txt - Add delays between requests to avoid bans:
await new Promise(r => setTimeout(r, 2000)) - Use
networkidle2for SPAs (Single Page Apps) - Debug with screenshots when selectors fail
- Set user agent for sites that block bots:
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36');
Reference
For detailed Puppeteer API, see puppeteer/docs/api.md.
能力标签
如何使用
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install browser-automation-puppeteer - 安装完成后,直接呼叫该 Skill 的名称或使用
/browser-automation-puppeteer触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
- Initial release of browser-automation skill powered by Puppeteer.
- Enables web scraping, crawling, form automation, and screenshots for JavaScript-rendered and dynamic content.
- Includes usage guidelines, common code patterns, and ready-to-use scripts for scraping, screenshots, and crawling.
- Provides selector reference and tips for effective, responsible browser automation.
元数据
常见问题
Browser Automation 是什么?
Web scraping and browser automation using Puppeteer. Use when the user wants to extract data from websites, crawl pages, scrape dynamic content rendered by J... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 1 次。
如何安装 Browser Automation?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install browser-automation-puppeteer」即可一键安装,无需额外配置。
Browser Automation 是免费的吗?
是的,Browser Automation 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Browser Automation 支持哪些平台?
Browser Automation 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Browser Automation?
由 fasjdas(@fasjdas)开发并维护,当前版本 v1.0.0。
推荐 Skills