Browserbase Scraper
/install browserbase-scraper-skill
Browserbase Scraper
Bypass Cloudflare and bot protection using Stagehand + Browserbase cloud browsers with AI-powered extraction.
When to Use
- Website blocks curl/fetch with Cloudflare "Just a moment..." page
- Playwright headless gets detected and blocked
- Need structured data extraction from dynamic content
- Scraping auction sites, marketplaces, or other protected pages
Prerequisites
npm install @browserbasehq/stagehand zod
Required environment variables:
BROWSERBASE_API_KEY— from browserbase.com dashboardBROWSERBASE_PROJECT_ID— from browserbase.comGOOGLE_GENERATIVE_AI_API_KEY— for Gemini extraction (or use OpenAI)
Quick Start
import { Stagehand } from '@browserbasehq/stagehand';
const stagehand = new Stagehand({
env: 'BROWSERBASE',
apiKey: process.env.BROWSERBASE_API_KEY,
projectId: process.env.BROWSERBASE_PROJECT_ID,
model: {
modelName: 'google/gemini-3-flash-preview',
apiKey: process.env.GOOGLE_GENERATIVE_AI_API_KEY,
},
});
await stagehand.init();
const page = stagehand.context.pages()[0];
// Navigate (Cloudflare bypass is automatic)
await page.goto('https://protected-site.com/search?q=term');
await page.waitForTimeout(5000); // Let page fully load
// AI-powered extraction (instruction-only works best)
const data = await stagehand.extract(`
Extract all product listings as JSON array:
[{ "title": "...", "price": 123, "url": "..." }]
Return ONLY the JSON array.
`);
await stagehand.close();
Key Patterns
1. Instruction-Only Extraction (Recommended)
Schema-based extraction often returns empty. Use natural language instructions instead:
const extraction = await stagehand.extract(`
Look at this page and extract:
- All item titles
- Prices as numbers
- URLs
Return as JSON array.
`);
2. Handle Cloudflare Delays
Sometimes the challenge takes longer:
const title = await page.title();
if (title.toLowerCase().includes('moment')) {
await page.waitForTimeout(10000); // Wait for challenge
}
3. Scroll to Load More
Many sites lazy-load content:
for (let i = 0; i \x3C 5; i++) {
await page.evaluate(() => window.scrollBy(0, window.innerHeight));
await page.waitForTimeout(800);
}
4. Parse Extraction Results
The extraction returns a string that needs parsing:
let listings = [];
try {
const jsonMatch = extraction?.extraction?.match(/\[[\s\S]*\]/);
if (jsonMatch) listings = JSON.parse(jsonMatch[0]);
} catch (e) {
console.log('Parse error:', e.message);
}
Browserbase Free Tier Limits
- 1 concurrent session — cron jobs can conflict with interactive use
- Sessions auto-close after inactivity
- Use
stagehand.close()to release session immediately
Cron Integration
For scheduled scraping, use OpenClaw cron with isolated sessions:
openclaw cron add \
--name "Daily Scrape" \
--cron "0 6 * * *" \
--session isolated \
--message "Run: node ~/scripts/scraper.js"
Troubleshooting
| Issue | Solution |
|---|---|
| Empty extraction | Use instruction-only (no schema), increase wait time |
| Cloudflare loop | Wait 10-15s, check if title contains "moment" |
| Session limit | Close other Browserbase sessions, check dashboard |
| 429 errors | Wait for session to complete, don't retry immediately |
Example: Full Scraper
See scripts/example_scraper.js for a complete working example.
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install browserbase-scraper-skill - 安装完成后,直接呼叫该 Skill 的名称或使用
/browserbase-scraper-skill触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Browserbase Scraper 是什么?
Scrape Cloudflare-protected websites using Stagehand + Browserbase cloud browsers. Use when the user needs to extract data from websites with bot protection,... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 627 次。
如何安装 Browserbase Scraper?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install browserbase-scraper-skill」即可一键安装,无需额外配置。
Browserbase Scraper 是免费的吗?
是的,Browserbase Scraper 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Browserbase Scraper 支持哪些平台?
Browserbase Scraper 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Browserbase Scraper?
由 Joe Alicata(@wirelessjoe)开发并维护,当前版本 v0.1.0。