← 返回 Skills 市场
wirelessjoe

Browserbase Scraper

作者 Joe Alicata · GitHub ↗ · v0.1.0 · MIT-0
cross-platform ⚠ suspicious
627
总下载
0
收藏
1
当前安装
1
版本数
在 OpenClaw 中安装
/install browserbase-scraper-skill
功能描述
Scrape Cloudflare-protected websites using Stagehand + Browserbase cloud browsers. Use when the user needs to extract data from websites with bot protection,...
使用说明 (SKILL.md)

Browserbase Scraper

Bypass Cloudflare and bot protection using Stagehand + Browserbase cloud browsers with AI-powered extraction.

When to Use

  • Website blocks curl/fetch with Cloudflare "Just a moment..." page
  • Playwright headless gets detected and blocked
  • Need structured data extraction from dynamic content
  • Scraping auction sites, marketplaces, or other protected pages

Prerequisites

npm install @browserbasehq/stagehand zod

Required environment variables:

  • BROWSERBASE_API_KEY — from browserbase.com dashboard
  • BROWSERBASE_PROJECT_ID — from browserbase.com
  • GOOGLE_GENERATIVE_AI_API_KEY — for Gemini extraction (or use OpenAI)

Quick Start

import { Stagehand } from '@browserbasehq/stagehand';

const stagehand = new Stagehand({
  env: 'BROWSERBASE',
  apiKey: process.env.BROWSERBASE_API_KEY,
  projectId: process.env.BROWSERBASE_PROJECT_ID,
  model: {
    modelName: 'google/gemini-3-flash-preview',
    apiKey: process.env.GOOGLE_GENERATIVE_AI_API_KEY,
  },
});

await stagehand.init();
const page = stagehand.context.pages()[0];

// Navigate (Cloudflare bypass is automatic)
await page.goto('https://protected-site.com/search?q=term');
await page.waitForTimeout(5000); // Let page fully load

// AI-powered extraction (instruction-only works best)
const data = await stagehand.extract(`
  Extract all product listings as JSON array:
  [{ "title": "...", "price": 123, "url": "..." }]
  Return ONLY the JSON array.
`);

await stagehand.close();

Key Patterns

1. Instruction-Only Extraction (Recommended)

Schema-based extraction often returns empty. Use natural language instructions instead:

const extraction = await stagehand.extract(`
  Look at this page and extract:
  - All item titles
  - Prices as numbers
  - URLs
  Return as JSON array.
`);

2. Handle Cloudflare Delays

Sometimes the challenge takes longer:

const title = await page.title();
if (title.toLowerCase().includes('moment')) {
  await page.waitForTimeout(10000); // Wait for challenge
}

3. Scroll to Load More

Many sites lazy-load content:

for (let i = 0; i \x3C 5; i++) {
  await page.evaluate(() => window.scrollBy(0, window.innerHeight));
  await page.waitForTimeout(800);
}

4. Parse Extraction Results

The extraction returns a string that needs parsing:

let listings = [];
try {
  const jsonMatch = extraction?.extraction?.match(/\[[\s\S]*\]/);
  if (jsonMatch) listings = JSON.parse(jsonMatch[0]);
} catch (e) {
  console.log('Parse error:', e.message);
}

Browserbase Free Tier Limits

  • 1 concurrent session — cron jobs can conflict with interactive use
  • Sessions auto-close after inactivity
  • Use stagehand.close() to release session immediately

Cron Integration

For scheduled scraping, use OpenClaw cron with isolated sessions:

openclaw cron add \
  --name "Daily Scrape" \
  --cron "0 6 * * *" \
  --session isolated \
  --message "Run: node ~/scripts/scraper.js"

Troubleshooting

Issue Solution
Empty extraction Use instruction-only (no schema), increase wait time
Cloudflare loop Wait 10-15s, check if title contains "moment"
Session limit Close other Browserbase sessions, check dashboard
429 errors Wait for session to complete, don't retry immediately

Example: Full Scraper

See scripts/example_scraper.js for a complete working example.

安全使用建议
Do not install blindly. Key points to consider before using: (1) The SKILL.md requires BROWSERBASE_API_KEY, BROWSERBASE_PROJECT_ID and GOOGLE_GENERATIVE_AI_API_KEY but the registry metadata lists none — ask the publisher to correct the metadata so required credentials are visible. (2) Use scoped or disposable API keys (test account) rather than production credentials. (3) Confirm the source/owner (the skill has no homepage and unknown source); absence of code files means you rely entirely on instructions — request the example scripts referenced (scripts/example_scraper.js) before running. (4) Be aware scraping Cloudflare-protected sites can violate terms of service or laws; ensure you have permission. (5) Run initial tests in an isolated environment and rotate keys if you expose them during testing. If the publisher responds and metadata is fixed (or example scripts are provided), this looks coherent; until then treat it cautiously.
功能分析
Type: OpenClaw Skill Name: browserbase-scraper-skill Version: 0.1.0 The skill is a legitimate tool for scraping Cloudflare-protected websites using the Browserbase Stagehand library and Google Gemini AI. The instructions in SKILL.md provide standard implementation patterns for these services and do not contain any evidence of data exfiltration, malicious execution, or prompt injection.
能力评估
Purpose & Capability
The SKILL.md describes scraping Cloudflare-protected sites with Browserbase/Stagehand and optionally Gemini — those env vars (BROWSERBASE_API_KEY, BROWSERBASE_PROJECT_ID, GOOGLE_GENERATIVE_AI_API_KEY) are coherent with the purpose. However, the registry metadata claims no required env vars or primary credential, which is inconsistent with the instructions and could mislead users about what secrets are needed.
Instruction Scope
The instructions stay within scraping/scraper operation (npm install, Stagehand init, page navigation, waiting, scrolling, extracting and parsing). They do not request unrelated system data. Minor issues: the docs reference a local file (scripts/example_scraper.js) that is not present in the package, and the SKILL.md suggests using 'OpenClaw cron' without providing the example script — this leaves gaps a user would need to fill.
Install Mechanism
This is instruction-only (no install spec) and recommends installing @browserbasehq/stagehand via npm. That is a proportionate, standard install recommendation for the described functionality; nothing in the SKILL.md instructs downloading arbitrary executables or third-party archives.
Credentials
The SKILL.md requires two Browserbase credentials and an LLM API key — reasonable for a cloud-browser + AI extraction flow — but the published registry metadata declares no required environment variables or primary credential. The omission is a material mismatch: users may not realize they must provide API keys. Also the skill example uses process.env directly; verify you will supply only scoped/test keys and not high-privilege production credentials.
Persistence & Privilege
The skill is not always-enabled and does not request system config paths or persistent privileges. There are no install hooks or indications it will modify other skills or system-wide settings.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install browserbase-scraper-skill
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /browserbase-scraper-skill 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.1.0
- Initial release of browserbase-scraper skill. - Enables scraping of Cloudflare-protected and JavaScript-heavy websites using Stagehand and Browserbase cloud browsers. - Supports AI-powered data extraction via Gemini models. - Bypasses bot protection automatically, with instructions for handling dynamic content and Cloudflare delays. - Provides setup, usage examples, and troubleshooting tips for structured data scraping.
元数据
Slug browserbase-scraper-skill
版本 0.1.0
许可证 MIT-0
累计安装 1
当前安装数 1
历史版本数 1
常见问题

Browserbase Scraper 是什么?

Scrape Cloudflare-protected websites using Stagehand + Browserbase cloud browsers. Use when the user needs to extract data from websites with bot protection,... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 627 次。

如何安装 Browserbase Scraper?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install browserbase-scraper-skill」即可一键安装,无需额外配置。

Browserbase Scraper 是免费的吗?

是的,Browserbase Scraper 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Browserbase Scraper 支持哪些平台?

Browserbase Scraper 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Browserbase Scraper?

由 Joe Alicata(@wirelessjoe)开发并维护,当前版本 v0.1.0。

💬 留言讨论