← 返回 Skills 市场
kirkraman

scrape

作者 KirkRaman · GitHub ↗ · v1.0.1 · MIT-0
cross-platform ⚠ suspicious
128
总下载
0
收藏
0
当前安装
2
版本数
在 OpenClaw 中安装
/install jx-scrape
功能描述
Legal web scraping with robots.txt compliance, rate limiting, and GDPR/CCPA-aware data handling. Supports both direct HTTP scraping and managed scraping via...
使用说明 (SKILL.md)

Pre-Scrape Compliance Checklist

Before writing any scraping code:

  1. robots.txt — Fetch {domain}/robots.txt, check if target path is disallowed. If yes, stop.
  2. Terms of Service — Check /terms, /tos, /legal. Explicit scraping prohibition = need permission.
  3. Data type — Public factual data (prices, listings) is safer. Personal data triggers GDPR/CCPA.
  4. Authentication — Data behind login is off-limits without authorization. Never scrape protected content.
  5. API available? — If site offers an API, use it. Always. Scraping when API exists often violates ToS.

Legal Boundaries

  • Public data, no login — Generally legal (hiQ v. LinkedIn 2022)
  • Bypassing barriers — CFAA violation risk (Van Buren v. US 2021)
  • Ignoring robots.txt — Gray area, often breaches ToS (Meta v. Bright Data 2024)
  • Personal data without consent — GDPR/CCPA violation
  • Republishing copyrighted content — Copyright infringement

Request Discipline

  • Rate limit: Minimum 2-3 seconds between requests. Faster = server strain = legal exposure.
  • User-Agent: Real browser string + contact email: Mozilla/5.0 ... (contact: [email protected])
  • Respect 429: Exponential backoff. Ignoring 429s shows intent to harm.
  • Session reuse: Keep connections open to reduce server load.

Data Handling

  • Strip PII immediately — Don't collect names, emails, phones unless legally justified.
  • No fingerprinting — Don't combine data to identify individuals indirectly.
  • Minimize storage — Cache only what you need, delete what you don't.
  • Audit trail — Log what, when, where. Evidence of good faith if challenged.

SkillBoss API Hub Scraping

For managed content extraction with automatic anti-bot handling, use SkillBoss API Hub's scraping capability (type: "scraper"). Authenticate with SKILLBOSS_API_KEY and call https://api.skillbossai.com/v1/pilot. Result is returned at result.data.markdown.

For code patterns, robots.txt parser, and SkillBoss API Hub scraping integration, see code.md

安全使用建议
Before installing: (1) Ask the publisher to correct the registry metadata to declare SKILLBOSS_API_KEY if the managed-scraping path is required, or explicitly state that the API key is optional. (2) Verify the SkillBoss API host (https://api.skillbossai.com) and the operator's trustworthiness — the API key grants that external service access to any scraped content you send. (3) If you must use this skill, supply a scoped, revocable API key and avoid sending raw PII to the external API; prefer local/direct scraping code when dealing with sensitive data. (4) Test the skill in a sandboxed environment and review logs to ensure no unexpected endpoints receive scraped data. (5) If the publisher cannot clarify the env/metadata mismatch or the SkillBoss operator, treat the skill with caution or avoid installing.
功能分析
Type: OpenClaw Skill Name: jx-scrape Version: 1.0.1 The jx-scrape skill bundle provides a well-documented framework for ethical web scraping, emphasizing robots.txt compliance, rate limiting, and GDPR/CCPA awareness. The code in code.md implements standard best practices such as exponential backoff, jitter, and session management, while integrating with the SkillBoss API Hub (api.skillbossai.com) for managed extraction. No malicious patterns, data exfiltration, or prompt-injection risks were identified.
能力标签
requires-sensitive-credentials
能力评估
Purpose & Capability
Name, description, and runtime instructions match: the skill provides polite/legal scraping patterns and an optional managed-scraping path via a SkillBoss API. However the registry metadata lists no required env vars while SKILL.md and code.md explicitly require SKILLBOSS_API_KEY — an inconsistency that should be resolved (either the env should be declared or the docs updated to make the API optional).
Instruction Scope
SKILL.md and code.md limit actions to fetching robots.txt, site pages, ToS, and using an external API; they prescribe rate limits, PII-stripping, backoff, and audit logging. There is no instruction to read local files, other env vars, or system state outside the declared scraping flow. The only external network targets are site domains being scraped and api.skillbossai.com (for managed scraping).
Install Mechanism
Instruction-only skill with no install spec and no binaries to fetch or write to disk. Lowest install risk.
Credentials
SKILL.md and code.md require SKILLBOSS_API_KEY (os.environ usage and HTTP Authorization header to api.skillbossai.com), but the registry metadata lists no required env vars — this mismatch is a red flag. Requesting a single API key for an external scraping service is plausible, but the missing declaration and the unknown provenance of SkillBoss increase risk: that key would permit external API calls and potential data exfiltration to that service.
Persistence & Privilege
The skill does not request persistent presence or elevated platform privileges (always:false). It does not modify other skills or system settings in the provided instructions.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install jx-scrape
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /jx-scrape 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.1
- Documentation update only: code.md file was changed in this release. - No changes were made to the SKILL.md or functional code.
v1.0.0
Scrape 1.0.0 — Initial release - Enables legal web scraping with strict robots.txt compliance and rate limiting. - Incorporates GDPR/CCPA-aware data handling, protecting personal information. - Supports both direct HTTP scraping and managed scraping via SkillBoss API Hub. - Provides a step-by-step pre-scrape compliance checklist and legal boundaries. - Guides for responsible request handling and best-practice data management.
元数据
Slug jx-scrape
版本 1.0.1
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 2
常见问题

scrape 是什么?

Legal web scraping with robots.txt compliance, rate limiting, and GDPR/CCPA-aware data handling. Supports both direct HTTP scraping and managed scraping via... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 128 次。

如何安装 scrape?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install jx-scrape」即可一键安装,无需额外配置。

scrape 是免费的吗?

是的,scrape 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

scrape 支持哪些平台?

scrape 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 scrape?

由 KirkRaman(@kirkraman)开发并维护,当前版本 v1.0.1。

💬 留言讨论