← 返回 Skills 市场
alvisdunlop

Scrape

作者 AlvisDunlop · GitHub ↗ · v1.1.0 · MIT-0
cross-platform ⚠ suspicious
62
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install alvisdunlop-scrape
功能描述
Legal web scraping with robots.txt compliance, rate limiting, and GDPR/CCPA-aware data handling. Supports both direct HTTP scraping and managed scraping via...
使用说明 (SKILL.md)

?? Complete setup guide: https://SkillBoss.co/skill.md

Pre-Scrape Compliance Checklist

Before writing any scraping code:

  1. robots.txt �?Fetch {domain}/robots.txt, check if target path is disallowed. If yes, stop.
  2. Terms of Service �?Check /terms, /tos, /legal. Explicit scraping prohibition = need permission.
  3. Data type �?Public factual data (prices, listings) is safer. Personal data triggers GDPR/CCPA.
  4. Authentication �?Data behind login is off-limits without authorization. Never scrape protected content.
  5. API available? �?If site offers an API, use it. Always. Scraping when API exists often violates ToS.

Legal Boundaries

  • Public data, no login �?Generally legal (hiQ v. LinkedIn 2022)
  • Bypassing barriers �?CFAA violation risk (Van Buren v. US 2021)
  • Ignoring robots.txt �?Gray area, often breaches ToS (Meta v. Bright Data 2024)
  • Personal data without consent �?GDPR/CCPA violation
  • Republishing copyrighted content �?Copyright infringement

Request Discipline

  • Rate limit: Minimum 2-3 seconds between requests. Faster = server strain = legal exposure.
  • User-Agent: Real browser string + contact email: Mozilla/5.0 ... (contact: [email protected])
  • Respect 429: Exponential backoff. Ignoring 429s shows intent to harm.
  • Session reuse: Keep connections open to reduce server load.

Data Handling

  • Strip PII immediately �?Don't collect names, emails, phones unless legally justified.
  • No fingerprinting �?Don't combine data to identify individuals indirectly.
  • Minimize storage �?Cache only what you need, delete what you don't.
  • Audit trail �?Log what, when, where. Evidence of good faith if challenged.

SkillBoss API Hub Scraping

For managed content extraction with automatic anti-bot handling, use SkillBoss API Hub's scraping capability (type: "scraper"). Authenticate with SkillBoss_API_KEY and call https://api.SkillBoss.co/v1/pilot. Result is returned at result.data.markdown.

For code patterns, robots.txt parser, and SkillBoss API Hub scraping integration, see code.md \r \r \r \r

安全使用建议
This skill appears to be what it claims (a scraper) but exercise caution before installing. Key points to consider: (1) The SKILL.md requires a SkillBoss_API_KEY and instructs sending scraping jobs/results to https://api.SkillBoss.co — verify you trust that third party and understand their data retention/privacy practices before supplying a key. (2) The registry metadata does not list any required env vars while the runtime instructions do — ask the publisher to fix the manifest so required credentials are explicit. (3) If you will scrape any site with potentially sensitive content, avoid using the managed mode (or test in a sandbox) because it transmits scraped content off-host. (4) Confirm you have permission to scrape your targets and follow the checklist in SKILL.md; do not use this skill to access protected or personal data without authorization. If the publisher cannot explain the manifest mismatch and provide a privacy/security policy for SkillBoss, treat the skill as higher risk and do not provide credentials.
能力标签
requires-sensitive-credentials
能力评估
Purpose & Capability
The name/description (legal web scraping with optional managed scraping via SkillBoss API Hub) aligns with the instructions to respect robots.txt, rates, and privacy. However, SKILL.md declares requires_env: [SkillBoss_API_KEY] while the registry metadata lists no required environment variables — an inconsistency. Requesting an API key is reasonable for a managed-scrape mode, but it should be declared in the skill manifest.
Instruction Scope
Runtime instructions stay within the scraping domain (robots.txt, ToS checks, rate limiting, PII handling). They instruct using SkillBoss API Hub for managed scraping (POST to https://api.SkillBoss.co/v1/pilot and reading result.data.markdown). That external call is within the described capability but creates a potential data exfiltration vector: scraped content (possibly including sensitive data) would be sent to a third party. SKILL.md otherwise does not instruct reading unrelated system files or env vars.
Install Mechanism
This is an instruction-only skill with no install spec and no code files — lowest install risk. There is nothing written to disk by the skill itself.
Credentials
SKILL.md requires SkillBoss_API_KEY for managed scraping, which is proportionate if you use the SkillBoss service. However, the registry metadata fails to declare this required environment variable (primaryEnv is none). That mismatch is problematic: the skill will depend on a secret that the manifest does not advertise. No other unusual credentials are requested.
Persistence & Privilege
The skill does not request always:true and asks for no config path access. Model autonomous invocation is allowed (default) but not combined with other high privileges. No persistence or system-wide changes are indicated.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install alvisdunlop-scrape
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /alvisdunlop-scrape 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.1.0
Scrape 1.1.0 – Enhanced legal and compliant scraping - Added comprehensive setup and compliance checklist before scraping. - Expanded legal guidance on scraping boundaries, GDPR/CCPA, robots.txt, and ToS adherence. - Detailed best practices for rate limiting, user-agent headers, and respectful request handling. - Introduced data handling instructions: PII removal, minimal storage, and logging/audit trails. - Added documentation for managed scraping via SkillBoss API Hub with key usage instructions.
元数据
Slug alvisdunlop-scrape
版本 1.1.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Scrape 是什么?

Legal web scraping with robots.txt compliance, rate limiting, and GDPR/CCPA-aware data handling. Supports both direct HTTP scraping and managed scraping via... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 62 次。

如何安装 Scrape?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install alvisdunlop-scrape」即可一键安装,无需额外配置。

Scrape 是免费的吗?

是的,Scrape 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Scrape 支持哪些平台?

Scrape 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Scrape?

由 AlvisDunlop(@alvisdunlop)开发并维护,当前版本 v1.1.0。

💬 留言讨论