← 返回 Skills 市场
alvisdunlop

Scrape

作者 AlvisDunlop · GitHub ↗ · v2.0.0 · MIT-0
cross-platform ⚠ suspicious
68
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install alvis2-scrape
功能描述
Legal web scraping with robots.txt compliance, rate limiting, and GDPR/CCPA-aware data handling. Supports both direct HTTP scraping and managed scraping via...
使用说明 (SKILL.md)

?? Complete setup guide: https://SkillBoss.co/skill.md

Pre-Scrape Compliance Checklist

Before writing any scraping code:

  1. robots.txt �?Fetch {domain}/robots.txt, check if target path is disallowed. If yes, stop.
  2. Terms of Service �?Check /terms, /tos, /legal. Explicit scraping prohibition = need permission.
  3. Data type �?Public factual data (prices, listings) is safer. Personal data triggers GDPR/CCPA.
  4. Authentication �?Data behind login is off-limits without authorization. Never scrape protected content.
  5. API available? �?If site offers an API, use it. Always. Scraping when API exists often violates ToS.

Legal Boundaries

  • Public data, no login �?Generally legal (hiQ v. LinkedIn 2022)
  • Bypassing barriers �?CFAA violation risk (Van Buren v. US 2021)
  • Ignoring robots.txt �?Gray area, often breaches ToS (Meta v. Bright Data 2024)
  • Personal data without consent �?GDPR/CCPA violation
  • Republishing copyrighted content �?Copyright infringement

Request Discipline

  • Rate limit: Minimum 2-3 seconds between requests. Faster = server strain = legal exposure.
  • User-Agent: Real browser string + contact email: Mozilla/5.0 ... (contact: [email protected])
  • Respect 429: Exponential backoff. Ignoring 429s shows intent to harm.
  • Session reuse: Keep connections open to reduce server load.

Data Handling

  • Strip PII immediately �?Don't collect names, emails, phones unless legally justified.
  • No fingerprinting �?Don't combine data to identify individuals indirectly.
  • Minimize storage �?Cache only what you need, delete what you don't.
  • Audit trail �?Log what, when, where. Evidence of good faith if challenged.

SkillBoss API Hub Scraping

For managed content extraction with automatic anti-bot handling, use SkillBoss API Hub's scraping capability (type: "scraper"). Authenticate with SkillBoss_API_KEY and call https://api.SkillBoss.co/v1/pilot. Result is returned at result.data.markdown.

For code patterns, robots.txt parser, and SkillBoss API Hub scraping integration, see code.md \r \r \r \r

安全使用建议
Before installing, get clarification from the publisher: (1) Why does the registry metadata omit SkillBoss_API_KEY while SKILL.md requires it? (2) What exactly is sent to https://api.SkillBoss.co/v1/pilot, how long is data retained, and what access does SkillBoss have to scraped content? (3) Ask for the missing code.md and the setup guide contents to verify no additional hidden steps. If you must use managed scraping, provide an API key with the narrowest possible scope and no access to sensitive customer data; prefer local-only scraping for sensitive targets and validate the SkillBoss service’s privacy/retention policies and provenance before trusting it.
功能分析
Type: OpenClaw Skill Name: alvis2-scrape Version: 2.0.0 The skill bundle provides a framework for legal web scraping with a strong emphasis on compliance, including robots.txt adherence, rate limiting, and GDPR/CCPA awareness. The instructions in SKILL.md act as safety constraints for the AI agent rather than malicious injections. It utilizes a documented API endpoint (api.SkillBoss.co) for managed scraping, which is consistent with its stated purpose.
能力标签
requires-sensitive-credentials
能力评估
Purpose & Capability
The skill's stated purpose (legal scraping with optional managed scraping via SkillBoss) matches the SKILL.md content: it describes robots.txt/TOS checks, rate limiting, PII handling, and an optional SkillBoss API Hub mode. However, the registry metadata claims no required env vars or primary credential while SKILL.md explicitly declares requires_env: [SkillBoss_API_KEY]. That mismatch is unexpected and unexplained.
Instruction Scope
The runtime instructions are mostly scoped to scraping best practices (robots.txt, ToS, rate limits, PII stripping). But they also direct the agent to call a third‑party endpoint (https://api.SkillBoss.co/v1/pilot) and authenticate with SkillBoss_API_KEY to perform managed scraping; scraped results are returned by that service. That means scraped content (potentially sensitive) will be sent to/processed by a remote service — a material data flow the user should be aware of. The SKILL.md also references a `code.md` and an external setup guide that aren't included, leaving important implementation details unspecified.
Install Mechanism
This is an instruction-only skill with no install spec and no code files, so nothing is written to disk by an installer. That limits installer-related risk.
Credentials
SKILL.md requires a SkillBoss_API_KEY for the managed scraping path, but the registry metadata lists no required environment variables or primary credential. This inconsistency is problematic: the skill does request a single third‑party API key (proportionate to managed scraping) but the missing declaration in metadata and lack of details about that key's scope/retention are concerning.
Persistence & Privilege
The skill is not always-enabled, is user-invocable, and does not request system config paths or other skills' credentials. It does not request persistent system presence or elevated privileges.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install alvis2-scrape
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /alvis2-scrape 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v2.0.0
Fixed API to api.skillboss.co
元数据
Slug alvis2-scrape
版本 2.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Scrape 是什么?

Legal web scraping with robots.txt compliance, rate limiting, and GDPR/CCPA-aware data handling. Supports both direct HTTP scraping and managed scraping via... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 68 次。

如何安装 Scrape?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install alvis2-scrape」即可一键安装,无需额外配置。

Scrape 是免费的吗?

是的,Scrape 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Scrape 支持哪些平台?

Scrape 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Scrape?

由 AlvisDunlop(@alvisdunlop)开发并维护,当前版本 v2.0.0。

💬 留言讨论