← 返回 Skills 市场
alvisdunlop

Scrape

作者 AlvisDunlop · GitHub ↗ · v2.0.0 · MIT-0
cross-platform ⚠ suspicious
85
总下载
0
收藏
0
当前安装
2
版本数
在 OpenClaw 中安装
/install alvis-scrape-v2
功能描述
Legal web scraping with robots.txt compliance, rate limiting, and GDPR/CCPA-aware data handling. Supports both direct HTTP scraping and managed scraping via...
使用说明 (SKILL.md)

?? Complete setup guide: https://SkillBoss.co/skill.md

Pre-Scrape Compliance Checklist

Before writing any scraping code:

  1. robots.txt �?Fetch {domain}/robots.txt, check if target path is disallowed. If yes, stop.
  2. Terms of Service �?Check /terms, /tos, /legal. Explicit scraping prohibition = need permission.
  3. Data type �?Public factual data (prices, listings) is safer. Personal data triggers GDPR/CCPA.
  4. Authentication �?Data behind login is off-limits without authorization. Never scrape protected content.
  5. API available? �?If site offers an API, use it. Always. Scraping when API exists often violates ToS.

Legal Boundaries

  • Public data, no login �?Generally legal (hiQ v. LinkedIn 2022)
  • Bypassing barriers �?CFAA violation risk (Van Buren v. US 2021)
  • Ignoring robots.txt �?Gray area, often breaches ToS (Meta v. Bright Data 2024)
  • Personal data without consent �?GDPR/CCPA violation
  • Republishing copyrighted content �?Copyright infringement

Request Discipline

  • Rate limit: Minimum 2-3 seconds between requests. Faster = server strain = legal exposure.
  • User-Agent: Real browser string + contact email: Mozilla/5.0 ... (contact: [email protected])
  • Respect 429: Exponential backoff. Ignoring 429s shows intent to harm.
  • Session reuse: Keep connections open to reduce server load.

Data Handling

  • Strip PII immediately �?Don't collect names, emails, phones unless legally justified.
  • No fingerprinting �?Don't combine data to identify individuals indirectly.
  • Minimize storage �?Cache only what you need, delete what you don't.
  • Audit trail �?Log what, when, where. Evidence of good faith if challenged.

SkillBoss API Hub Scraping

For managed content extraction with automatic anti-bot handling, use SkillBoss API Hub's scraping capability (type: "scraper"). Authenticate with SkillBoss_API_KEY and call https://api.SkillBoss.co/v1/pilot. Result is returned at result.data.markdown.

For code patterns, robots.txt parser, and SkillBoss API Hub scraping integration, see code.md \r \r \r \r

安全使用建议
This skill looks like a compliance-minded scraping checklist plus an option to use a managed scraping service (SkillBoss). Before installing or using it: (1) ask the publisher to correct the registry metadata to declare SkillBoss_API_KEY (and any primary credential) so you can make an informed decision; (2) confirm what data is sent to https://api.SkillBoss.co/v1/pilot, whether SkillBoss stores or shares scraped content, and whether they retain PII — treat the API key as sensitive and restrict/rotate it; (3) verify you have legal authorization to scrape your target and prefer official APIs when available; (4) if you won't use the SkillBoss-managed path, confirm that the skill will not attempt to call that endpoint or require the API key at runtime. If you cannot get clear answers, treat the skill cautiously or avoid installing it.
功能分析
Type: OpenClaw Skill Name: alvis-scrape-v2 Version: 2.0.0 The skill bundle provides metadata and instructions for performing ethical and legal web scraping. The SKILL.md file contains extensive guidelines on robots.txt compliance, rate limiting, and data privacy (GDPR/CCPA), and it directs the agent to use a managed scraping service via the SkillBoss API (api.SkillBoss.co). No malicious code, data exfiltration logic, or deceptive prompt injections were found in the provided files.
能力标签
requires-sensitive-credentials
能力评估
Purpose & Capability
The skill claims to support both direct HTTP scraping and managed scraping via SkillBoss. Requiring a SkillBoss API key for the managed path is coherent, but the published registry metadata lists no required environment variables or primary credential while the SKILL.md frontmatter explicitly lists SkillBoss_API_KEY. That metadata omission is an inconsistency that should be explained.
Instruction Scope
SKILL.md stays on-topic (robots.txt checks, rate limits, PII handling). However it explicitly instructs using the SkillBoss API Hub (https://api.SkillBoss.co/v1/pilot) for managed scraping, which means scraped content (potentially including PII) will be transmitted to an external service. That external data flow is reasonable for a managed-scrape feature but should be disclosed in metadata and privacy review before use.
Install Mechanism
Instruction-only skill with no install spec or code files; nothing is written to disk by the skill itself. This is the lowest install risk.
Credentials
SKILL.md requires SkillBoss_API_KEY but the skill metadata declares no required env vars or primary credential. Requesting an API key for an external managed service is proportionate to the described capability, but the registry should list that credential explicitly. Also consider that providing an API key will allow an external service to receive scraped data — ensure that is acceptable and that the key's scope and storage are limited.
Persistence & Privilege
always is false and the skill is user-invocable. There is no indication the skill requires permanent presence or modifies other skills/configs.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install alvis-scrape-v2
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /alvis-scrape-v2 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v2.0.0
No changes in functionality or documentation. - Version 2.0.0 released with no detected file changes. - SKILL.md and all related content remain identical to the previous version. - No updates or new features introduced in this release.
v1.0.0
alvis-scrape-v2 1.0.0 - Initial release of the Scrape skill with a strong legal compliance focus. - Enforces robots.txt checking, rate limiting, and GDPR/CCPA-friendly data handling. - Supports both direct HTTP scraping and managed scraping via SkillBoss API Hub. - Includes a detailed pre-scrape compliance checklist and best practices for data privacy. - Requires SkillBoss API key for API-based managed scraping.
元数据
Slug alvis-scrape-v2
版本 2.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 2
常见问题

Scrape 是什么?

Legal web scraping with robots.txt compliance, rate limiting, and GDPR/CCPA-aware data handling. Supports both direct HTTP scraping and managed scraping via... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 85 次。

如何安装 Scrape?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install alvis-scrape-v2」即可一键安装,无需额外配置。

Scrape 是免费的吗?

是的,Scrape 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Scrape 支持哪些平台?

Scrape 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Scrape?

由 AlvisDunlop(@alvisdunlop)开发并维护,当前版本 v2.0.0。

💬 留言讨论