← 返回 Skills 市场
alvisdunlop

Alvis Web Scrape

作者 AlvisDunlop · GitHub ↗ · v1.0.5 · MIT-0
cross-platform ⚠ suspicious
93
总下载
0
收藏
0
当前安装
2
版本数
在 OpenClaw 中安装
/install alvis-web-scrape
功能描述
Legal web scraping with robots.txt compliance, rate limiting, and GDPR/CCPA-aware data handling. Supports both direct HTTP scraping and managed scraping via...
使用说明 (SKILL.md)

?? Complete setup guide: https://SkillBoss.co/skill.md

Pre-Scrape Compliance Checklist

Before writing any scraping code:

  1. robots.txt �?Fetch {domain}/robots.txt, check if target path is disallowed. If yes, stop.
  2. Terms of Service �?Check /terms, /tos, /legal. Explicit scraping prohibition = need permission.
  3. Data type �?Public factual data (prices, listings) is safer. Personal data triggers GDPR/CCPA.
  4. Authentication �?Data behind login is off-limits without authorization. Never scrape protected content.
  5. API available? �?If site offers an API, use it. Always. Scraping when API exists often violates ToS.

Legal Boundaries

  • Public data, no login �?Generally legal (hiQ v. LinkedIn 2022)
  • Bypassing barriers �?CFAA violation risk (Van Buren v. US 2021)
  • Ignoring robots.txt �?Gray area, often breaches ToS (Meta v. Bright Data 2024)
  • Personal data without consent �?GDPR/CCPA violation
  • Republishing copyrighted content �?Copyright infringement

Request Discipline

  • Rate limit: Minimum 2-3 seconds between requests. Faster = server strain = legal exposure.
  • User-Agent: Real browser string + contact email: Mozilla/5.0 ... (contact: [email protected])
  • Respect 429: Exponential backoff. Ignoring 429s shows intent to harm.
  • Session reuse: Keep connections open to reduce server load.

Data Handling

  • Strip PII immediately �?Don't collect names, emails, phones unless legally justified.
  • No fingerprinting �?Don't combine data to identify individuals indirectly.
  • Minimize storage �?Cache only what you need, delete what you don't.
  • Audit trail �?Log what, when, where. Evidence of good faith if challenged.

SkillBoss API Hub Scraping

For managed content extraction with automatic anti-bot handling, use SkillBoss API Hub's scraping capability (type: "scraper"). Authenticate with SkillBoss_API_KEY and call https://api.SkillBoss.co/v1/pilot. Result is returned at result.data.markdown.

For code patterns, robots.txt parser, and SkillBoss API Hub scraping integration, see code.md \r \r \r \r

安全使用建议
Before installing: (1) Confirm the SkillBoss_API_KEY requirement — registry metadata omits it but SKILL.md requires it; don't supply an API key unless you trust the SkillBoss service. (2) Verify the SkillBoss domain and read its privacy/retention policy — managed scraping sends scraped content to api.SkillBoss.co, so PII or copyrighted content could leave your environment. (3) Ask the publisher for the missing code.md and any example code to confirm what the agent will execute and what data is transmitted. (4) If you must test, do so with a limited, non-sensitive API key and on non-sensitive targets; review logs and backoff behavior. (5) Remember legal responsibility stays with you — follow the checklist in SKILL.md and obtain permission before scraping protected content.
功能分析
Type: OpenClaw Skill Name: alvis-web-scrape Version: 1.0.5 The skill bundle provides documentation and instructions for performing ethical and compliant web scraping. It emphasizes adherence to robots.txt, rate limiting, and legal frameworks like GDPR/CCPA, while referencing an external managed scraping service (api.SkillBoss.co). No malicious code, data exfiltration logic, or harmful prompt injections were found in the provided files.
能力标签
requires-sensitive-credentials
能力评估
Purpose & Capability
Name/description describe a legal web-scraping helper and the SKILL.md explains both direct scraping and an integration with SkillBoss API Hub — functionally consistent. However the registry metadata claims no required env vars while SKILL.md declares requires_env: [SkillBoss_API_KEY], an internal inconsistency that affects capability and trust.
Instruction Scope
Runtime instructions are focused on expected scraping tasks (robots.txt, ToS checks, rate limits, PII handling) and specify use of SkillBoss API Hub for managed scraping. Nothing in SKILL.md instructs the agent to read unrelated files or system credentials. Missing referenced artefacts: SKILL.md points to code.md and an external setup guide (https://SkillBoss.co/skill.md) but no code files or code.md are included in the package — that reduces transparency.
Install Mechanism
There is no install spec and no code files (instruction-only), so nothing will be downloaded or written by an installer. This minimizes install-time risk.
Credentials
SKILL.md requires a single API key (SkillBoss_API_KEY) which is proportionate to using a hosted scraping API, but the skill registry metadata lists no required env vars — a mismatch. You should confirm whether an API key is actually required and how it will be used/stored by the agent and the external service.
Persistence & Privilege
Skill does not request always:true and has no special OS or config path requirements. It does allow normal autonomous invocation (platform default), which combined with an external API key increases blast radius but is not unusual for this kind of skill.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install alvis-web-scrape
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /alvis-web-scrape 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.5
Version 1.0.5 of alvis-web-scrape - No file changes detected in this release. - No updates to code or documentation content.
v1.0.0
Scrape 1.0.0 Changelog: - Initial release with robust, legal web scraping capabilities. - Enforces robots.txt compliance, rate limiting, and GDPR/CCPA-aware data handling. - Supports both direct HTTP scraping and SkillBoss API Hub managed scraping. - Includes a comprehensive pre-scrape compliance checklist and legal boundaries guide. - Requires SkillBoss_API_KEY for API Hub integration.
元数据
Slug alvis-web-scrape
版本 1.0.5
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 2
常见问题

Alvis Web Scrape 是什么?

Legal web scraping with robots.txt compliance, rate limiting, and GDPR/CCPA-aware data handling. Supports both direct HTTP scraping and managed scraping via... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 93 次。

如何安装 Alvis Web Scrape?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install alvis-web-scrape」即可一键安装,无需额外配置。

Alvis Web Scrape 是免费的吗?

是的,Alvis Web Scrape 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Alvis Web Scrape 支持哪些平台?

Alvis Web Scrape 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Alvis Web Scrape?

由 AlvisDunlop(@alvisdunlop)开发并维护,当前版本 v1.0.5。

💬 留言讨论