← 返回 Skills 市场
kirkraman

scrape

作者 KirkRaman · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
72
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install kirk-scrape
功能描述
Performs compliant web scraping with robots.txt checks, rate limiting, GDPR/CCPA-aware data handling, using direct HTTP or SkillBoss API Hub integration.
使用说明 (SKILL.md)

name: Scrape name: Scrape description: Legal web scraping with robots.txt compliance, rate limiting, and GDPR/CCPA-aware data handling. Supports both direct HTTP scraping and managed scraping via SkillBoss API Hub. requires_env: [SKILLBOSS_API_KEY]

Pre-Scrape Compliance Checklist

Before writing any scraping code:

  1. robots.txt — Fetch {domain}/robots.txt, check if target path is disallowed. If yes, stop.
  2. Terms of Service — Check /terms, /tos, /legal. Explicit scraping prohibition = need permission.
  3. Data type — Public factual data (prices, listings) is safer. Personal data triggers GDPR/CCPA.
  4. Authentication — Data behind login is off-limits without authorization. Never scrape protected content.
  5. API available? — If site offers an API, use it. Always. Scraping when API exists often violates ToS.

Legal Boundaries

  • Public data, no login — Generally legal (hiQ v. LinkedIn 2022)
  • Bypassing barriers — CFAA violation risk (Van Buren v. US 2021)
  • Ignoring robots.txt — Gray area, often breaches ToS (Meta v. Bright Data 2024)
  • Personal data without consent — GDPR/CCPA violation
  • Republishing copyrighted content — Copyright infringement

Request Discipline

  • Rate limit: Minimum 2-3 seconds between requests. Faster = server strain = legal exposure.
  • User-Agent: Real browser string + contact email: Mozilla/5.0 ... (contact: [email protected])
  • Respect 429: Exponential backoff. Ignoring 429s shows intent to harm.
  • Session reuse: Keep connections open to reduce server load.

Data Handling

  • Strip PII immediately — Don't collect names, emails, phones unless legally justified.
  • No fingerprinting — Don't combine data to identify individuals indirectly.
  • Minimize storage — Cache only what you need, delete what you don't.
  • Audit trail — Log what, when, where. Evidence of good faith if challenged.

SkillBoss API Hub Scraping

For managed content extraction with automatic anti-bot handling, use SkillBoss API Hub's scraping capability (type: "scraper"). Authenticate with SKILLBOSS_API_KEY and call https://api.skillbossai.com/v1/pilot. Result is returned at result.data.markdown.

For code patterns, robots.txt parser, and SkillBoss API Hub scraping integration, see code.md

安全使用建议
This skill's behavior (direct HTTP scraping and optional managed scraping via a third party) is consistent with its description, but there are several red flags you should resolve before installing: - Metadata mismatch: the registry lists no required env vars but the SKILL.md and code examples require SKILLBOSS_API_KEY. Confirm the required environment variables in the published metadata. - Inconsistent endpoints: documentation references multiple domains (api.skillbossai.com, api.heybossai.com, and skillboss.co). Ask the publisher which domain is authoritative and verify TLS certs and ownership of the endpoint before providing an API key. - Data exfiltration: using the managed-scraping path will send scraped page content to the third-party service. Do not provide sensitive or private data unless you trust that service and its privacy/compliance posture. - Ask for provenance: request a source repository or official homepage from the publisher so you can review real source code or confirm the vendor. If the publisher updates the registry to declare SKILLBOSS_API_KEY, consolidates and documents the single correct API domain, and provides a verifiable homepage or source repo, this would increase confidence and likely make the skill benign for typical use-cases. Until then, treat it cautiously and avoid supplying API keys or sensitive targets.
功能分析
Type: OpenClaw Skill Name: kirk-scrape Version: 1.0.0 The 'kirk-scrape' skill provides tools and guidelines for ethical web scraping, including robots.txt compliance, rate limiting, and PII handling. The code in code.md implements standard scraping patterns and integrates with the SkillBoss API Hub (api.skillbossai.com) for managed extraction, showing no signs of malicious intent or data exfiltration.
能力标签
requires-sensitive-credentials
能力评估
Purpose & Capability
The skill claims to perform compliant web scraping and to support a managed 'SkillBoss API Hub' integration; that capability reasonably requires an API key. However the registry metadata lists no required env vars while SKILL.md and code.md both reference SKILLBOSS_API_KEY — a clear mismatch between declared requirements and the runtime instructions.
Instruction Scope
SKILL.md and code.md stay focused on scraping best-practices (robots.txt, rate-limiting, PII handling). They also include patterns to call a managed scraping API and show sending scraped page content to a third party. That data-transmission is expected for a managed-scraping mode but is an important privacy/third-party-exfiltration behavior the user should be aware of.
Install Mechanism
This is an instruction-only skill with no install spec or code files to fetch or execute, which minimizes install-time risk.
Credentials
The only credential referenced is SKILLBOSS_API_KEY, which is proportionate for a managed-scraping integration. However the skill manifest/registry incorrectly lists no required env vars while SKILL.md/code.md require the API key. Additionally, the documentation uses multiple domains for the same service (SKILL.md: api.skillbossai.com, code.md: api.heybossai.com, README: skillboss.co), creating uncertainty about which external endpoint will receive data and which service will hold the API key.
Persistence & Privilege
The skill does not request always:true, does not declare system-wide config changes, and is not requesting elevated or permanent privileges.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install kirk-scrape
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /kirk-scrape 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
- Initial release of the "Scrape" skill focused on compliant web scraping. - Features built-in robots.txt checking and pre-scrape compliance checklist. - Supports both direct HTTP scraping and managed scraping via SkillBoss API Hub. - Enforces rate limiting, legal guidelines (GDPR/CCPA), and ethical data handling practices. - Documentation included for legal boundaries, request discipline, and data minimization. - Requires `SKILLBOSS_API_KEY` for API Hub integration.
元数据
Slug kirk-scrape
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

scrape 是什么?

Performs compliant web scraping with robots.txt checks, rate limiting, GDPR/CCPA-aware data handling, using direct HTTP or SkillBoss API Hub integration. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 72 次。

如何安装 scrape?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install kirk-scrape」即可一键安装,无需额外配置。

scrape 是免费的吗?

是的,scrape 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

scrape 支持哪些平台?

scrape 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 scrape?

由 KirkRaman(@kirkraman)开发并维护,当前版本 v1.0.0。

💬 留言讨论