← 返回 Skills 市场
arthasking123

AI Data Scraper

作者 ZhangYang · GitHub ↗ · v1.0.0
cross-platform ⚠ suspicious
1685
总下载
1
收藏
7
当前安装
1
版本数
在 OpenClaw 中安装
/install ai-data-scraper
功能描述
Automates web and API data extraction with cleaning, formatting, scheduling, proxy support, retries, deduplication, and real-time monitoring.
使用说明 (SKILL.md)

SKILL.md

Data Scraping Service

自动化数据抓取和清洗服务。

能力

  • Web 网页抓取
  • API 数据提取
  • 数据清洗和格式化
  • 批量抓取任务
  • 定时监控

使用方式

# 抓取网页数据
openclaw run scraper --url "https://example.com" --format "json"

# 抓取 API
openclaw run scraper --api "https://api.example.com/data" --output "data.json"

# 定时抓取
openclaw run scraper --cron "0 */6 * * *" --target "stocks"

收费模式

  • 单次抓取: $5-20
  • 月度订阅: $50-200
  • API 集成: 按项目收费

特性

  • ✅ 支持 HTML/JSON/XML
  • ✅ 代理池支持
  • ✅ 自动重试
  • ✅ 数据去重
  • ✅ 实时监控

开发者

OpenClaw AI Agent License: MIT Version: 1.0.0

安全使用建议
This package looks sloppy rather than actively malicious: the README and marketing promise many advanced features that are not implemented in the shipped script. Before installing or using it, consider: 1) Don't expect proxy pools, retries, dedupe, scheduling, or monitoring — they are not implemented. 2) Test in an isolated directory or sandbox (not your home or repo root) because the script will write files to ./output. 3) Run it manually with a safe public URL to confirm behavior and network calls (it uses curl to fetch whatever URL you supply). 4) If you need the advertised features, ask the author for an explanation or implementation, or inspect/modify the script to add proper flag parsing, retries, proxy usage, and safe path handling. 5) Because the SKILL.md examples use flag syntax but the script uses positional args, avoid automated/production use until the interface is fixed. If you require higher assurance (e.g., for sensitive data), do not install this skill until the mismatches are resolved and the author provides audited code.
功能分析
Type: OpenClaw Skill Name: ai-data-scraper Version: 1.0.0 The `main.sh` script contains a critical shell injection vulnerability. The `$URL` and `$API_URL` variables are directly used within `curl -sL "$URL" --compressed` without proper sanitization, allowing an attacker to inject arbitrary shell commands if they can control the input URL. While the script's stated purpose is benign (data scraping), this flaw enables remote code execution. Additionally, the script has functional bugs, including a mismatch in argument parsing between `main.sh` and `SKILL.md`/`package.json`, and calls to undefined `log_info`/`log_error` functions, which would cause it to fail.
能力评估
Purpose & Capability
The skill description and SKILL.md advertise advanced scraping capabilities (proxy pool support, retries, deduplication, real-time monitoring, scheduling, billing tiers). The included code (main.sh and package.json) implements only a minimal curl-based fetcher that writes to ./output and does not implement proxies, retries, deduplication, monitoring, cron scheduling, or payment integration. This is an overclaim / mismatch between stated purpose and actual capability.
Instruction Scope
SKILL.md shows example invocations using flag-style commands (openclaw run scraper --url <...> --cron <...>) but the provided main.sh expects positional arguments and does not parse --url/--api/--format/--cron flags. SKILL.md promises features (cron scheduling, API integration) that are not present in the instructions or script. The instructions do not ask the agent to read unrelated credentials or files (good), but they are inconsistent with the shipped code.
Install Mechanism
There is no install spec (instruction-only), which is low-risk. However, the skill bundles code files (main.sh and package.json) despite claiming to be instruction-only; that's not itself malicious but is inconsistent and means code will be present on disk when installed. The code is plain shell and only depends on curl being present.
Credentials
The skill requests no environment variables, no credentials, and specifies no config paths. That is proportionate to the minimal behavior of the script (it simply calls curl and writes files).
Persistence & Privilege
always:false and normal invocation flags. The skill does not request persistent or system-wide privileges, and it does not modify other skills or system config. It writes files to a local './output' directory (relative) which could overwrite files if run in a sensitive working directory — a normal file I/O concern rather than elevated privilege.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install ai-data-scraper
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /ai-data-scraper 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release
元数据
Slug ai-data-scraper
版本 1.0.0
许可证
累计安装 7
当前安装数 7
历史版本数 1
常见问题

AI Data Scraper 是什么?

Automates web and API data extraction with cleaning, formatting, scheduling, proxy support, retries, deduplication, and real-time monitoring. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 1685 次。

如何安装 AI Data Scraper?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install ai-data-scraper」即可一键安装,无需额外配置。

AI Data Scraper 是免费的吗?

是的,AI Data Scraper 完全免费(开源免费),可自由下载、安装和使用。

AI Data Scraper 支持哪些平台?

AI Data Scraper 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 AI Data Scraper?

由 ZhangYang(@arthasking123)开发并维护,当前版本 v1.0.0。

💬 留言讨论