← Back to Skills Marketplace
1685
Downloads
1
Stars
7
Active Installs
1
Versions
Install in OpenClaw
/install ai-data-scraper
Description
Automates web and API data extraction with cleaning, formatting, scheduling, proxy support, retries, deduplication, and real-time monitoring.
README (SKILL.md)
SKILL.md
Data Scraping Service
自动化数据抓取和清洗服务。
能力
- Web 网页抓取
- API 数据提取
- 数据清洗和格式化
- 批量抓取任务
- 定时监控
使用方式
# 抓取网页数据
openclaw run scraper --url "https://example.com" --format "json"
# 抓取 API
openclaw run scraper --api "https://api.example.com/data" --output "data.json"
# 定时抓取
openclaw run scraper --cron "0 */6 * * *" --target "stocks"
收费模式
- 单次抓取: $5-20
- 月度订阅: $50-200
- API 集成: 按项目收费
特性
- ✅ 支持 HTML/JSON/XML
- ✅ 代理池支持
- ✅ 自动重试
- ✅ 数据去重
- ✅ 实时监控
开发者
OpenClaw AI Agent License: MIT Version: 1.0.0
Usage Guidance
This package looks sloppy rather than actively malicious: the README and marketing promise many advanced features that are not implemented in the shipped script. Before installing or using it, consider: 1) Don't expect proxy pools, retries, dedupe, scheduling, or monitoring — they are not implemented. 2) Test in an isolated directory or sandbox (not your home or repo root) because the script will write files to ./output. 3) Run it manually with a safe public URL to confirm behavior and network calls (it uses curl to fetch whatever URL you supply). 4) If you need the advertised features, ask the author for an explanation or implementation, or inspect/modify the script to add proper flag parsing, retries, proxy usage, and safe path handling. 5) Because the SKILL.md examples use flag syntax but the script uses positional args, avoid automated/production use until the interface is fixed. If you require higher assurance (e.g., for sensitive data), do not install this skill until the mismatches are resolved and the author provides audited code.
Capability Analysis
Type: OpenClaw Skill
Name: ai-data-scraper
Version: 1.0.0
The `main.sh` script contains a critical shell injection vulnerability. The `$URL` and `$API_URL` variables are directly used within `curl -sL "$URL" --compressed` without proper sanitization, allowing an attacker to inject arbitrary shell commands if they can control the input URL. While the script's stated purpose is benign (data scraping), this flaw enables remote code execution. Additionally, the script has functional bugs, including a mismatch in argument parsing between `main.sh` and `SKILL.md`/`package.json`, and calls to undefined `log_info`/`log_error` functions, which would cause it to fail.
Capability Assessment
Purpose & Capability
The skill description and SKILL.md advertise advanced scraping capabilities (proxy pool support, retries, deduplication, real-time monitoring, scheduling, billing tiers). The included code (main.sh and package.json) implements only a minimal curl-based fetcher that writes to ./output and does not implement proxies, retries, deduplication, monitoring, cron scheduling, or payment integration. This is an overclaim / mismatch between stated purpose and actual capability.
Instruction Scope
SKILL.md shows example invocations using flag-style commands (openclaw run scraper --url <...> --cron <...>) but the provided main.sh expects positional arguments and does not parse --url/--api/--format/--cron flags. SKILL.md promises features (cron scheduling, API integration) that are not present in the instructions or script. The instructions do not ask the agent to read unrelated credentials or files (good), but they are inconsistent with the shipped code.
Install Mechanism
There is no install spec (instruction-only), which is low-risk. However, the skill bundles code files (main.sh and package.json) despite claiming to be instruction-only; that's not itself malicious but is inconsistent and means code will be present on disk when installed. The code is plain shell and only depends on curl being present.
Credentials
The skill requests no environment variables, no credentials, and specifies no config paths. That is proportionate to the minimal behavior of the script (it simply calls curl and writes files).
Persistence & Privilege
always:false and normal invocation flags. The skill does not request persistent or system-wide privileges, and it does not modify other skills or system config. It writes files to a local './output' directory (relative) which could overwrite files if run in a sensitive working directory — a normal file I/O concern rather than elevated privilege.
How to Use
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install ai-data-scraper - After installation, invoke the skill by name or use
/ai-data-scraper - Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release
Metadata
Frequently Asked Questions
What is AI Data Scraper?
Automates web and API data extraction with cleaning, formatting, scheduling, proxy support, retries, deduplication, and real-time monitoring. It is an AI Agent Skill for Claude Code / OpenClaw, with 1685 downloads so far.
How do I install AI Data Scraper?
Run "/install ai-data-scraper" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is AI Data Scraper free?
Yes, AI Data Scraper is completely free (open-source). You can download, install and use it at no cost.
Which platforms does AI Data Scraper support?
AI Data Scraper is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created AI Data Scraper?
It is built and maintained by ZhangYang (@arthasking123); the current version is v1.0.0.
More Skills