← Back to Skills Marketplace

AI Data Scraper

Name: AI Data Scraper
Author: arthasking123

by ZhangYang · GitHub ↗ · v1.0.0

cross-platform ⚠ suspicious

1685

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install ai-data-scraper

Description

Automates web and API data extraction with cleaning, formatting, scheduling, proxy support, retries, deduplication, and real-time monitoring.

README (SKILL.md)

SKILL.md

Data Scraping Service

自动化数据抓取和清洗服务。

能力

Web 网页抓取
API 数据提取
数据清洗和格式化
批量抓取任务
定时监控

使用方式

# 抓取网页数据
openclaw run scraper --url "https://example.com" --format "json"

# 抓取 API
openclaw run scraper --api "https://api.example.com/data" --output "data.json"

# 定时抓取
openclaw run scraper --cron "0 */6 * * *" --target "stocks"

收费模式

单次抓取: $5-20
月度订阅: $50-200
API 集成: 按项目收费

特性

✅ 支持 HTML/JSON/XML
✅ 代理池支持
✅ 自动重试
✅ 数据去重
✅ 实时监控

开发者

OpenClaw AI Agent License: MIT Version: 1.0.0

Usage Guidance

This package looks sloppy rather than actively malicious: the README and marketing promise many advanced features that are not implemented in the shipped script. Before installing or using it, consider: 1) Don't expect proxy pools, retries, dedupe, scheduling, or monitoring — they are not implemented. 2) Test in an isolated directory or sandbox (not your home or repo root) because the script will write files to ./output. 3) Run it manually with a safe public URL to confirm behavior and network calls (it uses curl to fetch whatever URL you supply). 4) If you need the advertised features, ask the author for an explanation or implementation, or inspect/modify the script to add proper flag parsing, retries, proxy usage, and safe path handling. 5) Because the SKILL.md examples use flag syntax but the script uses positional args, avoid automated/production use until the interface is fixed. If you require higher assurance (e.g., for sensitive data), do not install this skill until the mismatches are resolved and the author provides audited code.

Capability Analysis

Type: OpenClaw Skill Name: ai-data-scraper Version: 1.0.0 The `main.sh` script contains a critical shell injection vulnerability. The `$URL` and `$API_URL` variables are directly used within `curl -sL "$URL" --compressed` without proper sanitization, allowing an attacker to inject arbitrary shell commands if they can control the input URL. While the script's stated purpose is benign (data scraping), this flaw enables remote code execution. Additionally, the script has functional bugs, including a mismatch in argument parsing between `main.sh` and `SKILL.md`/`package.json`, and calls to undefined `log_info`/`log_error` functions, which would cause it to fail.

Capability Assessment

⚠ Purpose & Capability

The skill description and SKILL.md advertise advanced scraping capabilities (proxy pool support, retries, deduplication, real-time monitoring, scheduling, billing tiers). The included code (main.sh and package.json) implements only a minimal curl-based fetcher that writes to ./output and does not implement proxies, retries, deduplication, monitoring, cron scheduling, or payment integration. This is an overclaim / mismatch between stated purpose and actual capability.

⚠ Instruction Scope

SKILL.md shows example invocations using flag-style commands (openclaw run scraper --url <...> --cron <...>) but the provided main.sh expects positional arguments and does not parse --url/--api/--format/--cron flags. SKILL.md promises features (cron scheduling, API integration) that are not present in the instructions or script. The instructions do not ask the agent to read unrelated credentials or files (good), but they are inconsistent with the shipped code.

ℹ Install Mechanism

There is no install spec (instruction-only), which is low-risk. However, the skill bundles code files (main.sh and package.json) despite claiming to be instruction-only; that's not itself malicious but is inconsistent and means code will be present on disk when installed. The code is plain shell and only depends on curl being present.

✓ Credentials

The skill requests no environment variables, no credentials, and specifies no config paths. That is proportionate to the minimal behavior of the script (it simply calls curl and writes files).

✓ Persistence & Privilege

always:false and normal invocation flags. The skill does not request persistent or system-wide privileges, and it does not modify other skills or system config. It writes files to a local './output' directory (relative) which could overwrite files if run in a sensitive working directory — a normal file I/O concern rather than elevated privilege.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install ai-data-scraper
After installation, invoke the skill by name or use /ai-data-scraper
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

Initial release

Metadata

Slug ai-data-scraper

Version 1.0.0

License —

All-time Installs 7

Active Installs 7

Total Versions 1

Frequently Asked Questions

What is AI Data Scraper?

Automates web and API data extraction with cleaning, formatting, scheduling, proxy support, retries, deduplication, and real-time monitoring. It is an AI Agent Skill for Claude Code / OpenClaw, with 1685 downloads so far.

How do I install AI Data Scraper?

Run "/install ai-data-scraper" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is AI Data Scraper free?

Yes, AI Data Scraper is completely free (open-source). You can download, install and use it at no cost.

Which platforms does AI Data Scraper support?

AI Data Scraper is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created AI Data Scraper?

It is built and maintained by ZhangYang (@arthasking123); the current version is v1.0.0.

More Skills