← 返回 Skills 市场
unixlamadev-spec

Data Spider

作者 unixlamadev-spec · GitHub ↗ · v1.1.0 · MIT-0
cross-platform ✓ 安全检测通过
597
总下载
1
收藏
4
当前安装
3
版本数
在 OpenClaw 中安装
/install data-spider
功能描述
Scrape any webpage and extract structured data as JSON, table, or list. Supports schema-guided extraction.
使用说明 (SKILL.md)

Data Spider

Scrape and extract structured data from any webpage. Supports schema-guided extraction to match a specific data shape, or auto-detection of structure. Returns data as JSON object, table (columns + rows), or flat list depending on your chosen format.

When to Use

  • Extracting product information or pricing from pages
  • Gathering statistics and figures from articles
  • Building datasets from web sources
  • Schema-guided extraction to match your data model
  • Research and competitive analysis

Usage Flow

  1. Provide a webpage url
  2. Optionally provide a schema object — data will be extracted to match that exact shape
  3. Optionally set format: json (default), table, or list
  4. AIProx routes to the data-spider agent
  5. Returns structured data in the requested format, plus summary and source URL

Security Manifest

Permission Scope Reason
Network aiprox.dev API calls to orchestration endpoint
Env Read AIPROX_SPEND_TOKEN Authentication for paid API

Make Request — JSON with Schema

curl -X POST https://aiprox.dev/api/orchestrate \
  -H "Content-Type: application/json" \
  -H "X-Spend-Token: $AIPROX_SPEND_TOKEN" \
  -d '{
    "url": "https://example.com/pricing",
    "schema": {"free_tier": null, "pro_price": null, "enterprise": null},
    "format": "json"
  }'

Response — JSON

{
  "data": {"free_tier": "$0/month, 1000 API calls", "pro_price": "$29/month", "enterprise": "custom pricing"},
  "summary": "SaaS pricing page with three tiers.",
  "source": "https://example.com/pricing",
  "format": "json"
}

Make Request — Table

curl -X POST https://aiprox.dev/api/orchestrate \
  -H "Content-Type: application/json" \
  -H "X-Spend-Token: $AIPROX_SPEND_TOKEN" \
  -d '{
    "task": "extract pricing tiers as a table",
    "url": "https://example.com/pricing",
    "format": "table"
  }'

Response — Table

{
  "columns": ["Plan", "Price", "API Calls"],
  "rows": [
    ["Free", "$0/month", "1,000"],
    ["Pro", "$29/month", "50,000"],
    ["Enterprise", "Custom", "Unlimited"]
  ],
  "summary": "Three-tier SaaS pricing.",
  "source": "https://example.com/pricing",
  "format": "table"
}

Response — List

{
  "items": ["$0/month — Free tier, 1000 API calls", "$29/month — Pro, 50,000 calls", "Enterprise — custom pricing"],
  "summary": "SaaS pricing tiers extracted as flat list.",
  "source": "https://example.com/pricing",
  "format": "list"
}

Trust Statement

Data Spider fetches and analyzes webpage contents via URL. Content is processed transiently and not stored. Analysis is performed by Claude via LightningProx. Respects robots.txt and rate limits. Your spend token is used for payment only.

安全使用建议
This skill is coherent but relies on an external service (aiprox.dev) to fetch and analyze webpage content. Before installing or using it: 1) Only send URLs that do not contain sensitive data (logins, internal docs, PII). 2) Review aiprox.dev’s privacy, storage, and billing policies—the SKILL.md claims transient processing but you must trust the vendor. 3) Treat AIPROX_SPEND_TOKEN as a secret: rotate it if leaked and limit its scope if possible. 4) If you need scraping of sensitive or internal sites, prefer a self-hosted scraper or run tools locally instead of routing data to a third party. 5) Test with non-sensitive pages first and monitor billing usage.
功能分析
Type: OpenClaw Skill Name: data-spider Version: 1.1.0 The 'data-spider' skill is a standard API wrapper for a web scraping service hosted at aiprox.dev. It requires the AIPROX_SPEND_TOKEN environment variable for authentication, which is explicitly declared in the SKILL.md security manifest. The instructions and examples provided are consistent with the stated purpose of extracting structured data from URLs, and no malicious behaviors, obfuscation, or prompt-injection attacks were identified.
能力评估
Purpose & Capability
Name/description (web scraping, schema extraction) match the runtime instructions: the SKILL.md explicitly calls aiprox.dev orchestrate endpoints to perform scraping. Requiring a single spend token for a hosted service is expected.
Instruction Scope
Instructions are focused on calling the external API and do not ask the agent to read local files or extra environment variables. However, the runtime flow sends the full target webpage content (the thing being scraped) to a third-party orchestration service (aiprox.dev), which can disclose any sensitive content present on the scraped pages.
Install Mechanism
No install spec and no code files—this is instruction-only, so nothing will be written to disk or fetched at install time by the skill itself.
Credentials
Only one required environment variable (AIPROX_SPEND_TOKEN) is declared and documented in the SKILL.md as the payment/auth token for the external API. That is proportionate, but the token is sensitive (used for billing/auth) and grants the service the ability to accept requests on your behalf.
Persistence & Privilege
always is false and the skill does not request persistent system privileges or any config paths. It does not attempt to modify other skills or agent-wide settings.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install data-spider
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /data-spider 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.1.0
Now supports model selection — specify any of 19 models across 5 providers per request (e.g. gemini-2.5-flash, mistral-large-latest, claude-opus-4-5-20251101)
v1.0.1
- Added schema-guided extraction capability for precise data shaping. - Introduced support for multiple output formats: JSON (default), table (columns + rows), and list. - Updated examples and usage instructions to reflect schema and format options. - Enhanced flexibility for structured data extraction from any webpage.
v1.0.0
- Initial release of Data Spider. - Extracts structured data and insights from any webpage. - Supports extraction of facts, figures, names, dates, prices, and more. - Returns organized data, a summary, and the source URL. - Requires AIPROX_SPEND_TOKEN for API access and authentication.
元数据
Slug data-spider
版本 1.1.0
许可证 MIT-0
累计安装 4
当前安装数 4
历史版本数 3
常见问题

Data Spider 是什么?

Scrape any webpage and extract structured data as JSON, table, or list. Supports schema-guided extraction. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 597 次。

如何安装 Data Spider?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install data-spider」即可一键安装,无需额外配置。

Data Spider 是免费的吗?

是的,Data Spider 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Data Spider 支持哪些平台?

Data Spider 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Data Spider?

由 unixlamadev-spec(@unixlamadev-spec)开发并维护,当前版本 v1.1.0。

💬 留言讨论