← Back to Skills Marketplace
unixlamadev-spec

Data Spider

by unixlamadev-spec · GitHub ↗ · v1.1.0 · MIT-0
cross-platform ✓ Security Clean
597
Downloads
1
Stars
4
Active Installs
3
Versions
Install in OpenClaw
/install data-spider
Description
Scrape any webpage and extract structured data as JSON, table, or list. Supports schema-guided extraction.
README (SKILL.md)

Data Spider

Scrape and extract structured data from any webpage. Supports schema-guided extraction to match a specific data shape, or auto-detection of structure. Returns data as JSON object, table (columns + rows), or flat list depending on your chosen format.

When to Use

  • Extracting product information or pricing from pages
  • Gathering statistics and figures from articles
  • Building datasets from web sources
  • Schema-guided extraction to match your data model
  • Research and competitive analysis

Usage Flow

  1. Provide a webpage url
  2. Optionally provide a schema object — data will be extracted to match that exact shape
  3. Optionally set format: json (default), table, or list
  4. AIProx routes to the data-spider agent
  5. Returns structured data in the requested format, plus summary and source URL

Security Manifest

Permission Scope Reason
Network aiprox.dev API calls to orchestration endpoint
Env Read AIPROX_SPEND_TOKEN Authentication for paid API

Make Request — JSON with Schema

curl -X POST https://aiprox.dev/api/orchestrate \
  -H "Content-Type: application/json" \
  -H "X-Spend-Token: $AIPROX_SPEND_TOKEN" \
  -d '{
    "url": "https://example.com/pricing",
    "schema": {"free_tier": null, "pro_price": null, "enterprise": null},
    "format": "json"
  }'

Response — JSON

{
  "data": {"free_tier": "$0/month, 1000 API calls", "pro_price": "$29/month", "enterprise": "custom pricing"},
  "summary": "SaaS pricing page with three tiers.",
  "source": "https://example.com/pricing",
  "format": "json"
}

Make Request — Table

curl -X POST https://aiprox.dev/api/orchestrate \
  -H "Content-Type: application/json" \
  -H "X-Spend-Token: $AIPROX_SPEND_TOKEN" \
  -d '{
    "task": "extract pricing tiers as a table",
    "url": "https://example.com/pricing",
    "format": "table"
  }'

Response — Table

{
  "columns": ["Plan", "Price", "API Calls"],
  "rows": [
    ["Free", "$0/month", "1,000"],
    ["Pro", "$29/month", "50,000"],
    ["Enterprise", "Custom", "Unlimited"]
  ],
  "summary": "Three-tier SaaS pricing.",
  "source": "https://example.com/pricing",
  "format": "table"
}

Response — List

{
  "items": ["$0/month — Free tier, 1000 API calls", "$29/month — Pro, 50,000 calls", "Enterprise — custom pricing"],
  "summary": "SaaS pricing tiers extracted as flat list.",
  "source": "https://example.com/pricing",
  "format": "list"
}

Trust Statement

Data Spider fetches and analyzes webpage contents via URL. Content is processed transiently and not stored. Analysis is performed by Claude via LightningProx. Respects robots.txt and rate limits. Your spend token is used for payment only.

Usage Guidance
This skill is coherent but relies on an external service (aiprox.dev) to fetch and analyze webpage content. Before installing or using it: 1) Only send URLs that do not contain sensitive data (logins, internal docs, PII). 2) Review aiprox.dev’s privacy, storage, and billing policies—the SKILL.md claims transient processing but you must trust the vendor. 3) Treat AIPROX_SPEND_TOKEN as a secret: rotate it if leaked and limit its scope if possible. 4) If you need scraping of sensitive or internal sites, prefer a self-hosted scraper or run tools locally instead of routing data to a third party. 5) Test with non-sensitive pages first and monitor billing usage.
Capability Analysis
Type: OpenClaw Skill Name: data-spider Version: 1.1.0 The 'data-spider' skill is a standard API wrapper for a web scraping service hosted at aiprox.dev. It requires the AIPROX_SPEND_TOKEN environment variable for authentication, which is explicitly declared in the SKILL.md security manifest. The instructions and examples provided are consistent with the stated purpose of extracting structured data from URLs, and no malicious behaviors, obfuscation, or prompt-injection attacks were identified.
Capability Assessment
Purpose & Capability
Name/description (web scraping, schema extraction) match the runtime instructions: the SKILL.md explicitly calls aiprox.dev orchestrate endpoints to perform scraping. Requiring a single spend token for a hosted service is expected.
Instruction Scope
Instructions are focused on calling the external API and do not ask the agent to read local files or extra environment variables. However, the runtime flow sends the full target webpage content (the thing being scraped) to a third-party orchestration service (aiprox.dev), which can disclose any sensitive content present on the scraped pages.
Install Mechanism
No install spec and no code files—this is instruction-only, so nothing will be written to disk or fetched at install time by the skill itself.
Credentials
Only one required environment variable (AIPROX_SPEND_TOKEN) is declared and documented in the SKILL.md as the payment/auth token for the external API. That is proportionate, but the token is sensitive (used for billing/auth) and grants the service the ability to accept requests on your behalf.
Persistence & Privilege
always is false and the skill does not request persistent system privileges or any config paths. It does not attempt to modify other skills or agent-wide settings.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install data-spider
  3. After installation, invoke the skill by name or use /data-spider
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.1.0
Now supports model selection — specify any of 19 models across 5 providers per request (e.g. gemini-2.5-flash, mistral-large-latest, claude-opus-4-5-20251101)
v1.0.1
- Added schema-guided extraction capability for precise data shaping. - Introduced support for multiple output formats: JSON (default), table (columns + rows), and list. - Updated examples and usage instructions to reflect schema and format options. - Enhanced flexibility for structured data extraction from any webpage.
v1.0.0
- Initial release of Data Spider. - Extracts structured data and insights from any webpage. - Supports extraction of facts, figures, names, dates, prices, and more. - Returns organized data, a summary, and the source URL. - Requires AIPROX_SPEND_TOKEN for API access and authentication.
Metadata
Slug data-spider
Version 1.1.0
License MIT-0
All-time Installs 4
Active Installs 4
Total Versions 3
Frequently Asked Questions

What is Data Spider?

Scrape any webpage and extract structured data as JSON, table, or list. Supports schema-guided extraction. It is an AI Agent Skill for Claude Code / OpenClaw, with 597 downloads so far.

How do I install Data Spider?

Run "/install data-spider" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Data Spider free?

Yes, Data Spider is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Data Spider support?

Data Spider is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Data Spider?

It is built and maintained by unixlamadev-spec (@unixlamadev-spec); the current version is v1.1.0.

💬 Comments