← Back to Skills Marketplace

Data Spider

Name: Data Spider
Author: unixlamadev-spec

by unixlamadev-spec · GitHub ↗ · v1.1.0 · MIT-0

cross-platform ✓ Security Clean

597

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install data-spider

Description

Scrape any webpage and extract structured data as JSON, table, or list. Supports schema-guided extraction.

README (SKILL.md)

Data Spider

Scrape and extract structured data from any webpage. Supports schema-guided extraction to match a specific data shape, or auto-detection of structure. Returns data as JSON object, table (columns + rows), or flat list depending on your chosen format.

When to Use

Extracting product information or pricing from pages
Gathering statistics and figures from articles
Building datasets from web sources
Schema-guided extraction to match your data model
Research and competitive analysis

Usage Flow

Provide a webpage url
Optionally provide a schema object — data will be extracted to match that exact shape
Optionally set format: json (default), table, or list
AIProx routes to the data-spider agent
Returns structured data in the requested format, plus summary and source URL

Security Manifest

Permission	Scope	Reason
Network	aiprox.dev	API calls to orchestration endpoint
Env Read	AIPROX_SPEND_TOKEN	Authentication for paid API

Make Request — JSON with Schema

curl -X POST https://aiprox.dev/api/orchestrate \
  -H "Content-Type: application/json" \
  -H "X-Spend-Token: $AIPROX_SPEND_TOKEN" \
  -d '{
    "url": "https://example.com/pricing",
    "schema": {"free_tier": null, "pro_price": null, "enterprise": null},
    "format": "json"
  }'

Response — JSON

{
  "data": {"free_tier": "$0/month, 1000 API calls", "pro_price": "$29/month", "enterprise": "custom pricing"},
  "summary": "SaaS pricing page with three tiers.",
  "source": "https://example.com/pricing",
  "format": "json"
}

Make Request — Table

curl -X POST https://aiprox.dev/api/orchestrate \
  -H "Content-Type: application/json" \
  -H "X-Spend-Token: $AIPROX_SPEND_TOKEN" \
  -d '{
    "task": "extract pricing tiers as a table",
    "url": "https://example.com/pricing",
    "format": "table"
  }'

Response — Table

{
  "columns": ["Plan", "Price", "API Calls"],
  "rows": [
    ["Free", "$0/month", "1,000"],
    ["Pro", "$29/month", "50,000"],
    ["Enterprise", "Custom", "Unlimited"]
  ],
  "summary": "Three-tier SaaS pricing.",
  "source": "https://example.com/pricing",
  "format": "table"
}

Response — List

{
  "items": ["$0/month — Free tier, 1000 API calls", "$29/month — Pro, 50,000 calls", "Enterprise — custom pricing"],
  "summary": "SaaS pricing tiers extracted as flat list.",
  "source": "https://example.com/pricing",
  "format": "list"
}

Trust Statement

Data Spider fetches and analyzes webpage contents via URL. Content is processed transiently and not stored. Analysis is performed by Claude via LightningProx. Respects robots.txt and rate limits. Your spend token is used for payment only.

Usage Guidance

This skill is coherent but relies on an external service (aiprox.dev) to fetch and analyze webpage content. Before installing or using it: 1) Only send URLs that do not contain sensitive data (logins, internal docs, PII). 2) Review aiprox.dev’s privacy, storage, and billing policies—the SKILL.md claims transient processing but you must trust the vendor. 3) Treat AIPROX_SPEND_TOKEN as a secret: rotate it if leaked and limit its scope if possible. 4) If you need scraping of sensitive or internal sites, prefer a self-hosted scraper or run tools locally instead of routing data to a third party. 5) Test with non-sensitive pages first and monitor billing usage.

Capability Analysis

Type: OpenClaw Skill Name: data-spider Version: 1.1.0 The 'data-spider' skill is a standard API wrapper for a web scraping service hosted at aiprox.dev. It requires the AIPROX_SPEND_TOKEN environment variable for authentication, which is explicitly declared in the SKILL.md security manifest. The instructions and examples provided are consistent with the stated purpose of extracting structured data from URLs, and no malicious behaviors, obfuscation, or prompt-injection attacks were identified.

Capability Assessment

✓ Purpose & Capability

Name/description (web scraping, schema extraction) match the runtime instructions: the SKILL.md explicitly calls aiprox.dev orchestrate endpoints to perform scraping. Requiring a single spend token for a hosted service is expected.

ℹ Instruction Scope

Instructions are focused on calling the external API and do not ask the agent to read local files or extra environment variables. However, the runtime flow sends the full target webpage content (the thing being scraped) to a third-party orchestration service (aiprox.dev), which can disclose any sensitive content present on the scraped pages.

✓ Install Mechanism

No install spec and no code files—this is instruction-only, so nothing will be written to disk or fetched at install time by the skill itself.

✓ Credentials

Only one required environment variable (AIPROX_SPEND_TOKEN) is declared and documented in the SKILL.md as the payment/auth token for the external API. That is proportionate, but the token is sensitive (used for billing/auth) and grants the service the ability to accept requests on your behalf.

✓ Persistence & Privilege

always is false and the skill does not request persistent system privileges or any config paths. It does not attempt to modify other skills or agent-wide settings.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install data-spider
After installation, invoke the skill by name or use /data-spider
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.1.0

Now supports model selection — specify any of 19 models across 5 providers per request (e.g. gemini-2.5-flash, mistral-large-latest, claude-opus-4-5-20251101)

v1.0.1

- Added schema-guided extraction capability for precise data shaping. - Introduced support for multiple output formats: JSON (default), table (columns + rows), and list. - Updated examples and usage instructions to reflect schema and format options. - Enhanced flexibility for structured data extraction from any webpage.

v1.0.0

- Initial release of Data Spider. - Extracts structured data and insights from any webpage. - Supports extraction of facts, figures, names, dates, prices, and more. - Returns organized data, a summary, and the source URL. - Requires AIPROX_SPEND_TOKEN for API access and authentication.

Metadata

Slug data-spider

Version 1.1.0

License MIT-0

All-time Installs 4

Active Installs 4

Total Versions 3

Frequently Asked Questions

What is Data Spider?

Scrape any webpage and extract structured data as JSON, table, or list. Supports schema-guided extraction. It is an AI Agent Skill for Claude Code / OpenClaw, with 597 downloads so far.

How do I install Data Spider?

Run "/install data-spider" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Data Spider free?

Yes, Data Spider is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Data Spider support?

Data Spider is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Data Spider?

It is built and maintained by unixlamadev-spec (@unixlamadev-spec); the current version is v1.1.0.

More Skills

Data Spider

Data Spider

When to Use

Usage Flow

Security Manifest

Make Request — JSON with Schema

Response — JSON

Make Request — Table

Response — Table

Response — List

Trust Statement

What is Data Spider?

How do I install Data Spider?

Is Data Spider free?

Which platforms does Data Spider support?

Who created Data Spider?

💬 Comments