← Back to Skills Marketplace

Data Pipelines

Name: Data Pipelines
Author: mike47512

by mike47512 · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ✓ Security Clean

138

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install data-pipelines

Description

Deep data pipeline workflow—ingestion, orchestration, idempotency, data quality, SLAs, observability, and lineage. Use when building batch/stream pipelines,...

README (SKILL.md)

Data Pipelines

Pipelines fail on silent schema drift, partial writes, and unclear ownership. Design for at-least-once delivery, idempotent sinks, and observable stages.

When to Offer This Workflow

Trigger conditions:

Batch or streaming ingestion (Kafka, Fivetran, Airflow, Dagster, Spark, etc.)
Late data, backfills, or schema changes breaking jobs
SLA misses on freshness or row counts

Initial offer:

Use six stages: (1) requirements & SLAs, (2) source contracts, (3) transforms & idempotency, (4) orchestration & dependencies, (5) quality & monitoring, (6) lineage & operations). Confirm batch vs stream and cloud stack.

Stage 1: Requirements & SLAs

Goal: Freshness (latency), completeness expectations, cost ceiling, failure tolerance (quarantine vs stop-the-line).

Exit condition: SLA table: pipeline → metric → threshold.

Stage 2: Source Contracts

Goal: Schema versioning; CDC vs snapshot pulls; API rate limits.

Practices

Raw landing zone immutable; curated layers downstream

Stage 3: Transforms & Idempotency

Goal: Deterministic transforms; upsert keys; partition strategy for rewinds.

Practices

Watermark progress for incremental loads

Stage 4: Orchestration & Dependencies

Goal: Clear DAG; retry policy; backfill without double counting; SLA miss alerts.

Stage 5: Quality & Monitoring

Goal: Data quality checks (null spikes, row bounds, referential checks); metrics on lag, duration, error rate.

Stage 6: Lineage & Operations

Goal: Column-level lineage where valuable; on-call runbook; ownership per pipeline.

Final Review Checklist

SLAs and failure policy explicit
Source contracts and schema evolution path
Idempotent writes and checkpointing
Orchestration with retries and safe backfill
Data quality checks and alerts
Lineage and ownership documented

Tips for Effective Guidance

Separate compute from storage cost awareness for large shuffles.
Pair with etl-design for batch patterns and message-queues for streaming handoffs.

Handling Deviations

Single-script pipelines: still document inputs, outputs, and schedule.

Usage Guidance

This skill is high-level documentation for designing and operating data pipelines and appears internally consistent. Because it's instruction-only and requests no credentials, it carries low direct risk. Before you use it in an agent that can act autonomously, consider: (1) do not provision cloud/database credentials to the agent unless you want it to run pipeline actions; (2) if you combine this with other skills (etl connectors, cloud deployers), review those skills for credential requests and install behaviors; and (3) treat the guidance as advisory — it won't execute code itself, so verify any automated playbooks you create from it before running against production. If you want higher confidence, ask the publisher for a homepage or source repo to confirm provenance.

Capability Analysis

Type: OpenClaw Skill Name: data-pipelines Version: 1.0.0 The skill bundle contains only architectural documentation and procedural guidance for designing data pipelines (SKILL.md). It lacks any executable code, network requests, or instructions that could be used for prompt injection or malicious activity.

Capability Assessment

✓ Purpose & Capability

Name/description (deep data pipeline workflow) match the content of SKILL.md: stage-by-stage guidance for ingestion, orchestration, idempotency, quality, SLAs, lineage. There are no unrelated requirements (no env vars, binaries, or config paths).

✓ Instruction Scope

The SKILL.md contains only design and operational guidance for pipelines (six-stage workflow, checklist, tips). It does not instruct the agent to read local files, access credentials, call external endpoints, or perform system operations beyond giving advice.

✓ Install Mechanism

No install spec and no code files — instruction-only. This minimizes write/execute risk; nothing will be downloaded or installed by the skill itself.

✓ Credentials

The skill declares no required environment variables, credentials, or config paths. Its guidance is conceptual and does not ask for secrets or unrelated credentials.

✓ Persistence & Privilege

Skill is user-invocable and not always-enabled; it does not request elevated persistence or modify other skills. Autonomous invocation is allowed by platform default but is not combined with other concerning privileges here.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install data-pipelines
After installation, invoke the skill by name or use /data-pipelines
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

- Initial release of the "data-pipelines" skill. - Provides a comprehensive workflow covering ingestion, orchestration, idempotency, data quality, SLAs, observability, and lineage. - Includes six structured stages: requirements & SLAs, source contracts, transforms & idempotency, orchestration & dependencies, quality & monitoring, and lineage & operations. - Offers trigger conditions for when the workflow is relevant and a detailed checklist for final review. - Contains practical tips and guidance for both batch and streaming pipelines, with emphasis on reliability and clarity.

Metadata

Slug data-pipelines

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Data Pipelines?

Deep data pipeline workflow—ingestion, orchestration, idempotency, data quality, SLAs, observability, and lineage. Use when building batch/stream pipelines,... It is an AI Agent Skill for Claude Code / OpenClaw, with 138 downloads so far.

How do I install Data Pipelines?

Run "/install data-pipelines" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Data Pipelines free?

Yes, Data Pipelines is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Data Pipelines support?

Data Pipelines is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Data Pipelines?

It is built and maintained by mike47512 (@mike47512); the current version is v1.0.0.

More Skills

Data Pipelines

Data Pipelines

When to Offer This Workflow

Stage 1: Requirements & SLAs

Stage 2: Source Contracts

Practices

Stage 3: Transforms & Idempotency

Practices

Stage 4: Orchestration & Dependencies

Stage 5: Quality & Monitoring

Stage 6: Lineage & Operations

Final Review Checklist

Tips for Effective Guidance

Handling Deviations

What is Data Pipelines?

How do I install Data Pipelines?

Is Data Pipelines free?

Which platforms does Data Pipelines support?

Who created Data Pipelines?

💬 Comments