/install data-pipelines
Data Pipelines
Pipelines fail on silent schema drift, partial writes, and unclear ownership. Design for at-least-once delivery, idempotent sinks, and observable stages.
When to Offer This Workflow
Trigger conditions:
- Batch or streaming ingestion (Kafka, Fivetran, Airflow, Dagster, Spark, etc.)
- Late data, backfills, or schema changes breaking jobs
- SLA misses on freshness or row counts
Initial offer:
Use six stages: (1) requirements & SLAs, (2) source contracts, (3) transforms & idempotency, (4) orchestration & dependencies, (5) quality & monitoring, (6) lineage & operations). Confirm batch vs stream and cloud stack.
Stage 1: Requirements & SLAs
Goal: Freshness (latency), completeness expectations, cost ceiling, failure tolerance (quarantine vs stop-the-line).
Exit condition: SLA table: pipeline → metric → threshold.
Stage 2: Source Contracts
Goal: Schema versioning; CDC vs snapshot pulls; API rate limits.
Practices
- Raw landing zone immutable; curated layers downstream
Stage 3: Transforms & Idempotency
Goal: Deterministic transforms; upsert keys; partition strategy for rewinds.
Practices
- Watermark progress for incremental loads
Stage 4: Orchestration & Dependencies
Goal: Clear DAG; retry policy; backfill without double counting; SLA miss alerts.
Stage 5: Quality & Monitoring
Goal: Data quality checks (null spikes, row bounds, referential checks); metrics on lag, duration, error rate.
Stage 6: Lineage & Operations
Goal: Column-level lineage where valuable; on-call runbook; ownership per pipeline.
Final Review Checklist
- SLAs and failure policy explicit
- Source contracts and schema evolution path
- Idempotent writes and checkpointing
- Orchestration with retries and safe backfill
- Data quality checks and alerts
- Lineage and ownership documented
Tips for Effective Guidance
- Separate compute from storage cost awareness for large shuffles.
- Pair with etl-design for batch patterns and message-queues for streaming handoffs.
Handling Deviations
- Single-script pipelines: still document inputs, outputs, and schedule.
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install data-pipelines - After installation, invoke the skill by name or use
/data-pipelines - Provide required inputs per the skill's parameter spec and get structured output
What is Data Pipelines?
Deep data pipeline workflow—ingestion, orchestration, idempotency, data quality, SLAs, observability, and lineage. Use when building batch/stream pipelines,... It is an AI Agent Skill for Claude Code / OpenClaw, with 138 downloads so far.
How do I install Data Pipelines?
Run "/install data-pipelines" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Data Pipelines free?
Yes, Data Pipelines is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Data Pipelines support?
Data Pipelines is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Data Pipelines?
It is built and maintained by mike47512 (@mike47512); the current version is v1.0.0.