/install data-pipelines
Data Pipelines
Pipelines fail on silent schema drift, partial writes, and unclear ownership. Design for at-least-once delivery, idempotent sinks, and observable stages.
When to Offer This Workflow
Trigger conditions:
- Batch or streaming ingestion (Kafka, Fivetran, Airflow, Dagster, Spark, etc.)
- Late data, backfills, or schema changes breaking jobs
- SLA misses on freshness or row counts
Initial offer:
Use six stages: (1) requirements & SLAs, (2) source contracts, (3) transforms & idempotency, (4) orchestration & dependencies, (5) quality & monitoring, (6) lineage & operations). Confirm batch vs stream and cloud stack.
Stage 1: Requirements & SLAs
Goal: Freshness (latency), completeness expectations, cost ceiling, failure tolerance (quarantine vs stop-the-line).
Exit condition: SLA table: pipeline → metric → threshold.
Stage 2: Source Contracts
Goal: Schema versioning; CDC vs snapshot pulls; API rate limits.
Practices
- Raw landing zone immutable; curated layers downstream
Stage 3: Transforms & Idempotency
Goal: Deterministic transforms; upsert keys; partition strategy for rewinds.
Practices
- Watermark progress for incremental loads
Stage 4: Orchestration & Dependencies
Goal: Clear DAG; retry policy; backfill without double counting; SLA miss alerts.
Stage 5: Quality & Monitoring
Goal: Data quality checks (null spikes, row bounds, referential checks); metrics on lag, duration, error rate.
Stage 6: Lineage & Operations
Goal: Column-level lineage where valuable; on-call runbook; ownership per pipeline.
Final Review Checklist
- SLAs and failure policy explicit
- Source contracts and schema evolution path
- Idempotent writes and checkpointing
- Orchestration with retries and safe backfill
- Data quality checks and alerts
- Lineage and ownership documented
Tips for Effective Guidance
- Separate compute from storage cost awareness for large shuffles.
- Pair with etl-design for batch patterns and message-queues for streaming handoffs.
Handling Deviations
- Single-script pipelines: still document inputs, outputs, and schedule.
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install data-pipelines - 安装完成后,直接呼叫该 Skill 的名称或使用
/data-pipelines触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Data Pipelines 是什么?
Deep data pipeline workflow—ingestion, orchestration, idempotency, data quality, SLAs, observability, and lineage. Use when building batch/stream pipelines,... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 138 次。
如何安装 Data Pipelines?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install data-pipelines」即可一键安装,无需额外配置。
Data Pipelines 是免费的吗?
是的,Data Pipelines 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Data Pipelines 支持哪些平台?
Data Pipelines 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Data Pipelines?
由 mike47512(@mike47512)开发并维护,当前版本 v1.0.0。