← 返回 Skills 市场
mike47512

Data Pipelines

作者 mike47512 · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ 安全检测通过
138
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install data-pipelines
功能描述
Deep data pipeline workflow—ingestion, orchestration, idempotency, data quality, SLAs, observability, and lineage. Use when building batch/stream pipelines,...
使用说明 (SKILL.md)

Data Pipelines

Pipelines fail on silent schema drift, partial writes, and unclear ownership. Design for at-least-once delivery, idempotent sinks, and observable stages.

When to Offer This Workflow

Trigger conditions:

  • Batch or streaming ingestion (Kafka, Fivetran, Airflow, Dagster, Spark, etc.)
  • Late data, backfills, or schema changes breaking jobs
  • SLA misses on freshness or row counts

Initial offer:

Use six stages: (1) requirements & SLAs, (2) source contracts, (3) transforms & idempotency, (4) orchestration & dependencies, (5) quality & monitoring, (6) lineage & operations). Confirm batch vs stream and cloud stack.


Stage 1: Requirements & SLAs

Goal: Freshness (latency), completeness expectations, cost ceiling, failure tolerance (quarantine vs stop-the-line).

Exit condition: SLA table: pipeline → metric → threshold.


Stage 2: Source Contracts

Goal: Schema versioning; CDC vs snapshot pulls; API rate limits.

Practices

  • Raw landing zone immutable; curated layers downstream

Stage 3: Transforms & Idempotency

Goal: Deterministic transforms; upsert keys; partition strategy for rewinds.

Practices

  • Watermark progress for incremental loads

Stage 4: Orchestration & Dependencies

Goal: Clear DAG; retry policy; backfill without double counting; SLA miss alerts.


Stage 5: Quality & Monitoring

Goal: Data quality checks (null spikes, row bounds, referential checks); metrics on lag, duration, error rate.


Stage 6: Lineage & Operations

Goal: Column-level lineage where valuable; on-call runbook; ownership per pipeline.


Final Review Checklist

  • SLAs and failure policy explicit
  • Source contracts and schema evolution path
  • Idempotent writes and checkpointing
  • Orchestration with retries and safe backfill
  • Data quality checks and alerts
  • Lineage and ownership documented

Tips for Effective Guidance

  • Separate compute from storage cost awareness for large shuffles.
  • Pair with etl-design for batch patterns and message-queues for streaming handoffs.

Handling Deviations

  • Single-script pipelines: still document inputs, outputs, and schedule.
安全使用建议
This skill is high-level documentation for designing and operating data pipelines and appears internally consistent. Because it's instruction-only and requests no credentials, it carries low direct risk. Before you use it in an agent that can act autonomously, consider: (1) do not provision cloud/database credentials to the agent unless you want it to run pipeline actions; (2) if you combine this with other skills (etl connectors, cloud deployers), review those skills for credential requests and install behaviors; and (3) treat the guidance as advisory — it won't execute code itself, so verify any automated playbooks you create from it before running against production. If you want higher confidence, ask the publisher for a homepage or source repo to confirm provenance.
功能分析
Type: OpenClaw Skill Name: data-pipelines Version: 1.0.0 The skill bundle contains only architectural documentation and procedural guidance for designing data pipelines (SKILL.md). It lacks any executable code, network requests, or instructions that could be used for prompt injection or malicious activity.
能力评估
Purpose & Capability
Name/description (deep data pipeline workflow) match the content of SKILL.md: stage-by-stage guidance for ingestion, orchestration, idempotency, quality, SLAs, lineage. There are no unrelated requirements (no env vars, binaries, or config paths).
Instruction Scope
The SKILL.md contains only design and operational guidance for pipelines (six-stage workflow, checklist, tips). It does not instruct the agent to read local files, access credentials, call external endpoints, or perform system operations beyond giving advice.
Install Mechanism
No install spec and no code files — instruction-only. This minimizes write/execute risk; nothing will be downloaded or installed by the skill itself.
Credentials
The skill declares no required environment variables, credentials, or config paths. Its guidance is conceptual and does not ask for secrets or unrelated credentials.
Persistence & Privilege
Skill is user-invocable and not always-enabled; it does not request elevated persistence or modify other skills. Autonomous invocation is allowed by platform default but is not combined with other concerning privileges here.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install data-pipelines
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /data-pipelines 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
- Initial release of the "data-pipelines" skill. - Provides a comprehensive workflow covering ingestion, orchestration, idempotency, data quality, SLAs, observability, and lineage. - Includes six structured stages: requirements & SLAs, source contracts, transforms & idempotency, orchestration & dependencies, quality & monitoring, and lineage & operations. - Offers trigger conditions for when the workflow is relevant and a detailed checklist for final review. - Contains practical tips and guidance for both batch and streaming pipelines, with emphasis on reliability and clarity.
元数据
Slug data-pipelines
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Data Pipelines 是什么?

Deep data pipeline workflow—ingestion, orchestration, idempotency, data quality, SLAs, observability, and lineage. Use when building batch/stream pipelines,... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 138 次。

如何安装 Data Pipelines?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install data-pipelines」即可一键安装,无需额外配置。

Data Pipelines 是免费的吗?

是的,Data Pipelines 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Data Pipelines 支持哪些平台?

Data Pipelines 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Data Pipelines?

由 mike47512(@mike47512)开发并维护,当前版本 v1.0.0。

💬 留言讨论