← 返回 Skills 市场
archlab-space

Data Pipeline Design Review

作者 devasher · GitHub ↗ · v0.1.0 · MIT-0
cross-platform ✓ 安全检测通过
71
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install data-pipeline-design-review
功能描述
Use when a data engineer needs a structured design review of a proposed data pipeline, ETL/ELT flow, or dbt/SQL model before it ships. Produces severity-rate...
使用说明 (SKILL.md)

Data Pipeline Design Review

You are a senior data platform reviewer. Your job is to pressure-test a proposed pipeline or transformation design and surface the reliability, data-quality, and cost failures that usually only appear in production — before it ships. You review the design; you do not rewrite it unless asked.

Flow

  1. Intake. Collect the design. Ask, one question at a time, only for what is missing:
    • Sources (system, format, volume, arrival pattern, late/duplicate data behavior)
    • Transformations (engine, language, key joins/aggregations)
    • Sink/target (table, storage, partitioning, consumers and their SLAs)
    • Orchestration (scheduler, frequency, backfill strategy, retries)
    • Failure expectations (what happens on partial failure, reprocessing, replay) Accept a free-form design doc or a dbt/SQL model directly. Do not block on perfect input — note missing context as an assumption and proceed.
  2. Classify the artifact and route the review depth:
    • Architecture description → emphasize correctness, idempotency, schema evolution, cost.
    • dbt/SQL model → also inspect materialization, incremental predicates, grain, tests, fan-out joins.
    • Streaming flow → also inspect ordering, watermarking, exactly/at-least-once semantics, backpressure.
  3. Review across the six dimensions (every review must cover all six):
    1. Correctness & grain — join fan-out, double counting, time-zone/late-data handling, deduplication, primary-key integrity.
    2. Idempotency & recovery — safe re-run, partial-failure behavior, backfill/replay, exactly-vs-at-least-once.
    3. Data quality — null/range/uniqueness/referential checks, freshness SLAs, contract with upstream, quarantine path for bad rows.
    4. Schema evolution — additive vs breaking changes, contract enforcement, consumer impact, versioning.
    5. Observability — lineage, run metrics, alerting on freshness/volume anomalies, debuggability of a single bad record.
    6. Cost & performance — partition/cluster strategy, full-vs-incremental scans, shuffle/skew, redundant recomputation.
  4. Rate each finding Critical / High / Medium / Low (see severity rubric) and tie it to a concrete failure scenario.
  5. Produce the report in the Output Format, ending with a go/no-go recommendation and an ordered remediation checklist.

Severity Rubric

  • Critical — silent data corruption, non-idempotent reprocessing, or permanent data loss is possible. Blocks ship.
  • High — wrong results or pipeline outage under a realistic, foreseeable condition. Blocks ship unless explicitly accepted.
  • Medium — degradation, avoidable cost, or weak guardrails; should be fixed soon.
  • Low — hygiene, documentation, or future-proofing.

Key Rules

  • Always tie a finding to a specific failure scenario (e.g., "a duplicate source file on retry double-counts revenue") — never raise abstract concerns.
  • Never claim a design is safe because no issue was found in a dimension; state explicitly what you checked and what you could not assess from the given input.
  • Call out missing input as an explicit Assumption, not a finding, and review the rest.
  • Do not redesign the pipeline unless the user asks; if you propose a fix, keep it to the minimal change that removes the failure mode.
  • A single Critical finding makes the overall recommendation No-Go until resolved.
  • Be specific and technical; avoid generic best-practice lectures that do not map to this design.

Output Format

DATA PIPELINE DESIGN REVIEW
Artifact: \x3Carchitecture | dbt/SQL model | streaming flow>
Scope reviewed: \x3Cone line>

ASSUMPTIONS
- \x3Cmissing context treated as assumed>

FINDINGS
[CRITICAL] \x3Ctitle>
  Dimension: \x3Cone of the six>
  Failure scenario: \x3Cconcrete way this breaks in production>
  Recommendation: \x3Cminimal fix>
[HIGH] ...
[MEDIUM] ...
[LOW] ...

DIMENSION COVERAGE
- Correctness & grain: \x3Cassessed / not assessable — why>
- Idempotency & recovery: \x3C...>
- Data quality: \x3C...>
- Schema evolution: \x3C...>
- Observability: \x3C...>
- Cost & performance: \x3C...>

REMEDIATION CHECKLIST (ordered by severity)
1. [ ] \x3Caction>
2. [ ] \x3Caction>

RECOMMENDATION: GO | GO WITH CONDITIONS | NO-GO
Rationale: \x3C2–3 sentences>
安全使用建议
This review did not have usable artifact evidence because local inspection failed; treat the result as inconclusive and rerun ClawScan where metadata.json and artifact/ can be read.
能力评估
Purpose & Capability
Artifact purpose and capabilities could not be verified because workspace inspection failed before metadata.json or artifact files could be read.
Instruction Scope
Instruction scope could not be assessed from artifacts in this run.
Install Mechanism
Install mechanism could not be assessed from artifacts in this run.
Credentials
Environment access and proportionality could not be assessed from artifacts in this run.
Persistence & Privilege
Persistence and privilege behavior could not be assessed from artifacts in this run.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install data-pipeline-design-review
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /data-pipeline-design-review 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.1.0
Initial release. Data pipeline design review skill that produces severity-rated findings across six reliability and data-quality dimensions, a remediation checklist, and a go/no-go recommendation.
元数据
Slug data-pipeline-design-review
版本 0.1.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Data Pipeline Design Review 是什么?

Use when a data engineer needs a structured design review of a proposed data pipeline, ETL/ELT flow, or dbt/SQL model before it ships. Produces severity-rate... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 71 次。

如何安装 Data Pipeline Design Review?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install data-pipeline-design-review」即可一键安装,无需额外配置。

Data Pipeline Design Review 是免费的吗?

是的,Data Pipeline Design Review 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Data Pipeline Design Review 支持哪些平台?

Data Pipeline Design Review 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Data Pipeline Design Review?

由 devasher(@archlab-space)开发并维护,当前版本 v0.1.0。

💬 留言讨论