← 返回 Skills 市场
Ingeniero de datos
作者
felix-antonio-sl
· GitHub ↗
· v1.0.0
· MIT-0
95
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install kv-senior-data-engineering
功能描述
Design and build scalable data pipelines, ETL/ELT systems, and data infrastructure. Use when designing data architectures, choosing between batch and streami...
使用说明 (SKILL.md)
Senior Data Engineer
Production-grade data engineering: pipelines, modeling, quality, and DataOps.
Activation
Use this skill when the user asks to:
- design a data pipeline (batch, streaming, or hybrid)
- choose between Lambda and Kappa architecture, or batch vs streaming
- build ETL/ELT with Airflow, Prefect, Dagster, dbt, or Spark
- implement data quality checks or data contracts
- model data (star schema, snowflake, SCD, Data Vault)
- optimize a slow Spark job, DAG, or warehouse query
- set up data observability, lineage, or incident response
Workflow
- Classify the request:
pipeline|model|quality|optimize|architecture. - Load the relevant reference:
- batch/streaming patterns, Lambda vs Kappa, CDC →
{baseDir}/references/data_pipeline_architecture.md - dimensional modeling, SCD, dbt, Data Vault →
{baseDir}/references/data_modeling_patterns.md - data testing, contracts, CI/CD, observability →
{baseDir}/references/dataops_best_practices.md - end-to-end workflow walkthroughs →
{baseDir}/references/workflows.md - slow queries, DAG failures, Spark tuning →
{baseDir}/references/troubleshooting.md
- batch/streaming patterns, Lambda vs Kappa, CDC →
- Run the appropriate script when artifacts are provided:
# Generate pipeline orchestration config (airflow | prefect | dagster) python {baseDir}/scripts/pipeline_orchestrator.py generate \ --type airflow --source postgres --destination snowflake --schedule "0 5 * * *" # Validate data quality (freshness, completeness, uniqueness, schema) python {baseDir}/scripts/data_quality_validator.py validate \ --input data/file.parquet --schema schemas/file.json \ --checks freshness,completeness,uniqueness # Analyze and optimize ETL performance python {baseDir}/scripts/etl_performance_optimizer.py analyze \ --query queries/aggregation.sql --engine spark --recommend - Emit the artifact: pipeline config, dbt model, schema DDL, quality rules, or architecture diagram.
Output Contract
- Open with the pipeline classification and dominant bottleneck or design decision.
- Emit one primary artifact per response (DAG, dbt model, schema, quality config).
- For architecture decisions: state the trade-offs of each option before recommending.
- Declare data loss risk explicitly when a pipeline design cannot guarantee exactly-once semantics.
- Close with observability recommendation (what to monitor and at what threshold).
Key Rules
- Default to batch unless sub-minute latency is a stated requirement.
- Default to dbt + warehouse compute for \x3C1TB daily; recommend Spark only when justified by volume or complexity.
- Every pipeline must declare: idempotency strategy, error handling, and dead-letter queue approach.
- Data quality checks are non-optional — include them in every pipeline design.
Guardrails
- Do not generate application-layer code (APIs, web services) — stay within data pipeline scope.
- Do not recommend streaming when batch satisfies the latency requirement; streaming adds operational cost.
- Flag missing idempotency as a HIGH issue; flag missing data quality checks as MEDIUM.
- For cross-engine migration refer to
migration-architect.
Self Check
Before emitting any artifact, verify:
- idempotency strategy is stated;
- error handling and retry logic are addressed;
- data quality checks are included or explicitly deferred with a reason;
- the chosen architecture (batch vs stream) matches the stated latency requirement.
安全使用建议
This bundle appears coherent for designing and generating data pipelines, but review generated artifacts before running them in production. Specific points to consider: 1) Generated DAGs and tasks will reference connection IDs (e.g., postgres_conn_id, snowflake_conn_id) — you must configure those connections securely in your Airflow/secret manager rather than embedding secrets in generated code. 2) The generator can embed arbitrary bash commands from task parameters; inspect any generated BashOperator commands for accidental injection of secrets or destructive commands. 3) The validators and patterns include detectors for PII (emails, credit-card-like patterns) — avoid feeding sensitive production data into the skill without appropriate controls. 4) Test generated code in an isolated or staging environment first, and supply credentials via your normal secret-store mechanism. If you want to reduce risk, disable autonomous invocation for this skill or review its outputs manually before execution.
功能分析
Type: OpenClaw Skill
Name: kv-senior-data-engineering
Version: 1.0.0
The skill bundle provides a comprehensive and legitimate set of tools for senior data engineering tasks, including pipeline orchestration, data quality validation, and performance optimization. The Python scripts (pipeline_orchestrator.py, data_quality_validator.py, and etl_performance_optimizer.py) are well-documented and implement their stated features using standard libraries without any evidence of malicious intent, data exfiltration, or unauthorized execution. While pipeline_orchestrator.py uses the compile() function for syntax validation of generated or provided code, it does not execute the code, and the overall behavior is strictly aligned with the professional data engineering purpose described in SKILL.md.
能力评估
Purpose & Capability
Name and description (designing pipelines, ETL/ELT, quality, Airflow/dbt/Spark/Kafka guidance) match the provided reference docs and three scripts (pipeline generator, quality validator, ETL optimizer). There are no unrelated binaries, credentials, or config paths declared that would be inconsistent with the stated purpose.
Instruction Scope
SKILL.md instructs the agent to load local reference files under {baseDir}/references and to run the packaged scripts under {baseDir}/scripts. The instructions do not ask the agent to read arbitrary system files, environment variables, or remote endpoints beyond what is normal for data-engineering artifacts. It does generate DAGs and commands that will later require environment-specific connections (Airflow conn IDs, Snowflake/Postgres connection names), which is expected for this purpose.
Install Mechanism
No install spec or external downloads are present; this is an instruction-only skill with bundled Python scripts and docs. Nothing is fetched from external URLs or installed at runtime by the skill itself.
Credentials
The skill declares no required environment variables or credentials (none in requires.env). However, generated artifacts (Airflow DAGs, SnowflakeOperator/PostgresOperator usage, Kafka examples) will expect platform-specific connection identifiers and secrets to be present in the target environment (Airflow connections, cloud credentials) — this is normal but users should not assume this skill will auto-supply or manage those credentials.
Persistence & Privilege
always is false and model invocation is allowed (default). The skill does not request permanent system presence or attempt to modify other skills or system-wide agent settings.
如何使用
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install kv-senior-data-engineering - 安装完成后,直接呼叫该 Skill 的名称或使用
/kv-senior-data-engineering触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release of Senior Data Engineer skill.
- Provides guidance on designing data pipelines, data modeling, quality frameworks, and DataOps best practices.
- Includes workflow for classifying requests and referencing detailed guides for architecture, modeling, testing, workflow, and troubleshooting.
- Supports generating artifacts such as pipeline configs, dbt models, schema DDL, and quality rules.
- Enforces key engineering guardrails and self-checks for idempotency, error handling, and data quality coverage.
- Recommends technologies and architectures based on data volume, latency, and complexity requirements.
元数据
常见问题
Ingeniero de datos 是什么?
Design and build scalable data pipelines, ETL/ELT systems, and data infrastructure. Use when designing data architectures, choosing between batch and streami... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 95 次。
如何安装 Ingeniero de datos?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install kv-senior-data-engineering」即可一键安装,无需额外配置。
Ingeniero de datos 是免费的吗?
是的,Ingeniero de datos 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Ingeniero de datos 支持哪些平台?
Ingeniero de datos 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Ingeniero de datos?
由 felix-antonio-sl(@felix-antonio-sl)开发并维护,当前版本 v1.0.0。
推荐 Skills