Ingeniero de datos
/install kv-senior-data-engineering
Senior Data Engineer
Production-grade data engineering: pipelines, modeling, quality, and DataOps.
Activation
Use this skill when the user asks to:
- design a data pipeline (batch, streaming, or hybrid)
- choose between Lambda and Kappa architecture, or batch vs streaming
- build ETL/ELT with Airflow, Prefect, Dagster, dbt, or Spark
- implement data quality checks or data contracts
- model data (star schema, snowflake, SCD, Data Vault)
- optimize a slow Spark job, DAG, or warehouse query
- set up data observability, lineage, or incident response
Workflow
- Classify the request:
pipeline|model|quality|optimize|architecture. - Load the relevant reference:
- batch/streaming patterns, Lambda vs Kappa, CDC →
{baseDir}/references/data_pipeline_architecture.md - dimensional modeling, SCD, dbt, Data Vault →
{baseDir}/references/data_modeling_patterns.md - data testing, contracts, CI/CD, observability →
{baseDir}/references/dataops_best_practices.md - end-to-end workflow walkthroughs →
{baseDir}/references/workflows.md - slow queries, DAG failures, Spark tuning →
{baseDir}/references/troubleshooting.md
- batch/streaming patterns, Lambda vs Kappa, CDC →
- Run the appropriate script when artifacts are provided:
# Generate pipeline orchestration config (airflow | prefect | dagster) python {baseDir}/scripts/pipeline_orchestrator.py generate \ --type airflow --source postgres --destination snowflake --schedule "0 5 * * *" # Validate data quality (freshness, completeness, uniqueness, schema) python {baseDir}/scripts/data_quality_validator.py validate \ --input data/file.parquet --schema schemas/file.json \ --checks freshness,completeness,uniqueness # Analyze and optimize ETL performance python {baseDir}/scripts/etl_performance_optimizer.py analyze \ --query queries/aggregation.sql --engine spark --recommend - Emit the artifact: pipeline config, dbt model, schema DDL, quality rules, or architecture diagram.
Output Contract
- Open with the pipeline classification and dominant bottleneck or design decision.
- Emit one primary artifact per response (DAG, dbt model, schema, quality config).
- For architecture decisions: state the trade-offs of each option before recommending.
- Declare data loss risk explicitly when a pipeline design cannot guarantee exactly-once semantics.
- Close with observability recommendation (what to monitor and at what threshold).
Key Rules
- Default to batch unless sub-minute latency is a stated requirement.
- Default to dbt + warehouse compute for \x3C1TB daily; recommend Spark only when justified by volume or complexity.
- Every pipeline must declare: idempotency strategy, error handling, and dead-letter queue approach.
- Data quality checks are non-optional — include them in every pipeline design.
Guardrails
- Do not generate application-layer code (APIs, web services) — stay within data pipeline scope.
- Do not recommend streaming when batch satisfies the latency requirement; streaming adds operational cost.
- Flag missing idempotency as a HIGH issue; flag missing data quality checks as MEDIUM.
- For cross-engine migration refer to
migration-architect.
Self Check
Before emitting any artifact, verify:
- idempotency strategy is stated;
- error handling and retry logic are addressed;
- data quality checks are included or explicitly deferred with a reason;
- the chosen architecture (batch vs stream) matches the stated latency requirement.
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install kv-senior-data-engineering - After installation, invoke the skill by name or use
/kv-senior-data-engineering - Provide required inputs per the skill's parameter spec and get structured output
What is Ingeniero de datos?
Design and build scalable data pipelines, ETL/ELT systems, and data infrastructure. Use when designing data architectures, choosing between batch and streami... It is an AI Agent Skill for Claude Code / OpenClaw, with 95 downloads so far.
How do I install Ingeniero de datos?
Run "/install kv-senior-data-engineering" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Ingeniero de datos free?
Yes, Ingeniero de datos is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Ingeniero de datos support?
Ingeniero de datos is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Ingeniero de datos?
It is built and maintained by felix-antonio-sl (@felix-antonio-sl); the current version is v1.0.0.