← Back to Skills Marketplace

Ingeniero de datos

Name: Ingeniero de datos
Author: felix-antonio-sl

by felix-antonio-sl · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ✓ Security Clean

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install kv-senior-data-engineering

Description

Design and build scalable data pipelines, ETL/ELT systems, and data infrastructure. Use when designing data architectures, choosing between batch and streami...

README (SKILL.md)

Senior Data Engineer

Production-grade data engineering: pipelines, modeling, quality, and DataOps.

Activation

Use this skill when the user asks to:

design a data pipeline (batch, streaming, or hybrid)
choose between Lambda and Kappa architecture, or batch vs streaming
build ETL/ELT with Airflow, Prefect, Dagster, dbt, or Spark
implement data quality checks or data contracts
model data (star schema, snowflake, SCD, Data Vault)
optimize a slow Spark job, DAG, or warehouse query
set up data observability, lineage, or incident response

Workflow

Classify the request: pipeline | model | quality | optimize | architecture.
Load the relevant reference:
- batch/streaming patterns, Lambda vs Kappa, CDC → {baseDir}/references/data_pipeline_architecture.md
- dimensional modeling, SCD, dbt, Data Vault → {baseDir}/references/data_modeling_patterns.md
- data testing, contracts, CI/CD, observability → {baseDir}/references/dataops_best_practices.md
- end-to-end workflow walkthroughs → {baseDir}/references/workflows.md
- slow queries, DAG failures, Spark tuning → {baseDir}/references/troubleshooting.md

Run the appropriate script when artifacts are provided:

# Generate pipeline orchestration config (airflow | prefect | dagster)
python {baseDir}/scripts/pipeline_orchestrator.py generate \
  --type airflow --source postgres --destination snowflake --schedule "0 5 * * *"

# Validate data quality (freshness, completeness, uniqueness, schema)
python {baseDir}/scripts/data_quality_validator.py validate \
  --input data/file.parquet --schema schemas/file.json \
  --checks freshness,completeness,uniqueness

# Analyze and optimize ETL performance
python {baseDir}/scripts/etl_performance_optimizer.py analyze \
  --query queries/aggregation.sql --engine spark --recommend

Emit the artifact: pipeline config, dbt model, schema DDL, quality rules, or architecture diagram.

Output Contract

Open with the pipeline classification and dominant bottleneck or design decision.
Emit one primary artifact per response (DAG, dbt model, schema, quality config).
For architecture decisions: state the trade-offs of each option before recommending.
Declare data loss risk explicitly when a pipeline design cannot guarantee exactly-once semantics.
Close with observability recommendation (what to monitor and at what threshold).

Key Rules

Default to batch unless sub-minute latency is a stated requirement.
Default to dbt + warehouse compute for \x3C1TB daily; recommend Spark only when justified by volume or complexity.
Every pipeline must declare: idempotency strategy, error handling, and dead-letter queue approach.
Data quality checks are non-optional — include them in every pipeline design.

Guardrails

Do not generate application-layer code (APIs, web services) — stay within data pipeline scope.
Do not recommend streaming when batch satisfies the latency requirement; streaming adds operational cost.
Flag missing idempotency as a HIGH issue; flag missing data quality checks as MEDIUM.
For cross-engine migration refer to migration-architect.

Self Check

Before emitting any artifact, verify:

idempotency strategy is stated;
error handling and retry logic are addressed;
data quality checks are included or explicitly deferred with a reason;
the chosen architecture (batch vs stream) matches the stated latency requirement.

Usage Guidance

This bundle appears coherent for designing and generating data pipelines, but review generated artifacts before running them in production. Specific points to consider: 1) Generated DAGs and tasks will reference connection IDs (e.g., postgres_conn_id, snowflake_conn_id) — you must configure those connections securely in your Airflow/secret manager rather than embedding secrets in generated code. 2) The generator can embed arbitrary bash commands from task parameters; inspect any generated BashOperator commands for accidental injection of secrets or destructive commands. 3) The validators and patterns include detectors for PII (emails, credit-card-like patterns) — avoid feeding sensitive production data into the skill without appropriate controls. 4) Test generated code in an isolated or staging environment first, and supply credentials via your normal secret-store mechanism. If you want to reduce risk, disable autonomous invocation for this skill or review its outputs manually before execution.

Capability Analysis

Type: OpenClaw Skill Name: kv-senior-data-engineering Version: 1.0.0 The skill bundle provides a comprehensive and legitimate set of tools for senior data engineering tasks, including pipeline orchestration, data quality validation, and performance optimization. The Python scripts (pipeline_orchestrator.py, data_quality_validator.py, and etl_performance_optimizer.py) are well-documented and implement their stated features using standard libraries without any evidence of malicious intent, data exfiltration, or unauthorized execution. While pipeline_orchestrator.py uses the compile() function for syntax validation of generated or provided code, it does not execute the code, and the overall behavior is strictly aligned with the professional data engineering purpose described in SKILL.md.

Capability Assessment

✓ Purpose & Capability

Name and description (designing pipelines, ETL/ELT, quality, Airflow/dbt/Spark/Kafka guidance) match the provided reference docs and three scripts (pipeline generator, quality validator, ETL optimizer). There are no unrelated binaries, credentials, or config paths declared that would be inconsistent with the stated purpose.

✓ Instruction Scope

SKILL.md instructs the agent to load local reference files under {baseDir}/references and to run the packaged scripts under {baseDir}/scripts. The instructions do not ask the agent to read arbitrary system files, environment variables, or remote endpoints beyond what is normal for data-engineering artifacts. It does generate DAGs and commands that will later require environment-specific connections (Airflow conn IDs, Snowflake/Postgres connection names), which is expected for this purpose.

✓ Install Mechanism

No install spec or external downloads are present; this is an instruction-only skill with bundled Python scripts and docs. Nothing is fetched from external URLs or installed at runtime by the skill itself.

ℹ Credentials

The skill declares no required environment variables or credentials (none in requires.env). However, generated artifacts (Airflow DAGs, SnowflakeOperator/PostgresOperator usage, Kafka examples) will expect platform-specific connection identifiers and secrets to be present in the target environment (Airflow connections, cloud credentials) — this is normal but users should not assume this skill will auto-supply or manage those credentials.

✓ Persistence & Privilege

always is false and model invocation is allowed (default). The skill does not request permanent system presence or attempt to modify other skills or system-wide agent settings.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install kv-senior-data-engineering
After installation, invoke the skill by name or use /kv-senior-data-engineering
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

Initial release of Senior Data Engineer skill. - Provides guidance on designing data pipelines, data modeling, quality frameworks, and DataOps best practices. - Includes workflow for classifying requests and referencing detailed guides for architecture, modeling, testing, workflow, and troubleshooting. - Supports generating artifacts such as pipeline configs, dbt models, schema DDL, and quality rules. - Enforces key engineering guardrails and self-checks for idempotency, error handling, and data quality coverage. - Recommends technologies and architectures based on data volume, latency, and complexity requirements.

Metadata

Slug kv-senior-data-engineering

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Ingeniero de datos?

Design and build scalable data pipelines, ETL/ELT systems, and data infrastructure. Use when designing data architectures, choosing between batch and streami... It is an AI Agent Skill for Claude Code / OpenClaw, with 95 downloads so far.

How do I install Ingeniero de datos?

Run "/install kv-senior-data-engineering" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Ingeniero de datos free?

Yes, Ingeniero de datos is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Ingeniero de datos support?

Ingeniero de datos is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Ingeniero de datos?

It is built and maintained by felix-antonio-sl (@felix-antonio-sl); the current version is v1.0.0.

More Skills