Description

Build ETL pipelines with data ingestion, cleaning, and validation steps. Use when ingesting sources, transforming formats, validating data, or scheduling loads.

README (SKILL.md)

ETL

Name: Etl
Author: bytesagain-lab

Extract-Transform-Load data toolkit (v2.0.0). Record and manage data pipeline activities across the full ETL lifecycle — ingest, transform, query, filter, aggregate, visualize, export, sample, schema definition, validation, pipeline orchestration, and data profiling. Each command logs timestamped entries to its own log file, giving you a structured record of all data operations.

Commands

Command	Description
`etl ingest \x3Cinput>`	Record a data ingestion event (source, format, row count, etc.). Without args, shows recent ingest entries.
`etl transform \x3Cinput>`	Log a transformation step (column rename, type cast, normalization, etc.). Without args, shows recent transforms.
`etl query \x3Cinput>`	Record a query operation or SQL statement. Without args, shows recent queries.
`etl filter \x3Cinput>`	Log a filtering rule or condition applied to data. Without args, shows recent filters.
`etl aggregate \x3Cinput>`	Record an aggregation step (GROUP BY, SUM, AVG, etc.). Without args, shows recent aggregations.
`etl visualize \x3Cinput>`	Log a visualization request or chart configuration. Without args, shows recent visualizations.
`etl export \x3Cinput>`	Record an export operation (destination, format, row count). Without args, shows recent exports.
`etl sample \x3Cinput>`	Log a data sampling step (sample size, method, seed). Without args, shows recent samples.
`etl schema \x3Cinput>`	Record a schema definition or schema change. Without args, shows recent schema entries.
`etl validate \x3Cinput>`	Log a data validation rule or result. Without args, shows recent validations.
`etl pipeline \x3Cinput>`	Record a pipeline configuration or execution step. Without args, shows recent pipeline entries.
`etl profile \x3Cinput>`	Log a data profiling result (null counts, distributions, anomalies). Without args, shows recent profiles.
`etl stats`	Show summary statistics: entry counts per category, total entries, data size, and earliest record date.
`etl export \x3Cfmt>`	Export all logged data to a file. Supported formats: `json`, `csv`, `txt`. (Note: this is a different code path from the `export` log command — it exports the tool's own data.)
`etl search \x3Cterm>`	Search across all log files for a keyword (case-insensitive).
`etl recent`	Show the 20 most recent entries from the activity history log.
`etl status`	Health check: version, data directory, total entries, disk usage, last activity.
`etl help`	Show the built-in help with all available commands.
`etl version`	Print the current version (v2.0.0).

Data Storage

All data is stored as plain-text log files in ~/.local/share/etl/:

Per-command logs — Each command (ingest, transform, query, etc.) writes to its own .log file (e.g., ingest.log, transform.log).
History log — Every operation is also appended to history.log with a timestamp and command name.
Export files — Generated in the same directory as export.json, export.csv, or export.txt.

Entries are stored in timestamp|value format, making them easy to grep, parse, or pipe into downstream tools.

Requirements

Bash 4.0+ (uses set -euo pipefail)
coreutils — date, wc, du, head, tail, grep, basename, cut
No external dependencies, API keys, or network access required
Works fully offline on any POSIX-compatible system

When to Use

Logging data pipeline steps — Record each stage of your ETL process (ingest → transform → validate → export) with timestamps, creating a complete audit trail of data movements.
Schema management and validation — Use schema to document table structures and validate to log data quality rules and their pass/fail results.
Data profiling and exploration — Use profile to record column statistics, null rates, and distribution anomalies; use sample to log sampling parameters for reproducibility.
Pipeline orchestration tracking — Use pipeline to record multi-step workflow configurations, execution order, and dependencies between ETL stages.
Cross-team data operations review — Run stats for aggregate counts, search to find specific operations by keyword, and export json to share pipeline logs with team members or load into dashboards.

Examples

# Log a data ingestion from S3
etl ingest "s3://data-lake/raw/users_2024.csv — 1.2M rows, CSV format"

# Record a transformation step
etl transform "Normalize email to lowercase, cast created_at to UTC timestamp"

# Log a validation rule
etl validate "NOT NULL check on user_id: 0 violations out of 1,200,000 rows"

# Record schema for a new table
etl schema "users_dim: id INT PK, email VARCHAR(255), created_at TIMESTAMP, country CHAR(2)"

# Define a pipeline
etl pipeline "daily_user_load: ingest(s3) -> dedupe -> validate -> load(postgres)"

# Search for anything related to 'users'
etl search users

# Export all ETL logs to CSV for analysis
etl export csv

# View summary statistics
etl stats

# Check system health
etl status

Tips

Run any data command without arguments to see recent entries (e.g., etl ingest shows the last 20 ingest entries).
Use etl recent for a quick overview of all activity across all categories.
Combine with cron to auto-log pipeline runs: 0 2 * * * etl pipeline "nightly_load completed at $(date)"
Back up your data by copying ~/.local/share/etl/ to your preferred backup location.

Powered by BytesAgain | bytesagain.com | [email protected]

Usage Guidance

This skill appears to do exactly what it says: a local logger for ETL activity. Before installing or running: (1) Inspect the script and ensure you are comfortable placing/adding it to your PATH; (2) be aware all entries are stored as plain-text under ~/.local/share/etl — do not log secrets (API keys, passwords, sensitive data) into these files; (3) exported JSON/csv may be malformed if entries contain quotes or newlines — avoid logging raw sensitive payloads or sanitize inputs; (4) set appropriate filesystem permissions on the data directory if others share the machine; and (5) if you plan to automate (cron), ensure the cron environment and output handling meet your security needs.

Capability Analysis

Type: OpenClaw Skill Name: etl Version: 2.0.1 The 'etl' skill is a straightforward Bash-based logging utility designed to record and track ETL pipeline activities in local text files. Analysis of 'scripts/script.sh' and 'SKILL.md' shows that the tool only interacts with its own data directory (~/.local/share/etl/) and lacks any network access, credential harvesting, or unauthorized file system operations. While the export functions in 'scripts/script.sh' lack rigorous input escaping for JSON/CSV formats and the search function is potentially susceptible to minor argument injection, these are unintentional functional bugs rather than malicious features or high-risk vulnerabilities.

Capability Assessment

✓ Purpose & Capability

Name/description match what the code does: a lightweight CLI for recording ETL steps to per-command log files. No unexpected credentials, network access, or unrelated binaries are requested.

✓ Instruction Scope

SKILL.md and the included script limit operations to logging and local exports under $HOME/.local/share/etl. Commands only read/write those files and use standard coreutils. Note: exported JSON/csv construction does not escape special characters in user-supplied values (could produce invalid output if entries contain quotes/newlines) and search uses grep with the raw term (behaves as a local search, not exfiltration).

✓ Install Mechanism

Instruction-only skill with a bundled script; there is no remote download or install step. Nothing is fetched from arbitrary URLs and no archives are extracted.

✓ Credentials

No required environment variables or credentials are declared. The script uses HOME to determine the data directory (expected). It does not read other environment secrets or external config paths.

✓ Persistence & Privilege

Does not request permanent/always-on inclusion. It stores logs under ~/.local/share/etl (user-writable area) and does not modify other skills or global agent configuration.

Version History

v2.0.1

update

v2.0.0

v2.5 standard: Use-when desc, homepage, source, security fix

v1.0.5

old template -> domain-specific v2.0.0

v1.0.4

old template -> domain-specific v2.0.0

v1.0.3

Quality upgrade

v1.0.2

Quality upgrade: custom functionality

v1.0.1

Added feedback link

v1.0.0

Initial release

Metadata

Slug etl

Version 2.0.1

License MIT-0

All-time Installs 1

Active Installs 1

Total Versions 8

Frequently Asked Questions

What is Etl?

Build ETL pipelines with data ingestion, cleaning, and validation steps. Use when ingesting sources, transforming formats, validating data, or scheduling loads. It is an AI Agent Skill for Claude Code / OpenClaw, with 386 downloads so far.

How do I install Etl?

Run "/install etl" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Etl free?

Yes, Etl is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Etl support?

Etl is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Etl?

It is built and maintained by bytesagain-lab (@bytesagain-lab); the current version is v2.0.1.

More Skills

Etl

ETL