功能描述

Build ETL pipelines with data ingestion, cleaning, and validation steps. Use when ingesting sources, transforming formats, validating data, or scheduling loads.

使用说明 (SKILL.md)

ETL

Name: Etl
Author: bytesagain-lab

Extract-Transform-Load data toolkit (v2.0.0). Record and manage data pipeline activities across the full ETL lifecycle — ingest, transform, query, filter, aggregate, visualize, export, sample, schema definition, validation, pipeline orchestration, and data profiling. Each command logs timestamped entries to its own log file, giving you a structured record of all data operations.

Commands

Command	Description
`etl ingest \x3Cinput>`	Record a data ingestion event (source, format, row count, etc.). Without args, shows recent ingest entries.
`etl transform \x3Cinput>`	Log a transformation step (column rename, type cast, normalization, etc.). Without args, shows recent transforms.
`etl query \x3Cinput>`	Record a query operation or SQL statement. Without args, shows recent queries.
`etl filter \x3Cinput>`	Log a filtering rule or condition applied to data. Without args, shows recent filters.
`etl aggregate \x3Cinput>`	Record an aggregation step (GROUP BY, SUM, AVG, etc.). Without args, shows recent aggregations.
`etl visualize \x3Cinput>`	Log a visualization request or chart configuration. Without args, shows recent visualizations.
`etl export \x3Cinput>`	Record an export operation (destination, format, row count). Without args, shows recent exports.
`etl sample \x3Cinput>`	Log a data sampling step (sample size, method, seed). Without args, shows recent samples.
`etl schema \x3Cinput>`	Record a schema definition or schema change. Without args, shows recent schema entries.
`etl validate \x3Cinput>`	Log a data validation rule or result. Without args, shows recent validations.
`etl pipeline \x3Cinput>`	Record a pipeline configuration or execution step. Without args, shows recent pipeline entries.
`etl profile \x3Cinput>`	Log a data profiling result (null counts, distributions, anomalies). Without args, shows recent profiles.
`etl stats`	Show summary statistics: entry counts per category, total entries, data size, and earliest record date.
`etl export \x3Cfmt>`	Export all logged data to a file. Supported formats: `json`, `csv`, `txt`. (Note: this is a different code path from the `export` log command — it exports the tool's own data.)
`etl search \x3Cterm>`	Search across all log files for a keyword (case-insensitive).
`etl recent`	Show the 20 most recent entries from the activity history log.
`etl status`	Health check: version, data directory, total entries, disk usage, last activity.
`etl help`	Show the built-in help with all available commands.
`etl version`	Print the current version (v2.0.0).

Data Storage

All data is stored as plain-text log files in ~/.local/share/etl/:

Per-command logs — Each command (ingest, transform, query, etc.) writes to its own .log file (e.g., ingest.log, transform.log).
History log — Every operation is also appended to history.log with a timestamp and command name.
Export files — Generated in the same directory as export.json, export.csv, or export.txt.

Entries are stored in timestamp|value format, making them easy to grep, parse, or pipe into downstream tools.

Requirements

Bash 4.0+ (uses set -euo pipefail)
coreutils — date, wc, du, head, tail, grep, basename, cut
No external dependencies, API keys, or network access required
Works fully offline on any POSIX-compatible system

When to Use

Logging data pipeline steps — Record each stage of your ETL process (ingest → transform → validate → export) with timestamps, creating a complete audit trail of data movements.
Schema management and validation — Use schema to document table structures and validate to log data quality rules and their pass/fail results.
Data profiling and exploration — Use profile to record column statistics, null rates, and distribution anomalies; use sample to log sampling parameters for reproducibility.
Pipeline orchestration tracking — Use pipeline to record multi-step workflow configurations, execution order, and dependencies between ETL stages.
Cross-team data operations review — Run stats for aggregate counts, search to find specific operations by keyword, and export json to share pipeline logs with team members or load into dashboards.

Examples

# Log a data ingestion from S3
etl ingest "s3://data-lake/raw/users_2024.csv — 1.2M rows, CSV format"

# Record a transformation step
etl transform "Normalize email to lowercase, cast created_at to UTC timestamp"

# Log a validation rule
etl validate "NOT NULL check on user_id: 0 violations out of 1,200,000 rows"

# Record schema for a new table
etl schema "users_dim: id INT PK, email VARCHAR(255), created_at TIMESTAMP, country CHAR(2)"

# Define a pipeline
etl pipeline "daily_user_load: ingest(s3) -> dedupe -> validate -> load(postgres)"

# Search for anything related to 'users'
etl search users

# Export all ETL logs to CSV for analysis
etl export csv

# View summary statistics
etl stats

# Check system health
etl status

Tips

Run any data command without arguments to see recent entries (e.g., etl ingest shows the last 20 ingest entries).
Use etl recent for a quick overview of all activity across all categories.
Combine with cron to auto-log pipeline runs: 0 2 * * * etl pipeline "nightly_load completed at $(date)"
Back up your data by copying ~/.local/share/etl/ to your preferred backup location.

Powered by BytesAgain | bytesagain.com | [email protected]

安全使用建议

This skill appears to do exactly what it says: a local logger for ETL activity. Before installing or running: (1) Inspect the script and ensure you are comfortable placing/adding it to your PATH; (2) be aware all entries are stored as plain-text under ~/.local/share/etl — do not log secrets (API keys, passwords, sensitive data) into these files; (3) exported JSON/csv may be malformed if entries contain quotes or newlines — avoid logging raw sensitive payloads or sanitize inputs; (4) set appropriate filesystem permissions on the data directory if others share the machine; and (5) if you plan to automate (cron), ensure the cron environment and output handling meet your security needs.

功能分析

Type: OpenClaw Skill Name: etl Version: 2.0.1 The 'etl' skill is a straightforward Bash-based logging utility designed to record and track ETL pipeline activities in local text files. Analysis of 'scripts/script.sh' and 'SKILL.md' shows that the tool only interacts with its own data directory (~/.local/share/etl/) and lacks any network access, credential harvesting, or unauthorized file system operations. While the export functions in 'scripts/script.sh' lack rigorous input escaping for JSON/CSV formats and the search function is potentially susceptible to minor argument injection, these are unintentional functional bugs rather than malicious features or high-risk vulnerabilities.

能力评估

✓ Purpose & Capability

Name/description match what the code does: a lightweight CLI for recording ETL steps to per-command log files. No unexpected credentials, network access, or unrelated binaries are requested.

✓ Instruction Scope

SKILL.md and the included script limit operations to logging and local exports under $HOME/.local/share/etl. Commands only read/write those files and use standard coreutils. Note: exported JSON/csv construction does not escape special characters in user-supplied values (could produce invalid output if entries contain quotes/newlines) and search uses grep with the raw term (behaves as a local search, not exfiltration).

✓ Install Mechanism

Instruction-only skill with a bundled script; there is no remote download or install step. Nothing is fetched from arbitrary URLs and no archives are extracted.

✓ Credentials

No required environment variables or credentials are declared. The script uses HOME to determine the data directory (expected). It does not read other environment secrets or external config paths.

✓ Persistence & Privilege

Does not request permanent/always-on inclusion. It stores logs under ~/.local/share/etl (user-writable area) and does not modify other skills or global agent configuration.

版本历史

v2.0.1

update

v2.0.0

v2.5 standard: Use-when desc, homepage, source, security fix

v1.0.5

old template -> domain-specific v2.0.0

v1.0.4

old template -> domain-specific v2.0.0

v1.0.3

Quality upgrade

v1.0.2

Quality upgrade: custom functionality

v1.0.1

Added feedback link

v1.0.0

Initial release

元数据

Slug etl

版本 2.0.1

许可证 MIT-0

累计安装 1

当前安装数 1

历史版本数 8

常见问题

Etl 是什么？

Build ETL pipelines with data ingestion, cleaning, and validation steps. Use when ingesting sources, transforming formats, validating data, or scheduling loads. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 386 次。

如何安装 Etl？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install etl」即可一键安装，无需额外配置。

Etl 是免费的吗？

是的，Etl 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Etl 支持哪些平台？

Etl 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Etl？

由 bytesagain-lab（@bytesagain-lab）开发并维护，当前版本 v2.0.1。

Etl

ETL