← 返回 Skills 市场
bytesagain-lab

Etl

作者 bytesagain-lab · GitHub ↗ · v2.0.1 · MIT-0
cross-platform ✓ 安全检测通过
386
总下载
0
收藏
1
当前安装
8
版本数
在 OpenClaw 中安装
/install etl
功能描述
Build ETL pipelines with data ingestion, cleaning, and validation steps. Use when ingesting sources, transforming formats, validating data, or scheduling loads.
使用说明 (SKILL.md)

ETL

Extract-Transform-Load data toolkit (v2.0.0). Record and manage data pipeline activities across the full ETL lifecycle — ingest, transform, query, filter, aggregate, visualize, export, sample, schema definition, validation, pipeline orchestration, and data profiling. Each command logs timestamped entries to its own log file, giving you a structured record of all data operations.

Commands

Command Description
etl ingest \x3Cinput> Record a data ingestion event (source, format, row count, etc.). Without args, shows recent ingest entries.
etl transform \x3Cinput> Log a transformation step (column rename, type cast, normalization, etc.). Without args, shows recent transforms.
etl query \x3Cinput> Record a query operation or SQL statement. Without args, shows recent queries.
etl filter \x3Cinput> Log a filtering rule or condition applied to data. Without args, shows recent filters.
etl aggregate \x3Cinput> Record an aggregation step (GROUP BY, SUM, AVG, etc.). Without args, shows recent aggregations.
etl visualize \x3Cinput> Log a visualization request or chart configuration. Without args, shows recent visualizations.
etl export \x3Cinput> Record an export operation (destination, format, row count). Without args, shows recent exports.
etl sample \x3Cinput> Log a data sampling step (sample size, method, seed). Without args, shows recent samples.
etl schema \x3Cinput> Record a schema definition or schema change. Without args, shows recent schema entries.
etl validate \x3Cinput> Log a data validation rule or result. Without args, shows recent validations.
etl pipeline \x3Cinput> Record a pipeline configuration or execution step. Without args, shows recent pipeline entries.
etl profile \x3Cinput> Log a data profiling result (null counts, distributions, anomalies). Without args, shows recent profiles.
etl stats Show summary statistics: entry counts per category, total entries, data size, and earliest record date.
etl export \x3Cfmt> Export all logged data to a file. Supported formats: json, csv, txt. (Note: this is a different code path from the export log command — it exports the tool's own data.)
etl search \x3Cterm> Search across all log files for a keyword (case-insensitive).
etl recent Show the 20 most recent entries from the activity history log.
etl status Health check: version, data directory, total entries, disk usage, last activity.
etl help Show the built-in help with all available commands.
etl version Print the current version (v2.0.0).

Data Storage

All data is stored as plain-text log files in ~/.local/share/etl/:

  • Per-command logs — Each command (ingest, transform, query, etc.) writes to its own .log file (e.g., ingest.log, transform.log).
  • History log — Every operation is also appended to history.log with a timestamp and command name.
  • Export files — Generated in the same directory as export.json, export.csv, or export.txt.

Entries are stored in timestamp|value format, making them easy to grep, parse, or pipe into downstream tools.

Requirements

  • Bash 4.0+ (uses set -euo pipefail)
  • coreutilsdate, wc, du, head, tail, grep, basename, cut
  • No external dependencies, API keys, or network access required
  • Works fully offline on any POSIX-compatible system

When to Use

  1. Logging data pipeline steps — Record each stage of your ETL process (ingest → transform → validate → export) with timestamps, creating a complete audit trail of data movements.
  2. Schema management and validation — Use schema to document table structures and validate to log data quality rules and their pass/fail results.
  3. Data profiling and exploration — Use profile to record column statistics, null rates, and distribution anomalies; use sample to log sampling parameters for reproducibility.
  4. Pipeline orchestration tracking — Use pipeline to record multi-step workflow configurations, execution order, and dependencies between ETL stages.
  5. Cross-team data operations review — Run stats for aggregate counts, search to find specific operations by keyword, and export json to share pipeline logs with team members or load into dashboards.

Examples

# Log a data ingestion from S3
etl ingest "s3://data-lake/raw/users_2024.csv — 1.2M rows, CSV format"

# Record a transformation step
etl transform "Normalize email to lowercase, cast created_at to UTC timestamp"

# Log a validation rule
etl validate "NOT NULL check on user_id: 0 violations out of 1,200,000 rows"

# Record schema for a new table
etl schema "users_dim: id INT PK, email VARCHAR(255), created_at TIMESTAMP, country CHAR(2)"

# Define a pipeline
etl pipeline "daily_user_load: ingest(s3) -> dedupe -> validate -> load(postgres)"

# Search for anything related to 'users'
etl search users

# Export all ETL logs to CSV for analysis
etl export csv

# View summary statistics
etl stats

# Check system health
etl status

Tips

  • Run any data command without arguments to see recent entries (e.g., etl ingest shows the last 20 ingest entries).
  • Use etl recent for a quick overview of all activity across all categories.
  • Combine with cron to auto-log pipeline runs: 0 2 * * * etl pipeline "nightly_load completed at $(date)"
  • Back up your data by copying ~/.local/share/etl/ to your preferred backup location.

Powered by BytesAgain | bytesagain.com | [email protected]

安全使用建议
This skill appears to do exactly what it says: a local logger for ETL activity. Before installing or running: (1) Inspect the script and ensure you are comfortable placing/adding it to your PATH; (2) be aware all entries are stored as plain-text under ~/.local/share/etl — do not log secrets (API keys, passwords, sensitive data) into these files; (3) exported JSON/csv may be malformed if entries contain quotes or newlines — avoid logging raw sensitive payloads or sanitize inputs; (4) set appropriate filesystem permissions on the data directory if others share the machine; and (5) if you plan to automate (cron), ensure the cron environment and output handling meet your security needs.
功能分析
Type: OpenClaw Skill Name: etl Version: 2.0.1 The 'etl' skill is a straightforward Bash-based logging utility designed to record and track ETL pipeline activities in local text files. Analysis of 'scripts/script.sh' and 'SKILL.md' shows that the tool only interacts with its own data directory (~/.local/share/etl/) and lacks any network access, credential harvesting, or unauthorized file system operations. While the export functions in 'scripts/script.sh' lack rigorous input escaping for JSON/CSV formats and the search function is potentially susceptible to minor argument injection, these are unintentional functional bugs rather than malicious features or high-risk vulnerabilities.
能力评估
Purpose & Capability
Name/description match what the code does: a lightweight CLI for recording ETL steps to per-command log files. No unexpected credentials, network access, or unrelated binaries are requested.
Instruction Scope
SKILL.md and the included script limit operations to logging and local exports under $HOME/.local/share/etl. Commands only read/write those files and use standard coreutils. Note: exported JSON/csv construction does not escape special characters in user-supplied values (could produce invalid output if entries contain quotes/newlines) and search uses grep with the raw term (behaves as a local search, not exfiltration).
Install Mechanism
Instruction-only skill with a bundled script; there is no remote download or install step. Nothing is fetched from arbitrary URLs and no archives are extracted.
Credentials
No required environment variables or credentials are declared. The script uses HOME to determine the data directory (expected). It does not read other environment secrets or external config paths.
Persistence & Privilege
Does not request permanent/always-on inclusion. It stores logs under ~/.local/share/etl (user-writable area) and does not modify other skills or global agent configuration.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install etl
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /etl 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v2.0.1
update
v2.0.0
v2.5 standard: Use-when desc, homepage, source, security fix
v1.0.5
old template -> domain-specific v2.0.0
v1.0.4
old template -> domain-specific v2.0.0
v1.0.3
Quality upgrade
v1.0.2
Quality upgrade: custom functionality
v1.0.1
Added feedback link
v1.0.0
Initial release
元数据
Slug etl
版本 2.0.1
许可证 MIT-0
累计安装 1
当前安装数 1
历史版本数 8
常见问题

Etl 是什么?

Build ETL pipelines with data ingestion, cleaning, and validation steps. Use when ingesting sources, transforming formats, validating data, or scheduling loads. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 386 次。

如何安装 Etl?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install etl」即可一键安装,无需额外配置。

Etl 是免费的吗?

是的,Etl 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Etl 支持哪些平台?

Etl 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Etl?

由 bytesagain-lab(@bytesagain-lab)开发并维护,当前版本 v2.0.1。

💬 留言讨论