← 返回 Skills 市场
bytesagain3

Bigdata

作者 bytesagain3 · GitHub ↗ · v2.0.1 · MIT-0
cross-platform ✓ 安全检测通过
304
总下载
0
收藏
1
当前安装
7
版本数
在 OpenClaw 中安装
/install bigdata
功能描述
Split large files, run parallel processing, and stream batch analysis. Use when sampling datasets, aggregating logs, or transforming bulk data.
使用说明 (SKILL.md)

BigData

A comprehensive data processing toolkit for ingesting, transforming, querying, filtering, aggregating, and managing data workflows — all from the command line with local timestamped log storage.

Commands

Command Description
bigdata ingest \x3Cinput> Ingest raw data into the system. Without args, shows recent ingest entries
bigdata transform \x3Cinput> Record a data transformation step. Without args, shows recent transforms
bigdata query \x3Cinput> Log and track data queries. Without args, shows recent queries
bigdata filter \x3Cinput> Apply and record data filters. Without args, shows recent filters
bigdata aggregate \x3Cinput> Record aggregation operations. Without args, shows recent aggregations
bigdata visualize \x3Cinput> Log visualization tasks. Without args, shows recent visualizations
bigdata export \x3Cinput> Log export operations. Without args, shows recent exports
bigdata sample \x3Cinput> Record data sampling operations. Without args, shows recent samples
bigdata schema \x3Cinput> Track schema definitions and changes. Without args, shows recent schemas
bigdata validate \x3Cinput> Log data validation checks. Without args, shows recent validations
bigdata pipeline \x3Cinput> Record pipeline configurations. Without args, shows recent pipelines
bigdata profile \x3Cinput> Log data profiling operations. Without args, shows recent profiles
bigdata stats Show summary statistics across all entry types
bigdata search \x3Cterm> Search across all log entries for a keyword
bigdata recent Show the 20 most recent activity entries from the history log
bigdata status Health check — version, data dir, total entries, disk usage, last activity
bigdata help Show all available commands
bigdata version Print version (v2.0.0)

Each data command (ingest, transform, query, etc.) works the same way:

  • With arguments: saves the entry with a timestamp to its dedicated .log file and records it in the activity history
  • Without arguments: displays the 20 most recent entries from that command's log

Data Storage

All data is stored locally in plain-text log files:

~/.local/share/bigdata/
├── ingest.log          # Ingested data entries
├── transform.log       # Transformation records
├── query.log           # Query log
├── filter.log          # Filter operations
├── aggregate.log       # Aggregation records
├── visualize.log       # Visualization tasks
├── export.log          # Export operations
├── sample.log          # Sampling records
├── schema.log          # Schema definitions
├── validate.log        # Validation checks
├── pipeline.log        # Pipeline configurations
├── profile.log         # Profiling results
└── history.log         # Unified activity log with timestamps

Each entry is stored as YYYY-MM-DD HH:MM|\x3Cvalue> for easy parsing and export.

Requirements

  • Bash 4.0+ (uses set -euo pipefail)
  • Standard UNIX utilities: date, wc, du, grep, head, tail, cat
  • No external dependencies or API keys required
  • Works offline — all data stays on your machine

When to Use

  1. Data pipeline tracking — Record each step of a multi-stage data workflow (ingest → transform → validate → export) with full timestamps for audit trails
  2. Quick data logging — Capture observations, measurements, or notes about datasets directly from the terminal without opening a separate app
  3. Schema management — Keep track of schema definitions, changes, and validation rules as your data evolves over time
  4. Data quality monitoring — Log validation checks and profiling results to build a history of data quality metrics
  5. Workflow documentation — Use search and recent commands to review what data operations were performed, when, and in what order

Examples

Log a complete data workflow

# Ingest raw data
bigdata ingest "customer_orders_2024.csv — 1.2M rows loaded"

# Transform it
bigdata transform "normalize dates to ISO-8601, trim whitespace, deduplicate"

# Validate the output
bigdata validate "all required fields present, no nulls in customer_id"

# Record the schema
bigdata schema "orders: id(int), customer_id(int), amount(decimal), date(date)"

# Export when ready
bigdata export "final dataset pushed to analytics warehouse"

Search and review activity

# Search across all logs for a keyword
bigdata search "customer"

# Check overall statistics
bigdata stats

# View recent activity across all commands
bigdata recent

# Health check
bigdata status

Pipeline and profiling

# Define a pipeline
bigdata pipeline "daily-etl: ingest → clean → validate → load — runs at 02:00 UTC"

# Profile a dataset
bigdata profile "users table: 500K rows, 12 columns, 0.3% nulls in email field"

# Sample data for testing
bigdata sample "random 10% sample from transactions for QA testing"

# Record an aggregation
bigdata aggregate "monthly revenue by region — Q1 totals computed"

Filter and query tracking

# Log a filter operation
bigdata filter "removed records older than 2020-01-01, kept 850K of 1.2M rows"

# Track a query
bigdata query "SELECT region, SUM(revenue) FROM orders GROUP BY region"

# Log a visualization
bigdata visualize "bar chart: monthly revenue trend, exported as PNG"

Output

All commands print confirmation to stdout. Data is persisted in ~/.local/share/bigdata/. Use bigdata stats for a summary or bigdata search \x3Cterm> to find specific entries across all logs.


Powered by BytesAgain | bytesagain.com | [email protected]

安全使用建议
This skill is coherent and runs locally: it writes plaintext logs to ~/.local/share/bigdata and does not contact the network or request secrets. Before installing, note that the script has minor implementation inconsistencies (history.log timestamp format vs SKILL.md expectations) that may affect 'status' or 'since' reads — these are bugs, not malicious code. If you will store sensitive data, avoid logging secrets in free-text entries or change the storage location/permissions; otherwise it's safe to try in a normal user account or a sandboxed environment. If you need certainty, review the script (scripts/script.sh) yourself or run it with test entries to verify behavior.
功能分析
Type: OpenClaw Skill Name: bigdata Version: 2.0.1 The 'bigdata' skill is a local activity logging utility that records metadata about data processing workflows into plain-text files in ~/.local/share/bigdata/. The script (scripts/script.sh) uses standard UNIX utilities like grep, tail, and wc to manage these logs and contains no network activity, credential access, or unauthorized execution logic.
能力评估
Purpose & Capability
Name, description, SKILL.md, and the included bash script all implement a local CLI for recording and querying timestamped log entries; no external services, credentials, or unrelated capabilities are requested.
Instruction Scope
Instructions and the script mostly align, but there are small inconsistencies/bugs: SKILL.md claims entries are stored as 'YYYY-MM-DD HH:MM|<value>', and many command implementations use that format, yet the internal _log function writes history entries as 'MM-DD HH:MM <type>: <value>' (no '|' delimiter). Some status/statistics code expects pipe-delimited timestamps when reading history.log. These are implementation bugs (scope mismatch) but not evidence of malicious behavior.
Install Mechanism
No install spec; the skill is a script plus documentation — nothing is downloaded or installed automatically. This minimizes installer-related risk.
Credentials
The skill requests no environment variables, no credentials, and uses only standard shell utilities. Its local file writes are limited to ~/.local/share/bigdata, which is reasonable for a logging tool.
Persistence & Privilege
The skill writes its own data under the user's home directory only. It does not request always:true, does not modify other skills or system-wide settings, and does not persist credentials. This level of persistence is proportional for a CLI logger.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install bigdata
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /bigdata 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v2.0.1
update
v2.0.0
v2.5 standard: Use-when desc, homepage, source, security fix
v1.0.4
old template -> domain-specific v2.0.0
v1.0.3
old template -> domain-specific v2.0.0
v1.0.2
Quality upgrade
v1.0.1
De-template, unique content, script cleanup
v1.0.0
Initial release
元数据
Slug bigdata
版本 2.0.1
许可证 MIT-0
累计安装 1
当前安装数 1
历史版本数 7
常见问题

Bigdata 是什么?

Split large files, run parallel processing, and stream batch analysis. Use when sampling datasets, aggregating logs, or transforming bulk data. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 304 次。

如何安装 Bigdata?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install bigdata」即可一键安装,无需额外配置。

Bigdata 是免费的吗?

是的,Bigdata 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Bigdata 支持哪些平台?

Bigdata 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Bigdata?

由 bytesagain3(@bytesagain3)开发并维护,当前版本 v2.0.1。

💬 留言讨论