← Back to Skills Marketplace
ckchzh

Datasets

by BytesAgain2 · GitHub ↗ · v2.0.0 · MIT-0
cross-platform ✓ Security Clean
203
Downloads
0
Stars
1
Active Installs
1
Versions
Install in OpenClaw
/install datasets
Description
Browse and load ready-to-use AI/ML datasets with fast manipulation. Use when searching datasets, loading training data, transforming formats.
README (SKILL.md)

Datasets

A data processing toolkit for ingesting, transforming, querying, and managing dataset entries from the command line. All operations are logged with timestamps and stored locally.

Commands

Data Operations

Each data command works in two modes: run without arguments to view recent entries, or pass input to record a new entry.

Command Description
datasets ingest \x3Cinput> Ingest data — record a new ingest entry or view recent ones
datasets transform \x3Cinput> Transform data — record a transformation or view recent ones
datasets query \x3Cinput> Query data — record a query or view recent ones
datasets filter \x3Cinput> Filter data — record a filter operation or view recent ones
datasets aggregate \x3Cinput> Aggregate data — record an aggregation or view recent ones
datasets visualize \x3Cinput> Visualize data — record a visualization or view recent ones
datasets export \x3Cinput> Export data — record an export entry or view recent ones
datasets sample \x3Cinput> Sample data — record a sample or view recent ones
datasets schema \x3Cinput> Schema management — record a schema entry or view recent ones
datasets validate \x3Cinput> Validate data — record a validation or view recent ones
datasets pipeline \x3Cinput> Pipeline management — record a pipeline step or view recent ones
datasets profile \x3Cinput> Profile data — record a profile or view recent ones

Utility Commands

Command Description
datasets stats Show summary statistics — entry counts per category, total entries, disk usage
datasets export \x3Cfmt> Export all data to a file (formats: json, csv, txt)
datasets search \x3Cterm> Search all log files for a term (case-insensitive)
datasets recent Show last 20 entries from activity history
datasets status Health check — version, data directory, entry count, disk usage, last activity
datasets help Show available commands
datasets version Show version (v2.0.0)

Data Storage

All data is stored locally at ~/.local/share/datasets/:

  • Each data command writes to its own log file (e.g., ingest.log, transform.log)
  • Entries are stored as timestamp|value pairs (pipe-delimited)
  • All actions are tracked in history.log with timestamps
  • Export generates files in the data directory (export.json, export.csv, or export.txt)

Requirements

  • Bash (with set -euo pipefail)
  • Standard Unix utilities: date, wc, du, grep, tail, cat, sed
  • No external dependencies or API keys required

When to Use

  • To log and track data processing operations (ingest, transform, query, etc.)
  • To maintain a searchable history of data pipeline activities
  • To export accumulated records in JSON, CSV, or plain text format
  • As part of larger automation or data-pipeline workflows
  • When you need a lightweight, local-only dataset operation tracker

Examples

# Record a new ingest entry
datasets ingest "loaded training_data.csv 10000 rows"

# View recent transform entries
datasets transform

# Record a query
datasets query "filter by date > 2026-01-01"

# Search across all logs
datasets search "training"

# Export everything as JSON
datasets export json

# Check overall statistics
datasets stats

# View recent activity
datasets recent

# Health check
datasets status

Powered by BytesAgain | bytesagain.com | [email protected] 💬 Feedback & Feature Requests: https://bytesagain.com/feedback

Usage Guidance
This tool is local-only and appears safe to use, but review the script before installing: it will create and append logs under ~/.local/share/datasets (history.log and per-command .log files) so don't log sensitive secrets into it. The JSON export code does not escape values and may produce invalid JSON if entries contain quotes/newlines — treat exports as potentially containing raw user data. Also confirm you have the full, untruncated script (the provided snippet looked cut off mid-function in the listing); if you plan to allow autonomous invocation, be aware it can write files to your home directory whenever invoked.
Capability Analysis
Type: OpenClaw Skill Name: datasets Version: 2.0.0 The Datasets skill is a local logging and tracking utility designed to record metadata about data processing operations. The implementation in scripts/script.sh uses standard Unix utilities to manage log files within a dedicated local directory (~/.local/share/datasets/) and lacks any indicators of malicious behavior such as network exfiltration, credential theft, or unauthorized command execution.
Capability Assessment
Purpose & Capability
Name/description (dataset ingestion, transforms, query, export) align with what is provided: a bash CLI that logs operations into per-command .log files under ~/.local/share/datasets. There are no extra binaries, cloud credentials, or unrelated capabilities requested.
Instruction Scope
SKILL.md instructs the agent to run local CLI commands and explains local storage and exports. The included script only reads/writes files inside the declared data directory and uses common Unix utilities; it does not read unrelated system files, access network endpoints, or attempt to exfiltrate environment variables.
Install Mechanism
There is no install spec (instruction-only skill) and the provided code is a single bash script. Nothing is downloaded at install time and no archives or remote installers are used.
Credentials
The skill declares no required env vars, credentials, or config paths. The script uses HOME to determine a local data directory (expected) and does not request or use secrets or external service keys.
Persistence & Privilege
The skill is not always-enabled and is user-invocable. It does not modify other skills or global agent settings. Its persistence is limited to creating and updating files under the user's ~/.local/share/datasets directory.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install datasets
  3. After installation, invoke the skill by name or use /datasets
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v2.0.0
publish v2.0.0
Metadata
Slug datasets
Version 2.0.0
License MIT-0
All-time Installs 1
Active Installs 1
Total Versions 1
Frequently Asked Questions

What is Datasets?

Browse and load ready-to-use AI/ML datasets with fast manipulation. Use when searching datasets, loading training data, transforming formats. It is an AI Agent Skill for Claude Code / OpenClaw, with 203 downloads so far.

How do I install Datasets?

Run "/install datasets" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Datasets free?

Yes, Datasets is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Datasets support?

Datasets is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Datasets?

It is built and maintained by BytesAgain2 (@ckchzh); the current version is v2.0.0.

💬 Comments