← Back to Skills Marketplace
mingo-318

Dataset Splitter

by Mingo_318 · GitHub ↗ · v1.0.0
cross-platform ✓ Security Clean
276
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install dataset-splitter
Description
Split image datasets into train, validation, and test sets with options for random or stratified splits, custom ratios, and annotation support.
README (SKILL.md)

Dataset Splitter

Split image datasets into train/val/test sets. Supports random split, stratified split, and custom ratios. Use when user needs to split dataset for machine learning training.

Features

  • Random Split: Randomly shuffle and split
  • Stratified Split: Maintain class distribution
  • Custom Ratios: Configurable train/val/test ratios
  • Annotation Support: Split images and corresponding annotations together
  • YOLO Format: Generate YOLO format dataset structure
  • Reproducible: Set random seed for reproducibility

Usage

# Simple split (80/10/10)
python scripts/splitter.py split /path/to/images/ --ratios 80 10 10

# With annotations
python scripts/splitter.py split /path/to/images/ --annotations /path/to/labels/

# YOLO format output
python scripts/splitter.py split /path/to/images/ --output /path/to/dataset/ --yolo

# Stratified by class
python scripts/splitter.py split /path/to/images/ --annotations labels/ --stratify

Examples

$ python scripts/splitter.py split ./images --ratios 80 10 10

Splitting dataset...
Total images: 1000
Train: 800 (80%)
Val: 100 (10%)
Test: 100 (10%)

✓ Created train/ (800 images)
✓ Created val/ (100 images)
✓ Created test/ (100 images)

Installation

pip install pillow

Options

  • --ratios: Split ratios (train val test), default: 80 10 10
  • --seed: Random seed for reproducibility
  • --annotations: Path to annotations (will be split together)
  • --output: Output directory
  • --yolo: Output in YOLO dataset format
  • --stratify: Maintain class distribution
  • --copy: Copy files instead of moving
Usage Guidance
This skill appears to do exactly what it says: split image datasets into train/val/test sets. Before running it, back up your dataset or use --copy to avoid losing original files (the script moves files by default). Ensure your annotations are YOLO-style .txt files (the script reads the first token as the class id for stratified splits). Install Pillow only if you need the 'stats' command. No network calls or credential access were found, but review and test on a small dataset first if you're unsure.
Capability Analysis
Type: OpenClaw Skill Name: dataset-splitter Version: 1.0.0 The dataset-splitter skill is a standard utility for organizing machine learning image datasets into train/val/test splits. The core logic in scripts/splitter.py uses safe file operations (shutil.move, shutil.copy2) and standard libraries to process images and annotations, with no evidence of malicious intent, data exfiltration, or prompt injection.
Capability Assessment
Purpose & Capability
Name/description match the included script: splitter.py implements random and stratified splits, annotation handling, YOLO output structure, and stats. Required dependencies (Pillow referenced in SKILL.md) align with the script's 'stats' feature.
Instruction Scope
SKILL.md usage matches the script, but the script's default behavior is to move files (shutil.move) unless --copy is provided — this is potentially destructive and should be highlighted to users. Stratified splitting infers the class from the first token of a .txt annotation file (YOLO-style); that assumption isn't fully documented in SKILL.md and may not match all annotation formats.
Install Mechanism
No install spec (instruction-only skill) and SKILL.md recommends 'pip install pillow' which is a standard package; nothing is downloaded from arbitrary URLs or written to disk by an installer.
Credentials
The skill requests no environment variables, no credentials, and touches only paths the user supplies. There are no attempts to read unrelated config or secrets.
Persistence & Privilege
always is false and the skill does not request elevated or persistent platform privileges. It does not modify other skills or global agent settings.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install dataset-splitter
  3. After installation, invoke the skill by name or use /dataset-splitter
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release of Dataset Splitter: - Split image datasets into train/val/test sets with configurable ratios. - Supports random and stratified splitting to maintain class balance. - Handles corresponding annotation files for each image. - Optionally outputs in YOLO dataset format. - Ensures reproducible splits by setting a random seed. - Can copy or move files during splitting.
Metadata
Slug dataset-splitter
Version 1.0.0
License
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Dataset Splitter?

Split image datasets into train, validation, and test sets with options for random or stratified splits, custom ratios, and annotation support. It is an AI Agent Skill for Claude Code / OpenClaw, with 276 downloads so far.

How do I install Dataset Splitter?

Run "/install dataset-splitter" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Dataset Splitter free?

Yes, Dataset Splitter is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Dataset Splitter support?

Dataset Splitter is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Dataset Splitter?

It is built and maintained by Mingo_318 (@mingo-318); the current version is v1.0.0.

💬 Comments