← Back to Skills Marketplace

Dataset Splitter

Name: Dataset Splitter
Author: mingo-318

by Mingo_318 · GitHub ↗ · v1.0.0

cross-platform ✓ Security Clean

276

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install dataset-splitter

Description

Split image datasets into train, validation, and test sets with options for random or stratified splits, custom ratios, and annotation support.

README (SKILL.md)

Dataset Splitter

Split image datasets into train/val/test sets. Supports random split, stratified split, and custom ratios. Use when user needs to split dataset for machine learning training.

Features

Random Split: Randomly shuffle and split
Stratified Split: Maintain class distribution
Custom Ratios: Configurable train/val/test ratios
Annotation Support: Split images and corresponding annotations together
YOLO Format: Generate YOLO format dataset structure
Reproducible: Set random seed for reproducibility

Usage

# Simple split (80/10/10)
python scripts/splitter.py split /path/to/images/ --ratios 80 10 10

# With annotations
python scripts/splitter.py split /path/to/images/ --annotations /path/to/labels/

# YOLO format output
python scripts/splitter.py split /path/to/images/ --output /path/to/dataset/ --yolo

# Stratified by class
python scripts/splitter.py split /path/to/images/ --annotations labels/ --stratify

Examples

$ python scripts/splitter.py split ./images --ratios 80 10 10

Splitting dataset...
Total images: 1000
Train: 800 (80%)
Val: 100 (10%)
Test: 100 (10%)

✓ Created train/ (800 images)
✓ Created val/ (100 images)
✓ Created test/ (100 images)

Installation

pip install pillow

Options

--ratios: Split ratios (train val test), default: 80 10 10
--seed: Random seed for reproducibility
--annotations: Path to annotations (will be split together)
--output: Output directory
--yolo: Output in YOLO dataset format
--stratify: Maintain class distribution
--copy: Copy files instead of moving

Usage Guidance

This skill appears to do exactly what it says: split image datasets into train/val/test sets. Before running it, back up your dataset or use --copy to avoid losing original files (the script moves files by default). Ensure your annotations are YOLO-style .txt files (the script reads the first token as the class id for stratified splits). Install Pillow only if you need the 'stats' command. No network calls or credential access were found, but review and test on a small dataset first if you're unsure.

Capability Analysis

Type: OpenClaw Skill Name: dataset-splitter Version: 1.0.0 The dataset-splitter skill is a standard utility for organizing machine learning image datasets into train/val/test splits. The core logic in scripts/splitter.py uses safe file operations (shutil.move, shutil.copy2) and standard libraries to process images and annotations, with no evidence of malicious intent, data exfiltration, or prompt injection.

Capability Assessment

✓ Purpose & Capability

Name/description match the included script: splitter.py implements random and stratified splits, annotation handling, YOLO output structure, and stats. Required dependencies (Pillow referenced in SKILL.md) align with the script's 'stats' feature.

ℹ Instruction Scope

SKILL.md usage matches the script, but the script's default behavior is to move files (shutil.move) unless --copy is provided — this is potentially destructive and should be highlighted to users. Stratified splitting infers the class from the first token of a .txt annotation file (YOLO-style); that assumption isn't fully documented in SKILL.md and may not match all annotation formats.

✓ Install Mechanism

No install spec (instruction-only skill) and SKILL.md recommends 'pip install pillow' which is a standard package; nothing is downloaded from arbitrary URLs or written to disk by an installer.

✓ Credentials

The skill requests no environment variables, no credentials, and touches only paths the user supplies. There are no attempts to read unrelated config or secrets.

✓ Persistence & Privilege

always is false and the skill does not request elevated or persistent platform privileges. It does not modify other skills or global agent settings.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install dataset-splitter
After installation, invoke the skill by name or use /dataset-splitter
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

Initial release of Dataset Splitter: - Split image datasets into train/val/test sets with configurable ratios. - Supports random and stratified splitting to maintain class balance. - Handles corresponding annotation files for each image. - Optionally outputs in YOLO dataset format. - Ensures reproducible splits by setting a random seed. - Can copy or move files during splitting.

Metadata

Slug dataset-splitter

Version 1.0.0

License —

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Dataset Splitter?

Split image datasets into train, validation, and test sets with options for random or stratified splits, custom ratios, and annotation support. It is an AI Agent Skill for Claude Code / OpenClaw, with 276 downloads so far.

How do I install Dataset Splitter?

Run "/install dataset-splitter" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Dataset Splitter free?

Yes, Dataset Splitter is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Dataset Splitter support?

Dataset Splitter is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Dataset Splitter?

It is built and maintained by Mingo_318 (@mingo-318); the current version is v1.0.0.

More Skills

Dataset Splitter

Dataset Splitter

Features

Usage

Examples

Installation

Options

What is Dataset Splitter?

How do I install Dataset Splitter?

Is Dataset Splitter free?

Which platforms does Dataset Splitter support?

Who created Dataset Splitter?

💬 Comments