← 返回 Skills 市场

Dataset Splitter

Name: Dataset Splitter
Author: mingo-318

作者 Mingo_318 · GitHub ↗ · v1.0.0

cross-platform ✓ 安全检测通过

276

总下载

当前安装

版本数

在 OpenClaw 中安装

/install dataset-splitter

功能描述

Split image datasets into train, validation, and test sets with options for random or stratified splits, custom ratios, and annotation support.

使用说明 (SKILL.md)

Dataset Splitter

Split image datasets into train/val/test sets. Supports random split, stratified split, and custom ratios. Use when user needs to split dataset for machine learning training.

Features

Random Split: Randomly shuffle and split
Stratified Split: Maintain class distribution
Custom Ratios: Configurable train/val/test ratios
Annotation Support: Split images and corresponding annotations together
YOLO Format: Generate YOLO format dataset structure
Reproducible: Set random seed for reproducibility

Usage

# Simple split (80/10/10)
python scripts/splitter.py split /path/to/images/ --ratios 80 10 10

# With annotations
python scripts/splitter.py split /path/to/images/ --annotations /path/to/labels/

# YOLO format output
python scripts/splitter.py split /path/to/images/ --output /path/to/dataset/ --yolo

# Stratified by class
python scripts/splitter.py split /path/to/images/ --annotations labels/ --stratify

Examples

$ python scripts/splitter.py split ./images --ratios 80 10 10

Splitting dataset...
Total images: 1000
Train: 800 (80%)
Val: 100 (10%)
Test: 100 (10%)

✓ Created train/ (800 images)
✓ Created val/ (100 images)
✓ Created test/ (100 images)

Installation

pip install pillow

Options

--ratios: Split ratios (train val test), default: 80 10 10
--seed: Random seed for reproducibility
--annotations: Path to annotations (will be split together)
--output: Output directory
--yolo: Output in YOLO dataset format
--stratify: Maintain class distribution
--copy: Copy files instead of moving

安全使用建议

This skill appears to do exactly what it says: split image datasets into train/val/test sets. Before running it, back up your dataset or use --copy to avoid losing original files (the script moves files by default). Ensure your annotations are YOLO-style .txt files (the script reads the first token as the class id for stratified splits). Install Pillow only if you need the 'stats' command. No network calls or credential access were found, but review and test on a small dataset first if you're unsure.

功能分析

Type: OpenClaw Skill Name: dataset-splitter Version: 1.0.0 The dataset-splitter skill is a standard utility for organizing machine learning image datasets into train/val/test splits. The core logic in scripts/splitter.py uses safe file operations (shutil.move, shutil.copy2) and standard libraries to process images and annotations, with no evidence of malicious intent, data exfiltration, or prompt injection.

能力评估

✓ Purpose & Capability

Name/description match the included script: splitter.py implements random and stratified splits, annotation handling, YOLO output structure, and stats. Required dependencies (Pillow referenced in SKILL.md) align with the script's 'stats' feature.

ℹ Instruction Scope

SKILL.md usage matches the script, but the script's default behavior is to move files (shutil.move) unless --copy is provided — this is potentially destructive and should be highlighted to users. Stratified splitting infers the class from the first token of a .txt annotation file (YOLO-style); that assumption isn't fully documented in SKILL.md and may not match all annotation formats.

✓ Install Mechanism

No install spec (instruction-only skill) and SKILL.md recommends 'pip install pillow' which is a standard package; nothing is downloaded from arbitrary URLs or written to disk by an installer.

✓ Credentials

The skill requests no environment variables, no credentials, and touches only paths the user supplies. There are no attempts to read unrelated config or secrets.

✓ Persistence & Privilege

always is false and the skill does not request elevated or persistent platform privileges. It does not modify other skills or global agent settings.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install dataset-splitter
安装完成后，直接呼叫该 Skill 的名称或使用 /dataset-splitter 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

Initial release of Dataset Splitter: - Split image datasets into train/val/test sets with configurable ratios. - Supports random and stratified splitting to maintain class balance. - Handles corresponding annotation files for each image. - Optionally outputs in YOLO dataset format. - Ensures reproducible splits by setting a random seed. - Can copy or move files during splitting.

元数据

Slug dataset-splitter

版本 1.0.0

许可证 —

累计安装 0

当前安装数 0

历史版本数 1

常见问题