← 返回 Skills 市场
mingo-318

Dataset Splitter

作者 Mingo_318 · GitHub ↗ · v1.0.0
cross-platform ✓ 安全检测通过
276
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install dataset-splitter
功能描述
Split image datasets into train, validation, and test sets with options for random or stratified splits, custom ratios, and annotation support.
使用说明 (SKILL.md)

Dataset Splitter

Split image datasets into train/val/test sets. Supports random split, stratified split, and custom ratios. Use when user needs to split dataset for machine learning training.

Features

  • Random Split: Randomly shuffle and split
  • Stratified Split: Maintain class distribution
  • Custom Ratios: Configurable train/val/test ratios
  • Annotation Support: Split images and corresponding annotations together
  • YOLO Format: Generate YOLO format dataset structure
  • Reproducible: Set random seed for reproducibility

Usage

# Simple split (80/10/10)
python scripts/splitter.py split /path/to/images/ --ratios 80 10 10

# With annotations
python scripts/splitter.py split /path/to/images/ --annotations /path/to/labels/

# YOLO format output
python scripts/splitter.py split /path/to/images/ --output /path/to/dataset/ --yolo

# Stratified by class
python scripts/splitter.py split /path/to/images/ --annotations labels/ --stratify

Examples

$ python scripts/splitter.py split ./images --ratios 80 10 10

Splitting dataset...
Total images: 1000
Train: 800 (80%)
Val: 100 (10%)
Test: 100 (10%)

✓ Created train/ (800 images)
✓ Created val/ (100 images)
✓ Created test/ (100 images)

Installation

pip install pillow

Options

  • --ratios: Split ratios (train val test), default: 80 10 10
  • --seed: Random seed for reproducibility
  • --annotations: Path to annotations (will be split together)
  • --output: Output directory
  • --yolo: Output in YOLO dataset format
  • --stratify: Maintain class distribution
  • --copy: Copy files instead of moving
安全使用建议
This skill appears to do exactly what it says: split image datasets into train/val/test sets. Before running it, back up your dataset or use --copy to avoid losing original files (the script moves files by default). Ensure your annotations are YOLO-style .txt files (the script reads the first token as the class id for stratified splits). Install Pillow only if you need the 'stats' command. No network calls or credential access were found, but review and test on a small dataset first if you're unsure.
功能分析
Type: OpenClaw Skill Name: dataset-splitter Version: 1.0.0 The dataset-splitter skill is a standard utility for organizing machine learning image datasets into train/val/test splits. The core logic in scripts/splitter.py uses safe file operations (shutil.move, shutil.copy2) and standard libraries to process images and annotations, with no evidence of malicious intent, data exfiltration, or prompt injection.
能力评估
Purpose & Capability
Name/description match the included script: splitter.py implements random and stratified splits, annotation handling, YOLO output structure, and stats. Required dependencies (Pillow referenced in SKILL.md) align with the script's 'stats' feature.
Instruction Scope
SKILL.md usage matches the script, but the script's default behavior is to move files (shutil.move) unless --copy is provided — this is potentially destructive and should be highlighted to users. Stratified splitting infers the class from the first token of a .txt annotation file (YOLO-style); that assumption isn't fully documented in SKILL.md and may not match all annotation formats.
Install Mechanism
No install spec (instruction-only skill) and SKILL.md recommends 'pip install pillow' which is a standard package; nothing is downloaded from arbitrary URLs or written to disk by an installer.
Credentials
The skill requests no environment variables, no credentials, and touches only paths the user supplies. There are no attempts to read unrelated config or secrets.
Persistence & Privilege
always is false and the skill does not request elevated or persistent platform privileges. It does not modify other skills or global agent settings.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install dataset-splitter
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /dataset-splitter 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release of Dataset Splitter: - Split image datasets into train/val/test sets with configurable ratios. - Supports random and stratified splitting to maintain class balance. - Handles corresponding annotation files for each image. - Optionally outputs in YOLO dataset format. - Ensures reproducible splits by setting a random seed. - Can copy or move files during splitting.
元数据
Slug dataset-splitter
版本 1.0.0
许可证
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Dataset Splitter 是什么?

Split image datasets into train, validation, and test sets with options for random or stratified splits, custom ratios, and annotation support. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 276 次。

如何安装 Dataset Splitter?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install dataset-splitter」即可一键安装,无需额外配置。

Dataset Splitter 是免费的吗?

是的,Dataset Splitter 完全免费(开源免费),可自由下载、安装和使用。

Dataset Splitter 支持哪些平台?

Dataset Splitter 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Dataset Splitter?

由 Mingo_318(@mingo-318)开发并维护,当前版本 v1.0.0。

💬 留言讨论