← Back to Skills Marketplace
expeditionhub

File Splitter

by ExpeditionHub · GitHub ↗ · v1.1.0 · MIT-0
cross-platform ✓ Security Clean
116
Downloads
1
Stars
0
Active Installs
2
Versions
Install in OpenClaw
/install file-splitter
Description
Split large files into smaller chunks with semantic boundary detection. Supports JSON, Markdown, and TXT formats. Preserves data integrity by splitting at na...
README (SKILL.md)

File Splitter - Universal File Splitting Tool

Split large files into smaller, manageable chunks while preserving semantic structure.

Quick Start

python \x3Cskill_dir>/scripts/split_files.py --input \x3Cinput_folder> --output \x3Coutput_folder> [options]

Parameters

Parameter Required Default Description
--input Yes - Source folder containing files to split
--output Yes - Output folder for split chunks
--max-size No 512000 (500KB) Maximum bytes per chunk
--min-size No 409600 (400KB) Minimum bytes per chunk
--seq-digits No 9 Number of digits in sequence numbers
--formats No json,md,txt File formats to process (comma-separated)
--dry-run No false Preview mode - show what would be split without executing

Examples

# Default 500KB split
python split_files.py --input "./corpus" --output "./corpus/chunks"

# Custom 200KB chunks
python split_files.py --input "./notes" --output "./notes/chunks" --max-size 204800 --min-size 153600

# JSON files only
python split_files.py --input "./data" --output "./data/out" --formats json

# Preview mode
python split_files.py --input "./data" --output "./data/out" --dry-run

Splitting Rules

JSON Files

  • Splits at JSON array element boundaries
  • Each chunk is a valid JSON array [...]
  • Automatically extracts list values if top-level is an object
  • Never cuts individual records in half

Markdown Files

  • Splits at heading boundaries (# through ######)
  • Each chunk maintains complete heading structure
  • Never cuts content within a heading section

TXT Files

  • Prefers splitting at paragraph boundaries (empty lines)
  • Falls back to line-by-line splitting if no paragraphs exist
  • Never cuts within a paragraph

Output Naming Convention

Format: {source_filename_without_extension}{9-digit_sequence_number}{extension}

Examples:

  • dataset000000001.json
  • dataset000000002.json
  • notes000000001.md

Safety Features

  1. Source File Preservation: Read-only access to source files; never deletes or modifies originals
  2. Duplicate Detection: Automatically skips files that already have N-digit sequence suffixes to avoid re-splitting
  3. Small File Skip: Files ≤ max-size are automatically skipped (no need to split)
  4. Sequential Processing: Processes files one at a time to ensure stability
  5. Data Validation: Compares total size/record count before and after splitting; reports verification results
  6. UTF-8 Encoding: Forces UTF-8 for all read/write operations to avoid encoding issues on Windows

Notes

  • Console may display garbled Chinese characters on Windows, but functionality is unaffected
  • If a single data block/paragraph exceeds max-size, it becomes its own chunk (integrity takes priority over size limits)
  • Output folder is automatically created if it doesn't exist
  • License: MIT-0
Usage Guidance
This skill appears coherent and limited to local file processing. Before installing/using it: (1) inspect the full scripts/split_files.py file (the provided view was truncated) to confirm no unexpected behavior later in the file; (2) run with --dry-run first to verify which files would be split; (3) back up important source files and pick an isolated output folder to avoid accidental overwrite/collisions; (4) note the script is non-recursive and will only process formats listed — check the JSON handling behavior (it will pick the longest list value from an object) to ensure that matches your data; (5) ensure you have a local Python environment and sufficient disk space. If you expect recursive traversal, networked storage, or large-scale automation, review/modify the script accordingly.
Capability Analysis
Type: OpenClaw Skill Name: file-splitter Version: 1.1.0 The file-splitter skill is a utility for segmenting large JSON, Markdown, and text files into smaller chunks based on semantic boundaries. The Python script (split_files.py) uses standard libraries to perform file I/O and data validation without any network access, shell execution, or access to sensitive system directories.
Capability Assessment
Purpose & Capability
Name/description (split JSON/MD/TXT into semantic chunks) align with the included Python script. The script implements JSON array splitting, markdown heading-based splitting, and text-paragraph splitting — all expected for this purpose. No unrelated binaries, env vars, or config paths are requested.
Instruction Scope
SKILL.md describes invoking the bundled Python script with input/output folders and options; the script operates only on files in the provided input folder (non-recursive), writes to the specified output folder, and documents safety features (read-only originals, dry-run). The instructions do not ask the agent to read other system files, credentials, or send data externally.
Install Mechanism
No install spec is present (instruction-only + included script). No downloads or package installs are required. Running the script requires only a Python runtime which is appropriate for a Python utility.
Credentials
The skill requests no environment variables, credentials, or config paths. All file operations are performed on user-supplied input/output directories; there is no use of secrets or external service credentials.
Persistence & Privilege
The skill is not always-included and does not request persistent privileges. It does create output files in the specified folder but does not modify or delete source files according to the code and SKILL.md.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install file-splitter
  3. After installation, invoke the skill by name or use /file-splitter
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.1.0
English version - Universal file splitter with semantic boundary detection for global users
v1.0.0
Initial release - JSON/MD/TXT file splitter with semantic boundary detection, supports semantic boundary detection for JSON arrays, Markdown headings, and TXT paragraphs
Metadata
Slug file-splitter
Version 1.1.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 2
Frequently Asked Questions

What is File Splitter?

Split large files into smaller chunks with semantic boundary detection. Supports JSON, Markdown, and TXT formats. Preserves data integrity by splitting at na... It is an AI Agent Skill for Claude Code / OpenClaw, with 116 downloads so far.

How do I install File Splitter?

Run "/install file-splitter" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is File Splitter free?

Yes, File Splitter is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does File Splitter support?

File Splitter is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created File Splitter?

It is built and maintained by ExpeditionHub (@expeditionhub); the current version is v1.1.0.

💬 Comments