← 返回 Skills 市场
expeditionhub

File Splitter

作者 ExpeditionHub · GitHub ↗ · v1.1.0 · MIT-0
cross-platform ✓ 安全检测通过
116
总下载
1
收藏
0
当前安装
2
版本数
在 OpenClaw 中安装
/install file-splitter
功能描述
Split large files into smaller chunks with semantic boundary detection. Supports JSON, Markdown, and TXT formats. Preserves data integrity by splitting at na...
使用说明 (SKILL.md)

File Splitter - Universal File Splitting Tool

Split large files into smaller, manageable chunks while preserving semantic structure.

Quick Start

python \x3Cskill_dir>/scripts/split_files.py --input \x3Cinput_folder> --output \x3Coutput_folder> [options]

Parameters

Parameter Required Default Description
--input Yes - Source folder containing files to split
--output Yes - Output folder for split chunks
--max-size No 512000 (500KB) Maximum bytes per chunk
--min-size No 409600 (400KB) Minimum bytes per chunk
--seq-digits No 9 Number of digits in sequence numbers
--formats No json,md,txt File formats to process (comma-separated)
--dry-run No false Preview mode - show what would be split without executing

Examples

# Default 500KB split
python split_files.py --input "./corpus" --output "./corpus/chunks"

# Custom 200KB chunks
python split_files.py --input "./notes" --output "./notes/chunks" --max-size 204800 --min-size 153600

# JSON files only
python split_files.py --input "./data" --output "./data/out" --formats json

# Preview mode
python split_files.py --input "./data" --output "./data/out" --dry-run

Splitting Rules

JSON Files

  • Splits at JSON array element boundaries
  • Each chunk is a valid JSON array [...]
  • Automatically extracts list values if top-level is an object
  • Never cuts individual records in half

Markdown Files

  • Splits at heading boundaries (# through ######)
  • Each chunk maintains complete heading structure
  • Never cuts content within a heading section

TXT Files

  • Prefers splitting at paragraph boundaries (empty lines)
  • Falls back to line-by-line splitting if no paragraphs exist
  • Never cuts within a paragraph

Output Naming Convention

Format: {source_filename_without_extension}{9-digit_sequence_number}{extension}

Examples:

  • dataset000000001.json
  • dataset000000002.json
  • notes000000001.md

Safety Features

  1. Source File Preservation: Read-only access to source files; never deletes or modifies originals
  2. Duplicate Detection: Automatically skips files that already have N-digit sequence suffixes to avoid re-splitting
  3. Small File Skip: Files ≤ max-size are automatically skipped (no need to split)
  4. Sequential Processing: Processes files one at a time to ensure stability
  5. Data Validation: Compares total size/record count before and after splitting; reports verification results
  6. UTF-8 Encoding: Forces UTF-8 for all read/write operations to avoid encoding issues on Windows

Notes

  • Console may display garbled Chinese characters on Windows, but functionality is unaffected
  • If a single data block/paragraph exceeds max-size, it becomes its own chunk (integrity takes priority over size limits)
  • Output folder is automatically created if it doesn't exist
  • License: MIT-0
安全使用建议
This skill appears coherent and limited to local file processing. Before installing/using it: (1) inspect the full scripts/split_files.py file (the provided view was truncated) to confirm no unexpected behavior later in the file; (2) run with --dry-run first to verify which files would be split; (3) back up important source files and pick an isolated output folder to avoid accidental overwrite/collisions; (4) note the script is non-recursive and will only process formats listed — check the JSON handling behavior (it will pick the longest list value from an object) to ensure that matches your data; (5) ensure you have a local Python environment and sufficient disk space. If you expect recursive traversal, networked storage, or large-scale automation, review/modify the script accordingly.
功能分析
Type: OpenClaw Skill Name: file-splitter Version: 1.1.0 The file-splitter skill is a utility for segmenting large JSON, Markdown, and text files into smaller chunks based on semantic boundaries. The Python script (split_files.py) uses standard libraries to perform file I/O and data validation without any network access, shell execution, or access to sensitive system directories.
能力评估
Purpose & Capability
Name/description (split JSON/MD/TXT into semantic chunks) align with the included Python script. The script implements JSON array splitting, markdown heading-based splitting, and text-paragraph splitting — all expected for this purpose. No unrelated binaries, env vars, or config paths are requested.
Instruction Scope
SKILL.md describes invoking the bundled Python script with input/output folders and options; the script operates only on files in the provided input folder (non-recursive), writes to the specified output folder, and documents safety features (read-only originals, dry-run). The instructions do not ask the agent to read other system files, credentials, or send data externally.
Install Mechanism
No install spec is present (instruction-only + included script). No downloads or package installs are required. Running the script requires only a Python runtime which is appropriate for a Python utility.
Credentials
The skill requests no environment variables, credentials, or config paths. All file operations are performed on user-supplied input/output directories; there is no use of secrets or external service credentials.
Persistence & Privilege
The skill is not always-included and does not request persistent privileges. It does create output files in the specified folder but does not modify or delete source files according to the code and SKILL.md.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install file-splitter
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /file-splitter 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.1.0
English version - Universal file splitter with semantic boundary detection for global users
v1.0.0
Initial release - JSON/MD/TXT file splitter with semantic boundary detection, supports semantic boundary detection for JSON arrays, Markdown headings, and TXT paragraphs
元数据
Slug file-splitter
版本 1.1.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 2
常见问题

File Splitter 是什么?

Split large files into smaller chunks with semantic boundary detection. Supports JSON, Markdown, and TXT formats. Preserves data integrity by splitting at na... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 116 次。

如何安装 File Splitter?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install file-splitter」即可一键安装,无需额外配置。

File Splitter 是免费的吗?

是的,File Splitter 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

File Splitter 支持哪些平台?

File Splitter 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 File Splitter?

由 ExpeditionHub(@expeditionhub)开发并维护,当前版本 v1.1.0。

💬 留言讨论