/install doc-to-json
Doc to JSON
Convert office documents to structured JSON using MinerU as the extraction engine.
Supported Formats
.doc/.docx— Word documents.pdf— PDF files.xlsx/.xls— Excel spreadsheets
Prerequisites
- mineru-open-api CLI must be installed (v0.5+)
- MINERU_TOKEN environment variable must be set
- Check:
mineru-open-api version
Quick Usage
# Full pipeline: document -> MinerU Markdown -> JSON
python3 scripts/doc_to_json.py /path/to/file.docx -o output.json
# Keep temp files for debugging
python3 scripts/doc_to_json.py /path/to/file.pdf -o out.json --keep-temp
Manual Two-Step Pipeline
If the full pipeline script fails, run steps manually:
Step 1: MinerU Extract
export MINERU_TOKEN="your_token"
mineru-open-api extract input_file.pdf -o /tmp/mineru_out/
Output: .md file in the output directory.
Step 2: Markdown -> JSON
python3 scripts/markdown_to_json.py /tmp/mineru_out/output.md -o output.json
JSON Structure
The output JSON preserves:
- Metadata fields — course name, code, credits, hours, etc. (extracted from plain text)
- Heading hierarchy — 一、二、三... sections become nested keys
- Tables — stored as array of arrays (row cells), keyed as
"表格" - Numbered lists — stored as array of strings under section title
- Paragraph text — merged into
"text"field per section
For Knowledge Base Preparation
After JSON conversion, common next steps:
- Chunk by section — split the JSON into per-section documents for embedding
- Table extraction — convert
"表格"arrays to flattened rows for database import - Metadata extraction — pull course code, name, etc. as document metadata
- Embedding — feed cleaned text chunks into vector database
See references/kb-prep.md for detailed KB preparation patterns.
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install doc-to-json - 安装完成后,直接呼叫该 Skill 的名称或使用
/doc-to-json触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Doc to JSON 是什么?
Convert documents (docx, doc, PDF, xlsx, xls) to structured JSON via MinerU. Full pipeline: file to mineru-open-api extract to Markdown then to JSON. Use whe... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 57 次。
如何安装 Doc to JSON?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install doc-to-json」即可一键安装,无需额外配置。
Doc to JSON 是免费的吗?
是的,Doc to JSON 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Doc to JSON 支持哪些平台?
Doc to JSON 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Doc to JSON?
由 梁辉盛(@kounlong)开发并维护,当前版本 v1.0.0。