/install doc-to-json
Doc to JSON
Convert office documents to structured JSON using MinerU as the extraction engine.
Supported Formats
.doc/.docx— Word documents.pdf— PDF files.xlsx/.xls— Excel spreadsheets
Prerequisites
- mineru-open-api CLI must be installed (v0.5+)
- MINERU_TOKEN environment variable must be set
- Check:
mineru-open-api version
Quick Usage
# Full pipeline: document -> MinerU Markdown -> JSON
python3 scripts/doc_to_json.py /path/to/file.docx -o output.json
# Keep temp files for debugging
python3 scripts/doc_to_json.py /path/to/file.pdf -o out.json --keep-temp
Manual Two-Step Pipeline
If the full pipeline script fails, run steps manually:
Step 1: MinerU Extract
export MINERU_TOKEN="your_token"
mineru-open-api extract input_file.pdf -o /tmp/mineru_out/
Output: .md file in the output directory.
Step 2: Markdown -> JSON
python3 scripts/markdown_to_json.py /tmp/mineru_out/output.md -o output.json
JSON Structure
The output JSON preserves:
- Metadata fields — course name, code, credits, hours, etc. (extracted from plain text)
- Heading hierarchy — 一、二、三... sections become nested keys
- Tables — stored as array of arrays (row cells), keyed as
"表格" - Numbered lists — stored as array of strings under section title
- Paragraph text — merged into
"text"field per section
For Knowledge Base Preparation
After JSON conversion, common next steps:
- Chunk by section — split the JSON into per-section documents for embedding
- Table extraction — convert
"表格"arrays to flattened rows for database import - Metadata extraction — pull course code, name, etc. as document metadata
- Embedding — feed cleaned text chunks into vector database
See references/kb-prep.md for detailed KB preparation patterns.
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install doc-to-json - After installation, invoke the skill by name or use
/doc-to-json - Provide required inputs per the skill's parameter spec and get structured output
What is Doc to JSON?
Convert documents (docx, doc, PDF, xlsx, xls) to structured JSON via MinerU. Full pipeline: file to mineru-open-api extract to Markdown then to JSON. Use whe... It is an AI Agent Skill for Claude Code / OpenClaw, with 57 downloads so far.
How do I install Doc to JSON?
Run "/install doc-to-json" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Doc to JSON free?
Yes, Doc to JSON is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Doc to JSON support?
Doc to JSON is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Doc to JSON?
It is built and maintained by 梁辉盛 (@kounlong); the current version is v1.0.0.