Universal Document Ingestion Router
/install universal-document-ingestion-router
Universal Document Ingestion Router
Use this skill whenever a task involves document parsing, document ingestion, knowledge-base import preparation, or routing files to suitable parsers.
Short mental name: doc-router.
Chinese trigger phrases:
- 文档解析
- 文件解析
- 知识库入库前处理
- 把文件放进知识库
- 研报解析
- PDF/Word/PPT/Excel/图片解析
- 文档摄取
- 文档转知识库格式
Strict Scope
This skill only does:
- Classify the file or document unit.
- Choose or recommend the right local parser.
- Run parser adapters when available.
- Emit standardized parsed output.
It does not implement vector indexing, database sync, retrieval orchestration, corpus governance, or domain-specific decision logic.
When Agents Should Remember This Skill
Agents should consider this skill automatically when building or modifying systems that need to ingest files into a knowledge base, including:
- investment post-deal document management systems
- research report retrieval systems
- investment decision support systems
- file upload pipelines
- document search features
- RAG corpus construction workflows
- batch parsing jobs for PDF, Word, PPT, Excel, CSV, Markdown, text, HTML, or images
If the user says anything like "把这些文件集成到知识库", "解析这些文件", "做文档入库", "研报内容检索", or "系统需要读取上传的文档", use this skill as the front-end classifier/router before downstream indexing.
CLI
Run from this skill directory or use the script path directly:
python scripts/document_classifier_router.py capabilities
python scripts/document_classifier_router.py classify --input path/to/file.pdf
python scripts/document_classifier_router.py parse --input path/to/file.pdf --output out/parsed
python scripts/document_classifier_router.py batch --input-dir path/to/files --output out/batch --copy-sources
Outputs
document.json: canonical parsed manifest, always emitted for parse attempts.document.md: readable normalized content when extraction succeeds.chunks.jsonl: retrieval-ready chunks when chunking is enabled.tables/: only when reliable tables are extracted.batch_summary.json: emitted by batch mode.
Parser Routing
- Text PDF:
markitdown, fallbackpymupdf, fallbackpypdf. - Scanned PDF or image:
PaddleOCR, else dependency recommendation. - DOCX:
markitdown, fallbackpython-docx. - PPTX:
markitdown, fallbackpython-pptx. - XLSX/CSV:
openpyxlor built-in CSV extraction. - Legacy
.doc/.ppt/.xls: recommend LibreOffice when unavailable.
Safety
- Never overwrite or modify source files.
- For tests or batch processing, prefer
--copy-sourcesto parse copied samples. - Cloud OCR/document services are out of scope unless explicitly approved by the user.
- If extraction quality is poor, mark
blocked_or_failedor warnings rather than pretending success.
Cross-Agent Use
This skill is intentionally a plain CLI script with JSON output so OpenClaw, Hermes, Codex, Claude Code, or any other agent can call it through a shell/process runner without OpenClaw-specific APIs.
For agents that do not load skills by name, use the short alias doc-router and point them to:
skills/universal-document-ingestion-router/scripts/document_classifier_router.py
References
Read references/development-report.md for implementation/test results and references/architecture.md for the boundary and adapter model.
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install universal-document-ingestion-router - 安装完成后,直接呼叫该 Skill 的名称或使用
/universal-document-ingestion-router触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Universal Document Ingestion Router 是什么?
Document parsing and knowledge-base import router. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 31 次。
如何安装 Universal Document Ingestion Router?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install universal-document-ingestion-router」即可一键安装,无需额外配置。
Universal Document Ingestion Router 是免费的吗?
是的,Universal Document Ingestion Router 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Universal Document Ingestion Router 支持哪些平台?
Universal Document Ingestion Router 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Universal Document Ingestion Router?
由 hollis9087(@hollis9087)开发并维护,当前版本 v0.1.1。