/install docx-pdf-knowledge-parser
name: docx-pdf-knowledge-parser description: parse local docx and pdf files into report-first knowledge artifacts. use when chatgpt needs to extract text from uploaded or locally available attachments, generate ingest-report.md, kb-items.jsonl, failed-items.jsonl, and memory.candidate.md without directly writing memory.md.
Docx PDF Knowledge Parser
Use this skill to turn local or uploaded .docx and .pdf files into structured, reviewable knowledge outputs.
What this skill does
- Accept local or already-available
.docxand.pdffiles. - Classify files into parseable, manual-review, or failed.
- Parse
.docxand.pdfin v1.0. - Produce report-first outputs instead of writing
MEMORY.mddirectly. - Preserve failures and uncertainty instead of guessing content.
Supported v1.0 scope
Inputs
- Local
.docxfile path - Local
.pdffile path - A batch of local
.docxand.pdffiles in one directory
Parsing
.docx.pdf
Outputs
ingest-report.mdkb-items.jsonlfailed-items.jsonlMEMORY.candidate.md
Required behavior
- Only process files that are already available locally or have already been provided to the runtime.
- Do not claim file content was learned unless text was actually extracted.
- Default to report-first. Do not write
MEMORY.mdin v1.0. - Record every failed file with a concrete reason.
- Prefer plain-text summaries over complex cards when reporting progress.
File routing rules
Parseable
Treat these as parseable in v1.0:
.docx.pdf
Manual-review
Route here when the file is out of scope or low-confidence in v1.0:
.pptx- images
- scans with no extractable text
- archives
- unusual file types
Failed
Route here when the file cannot be opened, parsed, or extracted successfully.
Standard workflow
- Resolve input type.
- Single file path -> process one file
- Directory path -> enumerate supported files
- Create a batch record.
- Generate
batch_id - Record
started_at
- Generate
- Build a manifest.
- File name
- File path
- File type
- Route decision
- Attempt extraction.
.docx-> useparsers/parse_docx.py.pdf-> useparsers/parse_pdf.py
- Produce structured outputs.
- success -> append to
kb-items.jsonl - failure -> append to
failed-items.jsonl
- success -> append to
- Summarize the batch.
- Write
ingest-report.md - Write
MEMORY.candidate.md
- Write
- Finish the batch.
- Record
finished_at - Never auto-write
MEMORY.md
- Record
Output contracts
kb-items.jsonl
Write one JSON object per successfully extracted knowledge item with at least:
batch_idsource_filesource_pathfile_typetopiccontent_typesummaryextracted_atconfidence
failed-items.jsonl
Write one JSON object per failed file with at least:
batch_idsource_filesource_pathfile_typefailure_reasonerror_detailsuggested_actionfailed_at
MEMORY.candidate.md
Include:
- batch header (
batch_id,started_at,finished_at,source_directoryorsource_file) - grouped knowledge summaries
- source references
- confidence notes
- items needing review
ingest-report.md
Include:
- Batch summary
- Input scope
- File counts and routing counts
- Successful extraction summary
- Failures and risks
- Recommended next actions
Safety rules
- Never invent text that was not extracted.
- If parsing fails, say so plainly and log it.
- Treat filenames as hints only, never as proof of document contents.
- Keep sensitive data out of
MEMORY.candidate.mdunless the workflow explicitly allows it.
Included files
run.py: minimal batch runner for local testingparsers/parse_docx.py: docx text extraction helperparsers/parse_pdf.py: pdf text extraction helperreferences/output_examples.md: sample output shapes and field guidanceREADME.md: setup and usage notes
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install docx-pdf-knowledge-parser - 安装完成后,直接呼叫该 Skill 的名称或使用
/docx-pdf-knowledge-parser触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
docx-pdf-knowledge-parser 是什么?
Parse local `.docx` and `.pdf` files into structured knowledge artifacts with detailed reports, tracking successes, failures, and summaries without auto-writ... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 153 次。
如何安装 docx-pdf-knowledge-parser?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install docx-pdf-knowledge-parser」即可一键安装,无需额外配置。
docx-pdf-knowledge-parser 是免费的吗?
是的,docx-pdf-knowledge-parser 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
docx-pdf-knowledge-parser 支持哪些平台?
docx-pdf-knowledge-parser 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 docx-pdf-knowledge-parser?
由 kaiasdobi(@kaiasdobi)开发并维护,当前版本 v1.0.1。