← 返回 Skills 市场
PDF
360
总下载
0
收藏
2
当前安装
1
版本数
在 OpenClaw 中安装
/install document-handler
功能描述
Read, extract text and metadata, and convert documents in formats like PDF, DOCX, XLSX, PPTX, EPUB, RTF, and OpenDocument.
使用说明 (SKILL.md)
Document Handler
Extract text, metadata, and content from any document format.
Supported Formats
| Format | Extensions | Text Extract | Metadata | Convert |
|---|---|---|---|---|
| ✅ pdftotext | ✅ pdfinfo | ✅ pdftoppm | ||
| Word | .docx | ✅ unzip + xml | ✅ | ✅ |
| Excel | .xlsx | ✅ unzip + xml | ✅ | ✅ |
| PowerPoint | .pptx | ✅ unzip + xml | ✅ | ✅ |
| EPUB | .epub | ✅ unzip + html | ✅ | ✅ |
| RTF | .rtf | ✅ textutil | ✅ | ✅ |
| OpenDocument | .odt, .ods, .odp | ✅ unzip + xml | ✅ | ✅ |
Quick Commands
# Extract text
pdftotext -layout input.pdf output.txt
# Get metadata
pdfinfo input.pdf
# Convert to images (for OCR or viewing)
pdftoppm -png input.pdf output_prefix
# Extract specific pages
pdftotext -f 5 -l 10 -layout input.pdf output.txt
DOCX/XLSX/PPTX (Office Open XML)
# Extract text from DOCX
unzip -p input.docx word/document.xml | sed 's/\x3C[^>]*>//g' | tr -s ' \
'
# Extract text from XLSX (all sheets)
unzip -p input.xlsx xl/sharedStrings.xml | sed 's/\x3C[^>]*>//g' | tr -s '\
'
# Extract text from PPTX
unzip -p input.pptx ppt/slides/*.xml | sed 's/\x3C[^>]*>//g' | tr -s ' \
'
# Get metadata
unzip -p input.docx docProps/core.xml
RTF (macOS)
# Convert RTF to plain text
textutil -convert txt input.rtf -output output.txt
# Convert RTF to HTML
textutil -convert html input.rtf -output output.html
EPUB
# Extract and read EPUB content
unzip -l input.epub # List contents
unzip -p input.epub "*.html" | lynx -stdin -dump # Text via lynx
unzip -p input.epub "*.xhtml" | sed 's/\x3C[^>]*>//g' # Raw text
OpenDocument (ODT/ODS/ODP)
# Extract text from ODT
unzip -p input.odt content.xml | sed 's/\x3C[^>]*>//g' | tr -s ' \
'
# Extract from ODS
unzip -p input.ods content.xml | sed 's/\x3C[^>]*>//g'
# Get metadata
unzip -p input.odt meta.xml
Scripts
extract_document.sh
Extracts text and metadata from any supported document format.
~/Dropbox/jarvis/skills/document-handler/scripts/extract_document.sh \x3Cfile>
Output:
- Text content to stdout
- Metadata as JSON comments
pdf_to_images.sh
Converts PDF pages to images for OCR or visual processing.
~/Dropbox/jarvis/skills/document-handler/scripts/pdf_to_images.sh \x3Cpdf> \x3Coutput_dir> [dpi]
Workflow
- Identify format — Check file extension
- Extract text — Use appropriate tool
- Get metadata — Author, date, pages, etc.
- Process content — Summarize, search, transform
Notes
- PDFs with scanned images need OCR (pdftoppm + tesseract)
- Encrypted PDFs require password
- Complex formatting may be lost in text extraction
- For tables in PDFs, consider tabula or camelot
安全使用建议
This skill appears to do what it says: extract text/metadata and convert documents. Before installing, be aware of the following: (1) it relies on many external CLI tools (pdftotext, pdfinfo, pdftoppm, unzip, textutil, lynx, tesseract, etc.) which are not declared — make sure those tools are available on your system or the commands will fail; (2) textutil is macOS-specific and some examples assume tools that may not exist on Linux/Windows; (3) extracted metadata can contain sensitive info (author, timestamps) — avoid passing files with secrets unless you trust the runtime; (4) the SKILL.md states it triggers on mentions of file paths, so consider whether you want automatic invocation in your agent. If you need higher assurance, review and run the two included scripts locally in a safe environment to confirm behavior.
功能分析
Type: OpenClaw Skill
Name: document-handler
Version: 1.0.0
The document-handler skill bundle provides legitimate tools for extracting text and metadata from various document formats (PDF, Office, EPUB, etc.) using standard command-line utilities like pdftotext, unzip, and pdftoppm. The included scripts (extract_document.sh and pdf_to_images.sh) perform local file processing as described, with no evidence of network exfiltration, credential theft, or malicious intent.
能力评估
Purpose & Capability
The name/description (document extraction and conversion) aligns with the included scripts and SKILL.md examples. However, the skill references many external CLI tools (pdftotext, pdfinfo, pdftoppm, unzip, textutil, lynx, tesseract, etc.) but declares no required binaries; the absence of declared required binaries is a documentation/packaging omission rather than a functional mismatch.
Instruction Scope
SKILL.md and scripts explicitly instruct the agent to read local files, extract metadata and text, and convert PDFs to images. These actions are within the stated purpose. The README triggers on mentions of file paths which could cause frequent activations, but that behavior is consistent with a document-handler skill.
Install Mechanism
There is no install spec (instruction-only plus two local scripts). Nothing is downloaded or written by an installer. The scripts only call local command-line tools; no remote code fetch or archive extraction from external URLs is present.
Credentials
The skill requests no environment variables or credentials and the scripts do not read any env vars or config paths. This is proportionate to the document-processing purpose.
Persistence & Privilege
The skill is not always-enabled and does not request elevated persistence. It does include a trigger definition (activate on mentions of document files) which is normal for an invocable skill; nothing in the files attempts to modify other skills or system-wide settings.
如何使用
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install document-handler - 安装完成后,直接呼叫该 Skill 的名称或使用
/document-handler触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release - extract text/metadata from PDF, DOCX, XLSX, PPTX, EPUB, RTF, ODT/ODS/ODP
元数据
常见问题
Document Handler 是什么?
Read, extract text and metadata, and convert documents in formats like PDF, DOCX, XLSX, PPTX, EPUB, RTF, and OpenDocument. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 360 次。
如何安装 Document Handler?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install document-handler」即可一键安装,无需额外配置。
Document Handler 是免费的吗?
是的,Document Handler 完全免费(开源免费),可自由下载、安装和使用。
Document Handler 支持哪些平台?
Document Handler 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Document Handler?
由 Neckr0ik(@neckr0ik)开发并维护,当前版本 v1.0.0。
推荐 Skills