/install document-handler
Document Handler
Extract text, metadata, and content from any document format.
Supported Formats
| Format | Extensions | Text Extract | Metadata | Convert |
|---|---|---|---|---|
| ✅ pdftotext | ✅ pdfinfo | ✅ pdftoppm | ||
| Word | .docx | ✅ unzip + xml | ✅ | ✅ |
| Excel | .xlsx | ✅ unzip + xml | ✅ | ✅ |
| PowerPoint | .pptx | ✅ unzip + xml | ✅ | ✅ |
| EPUB | .epub | ✅ unzip + html | ✅ | ✅ |
| RTF | .rtf | ✅ textutil | ✅ | ✅ |
| OpenDocument | .odt, .ods, .odp | ✅ unzip + xml | ✅ | ✅ |
Quick Commands
# Extract text
pdftotext -layout input.pdf output.txt
# Get metadata
pdfinfo input.pdf
# Convert to images (for OCR or viewing)
pdftoppm -png input.pdf output_prefix
# Extract specific pages
pdftotext -f 5 -l 10 -layout input.pdf output.txt
DOCX/XLSX/PPTX (Office Open XML)
# Extract text from DOCX
unzip -p input.docx word/document.xml | sed 's/\x3C[^>]*>//g' | tr -s ' \
'
# Extract text from XLSX (all sheets)
unzip -p input.xlsx xl/sharedStrings.xml | sed 's/\x3C[^>]*>//g' | tr -s '\
'
# Extract text from PPTX
unzip -p input.pptx ppt/slides/*.xml | sed 's/\x3C[^>]*>//g' | tr -s ' \
'
# Get metadata
unzip -p input.docx docProps/core.xml
RTF (macOS)
# Convert RTF to plain text
textutil -convert txt input.rtf -output output.txt
# Convert RTF to HTML
textutil -convert html input.rtf -output output.html
EPUB
# Extract and read EPUB content
unzip -l input.epub # List contents
unzip -p input.epub "*.html" | lynx -stdin -dump # Text via lynx
unzip -p input.epub "*.xhtml" | sed 's/\x3C[^>]*>//g' # Raw text
OpenDocument (ODT/ODS/ODP)
# Extract text from ODT
unzip -p input.odt content.xml | sed 's/\x3C[^>]*>//g' | tr -s ' \
'
# Extract from ODS
unzip -p input.ods content.xml | sed 's/\x3C[^>]*>//g'
# Get metadata
unzip -p input.odt meta.xml
Scripts
extract_document.sh
Extracts text and metadata from any supported document format.
~/Dropbox/jarvis/skills/document-handler/scripts/extract_document.sh \x3Cfile>
Output:
- Text content to stdout
- Metadata as JSON comments
pdf_to_images.sh
Converts PDF pages to images for OCR or visual processing.
~/Dropbox/jarvis/skills/document-handler/scripts/pdf_to_images.sh \x3Cpdf> \x3Coutput_dir> [dpi]
Workflow
- Identify format — Check file extension
- Extract text — Use appropriate tool
- Get metadata — Author, date, pages, etc.
- Process content — Summarize, search, transform
Notes
- PDFs with scanned images need OCR (pdftoppm + tesseract)
- Encrypted PDFs require password
- Complex formatting may be lost in text extraction
- For tables in PDFs, consider tabula or camelot
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install document-handler - After installation, invoke the skill by name or use
/document-handler - Provide required inputs per the skill's parameter spec and get structured output
What is Document Handler?
Read, extract text and metadata, and convert documents in formats like PDF, DOCX, XLSX, PPTX, EPUB, RTF, and OpenDocument. It is an AI Agent Skill for Claude Code / OpenClaw, with 360 downloads so far.
How do I install Document Handler?
Run "/install document-handler" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Document Handler free?
Yes, Document Handler is completely free (open-source). You can download, install and use it at no cost.
Which platforms does Document Handler support?
Document Handler is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Document Handler?
It is built and maintained by Neckr0ik (@neckr0ik); the current version is v1.0.0.