← Back to Skills Marketplace

Document Handler

Name: Document Handler
Author: neckr0ik

by Neckr0ik · GitHub ↗ · v1.0.0

cross-platform ✓ Security Clean

360

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install document-handler

Description

Read, extract text and metadata, and convert documents in formats like PDF, DOCX, XLSX, PPTX, EPUB, RTF, and OpenDocument.

README (SKILL.md)

Document Handler

Extract text, metadata, and content from any document format.

Supported Formats

Format	Extensions	Text Extract	Metadata	Convert
PDF	.pdf	✅ pdftotext	✅ pdfinfo	✅ pdftoppm
Word	.docx	✅ unzip + xml	✅	✅
Excel	.xlsx	✅ unzip + xml	✅	✅
PowerPoint	.pptx	✅ unzip + xml	✅	✅
EPUB	.epub	✅ unzip + html	✅	✅
RTF	.rtf	✅ textutil	✅	✅
OpenDocument	.odt, .ods, .odp	✅ unzip + xml	✅	✅

Quick Commands

PDF

# Extract text
pdftotext -layout input.pdf output.txt

# Get metadata
pdfinfo input.pdf

# Convert to images (for OCR or viewing)
pdftoppm -png input.pdf output_prefix

# Extract specific pages
pdftotext -f 5 -l 10 -layout input.pdf output.txt

DOCX/XLSX/PPTX (Office Open XML)

# Extract text from DOCX
unzip -p input.docx word/document.xml | sed 's/\x3C[^>]*>//g' | tr -s ' \
'

# Extract text from XLSX (all sheets)
unzip -p input.xlsx xl/sharedStrings.xml | sed 's/\x3C[^>]*>//g' | tr -s '\
'

# Extract text from PPTX
unzip -p input.pptx ppt/slides/*.xml | sed 's/\x3C[^>]*>//g' | tr -s ' \
'

# Get metadata
unzip -p input.docx docProps/core.xml

RTF (macOS)

# Convert RTF to plain text
textutil -convert txt input.rtf -output output.txt

# Convert RTF to HTML
textutil -convert html input.rtf -output output.html

EPUB

# Extract and read EPUB content
unzip -l input.epub                    # List contents
unzip -p input.epub "*.html" | lynx -stdin -dump  # Text via lynx
unzip -p input.epub "*.xhtml" | sed 's/\x3C[^>]*>//g'  # Raw text

OpenDocument (ODT/ODS/ODP)

# Extract text from ODT
unzip -p input.odt content.xml | sed 's/\x3C[^>]*>//g' | tr -s ' \
'

# Extract from ODS
unzip -p input.ods content.xml | sed 's/\x3C[^>]*>//g'

# Get metadata
unzip -p input.odt meta.xml

Scripts

extract_document.sh

Extracts text and metadata from any supported document format.

~/Dropbox/jarvis/skills/document-handler/scripts/extract_document.sh \x3Cfile>

Output:

Text content to stdout
Metadata as JSON comments

pdf_to_images.sh

Converts PDF pages to images for OCR or visual processing.

~/Dropbox/jarvis/skills/document-handler/scripts/pdf_to_images.sh \x3Cpdf> \x3Coutput_dir> [dpi]

Workflow

Identify format — Check file extension
Extract text — Use appropriate tool
Get metadata — Author, date, pages, etc.
Process content — Summarize, search, transform

Notes

PDFs with scanned images need OCR (pdftoppm + tesseract)
Encrypted PDFs require password
Complex formatting may be lost in text extraction
For tables in PDFs, consider tabula or camelot

Usage Guidance

This skill appears to do what it says: extract text/metadata and convert documents. Before installing, be aware of the following: (1) it relies on many external CLI tools (pdftotext, pdfinfo, pdftoppm, unzip, textutil, lynx, tesseract, etc.) which are not declared — make sure those tools are available on your system or the commands will fail; (2) textutil is macOS-specific and some examples assume tools that may not exist on Linux/Windows; (3) extracted metadata can contain sensitive info (author, timestamps) — avoid passing files with secrets unless you trust the runtime; (4) the SKILL.md states it triggers on mentions of file paths, so consider whether you want automatic invocation in your agent. If you need higher assurance, review and run the two included scripts locally in a safe environment to confirm behavior.

Capability Analysis

Type: OpenClaw Skill Name: document-handler Version: 1.0.0 The document-handler skill bundle provides legitimate tools for extracting text and metadata from various document formats (PDF, Office, EPUB, etc.) using standard command-line utilities like pdftotext, unzip, and pdftoppm. The included scripts (extract_document.sh and pdf_to_images.sh) perform local file processing as described, with no evidence of network exfiltration, credential theft, or malicious intent.

Capability Assessment

ℹ Purpose & Capability

The name/description (document extraction and conversion) aligns with the included scripts and SKILL.md examples. However, the skill references many external CLI tools (pdftotext, pdfinfo, pdftoppm, unzip, textutil, lynx, tesseract, etc.) but declares no required binaries; the absence of declared required binaries is a documentation/packaging omission rather than a functional mismatch.

✓ Instruction Scope

SKILL.md and scripts explicitly instruct the agent to read local files, extract metadata and text, and convert PDFs to images. These actions are within the stated purpose. The README triggers on mentions of file paths which could cause frequent activations, but that behavior is consistent with a document-handler skill.

✓ Install Mechanism

There is no install spec (instruction-only plus two local scripts). Nothing is downloaded or written by an installer. The scripts only call local command-line tools; no remote code fetch or archive extraction from external URLs is present.

✓ Credentials

The skill requests no environment variables or credentials and the scripts do not read any env vars or config paths. This is proportionate to the document-processing purpose.

✓ Persistence & Privilege

The skill is not always-enabled and does not request elevated persistence. It does include a trigger definition (activate on mentions of document files) which is normal for an invocable skill; nothing in the files attempts to modify other skills or system-wide settings.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install document-handler
After installation, invoke the skill by name or use /document-handler
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

Initial release - extract text/metadata from PDF, DOCX, XLSX, PPTX, EPUB, RTF, ODT/ODS/ODP

Metadata

Slug document-handler

Version 1.0.0

License —

All-time Installs 2

Active Installs 2

Total Versions 1

Frequently Asked Questions

What is Document Handler?

Read, extract text and metadata, and convert documents in formats like PDF, DOCX, XLSX, PPTX, EPUB, RTF, and OpenDocument. It is an AI Agent Skill for Claude Code / OpenClaw, with 360 downloads so far.

How do I install Document Handler?

Run "/install document-handler" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Document Handler free?

Yes, Document Handler is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Document Handler support?

Document Handler is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Document Handler?

It is built and maintained by Neckr0ik (@neckr0ik); the current version is v1.0.0.

More Skills

Document Handler

Document Handler

Supported Formats

Quick Commands

PDF

DOCX/XLSX/PPTX (Office Open XML)

RTF (macOS)

EPUB

OpenDocument (ODT/ODS/ODP)

Scripts

extract_document.sh

pdf_to_images.sh

Workflow

Notes

What is Document Handler?

How do I install Document Handler?

Is Document Handler free?

Which platforms does Document Handler support?

Who created Document Handler?

💬 Comments