Office Document Extractor
/install office-doc-extractor
Office Document Extractor
Zero-dependency converter for Microsoft Office documents. Extracts text and structure from DOCX, XLSX, and PPTX files into clean Markdown.
Quick Start
# Single file
python3 scripts/main.py report.docx -o report.md
# Batch convert a directory
python3 scripts/main.py ./documents --batch -o ./markdown
Supported Formats
| Format | Extension | Output |
|---|---|---|
| Word | .docx | Headings, paragraphs |
| Excel | .xlsx | Tables (one per sheet) |
| PowerPoint | .pptx | Slides as sections |
How It Works
- DOCX: Parses the ZIP archive's XML directly using Python's
zipfileandxml.etree - XLSX: Uses bundled
openpyxl(pure Python, no C extensions) - PPTX: Parses the ZIP archive's slide XML directly
No external commands, no network calls, no pip install required.
Usage
Single File
python3 scripts/main.py \x3Cinput_file> [-o \x3Coutput.md>]
Auto-detects format from file extension. If -o is omitted, outputs to \x3Cinput>.md.
Batch Conversion
python3 scripts/main.py \x3Cinput_directory> --batch [-o \x3Coutput_directory>]
Converts all .docx, .xlsx, .pptx files in the directory. Results saved to markdown_output/ by default.
Resources
scripts/
- main.py — Unified CLI for single-file and batch conversion
- docx_extractor.py — DOCX → Markdown (standard library only)
- xlsx_extractor.py — XLSX → Markdown tables (bundled openpyxl)
- pptx_extractor.py — PPTX → Markdown (standard library only)
Bundled Dependencies
- openpyxl/ — Pure Python Excel library (v3.1.5)
- et_xmlfile/ — openpyxl dependency (pure Python)
Limitations
- Does not extract images or embedded objects (text only)
- Does not preserve complex formatting (colors, fonts, layouts)
- Does not handle encrypted/password-protected files
- No OCR for scanned documents (use OpenClaw's native
pdftool for that)
Why This Skill?
Existing markitdown-based skills require pip install or external CLI tools, which triggers ClawHub security warnings. This skill is 100% self-contained — install it and use it immediately, even offline.
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install office-doc-extractor - After installation, invoke the skill by name or use
/office-doc-extractor - Provide required inputs per the skill's parameter spec and get structured output
What is Office Document Extractor?
Convert Microsoft Office documents (DOCX, XLSX, PPTX) to Markdown without any external dependencies. Use when the user needs to extract text from Word docume... It is an AI Agent Skill for Claude Code / OpenClaw, with 79 downloads so far.
How do I install Office Document Extractor?
Run "/install office-doc-extractor" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Office Document Extractor free?
Yes, Office Document Extractor is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Office Document Extractor support?
Office Document Extractor is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Office Document Extractor?
It is built and maintained by michealxie001 (@michealxie001); the current version is v1.0.1.