← Back to Skills Marketplace
michealxie001

Office Document Extractor

by michealxie001 · GitHub ↗ · v1.0.1 · MIT-0
cross-platform ✓ Security Clean
79
Downloads
0
Stars
0
Active Installs
2
Versions
Install in OpenClaw
/install office-doc-extractor
Description
Convert Microsoft Office documents (DOCX, XLSX, PPTX) to Markdown without any external dependencies. Use when the user needs to extract text from Word docume...
README (SKILL.md)

Office Document Extractor

Zero-dependency converter for Microsoft Office documents. Extracts text and structure from DOCX, XLSX, and PPTX files into clean Markdown.

Quick Start

# Single file
python3 scripts/main.py report.docx -o report.md

# Batch convert a directory
python3 scripts/main.py ./documents --batch -o ./markdown

Supported Formats

Format Extension Output
Word .docx Headings, paragraphs
Excel .xlsx Tables (one per sheet)
PowerPoint .pptx Slides as sections

How It Works

  • DOCX: Parses the ZIP archive's XML directly using Python's zipfile and xml.etree
  • XLSX: Uses bundled openpyxl (pure Python, no C extensions)
  • PPTX: Parses the ZIP archive's slide XML directly

No external commands, no network calls, no pip install required.

Usage

Single File

python3 scripts/main.py \x3Cinput_file> [-o \x3Coutput.md>]

Auto-detects format from file extension. If -o is omitted, outputs to \x3Cinput>.md.

Batch Conversion

python3 scripts/main.py \x3Cinput_directory> --batch [-o \x3Coutput_directory>]

Converts all .docx, .xlsx, .pptx files in the directory. Results saved to markdown_output/ by default.

Resources

scripts/

  • main.py — Unified CLI for single-file and batch conversion
  • docx_extractor.py — DOCX → Markdown (standard library only)
  • xlsx_extractor.py — XLSX → Markdown tables (bundled openpyxl)
  • pptx_extractor.py — PPTX → Markdown (standard library only)

Bundled Dependencies

  • openpyxl/ — Pure Python Excel library (v3.1.5)
  • et_xmlfile/ — openpyxl dependency (pure Python)

Limitations

  • Does not extract images or embedded objects (text only)
  • Does not preserve complex formatting (colors, fonts, layouts)
  • Does not handle encrypted/password-protected files
  • No OCR for scanned documents (use OpenClaw's native pdf tool for that)

Why This Skill?

Existing markitdown-based skills require pip install or external CLI tools, which triggers ClawHub security warnings. This skill is 100% self-contained — install it and use it immediately, even offline.

Usage Guidance
This looks consistent with an offline document converter. Before installing, be comfortable running the bundled Python code, use it only on documents you intend to extract, and remember that the generated Markdown may contain sensitive or untrusted text.
Capability Analysis
Type: OpenClaw Skill Name: office-doc-extractor Version: 1.0.1 The office-doc-extractor skill is a functional tool designed to convert DOCX, XLSX, and PPTX files into Markdown. It uses the Python standard library (zipfile, xml.etree) for Word and PowerPoint files and includes a bundled version of the openpyxl library for Excel files to maintain its 'zero-dependency' claim. The code logic in scripts/main.py, scripts/docx_extractor.py, and scripts/xlsx_extractor.py is transparent and strictly aligned with the stated purpose. No evidence of data exfiltration, network activity, or malicious prompt injection was found.
Capability Assessment
Purpose & Capability
The documented purpose, CLI, and visible source align: it converts DOCX/XLSX/PPTX files to Markdown and writes local output files. Users should notice that converted Markdown can contain the full text of private documents.
Instruction Scope
The instructions are user-directed examples for running the converter; there is no evidence of hidden goal changes, forced autonomous execution, or prompt-injection style instructions.
Install Mechanism
There is no install spec or network download, but the skill is executed as local Python code and bundles openpyxl/et_xmlfile. The registry source is unknown and no homepage is provided, so dependency provenance is less transparent.
Credentials
Local file reads and Markdown writes are proportionate to document conversion. Batch mode can process all supported files in a selected directory, so users should scope input and output paths carefully.
Persistence & Privilege
No credentials, privileged APIs, background workers, or ongoing persistence are shown. The only persistence evidenced is user-directed creation of Markdown output files.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install office-doc-extractor
  3. After installation, invoke the skill by name or use /office-doc-extractor
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.1
Fix: Removed pycache, repackaged clean build
v1.0.0
- Initial release of office-doc-extractor: convert DOCX, XLSX, and PPTX files to Markdown using a pure Python, zero-dependency approach. - Supports extraction of text and structure: Word headings/paragraphs, Excel tables, and PowerPoint slides. - Works offline—no pip installs, subprocess calls, or network access required. - Includes unified CLI for both single-file and batch directory conversion. - Bundles pure Python openpyxl and et_xmlfile for Excel support.
Metadata
Slug office-doc-extractor
Version 1.0.1
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 2
Frequently Asked Questions

What is Office Document Extractor?

Convert Microsoft Office documents (DOCX, XLSX, PPTX) to Markdown without any external dependencies. Use when the user needs to extract text from Word docume... It is an AI Agent Skill for Claude Code / OpenClaw, with 79 downloads so far.

How do I install Office Document Extractor?

Run "/install office-doc-extractor" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Office Document Extractor free?

Yes, Office Document Extractor is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Office Document Extractor support?

Office Document Extractor is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Office Document Extractor?

It is built and maintained by michealxie001 (@michealxie001); the current version is v1.0.1.

💬 Comments