← 返回 Skills 市场
michealxie001

Office Document Extractor

作者 michealxie001 · GitHub ↗ · v1.0.1 · MIT-0
cross-platform ✓ 安全检测通过
79
总下载
0
收藏
0
当前安装
2
版本数
在 OpenClaw 中安装
/install office-doc-extractor
功能描述
Convert Microsoft Office documents (DOCX, XLSX, PPTX) to Markdown without any external dependencies. Use when the user needs to extract text from Word docume...
使用说明 (SKILL.md)

Office Document Extractor

Zero-dependency converter for Microsoft Office documents. Extracts text and structure from DOCX, XLSX, and PPTX files into clean Markdown.

Quick Start

# Single file
python3 scripts/main.py report.docx -o report.md

# Batch convert a directory
python3 scripts/main.py ./documents --batch -o ./markdown

Supported Formats

Format Extension Output
Word .docx Headings, paragraphs
Excel .xlsx Tables (one per sheet)
PowerPoint .pptx Slides as sections

How It Works

  • DOCX: Parses the ZIP archive's XML directly using Python's zipfile and xml.etree
  • XLSX: Uses bundled openpyxl (pure Python, no C extensions)
  • PPTX: Parses the ZIP archive's slide XML directly

No external commands, no network calls, no pip install required.

Usage

Single File

python3 scripts/main.py \x3Cinput_file> [-o \x3Coutput.md>]

Auto-detects format from file extension. If -o is omitted, outputs to \x3Cinput>.md.

Batch Conversion

python3 scripts/main.py \x3Cinput_directory> --batch [-o \x3Coutput_directory>]

Converts all .docx, .xlsx, .pptx files in the directory. Results saved to markdown_output/ by default.

Resources

scripts/

  • main.py — Unified CLI for single-file and batch conversion
  • docx_extractor.py — DOCX → Markdown (standard library only)
  • xlsx_extractor.py — XLSX → Markdown tables (bundled openpyxl)
  • pptx_extractor.py — PPTX → Markdown (standard library only)

Bundled Dependencies

  • openpyxl/ — Pure Python Excel library (v3.1.5)
  • et_xmlfile/ — openpyxl dependency (pure Python)

Limitations

  • Does not extract images or embedded objects (text only)
  • Does not preserve complex formatting (colors, fonts, layouts)
  • Does not handle encrypted/password-protected files
  • No OCR for scanned documents (use OpenClaw's native pdf tool for that)

Why This Skill?

Existing markitdown-based skills require pip install or external CLI tools, which triggers ClawHub security warnings. This skill is 100% self-contained — install it and use it immediately, even offline.

安全使用建议
This looks consistent with an offline document converter. Before installing, be comfortable running the bundled Python code, use it only on documents you intend to extract, and remember that the generated Markdown may contain sensitive or untrusted text.
功能分析
Type: OpenClaw Skill Name: office-doc-extractor Version: 1.0.1 The office-doc-extractor skill is a functional tool designed to convert DOCX, XLSX, and PPTX files into Markdown. It uses the Python standard library (zipfile, xml.etree) for Word and PowerPoint files and includes a bundled version of the openpyxl library for Excel files to maintain its 'zero-dependency' claim. The code logic in scripts/main.py, scripts/docx_extractor.py, and scripts/xlsx_extractor.py is transparent and strictly aligned with the stated purpose. No evidence of data exfiltration, network activity, or malicious prompt injection was found.
能力评估
Purpose & Capability
The documented purpose, CLI, and visible source align: it converts DOCX/XLSX/PPTX files to Markdown and writes local output files. Users should notice that converted Markdown can contain the full text of private documents.
Instruction Scope
The instructions are user-directed examples for running the converter; there is no evidence of hidden goal changes, forced autonomous execution, or prompt-injection style instructions.
Install Mechanism
There is no install spec or network download, but the skill is executed as local Python code and bundles openpyxl/et_xmlfile. The registry source is unknown and no homepage is provided, so dependency provenance is less transparent.
Credentials
Local file reads and Markdown writes are proportionate to document conversion. Batch mode can process all supported files in a selected directory, so users should scope input and output paths carefully.
Persistence & Privilege
No credentials, privileged APIs, background workers, or ongoing persistence are shown. The only persistence evidenced is user-directed creation of Markdown output files.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install office-doc-extractor
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /office-doc-extractor 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.1
Fix: Removed pycache, repackaged clean build
v1.0.0
- Initial release of office-doc-extractor: convert DOCX, XLSX, and PPTX files to Markdown using a pure Python, zero-dependency approach. - Supports extraction of text and structure: Word headings/paragraphs, Excel tables, and PowerPoint slides. - Works offline—no pip installs, subprocess calls, or network access required. - Includes unified CLI for both single-file and batch directory conversion. - Bundles pure Python openpyxl and et_xmlfile for Excel support.
元数据
Slug office-doc-extractor
版本 1.0.1
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 2
常见问题

Office Document Extractor 是什么?

Convert Microsoft Office documents (DOCX, XLSX, PPTX) to Markdown without any external dependencies. Use when the user needs to extract text from Word docume... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 79 次。

如何安装 Office Document Extractor?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install office-doc-extractor」即可一键安装,无需额外配置。

Office Document Extractor 是免费的吗?

是的,Office Document Extractor 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Office Document Extractor 支持哪些平台?

Office Document Extractor 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Office Document Extractor?

由 michealxie001(@michealxie001)开发并维护,当前版本 v1.0.1。

💬 留言讨论