← Back to Skills Marketplace
xiaoyaliu00

document-reader

by xiaoya · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ Security Clean
257
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install document-reader
Description
通用文档读取工具,支持 PDF/DOCX/XLSX/PPTX/RTF/ODT 等多种文档格式,也支持 ZIP/TAR.GZ/RAR/7Z 等主流压缩包内文档直接读取
README (SKILL.md)

Document Reader - 通用文档读取技能

读取各种格式文档的内容,直接输出文本供 AI 分析。支持压缩包内文档直接读取,无需手动解压。

📦 支持格式

文档

格式 扩展名 说明
PDF .pdf 支持文本提取,依赖 poppler-utils
Microsoft Word .docx 完整提取所有段落文本
Microsoft Excel .xlsx 按 Sheet 输出,每个 Sheet 输出为表格文本
Microsoft PowerPoint .pptx 按 Slide 分块输出
Rich Text Format .rtf
OpenDocument Text .odt
HTML .html/.htm 提取正文文本
纯文本 .txt/.md/.json/.xml/.py/.js 直接读取

压缩包

格式 扩展名 功能
ZIP .zip 列出文件 ➜ 读取指定文档
TAR .tar/.tar.gz/.tgz/.tar.bz2 列出文件 ➜ 读取指定文档
RAR .rar 列出文件 ➜ 读取指定文档
7-Zip .7z 列出文件 ➜ 读取指定文档

🚀 快速开始

依赖安装

Python 包:

pip install textract python-docx openpyxl python-pptx rarfile py7zr --break-system-packages

系统依赖(Ubuntu/Debian):

apt-get install -y poppler-utils antiword unrtf tidy libxml2-dev libxslt1-dev

💡 使用示例

1. 直接读取本地文档

# 读取 PDF 文件
python {baseDir}/scripts/document_reader.py --file /path/to/document.pdf

# 读取 Word 文档
python {baseDir}/scripts/document_reader.py --file /path/to/report.docx

# 读取 Excel 文件(输出带 Sheet 分隔的表格文本)
python {baseDir}/scripts/document_reader.py --file /path/to/data.xlsx

# JSON 格式输出(方便程序处理)
python {baseDir}/scripts/document_reader.py --file /path/to/data.xlsx --format json

2. 处理压缩包

先列出压缩包里有哪些文件:

# 列出 ZIP 包内容
python {baseDir}/scripts/document_reader.py --list /path/to/archive.zip

# 列出 RAR 包内容
python {baseDir}/scripts/document_reader.py --list /path/to/archive.rar

# 列出 7z 包内容
python {baseDir}/scripts/document_reader.py --list /path/to/archive.7z

读取压缩包里的指定文档:

# 读取 ZIP 包内的 Word 文档
python {baseDir}/scripts/document_reader.py --file /path/to/archive.zip --inner-path document.docx

# 读取 7z 包内的 PDF
python {baseDir}/scripts/document_reader.py --file /path/to/archive.7z --inner-path report.pdf

输出示例

读取文档:

=== report.pdf ===

# 项目进度报告

## 本周完成

1. 完成了前端界面开发
2. 修复了三个 Bug
3. 编写了接口文档

...

列出压缩包:

Archive: data.zip
Found 3 file(s):

  readme.txt
  docs/report.pdf
  data/sheet.xlsx

✨ 特性

  • 🎯 开箱即用 — 装完依赖直接用,无需复杂配置
  • 📦 支持压缩包 — 不用手动解压,直接列出并读取内部文件
  • 🔍 模糊匹配 — 大小写不敏感匹配文件名,找不到精确匹配时自动尝试
  • 🎨 多种输出格式 — 人类可读文本 / JSON 程序接口都支持
  • 🧩 完整支持所有常用格式 — 办公文档+压缩包全覆盖

📝 使用场景

  • AI 分析各种办公文档
  • 批量读取压缩包内的文档内容
  • 快速查看附件内容
  • 数据提取和预处理

作者

Created by xiaoya Liu with OpenClaw

Usage Guidance
This skill is coherent for reading local documents and archives, but take these precautions before installing/running: 1) Install the documented system packages (apt-get) and Python packages in an isolated environment (venv/container) and be aware apt-get requires root. 2) Avoid running against untrusted archives on multi-user systems because the script writes predictable temp files in /tmp (use a sandbox or modify the script to use tempfile to prevent symlink/race issues). 3) textract and some format handlers rely on external binaries (e.g., poppler/pdftotext, unrar) — ensure those are installed from trusted sources. 4) If you need stronger safety, review the full script locally (it contains no network calls or credential access) or adapt it to use secure temporary files (tempfile.NamedTemporaryFile) and stricter path handling before use.
Capability Analysis
Type: OpenClaw Skill Name: document-reader Version: 1.0.0 The document-reader skill is a legitimate utility designed to extract text from various document formats (PDF, DOCX, XLSX, etc.) and compressed archives. The core logic in scripts/document_reader.py uses standard Python libraries and reputable third-party packages like openpyxl and textract to process files, with no evidence of data exfiltration, unauthorized network access, or malicious command execution.
Capability Assessment
Purpose & Capability
Name and description (reading many document formats and archives) match the included script and declared dependencies (textract, python-docx, openpyxl, python-pptx, rarfile, py7zr, plus system tools like poppler-utils). No extraneous credentials, binaries, or unrelated capabilities are requested.
Instruction Scope
Runtime instructions are narrowly scoped to listing and reading local files and archive contents. The implementation reads archive members into temporary files under /tmp (predictable names) then processes them; this is expected but creates a minor risk (race/symlink) if run in multi-user or untrusted environments. The SKILL.md correctly documents required system and Python dependencies; it does not instruct any network calls or exfiltration.
Install Mechanism
No install spec is provided (instruction-only behavior plus an included script). There is no download-from-URL or archive extraction at install time, so no high-risk install mechanism is present. The script does depend on third-party Python packages and some system packages documented in SKILL.md.
Credentials
The skill requests no environment variables, credentials, or config paths. All required permissions are local filesystem access to the files/archives the user asks it to read, which is proportionate to the stated purpose.
Persistence & Privilege
The skill does not require persistent presence (always is false). It does not modify other skills or system-wide settings based on the provided files. Autonomous invocation is allowed by default but is not combined with other red flags.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install document-reader
  3. After installation, invoke the skill by name or use /document-reader
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release: support PDF/DOCX/XLSX/PPTX/RTF/ODT documents and ZIP/TAR/RAR/7Z compressed files
Metadata
Slug document-reader
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is document-reader?

通用文档读取工具,支持 PDF/DOCX/XLSX/PPTX/RTF/ODT 等多种文档格式,也支持 ZIP/TAR.GZ/RAR/7Z 等主流压缩包内文档直接读取. It is an AI Agent Skill for Claude Code / OpenClaw, with 257 downloads so far.

How do I install document-reader?

Run "/install document-reader" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is document-reader free?

Yes, document-reader is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does document-reader support?

document-reader is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created document-reader?

It is built and maintained by xiaoya (@xiaoyaliu00); the current version is v1.0.0.

💬 Comments