← 返回 Skills 市场
257
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install document-reader
功能描述
通用文档读取工具,支持 PDF/DOCX/XLSX/PPTX/RTF/ODT 等多种文档格式,也支持 ZIP/TAR.GZ/RAR/7Z 等主流压缩包内文档直接读取
使用说明 (SKILL.md)
Document Reader - 通用文档读取技能
读取各种格式文档的内容,直接输出文本供 AI 分析。支持压缩包内文档直接读取,无需手动解压。
📦 支持格式
文档
| 格式 | 扩展名 | 说明 |
|---|---|---|
.pdf |
支持文本提取,依赖 poppler-utils | |
| Microsoft Word | .docx |
完整提取所有段落文本 |
| Microsoft Excel | .xlsx |
按 Sheet 输出,每个 Sheet 输出为表格文本 |
| Microsoft PowerPoint | .pptx |
按 Slide 分块输出 |
| Rich Text Format | .rtf |
|
| OpenDocument Text | .odt |
|
| HTML | .html/.htm |
提取正文文本 |
| 纯文本 | .txt/.md/.json/.xml/.py/.js 等 |
直接读取 |
压缩包
| 格式 | 扩展名 | 功能 |
|---|---|---|
| ZIP | .zip |
列出文件 ➜ 读取指定文档 |
| TAR | .tar/.tar.gz/.tgz/.tar.bz2 |
列出文件 ➜ 读取指定文档 |
| RAR | .rar |
列出文件 ➜ 读取指定文档 |
| 7-Zip | .7z |
列出文件 ➜ 读取指定文档 |
🚀 快速开始
依赖安装
Python 包:
pip install textract python-docx openpyxl python-pptx rarfile py7zr --break-system-packages
系统依赖(Ubuntu/Debian):
apt-get install -y poppler-utils antiword unrtf tidy libxml2-dev libxslt1-dev
💡 使用示例
1. 直接读取本地文档
# 读取 PDF 文件
python {baseDir}/scripts/document_reader.py --file /path/to/document.pdf
# 读取 Word 文档
python {baseDir}/scripts/document_reader.py --file /path/to/report.docx
# 读取 Excel 文件(输出带 Sheet 分隔的表格文本)
python {baseDir}/scripts/document_reader.py --file /path/to/data.xlsx
# JSON 格式输出(方便程序处理)
python {baseDir}/scripts/document_reader.py --file /path/to/data.xlsx --format json
2. 处理压缩包
先列出压缩包里有哪些文件:
# 列出 ZIP 包内容
python {baseDir}/scripts/document_reader.py --list /path/to/archive.zip
# 列出 RAR 包内容
python {baseDir}/scripts/document_reader.py --list /path/to/archive.rar
# 列出 7z 包内容
python {baseDir}/scripts/document_reader.py --list /path/to/archive.7z
读取压缩包里的指定文档:
# 读取 ZIP 包内的 Word 文档
python {baseDir}/scripts/document_reader.py --file /path/to/archive.zip --inner-path document.docx
# 读取 7z 包内的 PDF
python {baseDir}/scripts/document_reader.py --file /path/to/archive.7z --inner-path report.pdf
输出示例
读取文档:
=== report.pdf ===
# 项目进度报告
## 本周完成
1. 完成了前端界面开发
2. 修复了三个 Bug
3. 编写了接口文档
...
列出压缩包:
Archive: data.zip
Found 3 file(s):
readme.txt
docs/report.pdf
data/sheet.xlsx
✨ 特性
- 🎯 开箱即用 — 装完依赖直接用,无需复杂配置
- 📦 支持压缩包 — 不用手动解压,直接列出并读取内部文件
- 🔍 模糊匹配 — 大小写不敏感匹配文件名,找不到精确匹配时自动尝试
- 🎨 多种输出格式 — 人类可读文本 / JSON 程序接口都支持
- 🧩 完整支持所有常用格式 — 办公文档+压缩包全覆盖
📝 使用场景
- AI 分析各种办公文档
- 批量读取压缩包内的文档内容
- 快速查看附件内容
- 数据提取和预处理
作者
Created by xiaoya Liu with OpenClaw
安全使用建议
This skill is coherent for reading local documents and archives, but take these precautions before installing/running: 1) Install the documented system packages (apt-get) and Python packages in an isolated environment (venv/container) and be aware apt-get requires root. 2) Avoid running against untrusted archives on multi-user systems because the script writes predictable temp files in /tmp (use a sandbox or modify the script to use tempfile to prevent symlink/race issues). 3) textract and some format handlers rely on external binaries (e.g., poppler/pdftotext, unrar) — ensure those are installed from trusted sources. 4) If you need stronger safety, review the full script locally (it contains no network calls or credential access) or adapt it to use secure temporary files (tempfile.NamedTemporaryFile) and stricter path handling before use.
功能分析
Type: OpenClaw Skill
Name: document-reader
Version: 1.0.0
The document-reader skill is a legitimate utility designed to extract text from various document formats (PDF, DOCX, XLSX, etc.) and compressed archives. The core logic in scripts/document_reader.py uses standard Python libraries and reputable third-party packages like openpyxl and textract to process files, with no evidence of data exfiltration, unauthorized network access, or malicious command execution.
能力评估
Purpose & Capability
Name and description (reading many document formats and archives) match the included script and declared dependencies (textract, python-docx, openpyxl, python-pptx, rarfile, py7zr, plus system tools like poppler-utils). No extraneous credentials, binaries, or unrelated capabilities are requested.
Instruction Scope
Runtime instructions are narrowly scoped to listing and reading local files and archive contents. The implementation reads archive members into temporary files under /tmp (predictable names) then processes them; this is expected but creates a minor risk (race/symlink) if run in multi-user or untrusted environments. The SKILL.md correctly documents required system and Python dependencies; it does not instruct any network calls or exfiltration.
Install Mechanism
No install spec is provided (instruction-only behavior plus an included script). There is no download-from-URL or archive extraction at install time, so no high-risk install mechanism is present. The script does depend on third-party Python packages and some system packages documented in SKILL.md.
Credentials
The skill requests no environment variables, credentials, or config paths. All required permissions are local filesystem access to the files/archives the user asks it to read, which is proportionate to the stated purpose.
Persistence & Privilege
The skill does not require persistent presence (always is false). It does not modify other skills or system-wide settings based on the provided files. Autonomous invocation is allowed by default but is not combined with other red flags.
如何使用
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install document-reader - 安装完成后,直接呼叫该 Skill 的名称或使用
/document-reader触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release: support PDF/DOCX/XLSX/PPTX/RTF/ODT documents and ZIP/TAR/RAR/7Z compressed files
元数据
常见问题
document-reader 是什么?
通用文档读取工具,支持 PDF/DOCX/XLSX/PPTX/RTF/ODT 等多种文档格式,也支持 ZIP/TAR.GZ/RAR/7Z 等主流压缩包内文档直接读取. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 257 次。
如何安装 document-reader?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install document-reader」即可一键安装,无需额外配置。
document-reader 是免费的吗?
是的,document-reader 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
document-reader 支持哪些平台?
document-reader 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 document-reader?
由 xiaoya(@xiaoyaliu00)开发并维护,当前版本 v1.0.0。
推荐 Skills