← 返回 Skills 市场
xiongrui-xr

Document Processing

作者 xiongrui-xr · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
219
总下载
0
收藏
2
当前安装
1
版本数
在 OpenClaw 中安装
/install document-processing
功能描述
支持PDF、Word、Excel、PPT格式转换、内容提取及批量处理,自动同步飞书云文档和推送处理结果。
使用说明 (SKILL.md)

Document Processing Skill

Description

文档处理技能,支持多格式文档转换、内容提取和批量处理。

Features

  • 📄 格式转换(PDF/Word/Excel/PPT)
  • 🔍 内容提取(文本、表格、图片)
  • ✏️ 批量处理(重命名、格式统一)
  • 🤝 飞书集成(云文档同步)

Dependencies

  • Python 3.13+
  • PyPDF2, python-docx, openpyxl
  • pdfplumber, python-docx2txt

Usage

from document_processing import DocumentProcessor

# 初始化文档处理器
processor = DocumentProcessor()

# 转换PDF到Word
processor.convert_format("input.pdf", "output.docx")

# 提取PDF内容
content = processor.extract_content("document.pdf")

# 批量处理文档
processor.batch_process("./docs", ".docx", ".pdf")

Integration with Feishu

  • 文档自动同步到飞书云文档
  • 内容提取结果自动保存到飞书多维表格
  • 批量处理结果推送到飞书消息

Configuration

document_processing:
  temp_dir: "/tmp/document_processing"
  default_format: "docx"
  ocr_enabled: true
  ocr_api_key: "your_ocr_api_key"

feishu:
  doc_folder_token: "your_folder_token"
  bitable_app_token: "your_bitable_token"
  bitable_table_id: "your_table_id"
安全使用建议
This package appears to be a local document-processing utility but its documentation promises Feishu sync and cloud OCR keys that are not implemented in the code. Before installing or providing any secrets: (1) Do not paste Feishu tokens or OCR API keys into configuration or environment variables for this skill until you verify the code actually uses them. (2) If you need cloud sync, request or wait for a version where sync_to_feishu is implemented and network endpoints are explicit. (3) Note that pytesseract requires a system Tesseract binary — install and test that separately. (4) Run the module in a sandbox with non-sensitive documents to confirm behavior and watch for unexpected network activity. (5) Prefer a version with explicit network calls (requests or httpx) and documented endpoints before providing production credentials. Additional information that would raise confidence: a completed implementation of Feishu integration showing exact endpoints and auth flows, or updated SKILL.md that drops the cloud/credential requirements if they aren't needed.
功能分析
Type: OpenClaw Skill Name: document-processing Version: 1.0.0 The document-processing skill is a standard utility for converting and extracting content from PDF, Word, and Excel files using common libraries like pdfplumber and openpyxl. The code in document_processing.py aligns perfectly with the features described in SKILL.md, and while it mentions Feishu integration, the implementation is currently a harmless placeholder (pass).
能力评估
Purpose & Capability
The SKILL.md claims automatic Feishu sync and an OCR API key in configuration, but the Python implementation contains no network calls, does not read environment variables or config for Feishu/remote OCR, and the sync_to_feishu method is a stub. Also SKILL.md lists OCR and conversion dependencies but does not mention the external tesseract binary that pytesseract requires. These inconsistencies mean required credentials/configs in docs are not aligned with the actual code.
Instruction Scope
The instructions and examples operate on user-supplied file paths and a temp_dir; they do not instruct the agent to read unrelated system files or credentials. However, SKILL.md promotes saving results to Feishu and using an ocr_api_key, which the runtime instructions do not show how to use — giving the agent open discretion about syncing/uploading would be problematic if implemented later.
Install Mechanism
There is no install specification (instruction-only plus a single Python module). Nothing will be automatically downloaded or executed by an installer. Dependencies are Python packages listed in SKILL.md; installing them is standard but the skill does not provide an automated install script.
Credentials
The SKILL.md suggests configuration values like ocr_api_key, feishu doc_folder_token, and bitable_app_token but the code does not accept or read these credentials. Conversely, the code requires a Tesseract binary (pytesseract) which SKILL.md doesn't declare as an external requirement. Asking for cloud credentials in docs without using them is disproportionate and risks encouraging users to provide secrets unnecessarily.
Persistence & Privilege
The skill does not request always:true and does not modify other skills or system-wide settings. It writes to a local temp_dir under its own control, which is normal for a document-processing utility.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install document-processing
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /document-processing 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release of the document-processing skill. - 支持多格式文档转换(PDF/Word/Excel/PPT)。 - 支持内容提取(文本、表格、图片)。 - 提供批量处理功能,如重命名、格式统一。 - 集成飞书,实现云文档同步、内容推送及多维表格存储。 - 包含配置示例和主要依赖说明。
元数据
Slug document-processing
版本 1.0.0
许可证 MIT-0
累计安装 2
当前安装数 2
历史版本数 1
常见问题

Document Processing 是什么?

支持PDF、Word、Excel、PPT格式转换、内容提取及批量处理,自动同步飞书云文档和推送处理结果。 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 219 次。

如何安装 Document Processing?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install document-processing」即可一键安装,无需额外配置。

Document Processing 是免费的吗?

是的,Document Processing 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Document Processing 支持哪些平台?

Document Processing 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Document Processing?

由 xiongrui-xr(@xiongrui-xr)开发并维护,当前版本 v1.0.0。

💬 留言讨论