← 返回 Skills 市场

Document Processing

Name: Document Processing
Author: xiongrui-xr

作者 xiongrui-xr · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

219

总下载

当前安装

版本数

在 OpenClaw 中安装

/install document-processing

功能描述

支持PDF、Word、Excel、PPT格式转换、内容提取及批量处理，自动同步飞书云文档和推送处理结果。

使用说明 (SKILL.md)

Document Processing Skill

Description

文档处理技能，支持多格式文档转换、内容提取和批量处理。

Features

📄 格式转换（PDF/Word/Excel/PPT）
🔍 内容提取（文本、表格、图片）
✏️ 批量处理（重命名、格式统一）
🤝 飞书集成（云文档同步）

Dependencies

Python 3.13+
PyPDF2, python-docx, openpyxl
pdfplumber, python-docx2txt

Usage

from document_processing import DocumentProcessor

# 初始化文档处理器
processor = DocumentProcessor()

# 转换PDF到Word
processor.convert_format("input.pdf", "output.docx")

# 提取PDF内容
content = processor.extract_content("document.pdf")

# 批量处理文档
processor.batch_process("./docs", ".docx", ".pdf")

Integration with Feishu

文档自动同步到飞书云文档
内容提取结果自动保存到飞书多维表格
批量处理结果推送到飞书消息

Configuration

document_processing:
  temp_dir: "/tmp/document_processing"
  default_format: "docx"
  ocr_enabled: true
  ocr_api_key: "your_ocr_api_key"

feishu:
  doc_folder_token: "your_folder_token"
  bitable_app_token: "your_bitable_token"
  bitable_table_id: "your_table_id"

安全使用建议

This package appears to be a local document-processing utility but its documentation promises Feishu sync and cloud OCR keys that are not implemented in the code. Before installing or providing any secrets: (1) Do not paste Feishu tokens or OCR API keys into configuration or environment variables for this skill until you verify the code actually uses them. (2) If you need cloud sync, request or wait for a version where sync_to_feishu is implemented and network endpoints are explicit. (3) Note that pytesseract requires a system Tesseract binary — install and test that separately. (4) Run the module in a sandbox with non-sensitive documents to confirm behavior and watch for unexpected network activity. (5) Prefer a version with explicit network calls (requests or httpx) and documented endpoints before providing production credentials. Additional information that would raise confidence: a completed implementation of Feishu integration showing exact endpoints and auth flows, or updated SKILL.md that drops the cloud/credential requirements if they aren't needed.

功能分析

Type: OpenClaw Skill Name: document-processing Version: 1.0.0 The document-processing skill is a standard utility for converting and extracting content from PDF, Word, and Excel files using common libraries like pdfplumber and openpyxl. The code in document_processing.py aligns perfectly with the features described in SKILL.md, and while it mentions Feishu integration, the implementation is currently a harmless placeholder (pass).

能力评估

⚠ Purpose & Capability

The SKILL.md claims automatic Feishu sync and an OCR API key in configuration, but the Python implementation contains no network calls, does not read environment variables or config for Feishu/remote OCR, and the sync_to_feishu method is a stub. Also SKILL.md lists OCR and conversion dependencies but does not mention the external tesseract binary that pytesseract requires. These inconsistencies mean required credentials/configs in docs are not aligned with the actual code.

ℹ Instruction Scope

The instructions and examples operate on user-supplied file paths and a temp_dir; they do not instruct the agent to read unrelated system files or credentials. However, SKILL.md promotes saving results to Feishu and using an ocr_api_key, which the runtime instructions do not show how to use — giving the agent open discretion about syncing/uploading would be problematic if implemented later.

✓ Install Mechanism

There is no install specification (instruction-only plus a single Python module). Nothing will be automatically downloaded or executed by an installer. Dependencies are Python packages listed in SKILL.md; installing them is standard but the skill does not provide an automated install script.

⚠ Credentials

The SKILL.md suggests configuration values like ocr_api_key, feishu doc_folder_token, and bitable_app_token but the code does not accept or read these credentials. Conversely, the code requires a Tesseract binary (pytesseract) which SKILL.md doesn't declare as an external requirement. Asking for cloud credentials in docs without using them is disproportionate and risks encouraging users to provide secrets unnecessarily.

✓ Persistence & Privilege

The skill does not request always:true and does not modify other skills or system-wide settings. It writes to a local temp_dir under its own control, which is normal for a document-processing utility.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install document-processing
安装完成后，直接呼叫该 Skill 的名称或使用 /document-processing 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

Initial release of the document-processing skill. - 支持多格式文档转换（PDF/Word/Excel/PPT）。 - 支持内容提取（文本、表格、图片）。 - 提供批量处理功能，如重命名、格式统一。 - 集成飞书，实现云文档同步、内容推送及多维表格存储。 - 包含配置示例和主要依赖说明。

元数据

Slug document-processing

版本 1.0.0

许可证 MIT-0

累计安装 2

当前安装数 2

历史版本数 1

常见问题