← 返回 Skills 市场

pdf-processor

Name: pdf-processor
Author: pengsc1994

作者 pengsc1994 · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ✓ 安全检测通过

总下载

当前安装

版本数

在 OpenClaw 中安装

/install free-pdf-processor

功能描述

一站式 PDF 处理技能。支持 PDF 文本/图片/表格提取、格式转换（PDF↔Word/Excel）、合并拆分、OCR 识别、批量处理、水印添加、加密解密、压缩等。使用场景： (1) 从 PDF 提取文本内容进行数据分析 (2) 将 PDF 转换为 Word/Excel 方便编辑 (3) 合并或拆分 PDF 文...

使用说明 (SKILL.md)

PDF 处理技能

快速开始

安装依赖

cd D:\PDF.skill\pdf-processor
pip install -r requirements.txt

核心功能

功能	命令	说明
提取文本	`python scripts/extract_text.py \x3Cpdf_path>`	提取 PDF 文本内容
提取图片	`python scripts/extract_images.py \x3Cpdf_path> \x3Coutput_dir>`	提取 PDF 中的图片
提取表格	`python scripts/extract_tables.py \x3Cpdf_path>`	提取 PDF 中的表格
PDF 转 Word	`python scripts/pdf_to_word.py \x3Cpdf_path> \x3Coutput_path>`	转换为可编辑 Word
PDF 转 Excel	`python scripts/pdf_to_excel.py \x3Cpdf_path> \x3Coutput_path>`	提取表格到 Excel
合并 PDF	`python scripts/merge_pdfs.py \x3Coutput_path> \x3Cfile1> \x3Cfile2> ...`	合并多个 PDF
拆分 PDF	`python scripts/split_pdf.py \x3Cpdf_path> \x3Coutput_dir>`	按页拆分 PDF
添加水印	`python scripts/add_watermark.py \x3Cpdf_path> \x3Coutput_path> \x3Ctext>`	添加文字水印
OCR 识别	`python scripts/ocr_pdf.py \x3Cpdf_path> \x3Coutput_path>`	OCR 识别扫描件
加密 PDF	`python scripts/encrypt_pdf.py \x3Cinput> \x3Coutput> \x3Cpassword>`	AES-256 加密
解密 PDF	`python scripts/decrypt_pdf.py \x3Cinput> \x3Coutput> \x3Cpassword>`	解密 PDF
压缩 PDF	`python scripts/compress_pdf.py \x3Cinput> \x3Coutput>`	压缩 PDF 文件
批量处理	`python scripts/batch_process.py \x3Cinput_dir> \x3Coutput_dir> --operation \x3Cop>`	批量处理

功能详情

extract_text.py

提取 PDF 文本内容，支持：

纯文本提取
保留段落结构
提取元数据（标题、作者、创建时间）

python scripts/extract_text.py input.pdf -o output.txt --metadata

extract_tables.py

提取 PDF 表格数据：

自动检测表格边框
支持合并单元格
输出为 Excel 文件

pdf_to_word.py

PDF 转 Word 转换：

保留原始格式
提取图片到 Word
表格转换为 Word 表格

pdf_to_excel.py

PDF 转 Excel：

提取表格到不同 Sheet
保留文本内容

add_watermark.py

水印功能：

支持文字水印
可设置透明度、旋转角度、字体大小
支持批量添加

ocr_pdf.py

OCR 识别（需要安装 Tesseract）：

使用 Tesseract 进行中文识别
支持多种语言混合识别
保留原有 PDF 格式

encrypt_pdf.py / decrypt_pdf.py

加密解密：

AES-256 加密
支持用户密码和所有者密码

compress_pdf.py

压缩功能：

清理未使用对象
压缩图片
5 个压缩级别可选

batch_process.py

批量处理：

支持所有单文件操作
自动处理目录中所有 PDF
生成处理报告

使用示例

从 PDF 提取文本

用户: 帮我提取这个合同的文本内容
AI: 使用 extract_text.py 脚本提取文本

PDF 转 Word

用户: 把这个 PDF 转成 Word 文档
AI: 使用 pdf_to_word.py 进行转换

批量加水印

用户: 给这个文件夹里所有 PDF 添加"内部资料"水印
AI: 使用 batch_process.py 批量处理

加密 PDF

用户: 这个文件需要加密
AI: 使用 encrypt_pdf.py 进行 AES-256 加密

依赖安装

基础依赖

pip install pymupdf pdfplumber python-docx openpyxl pillow

OCR 支持（可选）

# 安装 Tesseract OCR
# Windows: https://github.com/UB-Mannheim/tesseract/wiki
# macOS: brew install tesseract
# Linux: sudo apt install tesseract-ocr

pip install pytesseract

注意事项

加密 PDF 需要提供密码
OCR 需要安装 Tesseract 引擎
大文件处理可能需要较长时间
转换效果取决于 PDF 原始质量

安全使用建议

This package appears internally consistent with a local PDF utility. Before installing or running it: (1) review the scripts yourself or run them in a sandbox/VM—these are local Python scripts and will read and write files you pass to them; (2) pip will install packages from PyPI—only proceed if you trust those packages or inspect requirements.txt (they are common PDF libraries here); (3) OCR requires the Tesseract engine (you must install it separately); (4) note minor privacy issue: encrypt_pdf prints the password to stdout, so avoid exposing sensitive passwords to shared consoles or logs; (5) some scripts have small implementation issues (e.g., missing imports in places) but those are functional bugs, not signs of malicious intent. If you process sensitive documents, run the tools on an isolated machine and verify outputs before sharing.

功能分析

Type: OpenClaw Skill Name: free-pdf-processor Version: 1.0.0 The 'free-pdf-processor' skill bundle is a legitimate collection of utility scripts for PDF manipulation, including text/image extraction, format conversion (Word/Excel), OCR, and encryption. The implementation uses standard, well-known libraries such as PyMuPDF (fitz), pdfplumber, and python-docx. All scripts in the 'scripts/' directory align strictly with the functionalities described in SKILL.md, and there is no evidence of data exfiltration, malicious execution, or prompt-injection attempts.

能力评估

✓ Purpose & Capability

The name/description describe PDF extraction, conversion, OCR, watermarking, encryption, compression, merging/splitting and the repository contains scripts that implement those features. Declared dependencies (pymupdf, pdfplumber, python-docx, openpyxl, Pillow, optional pytesseract) align with the functionality; there are no unrelated credentials, binaries, or config paths requested.

✓ Instruction Scope

SKILL.md only instructs installing Python deps and running the included scripts. The runtime instructions do not direct the agent to read unrelated system files, environment variables, or to send data to external endpoints. Note: SKILL.md uses a Windows-like example path (D:\PDF.skill\...) which is just an example and not a request for system-wide access.

✓ Install Mechanism

There is no automated install spec (instruction-only). Dependencies are installed via pip from requirements.txt (PyPI). The listed packages are common, well-known libraries for PDF/image/office processing. No downloads from arbitrary URLs or archive extraction are present.

✓ Credentials

The skill requests no environment variables or credentials. The code performs only local file I/O and local dependency checks (e.g., searching for Tesseract executable paths); it does not attempt to read unrelated environment secrets or external config.

✓ Persistence & Privilege

The skill is not set to always:true and does not modify other skills or global agent settings. It has no mechanism to persistently install itself into a platform or exfiltrate configuration.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install free-pdf-processor
安装完成后，直接呼叫该 Skill 的名称或使用 /free-pdf-processor 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

Initial public release: 全面升级为多功能一站式 PDF 处理工具 - 新增 14 个独立 PDF 脚本，覆盖提取文本/图片/表格、OCR、格式转换（PDF↔Word/Excel）、合并拆分、水印、加密解密、压缩、批量处理等功能 - 支持命令行一键处理多种常见 PDF 场景（提取内容、批量加水印、加解密、格式转换等） - 移除原学术专用流程及相关文档，聚焦普适 PDF 工具化处理 - 提升模块化与扩展性，每项功能独立脚本实现，方便按需调用与集成 - 全面更新文档，新增核心功能速查表与详细使用示例

元数据

Slug free-pdf-processor

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

pdf-processor 是什么？

一站式 PDF 处理技能。支持 PDF 文本/图片/表格提取、格式转换（PDF↔Word/Excel）、合并拆分、OCR 识别、批量处理、水印添加、加密解密、压缩等。使用场景： (1) 从 PDF 提取文本内容进行数据分析 (2) 将 PDF 转换为 Word/Excel 方便编辑 (3) 合并或拆分 PDF 文... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 17 次。

如何安装 pdf-processor？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install free-pdf-processor」即可一键安装，无需额外配置。

pdf-processor 是免费的吗？

是的，pdf-processor 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

pdf-processor 支持哪些平台？

pdf-processor 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 pdf-processor？

由 pengsc1994（@pengsc1994）开发并维护，当前版本 v1.0.0。