← Back to Skills Marketplace

pdf-processor

Name: pdf-processor
Author: pengsc1994

by pengsc1994 · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ✓ Security Clean

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install free-pdf-processor

Description

一站式 PDF 处理技能。支持 PDF 文本/图片/表格提取、格式转换（PDF↔Word/Excel）、合并拆分、OCR 识别、批量处理、水印添加、加密解密、压缩等。使用场景： (1) 从 PDF 提取文本内容进行数据分析 (2) 将 PDF 转换为 Word/Excel 方便编辑 (3) 合并或拆分 PDF 文...

README (SKILL.md)

PDF 处理技能

快速开始

安装依赖

cd D:\PDF.skill\pdf-processor
pip install -r requirements.txt

核心功能

功能	命令	说明
提取文本	`python scripts/extract_text.py \x3Cpdf_path>`	提取 PDF 文本内容
提取图片	`python scripts/extract_images.py \x3Cpdf_path> \x3Coutput_dir>`	提取 PDF 中的图片
提取表格	`python scripts/extract_tables.py \x3Cpdf_path>`	提取 PDF 中的表格
PDF 转 Word	`python scripts/pdf_to_word.py \x3Cpdf_path> \x3Coutput_path>`	转换为可编辑 Word
PDF 转 Excel	`python scripts/pdf_to_excel.py \x3Cpdf_path> \x3Coutput_path>`	提取表格到 Excel
合并 PDF	`python scripts/merge_pdfs.py \x3Coutput_path> \x3Cfile1> \x3Cfile2> ...`	合并多个 PDF
拆分 PDF	`python scripts/split_pdf.py \x3Cpdf_path> \x3Coutput_dir>`	按页拆分 PDF
添加水印	`python scripts/add_watermark.py \x3Cpdf_path> \x3Coutput_path> \x3Ctext>`	添加文字水印
OCR 识别	`python scripts/ocr_pdf.py \x3Cpdf_path> \x3Coutput_path>`	OCR 识别扫描件
加密 PDF	`python scripts/encrypt_pdf.py \x3Cinput> \x3Coutput> \x3Cpassword>`	AES-256 加密
解密 PDF	`python scripts/decrypt_pdf.py \x3Cinput> \x3Coutput> \x3Cpassword>`	解密 PDF
压缩 PDF	`python scripts/compress_pdf.py \x3Cinput> \x3Coutput>`	压缩 PDF 文件
批量处理	`python scripts/batch_process.py \x3Cinput_dir> \x3Coutput_dir> --operation \x3Cop>`	批量处理

功能详情

extract_text.py

提取 PDF 文本内容，支持：

纯文本提取
保留段落结构
提取元数据（标题、作者、创建时间）

python scripts/extract_text.py input.pdf -o output.txt --metadata

extract_tables.py

提取 PDF 表格数据：

自动检测表格边框
支持合并单元格
输出为 Excel 文件

pdf_to_word.py

PDF 转 Word 转换：

保留原始格式
提取图片到 Word
表格转换为 Word 表格

pdf_to_excel.py

PDF 转 Excel：

提取表格到不同 Sheet
保留文本内容

add_watermark.py

水印功能：

支持文字水印
可设置透明度、旋转角度、字体大小
支持批量添加

ocr_pdf.py

OCR 识别（需要安装 Tesseract）：

使用 Tesseract 进行中文识别
支持多种语言混合识别
保留原有 PDF 格式

encrypt_pdf.py / decrypt_pdf.py

加密解密：

AES-256 加密
支持用户密码和所有者密码

compress_pdf.py

压缩功能：

清理未使用对象
压缩图片
5 个压缩级别可选

batch_process.py

批量处理：

支持所有单文件操作
自动处理目录中所有 PDF
生成处理报告

使用示例

从 PDF 提取文本

用户: 帮我提取这个合同的文本内容
AI: 使用 extract_text.py 脚本提取文本

PDF 转 Word

用户: 把这个 PDF 转成 Word 文档
AI: 使用 pdf_to_word.py 进行转换

批量加水印

用户: 给这个文件夹里所有 PDF 添加"内部资料"水印
AI: 使用 batch_process.py 批量处理

加密 PDF

用户: 这个文件需要加密
AI: 使用 encrypt_pdf.py 进行 AES-256 加密

依赖安装

基础依赖

pip install pymupdf pdfplumber python-docx openpyxl pillow

OCR 支持（可选）

# 安装 Tesseract OCR
# Windows: https://github.com/UB-Mannheim/tesseract/wiki
# macOS: brew install tesseract
# Linux: sudo apt install tesseract-ocr

pip install pytesseract

注意事项

加密 PDF 需要提供密码
OCR 需要安装 Tesseract 引擎
大文件处理可能需要较长时间
转换效果取决于 PDF 原始质量

Usage Guidance

This package appears internally consistent with a local PDF utility. Before installing or running it: (1) review the scripts yourself or run them in a sandbox/VM—these are local Python scripts and will read and write files you pass to them; (2) pip will install packages from PyPI—only proceed if you trust those packages or inspect requirements.txt (they are common PDF libraries here); (3) OCR requires the Tesseract engine (you must install it separately); (4) note minor privacy issue: encrypt_pdf prints the password to stdout, so avoid exposing sensitive passwords to shared consoles or logs; (5) some scripts have small implementation issues (e.g., missing imports in places) but those are functional bugs, not signs of malicious intent. If you process sensitive documents, run the tools on an isolated machine and verify outputs before sharing.

Capability Analysis

Type: OpenClaw Skill Name: free-pdf-processor Version: 1.0.0 The 'free-pdf-processor' skill bundle is a legitimate collection of utility scripts for PDF manipulation, including text/image extraction, format conversion (Word/Excel), OCR, and encryption. The implementation uses standard, well-known libraries such as PyMuPDF (fitz), pdfplumber, and python-docx. All scripts in the 'scripts/' directory align strictly with the functionalities described in SKILL.md, and there is no evidence of data exfiltration, malicious execution, or prompt-injection attempts.

Capability Assessment

✓ Purpose & Capability

The name/description describe PDF extraction, conversion, OCR, watermarking, encryption, compression, merging/splitting and the repository contains scripts that implement those features. Declared dependencies (pymupdf, pdfplumber, python-docx, openpyxl, Pillow, optional pytesseract) align with the functionality; there are no unrelated credentials, binaries, or config paths requested.

✓ Instruction Scope

SKILL.md only instructs installing Python deps and running the included scripts. The runtime instructions do not direct the agent to read unrelated system files, environment variables, or to send data to external endpoints. Note: SKILL.md uses a Windows-like example path (D:\PDF.skill\...) which is just an example and not a request for system-wide access.

✓ Install Mechanism

There is no automated install spec (instruction-only). Dependencies are installed via pip from requirements.txt (PyPI). The listed packages are common, well-known libraries for PDF/image/office processing. No downloads from arbitrary URLs or archive extraction are present.

✓ Credentials

The skill requests no environment variables or credentials. The code performs only local file I/O and local dependency checks (e.g., searching for Tesseract executable paths); it does not attempt to read unrelated environment secrets or external config.

✓ Persistence & Privilege

The skill is not set to always:true and does not modify other skills or global agent settings. It has no mechanism to persistently install itself into a platform or exfiltrate configuration.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install free-pdf-processor
After installation, invoke the skill by name or use /free-pdf-processor
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

Initial public release: 全面升级为多功能一站式 PDF 处理工具 - 新增 14 个独立 PDF 脚本，覆盖提取文本/图片/表格、OCR、格式转换（PDF↔Word/Excel）、合并拆分、水印、加密解密、压缩、批量处理等功能 - 支持命令行一键处理多种常见 PDF 场景（提取内容、批量加水印、加解密、格式转换等） - 移除原学术专用流程及相关文档，聚焦普适 PDF 工具化处理 - 提升模块化与扩展性，每项功能独立脚本实现，方便按需调用与集成 - 全面更新文档，新增核心功能速查表与详细使用示例

Metadata

Slug free-pdf-processor

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is pdf-processor?

一站式 PDF 处理技能。支持 PDF 文本/图片/表格提取、格式转换（PDF↔Word/Excel）、合并拆分、OCR 识别、批量处理、水印添加、加密解密、压缩等。使用场景： (1) 从 PDF 提取文本内容进行数据分析 (2) 将 PDF 转换为 Word/Excel 方便编辑 (3) 合并或拆分 PDF 文... It is an AI Agent Skill for Claude Code / OpenClaw, with 17 downloads so far.

How do I install pdf-processor?

Run "/install free-pdf-processor" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is pdf-processor free?

Yes, pdf-processor is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does pdf-processor support?

pdf-processor is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created pdf-processor?

It is built and maintained by pengsc1994 (@pengsc1994); the current version is v1.0.0.

More Skills