Description

PDF带格式精确转换成Word Skill。将PDF文档精确转换为Word文档，完整保留原始字体名称、字号(pt)、段落格式（行间距1.5倍、段前段后间距、首行缩进2字符）、文本对齐方式（居中/两端对齐）、文本格式（粗体、斜体、下划线、颜色）、表格和图片等。支持中文字体智能映射（宋体→宋体、黑体→黑体、楷体→楷体...

README (SKILL.md)

\r \r

PDF带格式精确转换成Word (PDF to Word with Format)\r

Name: PDF带格式转换成Word
Author: yechao1995

\r

概述\r

\r 本 skill 提供高精度PDF转Word转换服务，最大程度保留原始文档的所有格式信息。\r \r

核心功能\r

\r

精确字体映射\r
- 智能识别PDF字体名称\r
- 映射到Word可用中英文字体\r
- 支持：宋体、黑体、楷体、仿宋_GB2312、Times New Roman、Arial等\r \r
字号精确转换\r
- 精确保留原始字号（pt）\r
- 标题22pt、二号16pt、三号14pt等\r \r
段落格式保留\r
- 行间距：1.5倍行距\r
- 首行缩进：2字符（0.74cm）\r
- 段前段后间距\r \r
文本对齐\r
- 左对齐、居中、右对齐、两端对齐\r \r
文本格式\r
- 粗体（Bold）\r
- 斜体（Italic）\r
- 下划线（Underline）\r
- 文本颜色\r \r
表格支持\r
- 完整表格结构\r
- 单元格内容\r \r
图片支持\r
- 提取并保留图片位置\r
- 自动调整图片大小\r \r ---\r \r

使用方式\r

\r

基本转换\r

\r

python convert.py \x3C输入PDF> --output \x3C输出Word>\r
```\r
\r
### 批量转换\r
\r
```bash\r
python convert.py \x3CPDF文件夹> --batch --output \x3C输出文件夹>\r
```\r
\r
### 转换指定页面\r
\r
```bash\r
python convert.py 文档.pdf --pages 0-5 --output 文档.docx\r
```\r
\r
---\r
\r
## 依赖安装\r
\r
首次使用需安装依赖：\r
\r
```bash\r
pip install pymupdf python-docx\r
```\r
\r
---\r
\r
## 示例\r
\r
```bash\r
# 基本转换\r
python convert.py 报告.pdf --output 报告.docx\r
\r
# 批量转换文件夹中所有PDF\r
python convert.py ./pdfs/ --batch --output ./words/\r
\r
# 转换前10页\r
python convert.py 文档.pdf --pages 0-9 --output 文档.docx\r
\r
# 指定起始页和结束页\r
python convert.py 长文档.pdf --start 5 --end 15 --output 部分.docx\r
```\r
\r
---\r
\r
## 输出说明\r
\r
- 输出文件为 `.docx` 格式，可用 Microsoft Word 或 WPS 打开\r
- 转换后的文档保留了大部分原始格式\r
- 特殊布局的PDF转换效果可能略有差异\r
\r
---\r
\r
## 技术原理\r
\r
本 skill 基于以下技术实现：\r
\r
1. **PyMuPDF (fitz)** - 提取PDF内容和格式信息\r
2. **python-docx** - 构建Word文档\r
\r
提取的格式信息包括：\r
- 字体名称、字号\r
- 文本对齐方式\r
- 粗体、斜体、下划线标志\r
- 文本颜色（RGB）\r
- 段落位置坐标\r
\r
---\r
\r
## 字体映射表\r
\r
| PDF字体 | Word字体 |\r
|--------|---------|\r
| 宋体, SimSun | 宋体 |\r
| 黑体, SimHei | 黑体 |\r
| 楷体, SimKai | 楷体_GB2312 |\r
| 仿宋, SimFang | 仿宋_GB2312 |\r
| Times New Roman | Times New Roman |\r
| Arial, Helvetica | Arial |\r
| 微软雅黑, Microsoft YaHei | 微软雅黑 |\r

Usage Guidance

This skill is internally consistent and runs locally, but review and test before use on sensitive documents. Practical notes: 1) install dependencies in a virtualenv (pip install pymupdf python-docx) and run in an isolated directory to avoid leftover temp images (_temp_img_*) if a run fails; 2) the included test script references the pdf2docx package even though convert.py uses PyMuPDF/python-docx (an inconsistency that may be a leftover test helper—expect minor runtime issues); 3) there are small coding bugs (missing/incorrect imports in some scopes) that could cause crashes — you may need to fix or patch the script; 4) because the tool reads and writes local files, avoid running it on confidential documents unless you audit the code or run in a controlled environment.

Capability Analysis

Type: OpenClaw Skill Name: pdf-to-word-with-format Version: 1.1.0 The skill is a legitimate PDF-to-Word conversion utility that uses the PyMuPDF (fitz) and python-docx libraries to extract text, tables, and images while preserving formatting. The code in `scripts/convert.py` follows the stated purpose, implementing font mapping and paragraph styling without any signs of data exfiltration, malicious execution, or prompt injection. While there is a minor coding bug (a potential NameError for 'Cm' in a helper function), it is clearly unintentional and does not pose a security risk.

Capability Assessment

✓ Purpose & Capability

Name/description, SKILL.md, and the included convert.py all describe and implement local PDF→.docx conversion, with reasonable dependencies (PyMuPDF, python-docx). Required capabilities match the stated goal.

ℹ Instruction Scope

Runtime instructions and usage are limited to reading local PDFs, creating temporary image files (_temp_img_*.ext) and writing .docx output. The SKILL.md does not instruct collection/transmission of unrelated data. Note: temp images are written to the current working directory and removed after use; if the process crashes some temp files may remain.

✓ Install Mechanism

No install spec in registry; SKILL.md recommends pip install pymupdf python-docx. This is an expected, low-risk instruction-only install method (standard PyPI packages).

✓ Credentials

No environment variables, secrets, or external credentials are requested. The tool operates on local files only.

✓ Persistence & Privilege

always is false and the skill does not request persistent/system-wide privileges or modify other skills. It runs as a user-invoked local script.

Version History

v1.1.0

新增精确格式提取：字体字号对齐缩进粗体斜体下划线颜色表格图片

v1.0.2

添加PyMuPDF回退方案

v1.0.1

添加PyMuPDF回退方案，解决版本兼容性问题

v1.0.0

支持保留字体、段落格式、表格、图片转换

Metadata

Slug pdf-to-word-with-format

Version 1.1.0

License MIT-0

All-time Installs 1

Active Installs 1

Total Versions 4

Frequently Asked Questions

What is PDF带格式转换成Word?

PDF带格式精确转换成Word Skill。将PDF文档精确转换为Word文档，完整保留原始字体名称、字号(pt)、段落格式（行间距1.5倍、段前段后间距、首行缩进2字符）、文本对齐方式（居中/两端对齐）、文本格式（粗体、斜体、下划线、颜色）、表格和图片等。支持中文字体智能映射（宋体→宋体、黑体→黑体、楷体→楷体... It is an AI Agent Skill for Claude Code / OpenClaw, with 428 downloads so far.

How do I install PDF带格式转换成Word?

Run "/install pdf-to-word-with-format" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is PDF带格式转换成Word free?

Yes, PDF带格式转换成Word is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does PDF带格式转换成Word support?

PDF带格式转换成Word is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created PDF带格式转换成Word?

It is built and maintained by yechao1995 (@yechao1995); the current version is v1.1.0.

More Skills

PDF带格式转换成Word