← Back to Skills Marketplace
yechao1995

PDF带格式转换成Word

by yechao1995 · GitHub ↗ · v1.1.0 · MIT-0
cross-platform ✓ Security Clean
428
Downloads
0
Stars
1
Active Installs
4
Versions
Install in OpenClaw
/install pdf-to-word-with-format
Description
PDF带格式精确转换成Word Skill。将PDF文档精确转换为Word文档,完整保留原始字体名称、字号(pt)、段落格式(行间距1.5倍、段前段后间距、首行缩进2字符)、文本对齐方式(居中/两端对齐)、文本格式(粗体、斜体、下划线、颜色)、表格和图片等。支持中文字体智能映射(宋体→宋体、黑体→黑体、楷体→楷体...
README (SKILL.md)

\r \r

PDF带格式精确转换成Word (PDF to Word with Format)\r

\r

概述\r

\r 本 skill 提供高精度PDF转Word转换服务,最大程度保留原始文档的所有格式信息。\r \r

核心功能\r

\r

  1. 精确字体映射\r
    • 智能识别PDF字体名称\r
    • 映射到Word可用中英文字体\r
    • 支持:宋体、黑体、楷体、仿宋_GB2312、Times New Roman、Arial等\r \r
  2. 字号精确转换\r
    • 精确保留原始字号(pt)\r
    • 标题22pt、二号16pt、三号14pt等\r \r
  3. 段落格式保留\r
    • 行间距:1.5倍行距\r
    • 首行缩进:2字符(0.74cm)\r
    • 段前段后间距\r \r
  4. 文本对齐\r
    • 左对齐、居中、右对齐、两端对齐\r \r
  5. 文本格式\r
    • 粗体(Bold)\r
    • 斜体(Italic)\r
    • 下划线(Underline)\r
    • 文本颜色\r \r
  6. 表格支持\r
    • 完整表格结构\r
    • 单元格内容\r \r
  7. 图片支持\r
    • 提取并保留图片位置\r
    • 自动调整图片大小\r \r ---\r \r

使用方式\r

\r

基本转换\r

\r

python convert.py \x3C输入PDF> --output \x3C输出Word>\r
```\r
\r
### 批量转换\r
\r
```bash\r
python convert.py \x3CPDF文件夹> --batch --output \x3C输出文件夹>\r
```\r
\r
### 转换指定页面\r
\r
```bash\r
python convert.py 文档.pdf --pages 0-5 --output 文档.docx\r
```\r
\r
---\r
\r
## 依赖安装\r
\r
首次使用需安装依赖:\r
\r
```bash\r
pip install pymupdf python-docx\r
```\r
\r
---\r
\r
## 示例\r
\r
```bash\r
# 基本转换\r
python convert.py 报告.pdf --output 报告.docx\r
\r
# 批量转换文件夹中所有PDF\r
python convert.py ./pdfs/ --batch --output ./words/\r
\r
# 转换前10页\r
python convert.py 文档.pdf --pages 0-9 --output 文档.docx\r
\r
# 指定起始页和结束页\r
python convert.py 长文档.pdf --start 5 --end 15 --output 部分.docx\r
```\r
\r
---\r
\r
## 输出说明\r
\r
- 输出文件为 `.docx` 格式,可用 Microsoft Word 或 WPS 打开\r
- 转换后的文档保留了大部分原始格式\r
- 特殊布局的PDF转换效果可能略有差异\r
\r
---\r
\r
## 技术原理\r
\r
本 skill 基于以下技术实现:\r
\r
1. **PyMuPDF (fitz)** - 提取PDF内容和格式信息\r
2. **python-docx** - 构建Word文档\r
\r
提取的格式信息包括:\r
- 字体名称、字号\r
- 文本对齐方式\r
- 粗体、斜体、下划线标志\r
- 文本颜色(RGB)\r
- 段落位置坐标\r
\r
---\r
\r
## 字体映射表\r
\r
| PDF字体 | Word字体 |\r
|--------|---------|\r
| 宋体, SimSun | 宋体 |\r
| 黑体, SimHei | 黑体 |\r
| 楷体, SimKai | 楷体_GB2312 |\r
| 仿宋, SimFang | 仿宋_GB2312 |\r
| Times New Roman | Times New Roman |\r
| Arial, Helvetica | Arial |\r
| 微软雅黑, Microsoft YaHei | 微软雅黑 |\r
Usage Guidance
This skill is internally consistent and runs locally, but review and test before use on sensitive documents. Practical notes: 1) install dependencies in a virtualenv (pip install pymupdf python-docx) and run in an isolated directory to avoid leftover temp images (_temp_img_*) if a run fails; 2) the included test script references the pdf2docx package even though convert.py uses PyMuPDF/python-docx (an inconsistency that may be a leftover test helper—expect minor runtime issues); 3) there are small coding bugs (missing/incorrect imports in some scopes) that could cause crashes — you may need to fix or patch the script; 4) because the tool reads and writes local files, avoid running it on confidential documents unless you audit the code or run in a controlled environment.
Capability Analysis
Type: OpenClaw Skill Name: pdf-to-word-with-format Version: 1.1.0 The skill is a legitimate PDF-to-Word conversion utility that uses the PyMuPDF (fitz) and python-docx libraries to extract text, tables, and images while preserving formatting. The code in `scripts/convert.py` follows the stated purpose, implementing font mapping and paragraph styling without any signs of data exfiltration, malicious execution, or prompt injection. While there is a minor coding bug (a potential NameError for 'Cm' in a helper function), it is clearly unintentional and does not pose a security risk.
Capability Assessment
Purpose & Capability
Name/description, SKILL.md, and the included convert.py all describe and implement local PDF→.docx conversion, with reasonable dependencies (PyMuPDF, python-docx). Required capabilities match the stated goal.
Instruction Scope
Runtime instructions and usage are limited to reading local PDFs, creating temporary image files (_temp_img_*.ext) and writing .docx output. The SKILL.md does not instruct collection/transmission of unrelated data. Note: temp images are written to the current working directory and removed after use; if the process crashes some temp files may remain.
Install Mechanism
No install spec in registry; SKILL.md recommends pip install pymupdf python-docx. This is an expected, low-risk instruction-only install method (standard PyPI packages).
Credentials
No environment variables, secrets, or external credentials are requested. The tool operates on local files only.
Persistence & Privilege
always is false and the skill does not request persistent/system-wide privileges or modify other skills. It runs as a user-invoked local script.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install pdf-to-word-with-format
  3. After installation, invoke the skill by name or use /pdf-to-word-with-format
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.1.0
新增精确格式提取:字体字号对齐缩进粗体斜体下划线颜色表格图片
v1.0.2
添加PyMuPDF回退方案
v1.0.1
添加PyMuPDF回退方案,解决版本兼容性问题
v1.0.0
支持保留字体、段落格式、表格、图片转换
Metadata
Slug pdf-to-word-with-format
Version 1.1.0
License MIT-0
All-time Installs 1
Active Installs 1
Total Versions 4
Frequently Asked Questions

What is PDF带格式转换成Word?

PDF带格式精确转换成Word Skill。将PDF文档精确转换为Word文档,完整保留原始字体名称、字号(pt)、段落格式(行间距1.5倍、段前段后间距、首行缩进2字符)、文本对齐方式(居中/两端对齐)、文本格式(粗体、斜体、下划线、颜色)、表格和图片等。支持中文字体智能映射(宋体→宋体、黑体→黑体、楷体→楷体... It is an AI Agent Skill for Claude Code / OpenClaw, with 428 downloads so far.

How do I install PDF带格式转换成Word?

Run "/install pdf-to-word-with-format" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is PDF带格式转换成Word free?

Yes, PDF带格式转换成Word is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does PDF带格式转换成Word support?

PDF带格式转换成Word is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created PDF带格式转换成Word?

It is built and maintained by yechao1995 (@yechao1995); the current version is v1.1.0.

💬 Comments