← 返回 Skills 市场
xzw

pdf2ofd

作者 xzw · GitHub ↗ · v1.0.2 · MIT-0
cross-platform ✓ 安全检测通过
238
总下载
0
收藏
0
当前安装
3
版本数
在 OpenClaw 中安装
/install pdf2ofd
功能描述
Converts PDF documents (invoices, reports) to High-Fidelity OFD format with pixel-perfect precision.
使用说明 (SKILL.md)

PDF to OFD High-Fidelity Converter

🎯 Purpose

A specialized skill for converting PDF documents into the Chinese National Standard OFD (GB/T 33190-2016) format. Optimized for Electronic Invoices (OFD版式发票) with advanced rendering capabilities that exceed standard conversion libraries.

✨ Key Features

  • High-Fidelity Text Placement: Uses character-level positioning (DeltaX arrays) and baseline origin data extracted via rawdict to ensure text layout is 100% identical to the source PDF.
  • Advanced Vector Graphics: Directly extracts original stroke colors, fill colors, and line widths. Supports complex path types and fill instructions.
  • Transparency Preservation: Fully supports Alpha and FillOpacity for vector paths and SMask transparency for images (e.g., electronic seals and signatures).
  • Cross-Platform Font Mapping: Intelligent mapping of macOS-specific (STSong, STKaiti) and Windows-specific font names to standardized OFD font names (宋体, 楷体, 黑体).
  • In-Memory Packaging: Generates the final OFD zip structure entirely in memory to avoid temporary file clutter and ensure security.
  • Color Snapping: Heuristic "Invoice Red" correction (128 0 0) for financial documents while preserving non-standard colors.

🛠️ Usage Instructions

When a user asks to convert a PDF or a "High-Fidelity" invoice to OFD:

  1. Direct Execution:

    python3 pdf2ofd.py \x3Cinput_path.pdf> [output_path.ofd]
    
  2. Plugin Integration: The script implements a PDF2OFDConverter class that can be easily imported and used in other Python workflows.

Example Output

Success: /path/to/invoice.ofd

📦 Requirements

Dependencies required in the environment:

  • PyMuPDF (fitz): For advanced PDF parsing and raw character data extraction.
  • Pillow: For image processing and transparency handling.
  • easyofd: The base library for OFD structure (extended via internal monkey patches).
  • xmltodict: For XML manipulation.

💡 Notes

  • This skill uses deep monkey-patching on easyofd to fix known library limitations regarding character positioning and resource ID tracking.
  • The conversion process assumes standard Chinese fonts (SimSun, KaiTi, SimHei) are available on the viewing system.
  • Zero-copy resource handling: Images are extracted and re-compressed as PNG/JPG only when necessary to preserve quality.
安全使用建议
This skill appears to do what it says: convert PDFs to OFD using PyMuPDF, Pillow, and easyofd. Before installing or running: - Install dependencies in an isolated environment (virtualenv, venv, or container) and pin package versions from a trusted source. - Review the remainder of pdf2ofd.py (the file was truncated in the provided excerpt) for any network/socket usage or subprocess calls before running it on sensitive documents. - Consider changing uuid1 usage to uuid4 if you will share produced OFD files and want to avoid embedding a host MAC/time-based identifier. - Because the skill monkey-patches easyofd, test conversion results and failure modes on non-sensitive sample files to ensure the patches behave correctly with the specific easyofd version you install. - If you require stronger assurance, request the upstream source repository or an author/homepage so you can verify version history and maintainers.
功能分析
Type: OpenClaw Skill Name: pdf2ofd Version: 1.0.2 The skill implements a PDF to OFD (Chinese National Standard) converter by extending the 'easyofd' library through monkey-patching. The code focuses on high-fidelity rendering of text, images, and vector graphics using PyMuPDF (fitz) and Pillow. No indicators of data exfiltration, malicious execution, or prompt injection were found; all operations are local and consistent with the stated document conversion purpose in pdf2ofd.py and SKILL.md.
能力评估
Purpose & Capability
Name/description, SKILL.md, requirements.txt, and pdf2ofd.py align: PDF parsing (PyMuPDF), image handling (Pillow), OFD generation (easyofd + xmltodict) and monkey-patching of easyofd are coherent for a high-fidelity converter.
Instruction Scope
SKILL.md instructs only to run the Python converter or import its class; the script's logic operates on the provided PDF bytes and builds OFD in memory. Minor note: the code imports uuid1 (which embeds host MAC/time) — if those UUIDs are written into output artifacts they could leak a host identifier when the OFD is shared; consider using uuid4 or another non-MAC-based ID if privacy is a concern. No evidence in the shown code of reading unrelated system files, environment variables, or contacting external endpoints.
Install Mechanism
No install spec is provided (instruction-only with bundled source). This is low-risk for automatic install, but the skill requires several Python packages (requirements.txt). Users should install dependencies in a controlled Python environment (virtualenv/container) and pin versions before installing.
Credentials
The skill requests no environment variables or credentials. Required runtime packages match the stated task; there are no unrelated secret accesses or config path requirements.
Persistence & Privilege
Skill is not always-enabled and does not request elevated/persistent platform privileges. It monkey-patches the local easyofd library at runtime (explained in SKILL.md) but does not modify other skills or system-wide configuration files in the provided code.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install pdf2ofd
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /pdf2ofd 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.2
Version 1.1.0 (High-Fidelity Update) - Greatly improved text matching with precise character-level placement and accurate baseline extraction. - Enhanced preservation of original vector graphics, including exact stroke and fill colors, complex paths, and line styles. - Full support for image and vector transparency, including alpha channels and SMask overlays. - Intelligent, cross-platform mapping of Windows and macOS font names to OFD standard fonts. - Now generates OFD output entirely in memory, avoiding temporary files for better performance and security. - Added advanced color snapping for financial documents to ensure compliance with "Invoice Red" standards.
v1.0.1
- Initial release of the pdf2ofd skill. - Converts PDF files, especially electronic invoices, to the OFD format with accurate rendering. - Handles invoice stamp transparency, consistent dark-red border coloring, and path correction for proper OFD viewing. - Provides command-line usage instructions and font requirements for best results.
v1.0.0
Initial release of pdf2ofd skill. - Converts PDF documents, especially Chinese Electronic Invoices, to the Chinese National Standard OFD format. - Accurately extracts stamp alpha-masks and applies solid dark-red (`128 0 0`) mapping to graphics elements. - Handles complex layout issues like path closure and vector graphic transformations to ensure viewer compatibility. - Includes CLI script for straightforward PDF-to-OFD conversion. - Provides guidance on required dependencies and font setup.
元数据
Slug pdf2ofd
版本 1.0.2
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 3
常见问题

pdf2ofd 是什么?

Converts PDF documents (invoices, reports) to High-Fidelity OFD format with pixel-perfect precision. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 238 次。

如何安装 pdf2ofd?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install pdf2ofd」即可一键安装,无需额外配置。

pdf2ofd 是免费的吗?

是的,pdf2ofd 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

pdf2ofd 支持哪些平台?

pdf2ofd 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 pdf2ofd?

由 xzw(@xzw)开发并维护,当前版本 v1.0.2。

💬 留言讨论