← 返回 Skills 市场

Image To Text Pdf

Name: Image To Text Pdf
Author: zjsxply

作者 Linyue Pan · GitHub ↗ · v0.1.0 · MIT-0

cross-platform ⚠ pending

总下载

当前安装

版本数

在 OpenClaw 中安装

/install image-to-text-pdf

功能描述

Convert a finished raster image, especially a generated poster or visual resume, into an image-based PDF with an additional selectable, copyable, searchable...

使用说明 (SKILL.md)

Image to Text PDF

Turn a finished raster image into a PDF while preserving the image exactly and adding a transparent text layer for selection, copying, and search.

Workflow

Get the complete image.

Treat the image as the visual source of truth.
Keep the original image dimensions; the layout coordinates should use the same pixel coordinate space whenever possible.

Build a text-layer layout JSON.

Prefer OCR or a vision model that returns text boxes over trying to infer positions manually.
Use the user's source text to correct OCR transcription. OCR boxes determine position; the source text determines final copyable content.
Read references/ocr-alignment.md when you need guidance for extracting, correcting, or prompting for box-level text.
Read references/layout-json.md for the exact JSON schema.

Generate both PDFs.

python scripts/compose_image_text_pdf.py \
  --image /path/to/image.png \
  --layout /path/to/layout.json \
  --output /path/to/image-text.pdf \
  --debug-output /path/to/image-text-check.pdf

For CJK or other non-Latin text, pass a Unicode font:

python scripts/compose_image_text_pdf.py \
  --image /path/to/image.png \
  --layout /path/to/layout.json \
  --output /path/to/image-text.pdf \
  --debug-output /path/to/image-text-check.pdf \
  --font-file /path/to/NotoSansCJK-Regular.ttc

Inspect the debug PDF before delivering.

The final PDF should look like the image-only source.
The debug PDF should show highlighted text boxes and visible text where the hidden layer will be placed.
If a highlighted box is shifted, fix the layout JSON, not the image.
If copy/paste text is wrong, fix the text field in layout JSON, not the OCR image.

OCR Word Conversion

When an OCR tool returns word-level boxes, convert them to line-level layout items:

python scripts/ocr_words_to_layout.py \
  --ocr /path/to/ocr.json \
  --output /path/to/layout.json \
  --image-width 1536 \
  --image-height 2048 \
  --source-text /path/to/source.txt

The converter accepts common JSON shapes containing words, items, textAnnotations, or nested page/line/word objects. It groups nearby words into lines and can replace OCR text with the closest line from the source text when the match is strong.

Practical Rules

Keep text boxes line-level unless a paragraph must be selected as one unit. Line-level boxes are easier to position and debug.
Do not try to match the rasterized font exactly. The hidden layer only needs close geometry and correct text.
Use the visible inspection PDF as the validation artifact. It should make every embedded text span obvious.
For generated images, ask the image-generation step to keep text in large, separated blocks. Dense tiny text is harder for OCR and manual correction.
Preserve the image as the background instead of rebuilding the layout in presentation or web formats.

能力标签

crypto

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install image-to-text-pdf
安装完成后，直接呼叫该 Skill 的名称或使用 /image-to-text-pdf 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v0.1.0

- Initial release of image-to-text-pdf. - Converts finished raster images (such as posters or visual resumes) into PDFs with an added selectable, searchable invisible text layer. - Provides a workflow that aligns OCR-detected text boxes with user-provided source text for accurate, copyable content. - Generates both a final (invisible-text) PDF and a visible inspection/debug PDF for validating text-layer placement. - Supports CJK and non-Latin text via user-specified Unicode fonts. - Includes a utility to convert word-level OCR output into line-level text boxes, matching them to source text.

元数据

Slug image-to-text-pdf

版本 0.1.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

Image To Text Pdf 是什么？

Convert a finished raster image, especially a generated poster or visual resume, into an image-based PDF with an additional selectable, copyable, searchable... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 9 次。

如何安装 Image To Text Pdf？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install image-to-text-pdf」即可一键安装，无需额外配置。

Image To Text Pdf 是免费的吗？

是的，Image To Text Pdf 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Image To Text Pdf 支持哪些平台？

Image To Text Pdf 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Image To Text Pdf？

由 Linyue Pan（@zjsxply）开发并维护，当前版本 v0.1.0。