Image To Text Pdf
/install image-to-text-pdf
Image to Text PDF
Turn a finished raster image into a PDF while preserving the image exactly and adding a transparent text layer for selection, copying, and search.
Workflow
- Get the complete image.
- Treat the image as the visual source of truth.
- Keep the original image dimensions; the layout coordinates should use the same pixel coordinate space whenever possible.
- Build a text-layer layout JSON.
- Prefer OCR or a vision model that returns text boxes over trying to infer positions manually.
- Use the user's source text to correct OCR transcription. OCR boxes determine position; the source text determines final copyable content.
- Read
references/ocr-alignment.mdwhen you need guidance for extracting, correcting, or prompting for box-level text. - Read
references/layout-json.mdfor the exact JSON schema.
- Generate both PDFs.
python scripts/compose_image_text_pdf.py \
--image /path/to/image.png \
--layout /path/to/layout.json \
--output /path/to/image-text.pdf \
--debug-output /path/to/image-text-check.pdf
For CJK or other non-Latin text, pass a Unicode font:
python scripts/compose_image_text_pdf.py \
--image /path/to/image.png \
--layout /path/to/layout.json \
--output /path/to/image-text.pdf \
--debug-output /path/to/image-text-check.pdf \
--font-file /path/to/NotoSansCJK-Regular.ttc
- Inspect the debug PDF before delivering.
- The final PDF should look like the image-only source.
- The debug PDF should show highlighted text boxes and visible text where the hidden layer will be placed.
- If a highlighted box is shifted, fix the layout JSON, not the image.
- If copy/paste text is wrong, fix the
textfield in layout JSON, not the OCR image.
OCR Word Conversion
When an OCR tool returns word-level boxes, convert them to line-level layout items:
python scripts/ocr_words_to_layout.py \
--ocr /path/to/ocr.json \
--output /path/to/layout.json \
--image-width 1536 \
--image-height 2048 \
--source-text /path/to/source.txt
The converter accepts common JSON shapes containing words, items, textAnnotations, or nested page/line/word objects. It groups nearby words into lines and can replace OCR text with the closest line from the source text when the match is strong.
Practical Rules
- Keep text boxes line-level unless a paragraph must be selected as one unit. Line-level boxes are easier to position and debug.
- Do not try to match the rasterized font exactly. The hidden layer only needs close geometry and correct text.
- Use the visible inspection PDF as the validation artifact. It should make every embedded text span obvious.
- For generated images, ask the image-generation step to keep text in large, separated blocks. Dense tiny text is harder for OCR and manual correction.
- Preserve the image as the background instead of rebuilding the layout in presentation or web formats.
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install image-to-text-pdf - 安装完成后,直接呼叫该 Skill 的名称或使用
/image-to-text-pdf触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Image To Text Pdf 是什么?
Convert a finished raster image, especially a generated poster or visual resume, into an image-based PDF with an additional selectable, copyable, searchable... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 9 次。
如何安装 Image To Text Pdf?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install image-to-text-pdf」即可一键安装,无需额外配置。
Image To Text Pdf 是免费的吗?
是的,Image To Text Pdf 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Image To Text Pdf 支持哪些平台?
Image To Text Pdf 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Image To Text Pdf?
由 Linyue Pan(@zjsxply)开发并维护,当前版本 v0.1.0。