Image To Text Pdf
/install image-to-text-pdf
Image to Text PDF
Turn a finished raster image into a PDF while preserving the image exactly and adding a transparent text layer for selection, copying, and search.
Workflow
- Get the complete image.
- Treat the image as the visual source of truth.
- Keep the original image dimensions; the layout coordinates should use the same pixel coordinate space whenever possible.
- Build a text-layer layout JSON.
- Prefer OCR or a vision model that returns text boxes over trying to infer positions manually.
- Use the user's source text to correct OCR transcription. OCR boxes determine position; the source text determines final copyable content.
- Read
references/ocr-alignment.mdwhen you need guidance for extracting, correcting, or prompting for box-level text. - Read
references/layout-json.mdfor the exact JSON schema.
- Generate both PDFs.
python scripts/compose_image_text_pdf.py \
--image /path/to/image.png \
--layout /path/to/layout.json \
--output /path/to/image-text.pdf \
--debug-output /path/to/image-text-check.pdf
For CJK or other non-Latin text, pass a Unicode font:
python scripts/compose_image_text_pdf.py \
--image /path/to/image.png \
--layout /path/to/layout.json \
--output /path/to/image-text.pdf \
--debug-output /path/to/image-text-check.pdf \
--font-file /path/to/NotoSansCJK-Regular.ttc
- Inspect the debug PDF before delivering.
- The final PDF should look like the image-only source.
- The debug PDF should show highlighted text boxes and visible text where the hidden layer will be placed.
- If a highlighted box is shifted, fix the layout JSON, not the image.
- If copy/paste text is wrong, fix the
textfield in layout JSON, not the OCR image.
OCR Word Conversion
When an OCR tool returns word-level boxes, convert them to line-level layout items:
python scripts/ocr_words_to_layout.py \
--ocr /path/to/ocr.json \
--output /path/to/layout.json \
--image-width 1536 \
--image-height 2048 \
--source-text /path/to/source.txt
The converter accepts common JSON shapes containing words, items, textAnnotations, or nested page/line/word objects. It groups nearby words into lines and can replace OCR text with the closest line from the source text when the match is strong.
Practical Rules
- Keep text boxes line-level unless a paragraph must be selected as one unit. Line-level boxes are easier to position and debug.
- Do not try to match the rasterized font exactly. The hidden layer only needs close geometry and correct text.
- Use the visible inspection PDF as the validation artifact. It should make every embedded text span obvious.
- For generated images, ask the image-generation step to keep text in large, separated blocks. Dense tiny text is harder for OCR and manual correction.
- Preserve the image as the background instead of rebuilding the layout in presentation or web formats.
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install image-to-text-pdf - After installation, invoke the skill by name or use
/image-to-text-pdf - Provide required inputs per the skill's parameter spec and get structured output
What is Image To Text Pdf?
Convert a finished raster image, especially a generated poster or visual resume, into an image-based PDF with an additional selectable, copyable, searchable... It is an AI Agent Skill for Claude Code / OpenClaw, with 9 downloads so far.
How do I install Image To Text Pdf?
Run "/install image-to-text-pdf" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Image To Text Pdf free?
Yes, Image To Text Pdf is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Image To Text Pdf support?
Image To Text Pdf is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Image To Text Pdf?
It is built and maintained by Linyue Pan (@zjsxply); the current version is v0.1.0.