← 返回 Skills 市场
zjsxply

Image To Text Pdf

作者 Linyue Pan · GitHub ↗ · v0.1.0 · MIT-0
cross-platform ⚠ pending
9
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install image-to-text-pdf
功能描述
Convert a finished raster image, especially a generated poster or visual resume, into an image-based PDF with an additional selectable, copyable, searchable...
使用说明 (SKILL.md)

Image to Text PDF

Turn a finished raster image into a PDF while preserving the image exactly and adding a transparent text layer for selection, copying, and search.

Workflow

  1. Get the complete image.
  • Treat the image as the visual source of truth.
  • Keep the original image dimensions; the layout coordinates should use the same pixel coordinate space whenever possible.
  1. Build a text-layer layout JSON.
  • Prefer OCR or a vision model that returns text boxes over trying to infer positions manually.
  • Use the user's source text to correct OCR transcription. OCR boxes determine position; the source text determines final copyable content.
  • Read references/ocr-alignment.md when you need guidance for extracting, correcting, or prompting for box-level text.
  • Read references/layout-json.md for the exact JSON schema.
  1. Generate both PDFs.
python scripts/compose_image_text_pdf.py \
  --image /path/to/image.png \
  --layout /path/to/layout.json \
  --output /path/to/image-text.pdf \
  --debug-output /path/to/image-text-check.pdf

For CJK or other non-Latin text, pass a Unicode font:

python scripts/compose_image_text_pdf.py \
  --image /path/to/image.png \
  --layout /path/to/layout.json \
  --output /path/to/image-text.pdf \
  --debug-output /path/to/image-text-check.pdf \
  --font-file /path/to/NotoSansCJK-Regular.ttc
  1. Inspect the debug PDF before delivering.
  • The final PDF should look like the image-only source.
  • The debug PDF should show highlighted text boxes and visible text where the hidden layer will be placed.
  • If a highlighted box is shifted, fix the layout JSON, not the image.
  • If copy/paste text is wrong, fix the text field in layout JSON, not the OCR image.

OCR Word Conversion

When an OCR tool returns word-level boxes, convert them to line-level layout items:

python scripts/ocr_words_to_layout.py \
  --ocr /path/to/ocr.json \
  --output /path/to/layout.json \
  --image-width 1536 \
  --image-height 2048 \
  --source-text /path/to/source.txt

The converter accepts common JSON shapes containing words, items, textAnnotations, or nested page/line/word objects. It groups nearby words into lines and can replace OCR text with the closest line from the source text when the match is strong.

Practical Rules

  • Keep text boxes line-level unless a paragraph must be selected as one unit. Line-level boxes are easier to position and debug.
  • Do not try to match the rasterized font exactly. The hidden layer only needs close geometry and correct text.
  • Use the visible inspection PDF as the validation artifact. It should make every embedded text span obvious.
  • For generated images, ask the image-generation step to keep text in large, separated blocks. Dense tiny text is harder for OCR and manual correction.
  • Preserve the image as the background instead of rebuilding the layout in presentation or web formats.
能力标签
crypto
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install image-to-text-pdf
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /image-to-text-pdf 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.1.0
- Initial release of image-to-text-pdf. - Converts finished raster images (such as posters or visual resumes) into PDFs with an added selectable, searchable invisible text layer. - Provides a workflow that aligns OCR-detected text boxes with user-provided source text for accurate, copyable content. - Generates both a final (invisible-text) PDF and a visible inspection/debug PDF for validating text-layer placement. - Supports CJK and non-Latin text via user-specified Unicode fonts. - Includes a utility to convert word-level OCR output into line-level text boxes, matching them to source text.
元数据
Slug image-to-text-pdf
版本 0.1.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Image To Text Pdf 是什么?

Convert a finished raster image, especially a generated poster or visual resume, into an image-based PDF with an additional selectable, copyable, searchable... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 9 次。

如何安装 Image To Text Pdf?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install image-to-text-pdf」即可一键安装,无需额外配置。

Image To Text Pdf 是免费的吗?

是的,Image To Text Pdf 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Image To Text Pdf 支持哪些平台?

Image To Text Pdf 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Image To Text Pdf?

由 Linyue Pan(@zjsxply)开发并维护,当前版本 v0.1.0。

💬 留言讨论