← 返回 Skills 市场
minutemighty

Image to Editable PowerPoint

作者 Jade Liu · GitHub ↗ · v1.0.1 · MIT-0
cross-platform ⚠ suspicious
121
总下载
2
收藏
0
当前安装
2
版本数
在 OpenClaw 中安装
/install image2pptx
功能描述
Convert static images (slides, posters, infographics) to editable PowerPoint files. OCR detects text, classical CV textmask detects ink pixels, mask-clip pre...
使用说明 (SKILL.md)

image2pptx: Image to Editable PowerPoint

What It Does

Converts a static image into an editable .pptx file where every text element is a selectable, editable text box over a clean inpainted background.

  1. OCR (PaddleOCR PP-OCRv5) — detects text regions with bounding boxes and content
  2. Textmask (classical CV) — finds text ink pixels via adaptive thresholding
  3. Mask-clip — ANDs textmask with OCR bboxes to preserve non-text elements
  4. Inpaint (LAMA) — reconstructs masked regions with neural inpainting
  5. Assemble — places editable text boxes with auto-scaled fonts and detected colors

When to Use

Scenario Recommendation
Slide with text on solid/flat background Best results
Slide with photo background Good — uses inpainting (warn about overlap areas)
Slide with solid background Good — use --skip-inpaint for speed
Chinese/multilingual slide Good — ch OCR handles both Chinese and English
Poster or infographic with text Good — works well if text is separate from graphics
Dense chart with axis labels on bars Caution — line grouping may over-merge crowded labels
Very thick/large decorative fonts Caution — may exceed standard mask dilation range
Extract individual assets as PNGs No — use px-asset-extract
Read text without creating PPTX No — use OCR directly
Edit an existing .pptx file No — use the pptx skill

Installation

git clone https://github.com/JadeLiu-tech/px-image2pptx.git
cd px-image2pptx
pip install -e ".[all]"

Usage

CLI

px-image2pptx slide.png -o output.pptx
px-image2pptx slide.png -o output.pptx --lang ch
px-image2pptx slide.png -o output.pptx --skip-inpaint
px-image2pptx slide.png -o output.pptx --ocr-json text_regions.json
px-image2pptx slide.png -o output.pptx --work-dir ./debug/

Python API

from px_image2pptx import image_to_pptx

report = image_to_pptx("slide.png", "output.pptx")

# With options
report = image_to_pptx(
    "slide.png", "output.pptx",
    lang="ch",
    skip_inpaint=False,
    work_dir="./debug/",
)

CLI Options

Option Default Description
-o, --output output.pptx Output PPTX path
--ocr-json Pre-computed OCR JSON (skips OCR)
--lang auto OCR language: auto, en, ch
--sensitivity 16 Textmask sensitivity (lower = more)
--dilation 12 Textmask dilation pixels
--min-font 8 Min font size in points
--max-font 72 Max font size in points
--skip-inpaint Skip LAMA inpainting
--work-dir Save intermediate files

Models

Downloaded automatically on first use (~370 MB total). All models are from official open-source repositories.

Model Size License Source
PP-OCRv5_server_det 84 MB Apache 2.0 PaddlePaddle/PaddleOCR
PP-OCRv5_server_rec 81 MB Apache 2.0 PaddlePaddle/PaddleOCR
big-lama 196 MB Apache 2.0 advimman/lama

Models are cached locally after first download (~/.paddlex/official_models/ for OCR, ~/.cache/torch/hub/checkpoints/ for LAMA). To skip model downloads entirely, use --ocr-json with pre-computed OCR and --skip-inpaint.

Limitations — When to Warn the User

Input Impact What to tell the user
Text on solid/flat background Best results No caveats needed
Text on textured background Good results LAMA handles repeating textures well
Text overlapping photos Inpainting artifacts likely "Areas where text covers photos may show blurring"
Dense chart with many labels Over-merged labels "Crowded labels may be grouped incorrectly"
Very thick/large fonts Incomplete mask coverage "Large fonts may exceed dilation range — try increasing --dilation"
Light text on dark background Blockier inpainting "White-on-dark text uses box masks instead of tight ink masks"
WebP image OCR fails (0 regions) Convert to PNG first: Image.open("in.webp").save("in.png")
Very large image (>4000px) Slow inpainting Suggest --skip-inpaint or downscaling
Decorative/handwritten fonts Typeface won't match "Fonts are reconstructed as Arial/Helvetica"
Centered/justified text Left-aligned output "Text alignment is not preserved"
安全使用建议
This skill appears coherent and implements what it claims. Before installing: (1) expect large downloads (~370 MB) and heavy Python dependencies (PyTorch, PaddleOCR, simple-lama-inpainting) which may take time, disk space, and may access GPUs; (2) models are fetched from open-source repos on first run — verify the upstream GitHub (SKILL.md points to github.com/JadeLiu-tech/px-image2pptx) if you need to trust the source; (3) the skill executes Python code on your machine, so install it in an isolated environment (virtualenv/conda) and review the repository if you have strict security policies; (4) if you cannot allow network downloads, use --ocr-json and --skip-inpaint to avoid model downloads and heavy inpainting, or prepopulate the caches from a trusted source.
功能分析
Type: OpenClaw Skill Name: image2pptx Version: 1.0.1 The image2pptx skill is a legitimate tool for converting static images into editable PowerPoint presentations using a pipeline of OCR (PaddleOCR), classical computer vision for text masking, and neural inpainting (LAMA). The code in assemble.py, textmask.py, and pipeline.py is well-structured and aligns perfectly with the stated purpose, utilizing standard libraries like OpenCV, PIL, and python-pptx. Model downloads for OCR and inpainting are transparently documented and sourced from established open-source repositories. No evidence of data exfiltration, malicious execution, or harmful prompt injection was found.
能力评估
Purpose & Capability
Name/description (image → editable PPTX) align with the provided Python modules (OCR, textmask, inpaint, assemble). The listed dependencies (PaddleOCR, Torch, LAMA) and code behavior are appropriate for OCR + inpainting + PPTX assembly. No unrelated credentials, binaries, or paths are requested.
Instruction Scope
SKILL.md and the code describe and implement the exact pipeline: run OCR, compute ink masks, clip to OCR bboxes, optionally inpaint, assemble PPTX. The instructions and CLI do not ask the agent to read unrelated files, environment variables, or post data to unknown endpoints. Intermediate files and model cache locations are local and documented.
Install Mechanism
The registry entry has no formal install spec (instruction-only), but SKILL.md/README instruct a git clone + pip install -e and the package includes Python source. Models are downloaded on first use (~370 MB) from cited open-source repos (PaddleOCR, advimman/lama). This is expected but noteworthy: automatic model downloads and heavy native packages (PyTorch) will fetch data from the network and consume disk space.
Credentials
No environment variables, credentials, or config paths are required. The code caches models under standard user-cache directories. There are no requests for unrelated secrets or system tokens.
Persistence & Privilege
The skill does not request always:true and does not modify other skills or system-wide agent configs. It runs as a normal skill and writes only its own temp/intermediate files and model caches.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install image2pptx
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /image2pptx 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.1
- Clarified open-source model origins, listing source repositories for all models. - Added details on local model cache directories. - Included instructions to bypass model downloads using `--ocr-json` and `--skip-inpaint`. - Minor text and formatting updates for clarity.
v1.0.0
image2pptx 1.0.0 – Convert images of slides, posters, and infographics into editable PowerPoint files. - Converts static images to .pptx with editable, selectable text boxes over reconstructed backgrounds. - Uses OCR, classical computer vision, inpainting, and font/color detection for high-accuracy slide reconstruction. - Supports CLI and Python API with options for language, inpainting, font size, and debug outputs. - Handles various scenarios: solid/photo backgrounds, multiple languages, posters, infographics. - Includes clear limitations and user guidance for challenging input cases (e.g., crowded charts, thick fonts, large images).
元数据
Slug image2pptx
版本 1.0.1
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 2
常见问题

Image to Editable PowerPoint 是什么?

Convert static images (slides, posters, infographics) to editable PowerPoint files. OCR detects text, classical CV textmask detects ink pixels, mask-clip pre... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 121 次。

如何安装 Image to Editable PowerPoint?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install image2pptx」即可一键安装,无需额外配置。

Image to Editable PowerPoint 是免费的吗?

是的,Image to Editable PowerPoint 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Image to Editable PowerPoint 支持哪些平台?

Image to Editable PowerPoint 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Image to Editable PowerPoint?

由 Jade Liu(@minutemighty)开发并维护,当前版本 v1.0.1。

💬 留言讨论