← Back to Skills Marketplace
minutemighty

Image to Editable PowerPoint

by Jade Liu · GitHub ↗ · v1.0.1 · MIT-0
cross-platform ⚠ suspicious
121
Downloads
2
Stars
0
Active Installs
2
Versions
Install in OpenClaw
/install image2pptx
Description
Convert static images (slides, posters, infographics) to editable PowerPoint files. OCR detects text, classical CV textmask detects ink pixels, mask-clip pre...
README (SKILL.md)

image2pptx: Image to Editable PowerPoint

What It Does

Converts a static image into an editable .pptx file where every text element is a selectable, editable text box over a clean inpainted background.

  1. OCR (PaddleOCR PP-OCRv5) — detects text regions with bounding boxes and content
  2. Textmask (classical CV) — finds text ink pixels via adaptive thresholding
  3. Mask-clip — ANDs textmask with OCR bboxes to preserve non-text elements
  4. Inpaint (LAMA) — reconstructs masked regions with neural inpainting
  5. Assemble — places editable text boxes with auto-scaled fonts and detected colors

When to Use

Scenario Recommendation
Slide with text on solid/flat background Best results
Slide with photo background Good — uses inpainting (warn about overlap areas)
Slide with solid background Good — use --skip-inpaint for speed
Chinese/multilingual slide Good — ch OCR handles both Chinese and English
Poster or infographic with text Good — works well if text is separate from graphics
Dense chart with axis labels on bars Caution — line grouping may over-merge crowded labels
Very thick/large decorative fonts Caution — may exceed standard mask dilation range
Extract individual assets as PNGs No — use px-asset-extract
Read text without creating PPTX No — use OCR directly
Edit an existing .pptx file No — use the pptx skill

Installation

git clone https://github.com/JadeLiu-tech/px-image2pptx.git
cd px-image2pptx
pip install -e ".[all]"

Usage

CLI

px-image2pptx slide.png -o output.pptx
px-image2pptx slide.png -o output.pptx --lang ch
px-image2pptx slide.png -o output.pptx --skip-inpaint
px-image2pptx slide.png -o output.pptx --ocr-json text_regions.json
px-image2pptx slide.png -o output.pptx --work-dir ./debug/

Python API

from px_image2pptx import image_to_pptx

report = image_to_pptx("slide.png", "output.pptx")

# With options
report = image_to_pptx(
    "slide.png", "output.pptx",
    lang="ch",
    skip_inpaint=False,
    work_dir="./debug/",
)

CLI Options

Option Default Description
-o, --output output.pptx Output PPTX path
--ocr-json Pre-computed OCR JSON (skips OCR)
--lang auto OCR language: auto, en, ch
--sensitivity 16 Textmask sensitivity (lower = more)
--dilation 12 Textmask dilation pixels
--min-font 8 Min font size in points
--max-font 72 Max font size in points
--skip-inpaint Skip LAMA inpainting
--work-dir Save intermediate files

Models

Downloaded automatically on first use (~370 MB total). All models are from official open-source repositories.

Model Size License Source
PP-OCRv5_server_det 84 MB Apache 2.0 PaddlePaddle/PaddleOCR
PP-OCRv5_server_rec 81 MB Apache 2.0 PaddlePaddle/PaddleOCR
big-lama 196 MB Apache 2.0 advimman/lama

Models are cached locally after first download (~/.paddlex/official_models/ for OCR, ~/.cache/torch/hub/checkpoints/ for LAMA). To skip model downloads entirely, use --ocr-json with pre-computed OCR and --skip-inpaint.

Limitations — When to Warn the User

Input Impact What to tell the user
Text on solid/flat background Best results No caveats needed
Text on textured background Good results LAMA handles repeating textures well
Text overlapping photos Inpainting artifacts likely "Areas where text covers photos may show blurring"
Dense chart with many labels Over-merged labels "Crowded labels may be grouped incorrectly"
Very thick/large fonts Incomplete mask coverage "Large fonts may exceed dilation range — try increasing --dilation"
Light text on dark background Blockier inpainting "White-on-dark text uses box masks instead of tight ink masks"
WebP image OCR fails (0 regions) Convert to PNG first: Image.open("in.webp").save("in.png")
Very large image (>4000px) Slow inpainting Suggest --skip-inpaint or downscaling
Decorative/handwritten fonts Typeface won't match "Fonts are reconstructed as Arial/Helvetica"
Centered/justified text Left-aligned output "Text alignment is not preserved"
Usage Guidance
This skill appears coherent and implements what it claims. Before installing: (1) expect large downloads (~370 MB) and heavy Python dependencies (PyTorch, PaddleOCR, simple-lama-inpainting) which may take time, disk space, and may access GPUs; (2) models are fetched from open-source repos on first run — verify the upstream GitHub (SKILL.md points to github.com/JadeLiu-tech/px-image2pptx) if you need to trust the source; (3) the skill executes Python code on your machine, so install it in an isolated environment (virtualenv/conda) and review the repository if you have strict security policies; (4) if you cannot allow network downloads, use --ocr-json and --skip-inpaint to avoid model downloads and heavy inpainting, or prepopulate the caches from a trusted source.
Capability Analysis
Type: OpenClaw Skill Name: image2pptx Version: 1.0.1 The image2pptx skill is a legitimate tool for converting static images into editable PowerPoint presentations using a pipeline of OCR (PaddleOCR), classical computer vision for text masking, and neural inpainting (LAMA). The code in assemble.py, textmask.py, and pipeline.py is well-structured and aligns perfectly with the stated purpose, utilizing standard libraries like OpenCV, PIL, and python-pptx. Model downloads for OCR and inpainting are transparently documented and sourced from established open-source repositories. No evidence of data exfiltration, malicious execution, or harmful prompt injection was found.
Capability Assessment
Purpose & Capability
Name/description (image → editable PPTX) align with the provided Python modules (OCR, textmask, inpaint, assemble). The listed dependencies (PaddleOCR, Torch, LAMA) and code behavior are appropriate for OCR + inpainting + PPTX assembly. No unrelated credentials, binaries, or paths are requested.
Instruction Scope
SKILL.md and the code describe and implement the exact pipeline: run OCR, compute ink masks, clip to OCR bboxes, optionally inpaint, assemble PPTX. The instructions and CLI do not ask the agent to read unrelated files, environment variables, or post data to unknown endpoints. Intermediate files and model cache locations are local and documented.
Install Mechanism
The registry entry has no formal install spec (instruction-only), but SKILL.md/README instruct a git clone + pip install -e and the package includes Python source. Models are downloaded on first use (~370 MB) from cited open-source repos (PaddleOCR, advimman/lama). This is expected but noteworthy: automatic model downloads and heavy native packages (PyTorch) will fetch data from the network and consume disk space.
Credentials
No environment variables, credentials, or config paths are required. The code caches models under standard user-cache directories. There are no requests for unrelated secrets or system tokens.
Persistence & Privilege
The skill does not request always:true and does not modify other skills or system-wide agent configs. It runs as a normal skill and writes only its own temp/intermediate files and model caches.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install image2pptx
  3. After installation, invoke the skill by name or use /image2pptx
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.1
- Clarified open-source model origins, listing source repositories for all models. - Added details on local model cache directories. - Included instructions to bypass model downloads using `--ocr-json` and `--skip-inpaint`. - Minor text and formatting updates for clarity.
v1.0.0
image2pptx 1.0.0 – Convert images of slides, posters, and infographics into editable PowerPoint files. - Converts static images to .pptx with editable, selectable text boxes over reconstructed backgrounds. - Uses OCR, classical computer vision, inpainting, and font/color detection for high-accuracy slide reconstruction. - Supports CLI and Python API with options for language, inpainting, font size, and debug outputs. - Handles various scenarios: solid/photo backgrounds, multiple languages, posters, infographics. - Includes clear limitations and user guidance for challenging input cases (e.g., crowded charts, thick fonts, large images).
Metadata
Slug image2pptx
Version 1.0.1
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 2
Frequently Asked Questions

What is Image to Editable PowerPoint?

Convert static images (slides, posters, infographics) to editable PowerPoint files. OCR detects text, classical CV textmask detects ink pixels, mask-clip pre... It is an AI Agent Skill for Claude Code / OpenClaw, with 121 downloads so far.

How do I install Image to Editable PowerPoint?

Run "/install image2pptx" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Image to Editable PowerPoint free?

Yes, Image to Editable PowerPoint is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Image to Editable PowerPoint support?

Image to Editable PowerPoint is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Image to Editable PowerPoint?

It is built and maintained by Jade Liu (@minutemighty); the current version is v1.0.1.

💬 Comments