← Back to Skills Marketplace

Image to Editable PowerPoint

Name: Image to Editable PowerPoint
Author: minutemighty

by Jade Liu · GitHub ↗ · v1.0.1 · MIT-0

cross-platform ⚠ suspicious

121

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install image2pptx

Description

Convert static images (slides, posters, infographics) to editable PowerPoint files. OCR detects text, classical CV textmask detects ink pixels, mask-clip pre...

README (SKILL.md)

image2pptx: Image to Editable PowerPoint

What It Does

Converts a static image into an editable .pptx file where every text element is a selectable, editable text box over a clean inpainted background.

OCR (PaddleOCR PP-OCRv5) — detects text regions with bounding boxes and content
Textmask (classical CV) — finds text ink pixels via adaptive thresholding
Mask-clip — ANDs textmask with OCR bboxes to preserve non-text elements
Inpaint (LAMA) — reconstructs masked regions with neural inpainting
Assemble — places editable text boxes with auto-scaled fonts and detected colors

When to Use

Scenario	Recommendation
Slide with text on solid/flat background	Best results
Slide with photo background	Good — uses inpainting (warn about overlap areas)
Slide with solid background	Good — use `--skip-inpaint` for speed
Chinese/multilingual slide	Good — `ch` OCR handles both Chinese and English
Poster or infographic with text	Good — works well if text is separate from graphics
Dense chart with axis labels on bars	Caution — line grouping may over-merge crowded labels
Very thick/large decorative fonts	Caution — may exceed standard mask dilation range
Extract individual assets as PNGs	No — use px-asset-extract
Read text without creating PPTX	No — use OCR directly
Edit an existing .pptx file	No — use the pptx skill

Installation

git clone https://github.com/JadeLiu-tech/px-image2pptx.git
cd px-image2pptx
pip install -e ".[all]"

Usage

CLI

px-image2pptx slide.png -o output.pptx
px-image2pptx slide.png -o output.pptx --lang ch
px-image2pptx slide.png -o output.pptx --skip-inpaint
px-image2pptx slide.png -o output.pptx --ocr-json text_regions.json
px-image2pptx slide.png -o output.pptx --work-dir ./debug/

Python API

from px_image2pptx import image_to_pptx

report = image_to_pptx("slide.png", "output.pptx")

# With options
report = image_to_pptx(
    "slide.png", "output.pptx",
    lang="ch",
    skip_inpaint=False,
    work_dir="./debug/",
)

CLI Options

Option	Default	Description
`-o`, `--output`	`output.pptx`	Output PPTX path
`--ocr-json`		Pre-computed OCR JSON (skips OCR)
`--lang`	`auto`	OCR language: `auto`, `en`, `ch`
`--sensitivity`	`16`	Textmask sensitivity (lower = more)
`--dilation`	`12`	Textmask dilation pixels
`--min-font`	`8`	Min font size in points
`--max-font`	`72`	Max font size in points
`--skip-inpaint`		Skip LAMA inpainting
`--work-dir`		Save intermediate files

Models

Downloaded automatically on first use (~370 MB total). All models are from official open-source repositories.

Model	Size	License	Source
PP-OCRv5_server_det	84 MB	Apache 2.0	PaddlePaddle/PaddleOCR
PP-OCRv5_server_rec	81 MB	Apache 2.0	PaddlePaddle/PaddleOCR
big-lama	196 MB	Apache 2.0	advimman/lama

Models are cached locally after first download (~/.paddlex/official_models/ for OCR, ~/.cache/torch/hub/checkpoints/ for LAMA). To skip model downloads entirely, use --ocr-json with pre-computed OCR and --skip-inpaint.

Limitations — When to Warn the User

Input	Impact	What to tell the user
Text on solid/flat background	Best results	No caveats needed
Text on textured background	Good results	LAMA handles repeating textures well
Text overlapping photos	Inpainting artifacts likely	"Areas where text covers photos may show blurring"
Dense chart with many labels	Over-merged labels	"Crowded labels may be grouped incorrectly"
Very thick/large fonts	Incomplete mask coverage	"Large fonts may exceed dilation range — try increasing `--dilation`"
Light text on dark background	Blockier inpainting	"White-on-dark text uses box masks instead of tight ink masks"
WebP image	OCR fails (0 regions)	Convert to PNG first: `Image.open("in.webp").save("in.png")`
Very large image (>4000px)	Slow inpainting	Suggest `--skip-inpaint` or downscaling
Decorative/handwritten fonts	Typeface won't match	"Fonts are reconstructed as Arial/Helvetica"
Centered/justified text	Left-aligned output	"Text alignment is not preserved"

Usage Guidance

This skill appears coherent and implements what it claims. Before installing: (1) expect large downloads (~370 MB) and heavy Python dependencies (PyTorch, PaddleOCR, simple-lama-inpainting) which may take time, disk space, and may access GPUs; (2) models are fetched from open-source repos on first run — verify the upstream GitHub (SKILL.md points to github.com/JadeLiu-tech/px-image2pptx) if you need to trust the source; (3) the skill executes Python code on your machine, so install it in an isolated environment (virtualenv/conda) and review the repository if you have strict security policies; (4) if you cannot allow network downloads, use --ocr-json and --skip-inpaint to avoid model downloads and heavy inpainting, or prepopulate the caches from a trusted source.

Capability Analysis

Type: OpenClaw Skill Name: image2pptx Version: 1.0.1 The image2pptx skill is a legitimate tool for converting static images into editable PowerPoint presentations using a pipeline of OCR (PaddleOCR), classical computer vision for text masking, and neural inpainting (LAMA). The code in assemble.py, textmask.py, and pipeline.py is well-structured and aligns perfectly with the stated purpose, utilizing standard libraries like OpenCV, PIL, and python-pptx. Model downloads for OCR and inpainting are transparently documented and sourced from established open-source repositories. No evidence of data exfiltration, malicious execution, or harmful prompt injection was found.

Capability Assessment

✓ Purpose & Capability

Name/description (image → editable PPTX) align with the provided Python modules (OCR, textmask, inpaint, assemble). The listed dependencies (PaddleOCR, Torch, LAMA) and code behavior are appropriate for OCR + inpainting + PPTX assembly. No unrelated credentials, binaries, or paths are requested.

✓ Instruction Scope

SKILL.md and the code describe and implement the exact pipeline: run OCR, compute ink masks, clip to OCR bboxes, optionally inpaint, assemble PPTX. The instructions and CLI do not ask the agent to read unrelated files, environment variables, or post data to unknown endpoints. Intermediate files and model cache locations are local and documented.

ℹ Install Mechanism

The registry entry has no formal install spec (instruction-only), but SKILL.md/README instruct a git clone + pip install -e and the package includes Python source. Models are downloaded on first use (~370 MB) from cited open-source repos (PaddleOCR, advimman/lama). This is expected but noteworthy: automatic model downloads and heavy native packages (PyTorch) will fetch data from the network and consume disk space.

✓ Credentials

No environment variables, credentials, or config paths are required. The code caches models under standard user-cache directories. There are no requests for unrelated secrets or system tokens.

✓ Persistence & Privilege

The skill does not request always:true and does not modify other skills or system-wide agent configs. It runs as a normal skill and writes only its own temp/intermediate files and model caches.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install image2pptx
After installation, invoke the skill by name or use /image2pptx
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.1

- Clarified open-source model origins, listing source repositories for all models. - Added details on local model cache directories. - Included instructions to bypass model downloads using `--ocr-json` and `--skip-inpaint`. - Minor text and formatting updates for clarity.

v1.0.0

image2pptx 1.0.0 – Convert images of slides, posters, and infographics into editable PowerPoint files. - Converts static images to .pptx with editable, selectable text boxes over reconstructed backgrounds. - Uses OCR, classical computer vision, inpainting, and font/color detection for high-accuracy slide reconstruction. - Supports CLI and Python API with options for language, inpainting, font size, and debug outputs. - Handles various scenarios: solid/photo backgrounds, multiple languages, posters, infographics. - Includes clear limitations and user guidance for challenging input cases (e.g., crowded charts, thick fonts, large images).

Metadata

Slug image2pptx

Version 1.0.1

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 2

Frequently Asked Questions

What is Image to Editable PowerPoint?

Convert static images (slides, posters, infographics) to editable PowerPoint files. OCR detects text, classical CV textmask detects ink pixels, mask-clip pre... It is an AI Agent Skill for Claude Code / OpenClaw, with 121 downloads so far.

How do I install Image to Editable PowerPoint?

Run "/install image2pptx" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Image to Editable PowerPoint free?

Yes, Image to Editable PowerPoint is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Image to Editable PowerPoint support?

Image to Editable PowerPoint is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Image to Editable PowerPoint?

It is built and maintained by Jade Liu (@minutemighty); the current version is v1.0.1.

More Skills