← 返回 Skills 市场

Image to Editable PowerPoint

Name: Image to Editable PowerPoint
Author: minutemighty

作者 Jade Liu · GitHub ↗ · v1.0.1 · MIT-0

cross-platform ⚠ suspicious

121

总下载

当前安装

版本数

在 OpenClaw 中安装

/install image2pptx

功能描述

Convert static images (slides, posters, infographics) to editable PowerPoint files. OCR detects text, classical CV textmask detects ink pixels, mask-clip pre...

使用说明 (SKILL.md)

image2pptx: Image to Editable PowerPoint

What It Does

Converts a static image into an editable .pptx file where every text element is a selectable, editable text box over a clean inpainted background.

OCR (PaddleOCR PP-OCRv5) — detects text regions with bounding boxes and content
Textmask (classical CV) — finds text ink pixels via adaptive thresholding
Mask-clip — ANDs textmask with OCR bboxes to preserve non-text elements
Inpaint (LAMA) — reconstructs masked regions with neural inpainting
Assemble — places editable text boxes with auto-scaled fonts and detected colors

When to Use

Scenario	Recommendation
Slide with text on solid/flat background	Best results
Slide with photo background	Good — uses inpainting (warn about overlap areas)
Slide with solid background	Good — use `--skip-inpaint` for speed
Chinese/multilingual slide	Good — `ch` OCR handles both Chinese and English
Poster or infographic with text	Good — works well if text is separate from graphics
Dense chart with axis labels on bars	Caution — line grouping may over-merge crowded labels
Very thick/large decorative fonts	Caution — may exceed standard mask dilation range
Extract individual assets as PNGs	No — use px-asset-extract
Read text without creating PPTX	No — use OCR directly
Edit an existing .pptx file	No — use the pptx skill

Installation

git clone https://github.com/JadeLiu-tech/px-image2pptx.git
cd px-image2pptx
pip install -e ".[all]"

Usage

CLI

px-image2pptx slide.png -o output.pptx
px-image2pptx slide.png -o output.pptx --lang ch
px-image2pptx slide.png -o output.pptx --skip-inpaint
px-image2pptx slide.png -o output.pptx --ocr-json text_regions.json
px-image2pptx slide.png -o output.pptx --work-dir ./debug/

Python API

from px_image2pptx import image_to_pptx

report = image_to_pptx("slide.png", "output.pptx")

# With options
report = image_to_pptx(
    "slide.png", "output.pptx",
    lang="ch",
    skip_inpaint=False,
    work_dir="./debug/",
)

CLI Options

Option	Default	Description
`-o`, `--output`	`output.pptx`	Output PPTX path
`--ocr-json`		Pre-computed OCR JSON (skips OCR)
`--lang`	`auto`	OCR language: `auto`, `en`, `ch`
`--sensitivity`	`16`	Textmask sensitivity (lower = more)
`--dilation`	`12`	Textmask dilation pixels
`--min-font`	`8`	Min font size in points
`--max-font`	`72`	Max font size in points
`--skip-inpaint`		Skip LAMA inpainting
`--work-dir`		Save intermediate files

Models

Downloaded automatically on first use (~370 MB total). All models are from official open-source repositories.

Model	Size	License	Source
PP-OCRv5_server_det	84 MB	Apache 2.0	PaddlePaddle/PaddleOCR
PP-OCRv5_server_rec	81 MB	Apache 2.0	PaddlePaddle/PaddleOCR
big-lama	196 MB	Apache 2.0	advimman/lama

Models are cached locally after first download (~/.paddlex/official_models/ for OCR, ~/.cache/torch/hub/checkpoints/ for LAMA). To skip model downloads entirely, use --ocr-json with pre-computed OCR and --skip-inpaint.

Limitations — When to Warn the User

Input	Impact	What to tell the user
Text on solid/flat background	Best results	No caveats needed
Text on textured background	Good results	LAMA handles repeating textures well
Text overlapping photos	Inpainting artifacts likely	"Areas where text covers photos may show blurring"
Dense chart with many labels	Over-merged labels	"Crowded labels may be grouped incorrectly"
Very thick/large fonts	Incomplete mask coverage	"Large fonts may exceed dilation range — try increasing `--dilation`"
Light text on dark background	Blockier inpainting	"White-on-dark text uses box masks instead of tight ink masks"
WebP image	OCR fails (0 regions)	Convert to PNG first: `Image.open("in.webp").save("in.png")`
Very large image (>4000px)	Slow inpainting	Suggest `--skip-inpaint` or downscaling
Decorative/handwritten fonts	Typeface won't match	"Fonts are reconstructed as Arial/Helvetica"
Centered/justified text	Left-aligned output	"Text alignment is not preserved"

安全使用建议

This skill appears coherent and implements what it claims. Before installing: (1) expect large downloads (~370 MB) and heavy Python dependencies (PyTorch, PaddleOCR, simple-lama-inpainting) which may take time, disk space, and may access GPUs; (2) models are fetched from open-source repos on first run — verify the upstream GitHub (SKILL.md points to github.com/JadeLiu-tech/px-image2pptx) if you need to trust the source; (3) the skill executes Python code on your machine, so install it in an isolated environment (virtualenv/conda) and review the repository if you have strict security policies; (4) if you cannot allow network downloads, use --ocr-json and --skip-inpaint to avoid model downloads and heavy inpainting, or prepopulate the caches from a trusted source.

功能分析

Type: OpenClaw Skill Name: image2pptx Version: 1.0.1 The image2pptx skill is a legitimate tool for converting static images into editable PowerPoint presentations using a pipeline of OCR (PaddleOCR), classical computer vision for text masking, and neural inpainting (LAMA). The code in assemble.py, textmask.py, and pipeline.py is well-structured and aligns perfectly with the stated purpose, utilizing standard libraries like OpenCV, PIL, and python-pptx. Model downloads for OCR and inpainting are transparently documented and sourced from established open-source repositories. No evidence of data exfiltration, malicious execution, or harmful prompt injection was found.

能力评估

✓ Purpose & Capability

Name/description (image → editable PPTX) align with the provided Python modules (OCR, textmask, inpaint, assemble). The listed dependencies (PaddleOCR, Torch, LAMA) and code behavior are appropriate for OCR + inpainting + PPTX assembly. No unrelated credentials, binaries, or paths are requested.

✓ Instruction Scope

SKILL.md and the code describe and implement the exact pipeline: run OCR, compute ink masks, clip to OCR bboxes, optionally inpaint, assemble PPTX. The instructions and CLI do not ask the agent to read unrelated files, environment variables, or post data to unknown endpoints. Intermediate files and model cache locations are local and documented.

ℹ Install Mechanism

The registry entry has no formal install spec (instruction-only), but SKILL.md/README instruct a git clone + pip install -e and the package includes Python source. Models are downloaded on first use (~370 MB) from cited open-source repos (PaddleOCR, advimman/lama). This is expected but noteworthy: automatic model downloads and heavy native packages (PyTorch) will fetch data from the network and consume disk space.

✓ Credentials

No environment variables, credentials, or config paths are required. The code caches models under standard user-cache directories. There are no requests for unrelated secrets or system tokens.

✓ Persistence & Privilege

The skill does not request always:true and does not modify other skills or system-wide agent configs. It runs as a normal skill and writes only its own temp/intermediate files and model caches.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install image2pptx
安装完成后，直接呼叫该 Skill 的名称或使用 /image2pptx 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.1

- Clarified open-source model origins, listing source repositories for all models. - Added details on local model cache directories. - Included instructions to bypass model downloads using `--ocr-json` and `--skip-inpaint`. - Minor text and formatting updates for clarity.

v1.0.0

image2pptx 1.0.0 – Convert images of slides, posters, and infographics into editable PowerPoint files. - Converts static images to .pptx with editable, selectable text boxes over reconstructed backgrounds. - Uses OCR, classical computer vision, inpainting, and font/color detection for high-accuracy slide reconstruction. - Supports CLI and Python API with options for language, inpainting, font size, and debug outputs. - Handles various scenarios: solid/photo backgrounds, multiple languages, posters, infographics. - Includes clear limitations and user guidance for challenging input cases (e.g., crowded charts, thick fonts, large images).

元数据

Slug image2pptx

版本 1.0.1

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 2

常见问题