PPT 视觉复刻

Name: PPT 视觉复刻
Author: allergro

Description

将PPT截图或信息图转换为可编辑的PPTX文件。此技能基于Images2Slides论文(arXiv:2602.07645)实现，利用视觉-语言模型(VLM)进行区域理解，通过坐标映射算法将像素坐标转换为PPTX坐标。支持复杂形状降级处理策略（custGeom/渐变/透明等无法直接还原的形状转为PNG嵌入）。...

Usage Guidance

This skill appears coherent for converting PPT screenshots to editable PPTX: it includes the JS/Python code you'd expect and warns that images are sent to an external VLM. Before installing, ensure you (1) do not process sensitive/confidential slides because images are transmitted to the configured VLM endpoint, (2) verify which VLM/service the agent will use and that its privacy/security posture is acceptable, (3) install the optional dependencies (npm pptxgenjs and Python Pillow) in a controlled environment, and (4) allow filesystem read/write only in a safe temp directory. If you need assurance about where data is sent, ask the skill author how the VLM endpoint is configured or inspect the agent/platform image tool settings.

Capability Analysis

Type: OpenClaw Skill Name: ppt-vision-replica Version: 1.4.1 The skill bundle is a legitimate implementation of a PPT reconstruction tool based on the Images2Slides methodology. It uses a Vision-Language Model (VLM) to analyze screenshots and generates editable PPTX files using standard libraries like 'pptxgenjs' (Node.js) and 'Pillow' (Python). The code in 'scripts/complex_shape.py', 'scripts/coordinate_mapper.js', and 'scripts/ppt_generator.js' is well-documented, functional, and lacks any indicators of malicious intent such as data exfiltration, unauthorized command execution, or obfuscation. The instructions in 'SKILL.md' are strictly task-aligned and include clear disclosures regarding data transmission to VLM endpoints and local filesystem requirements.

Capability Assessment

✓ Purpose & Capability

Name/description align with included code (coordinate_mapper.js, ppt_generator.js, complex_shape.py) and the declared runtime needs (pptxgenjs, Pillow, filesystem, network). There are no unrelated environment variables, binaries, or config paths requested that would be disproportionate to the conversion task.

ℹ Instruction Scope

SKILL.md explicitly instructs sending images to an external VLM service for region analysis (OpenClaw image tool / MiniMax / GPT-4V). The code files themselves do not contain network calls; the network usage is part of the architecture described in the instructions. The instructions also require writing temporary PNGs and output PPTX files and include optional unzipping of original PPTX for XML extraction—these are proportional but mean image data and (optionally) PPTX contents will be transmitted/processed.

✓ Install Mechanism

There is no automated install spec (instruction-only install), which is lowest-risk. The SKILL.md asks users to install pptxgenjs via npm and Pillow via pip for optional PNG rendering; these are standard, known packages. No remote archive downloads or obscure installers are present.

ℹ Credentials

The skill does not declare any required environment variables or credentials, which is consistent with shipping as agent-side code that uses the platform's image/VLM tooling. However, runtime image analysis requires a configured VLM endpoint (implicit credentials/auth are expected to be provided by the platform or agent configuration). Users should confirm where images will be sent and what credentials the agent/platform uses.

✓ Persistence & Privilege

The skill does not request always:true or any special persistent privileges. It reads and writes temporary files in the filesystem (explicitly declared) and does not modify other skills or global agent config.

Version History

v1.4.1

ppt-vision-replica v1.4.1 Changelog - 明确图片数据发送至外部 VLM 的隐私说明、文件系统读写说明、运行时依赖 - 新增脚本模板 `scripts/complex_shape.py`，用于生成与降级处理相关的复杂形状PNG（Python/Pillow模板化实现）。 - 为复杂自定义路径、多段渐变等PPT无法直接还原的形状，提供可复用的PNG绘制/嵌入能力。 - 无其他变动。

v1.4.0

将 PPT 截图或信息图转换为可编辑的 PPTX 文件。基于 Images2Slides 论文（arXiv:2602.07645）实现，调用 VLM（视觉语言模型）分析截图中的每个区域，理解形状类型、位置坐标、颜色、字体、层级关系，通过坐标映射算法将像素坐标转换为 PPT 坐标，还原出完整的可编辑 PPTX。核心能力： - 精确还原文字内容、位置、字体大小、颜色、对齐方式 - 支持 bullet 符号（■ ✓ - 1. 等）、行间距、字符间距 - 支持矩形、圆角矩形、线条、图片等基础形状 - 复杂形状（渐变填充/透明/自定义路径/custGeom）自动降级为 PNG 嵌入 - 支持多边形近似和贝塞尔曲线的 Python 绘制触发场景： - 根据截图/模板复刻可编辑的PPT文件 - 仅有 PPT 导出图片，需要重新编辑 - 分析竞品 PPT 的设计结构和布局 - 将纸质/截图PPT转换为可编辑版本来源：基于 Images2Slides 学术论文，结合 PptxGenJS + Python Pillow 绘图实现 Convert PPT screenshots or infographics into editable PPTX files. Built on the Images2Slides paper (arXiv:2602.07645), using a Vision Language Model (VLM) to analyze every region in a screenshot — shape types, coordinates, colors, fonts, and z-index layering — then mapping pixel coordinates to PPT coordinates via a coordinate transformation algorithm. Core capabilities: - Precisely restore text content, position, font size, color, and alignment - Support for bullet symbols (■ ✓ - 1.), line spacing, and character spacing - Basic shapes: rectangles, rounded rectangles, lines, and images - Complex shapes (gradient fills / transparency / custom paths / custGeom) auto-degraded to embedded PNGs - Polygon approximation and Bezier curve rendering via Python Pillow Trigger scenarios: - From PPT screenshot/PPT template to an editable file - Only have exported PPT images, need to re-edit - Analyze competitor PPT design structure and layout - Convert paper/screenshot PPT into an editable version Source: Based on Images2Slides academic paper, implemented with PptxGenJS + Python Pillow

Metadata

Slug ppt-vision-replica

Version 1.4.1

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 2

Frequently Asked Questions

What is PPT 视觉复刻?

将PPT截图或信息图转换为可编辑的PPTX文件。此技能基于Images2Slides论文(arXiv:2602.07645)实现，利用视觉-语言模型(VLM)进行区域理解，通过坐标映射算法将像素坐标转换为PPTX坐标。支持复杂形状降级处理策略（custGeom/渐变/透明等无法直接还原的形状转为PNG嵌入）。... It is an AI Agent Skill for Claude Code / OpenClaw, with 147 downloads so far.

How do I install PPT 视觉复刻?

Run "/install ppt-vision-replica" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is PPT 视觉复刻 free?

Yes, PPT 视觉复刻 is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does PPT 视觉复刻 support?

PPT 视觉复刻 is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created PPT 视觉复刻?

It is built and maintained by Allegro (@allergro); the current version is v1.4.1.

More Skills

What is PPT 视觉复刻?

How do I install PPT 视觉复刻?

Is PPT 视觉复刻 free?

Which platforms does PPT 视觉复刻 support?

Who created PPT 视觉复刻?

💬 Comments