PPT 视觉复刻

Name: PPT 视觉复刻
Author: allergro

功能描述

将PPT截图或信息图转换为可编辑的PPTX文件。此技能基于Images2Slides论文(arXiv:2602.07645)实现，利用视觉-语言模型(VLM)进行区域理解，通过坐标映射算法将像素坐标转换为PPTX坐标。支持复杂形状降级处理策略（custGeom/渐变/透明等无法直接还原的形状转为PNG嵌入）。...

安全使用建议

This skill appears coherent for converting PPT screenshots to editable PPTX: it includes the JS/Python code you'd expect and warns that images are sent to an external VLM. Before installing, ensure you (1) do not process sensitive/confidential slides because images are transmitted to the configured VLM endpoint, (2) verify which VLM/service the agent will use and that its privacy/security posture is acceptable, (3) install the optional dependencies (npm pptxgenjs and Python Pillow) in a controlled environment, and (4) allow filesystem read/write only in a safe temp directory. If you need assurance about where data is sent, ask the skill author how the VLM endpoint is configured or inspect the agent/platform image tool settings.

功能分析

Type: OpenClaw Skill Name: ppt-vision-replica Version: 1.4.1 The skill bundle is a legitimate implementation of a PPT reconstruction tool based on the Images2Slides methodology. It uses a Vision-Language Model (VLM) to analyze screenshots and generates editable PPTX files using standard libraries like 'pptxgenjs' (Node.js) and 'Pillow' (Python). The code in 'scripts/complex_shape.py', 'scripts/coordinate_mapper.js', and 'scripts/ppt_generator.js' is well-documented, functional, and lacks any indicators of malicious intent such as data exfiltration, unauthorized command execution, or obfuscation. The instructions in 'SKILL.md' are strictly task-aligned and include clear disclosures regarding data transmission to VLM endpoints and local filesystem requirements.

能力评估

✓ Purpose & Capability

Name/description align with included code (coordinate_mapper.js, ppt_generator.js, complex_shape.py) and the declared runtime needs (pptxgenjs, Pillow, filesystem, network). There are no unrelated environment variables, binaries, or config paths requested that would be disproportionate to the conversion task.

ℹ Instruction Scope

SKILL.md explicitly instructs sending images to an external VLM service for region analysis (OpenClaw image tool / MiniMax / GPT-4V). The code files themselves do not contain network calls; the network usage is part of the architecture described in the instructions. The instructions also require writing temporary PNGs and output PPTX files and include optional unzipping of original PPTX for XML extraction—these are proportional but mean image data and (optionally) PPTX contents will be transmitted/processed.

✓ Install Mechanism

There is no automated install spec (instruction-only install), which is lowest-risk. The SKILL.md asks users to install pptxgenjs via npm and Pillow via pip for optional PNG rendering; these are standard, known packages. No remote archive downloads or obscure installers are present.

ℹ Credentials

The skill does not declare any required environment variables or credentials, which is consistent with shipping as agent-side code that uses the platform's image/VLM tooling. However, runtime image analysis requires a configured VLM endpoint (implicit credentials/auth are expected to be provided by the platform or agent configuration). Users should confirm where images will be sent and what credentials the agent/platform uses.

✓ Persistence & Privilege

The skill does not request always:true or any special persistent privileges. It reads and writes temporary files in the filesystem (explicitly declared) and does not modify other skills or global agent config.

版本历史

v1.4.1

ppt-vision-replica v1.4.1 Changelog - 明确图片数据发送至外部 VLM 的隐私说明、文件系统读写说明、运行时依赖 - 新增脚本模板 `scripts/complex_shape.py`，用于生成与降级处理相关的复杂形状PNG（Python/Pillow模板化实现）。 - 为复杂自定义路径、多段渐变等PPT无法直接还原的形状，提供可复用的PNG绘制/嵌入能力。 - 无其他变动。

v1.4.0

将 PPT 截图或信息图转换为可编辑的 PPTX 文件。基于 Images2Slides 论文（arXiv:2602.07645）实现，调用 VLM（视觉语言模型）分析截图中的每个区域，理解形状类型、位置坐标、颜色、字体、层级关系，通过坐标映射算法将像素坐标转换为 PPT 坐标，还原出完整的可编辑 PPTX。核心能力： - 精确还原文字内容、位置、字体大小、颜色、对齐方式 - 支持 bullet 符号（■ ✓ - 1. 等）、行间距、字符间距 - 支持矩形、圆角矩形、线条、图片等基础形状 - 复杂形状（渐变填充/透明/自定义路径/custGeom）自动降级为 PNG 嵌入 - 支持多边形近似和贝塞尔曲线的 Python 绘制触发场景： - 根据截图/模板复刻可编辑的PPT文件 - 仅有 PPT 导出图片，需要重新编辑 - 分析竞品 PPT 的设计结构和布局 - 将纸质/截图PPT转换为可编辑版本来源：基于 Images2Slides 学术论文，结合 PptxGenJS + Python Pillow 绘图实现 Convert PPT screenshots or infographics into editable PPTX files. Built on the Images2Slides paper (arXiv:2602.07645), using a Vision Language Model (VLM) to analyze every region in a screenshot — shape types, coordinates, colors, fonts, and z-index layering — then mapping pixel coordinates to PPT coordinates via a coordinate transformation algorithm. Core capabilities: - Precisely restore text content, position, font size, color, and alignment - Support for bullet symbols (■ ✓ - 1.), line spacing, and character spacing - Basic shapes: rectangles, rounded rectangles, lines, and images - Complex shapes (gradient fills / transparency / custom paths / custGeom) auto-degraded to embedded PNGs - Polygon approximation and Bezier curve rendering via Python Pillow Trigger scenarios: - From PPT screenshot/PPT template to an editable file - Only have exported PPT images, need to re-edit - Analyze competitor PPT design structure and layout - Convert paper/screenshot PPT into an editable version Source: Based on Images2Slides academic paper, implemented with PptxGenJS + Python Pillow

元数据

Slug ppt-vision-replica

版本 1.4.1

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 2

常见问题

PPT 视觉复刻是什么？

将PPT截图或信息图转换为可编辑的PPTX文件。此技能基于Images2Slides论文(arXiv:2602.07645)实现，利用视觉-语言模型(VLM)进行区域理解，通过坐标映射算法将像素坐标转换为PPTX坐标。支持复杂形状降级处理策略（custGeom/渐变/透明等无法直接还原的形状转为PNG嵌入）。... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 147 次。

如何安装 PPT 视觉复刻？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install ppt-vision-replica」即可一键安装，无需额外配置。

PPT 视觉复刻是免费的吗？

是的，PPT 视觉复刻完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

PPT 视觉复刻支持哪些平台？

PPT 视觉复刻跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 PPT 视觉复刻？

由 Allegro（@allergro）开发并维护，当前版本 v1.4.1。

PPT 视觉复刻 是什么？

如何安装 PPT 视觉复刻？

PPT 视觉复刻 是免费的吗？

PPT 视觉复刻 支持哪些平台？

谁开发了 PPT 视觉复刻？

💬 留言讨论

PPT 视觉复刻是什么？

PPT 视觉复刻是免费的吗？

PPT 视觉复刻支持哪些平台？