← 返回 Skills 市场
147
总下载
0
收藏
0
当前安装
2
版本数
在 OpenClaw 中安装
/install ppt-vision-replica
功能描述
将PPT截图或信息图转换为可编辑的PPTX文件。此技能基于Images2Slides论文(arXiv:2602.07645)实现, 利用视觉-语言模型(VLM)进行区域理解,通过坐标映射算法将像素坐标转换为PPTX坐标。 支持复杂形状降级处理策略(custGeom/渐变/透明等无法直接还原的形状转为PNG嵌入)。...
安全使用建议
This skill appears coherent for converting PPT screenshots to editable PPTX: it includes the JS/Python code you'd expect and warns that images are sent to an external VLM. Before installing, ensure you (1) do not process sensitive/confidential slides because images are transmitted to the configured VLM endpoint, (2) verify which VLM/service the agent will use and that its privacy/security posture is acceptable, (3) install the optional dependencies (npm pptxgenjs and Python Pillow) in a controlled environment, and (4) allow filesystem read/write only in a safe temp directory. If you need assurance about where data is sent, ask the skill author how the VLM endpoint is configured or inspect the agent/platform image tool settings.
功能分析
Type: OpenClaw Skill
Name: ppt-vision-replica
Version: 1.4.1
The skill bundle is a legitimate implementation of a PPT reconstruction tool based on the Images2Slides methodology. It uses a Vision-Language Model (VLM) to analyze screenshots and generates editable PPTX files using standard libraries like 'pptxgenjs' (Node.js) and 'Pillow' (Python). The code in 'scripts/complex_shape.py', 'scripts/coordinate_mapper.js', and 'scripts/ppt_generator.js' is well-documented, functional, and lacks any indicators of malicious intent such as data exfiltration, unauthorized command execution, or obfuscation. The instructions in 'SKILL.md' are strictly task-aligned and include clear disclosures regarding data transmission to VLM endpoints and local filesystem requirements.
能力评估
Purpose & Capability
Name/description align with included code (coordinate_mapper.js, ppt_generator.js, complex_shape.py) and the declared runtime needs (pptxgenjs, Pillow, filesystem, network). There are no unrelated environment variables, binaries, or config paths requested that would be disproportionate to the conversion task.
Instruction Scope
SKILL.md explicitly instructs sending images to an external VLM service for region analysis (OpenClaw image tool / MiniMax / GPT-4V). The code files themselves do not contain network calls; the network usage is part of the architecture described in the instructions. The instructions also require writing temporary PNGs and output PPTX files and include optional unzipping of original PPTX for XML extraction—these are proportional but mean image data and (optionally) PPTX contents will be transmitted/processed.
Install Mechanism
There is no automated install spec (instruction-only install), which is lowest-risk. The SKILL.md asks users to install pptxgenjs via npm and Pillow via pip for optional PNG rendering; these are standard, known packages. No remote archive downloads or obscure installers are present.
Credentials
The skill does not declare any required environment variables or credentials, which is consistent with shipping as agent-side code that uses the platform's image/VLM tooling. However, runtime image analysis requires a configured VLM endpoint (implicit credentials/auth are expected to be provided by the platform or agent configuration). Users should confirm where images will be sent and what credentials the agent/platform uses.
Persistence & Privilege
The skill does not request always:true or any special persistent privileges. It reads and writes temporary files in the filesystem (explicitly declared) and does not modify other skills or global agent config.
如何使用
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install ppt-vision-replica - 安装完成后,直接呼叫该 Skill 的名称或使用
/ppt-vision-replica触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.4.1
ppt-vision-replica v1.4.1 Changelog
- 明确图片数据发送至外部 VLM 的隐私说明、文件系统读写说明、运行时依赖
- 新增脚本模板 `scripts/complex_shape.py`,用于生成与降级处理相关的复杂形状PNG(Python/Pillow模板化实现)。
- 为复杂自定义路径、多段渐变等PPT无法直接还原的形状,提供可复用的PNG绘制/嵌入能力。
- 无其他变动。
v1.4.0
将 PPT 截图或信息图转换为可编辑的 PPTX 文件。
基于 Images2Slides 论文(arXiv:2602.07645)实现,调用 VLM(视觉语言模型)
分析截图中的每个区域,理解形状类型、位置坐标、颜色、字体、层级关系,
通过坐标映射算法将像素坐标转换为 PPT 坐标,还原出完整的可编辑 PPTX。
核心能力:
- 精确还原文字内容、位置、字体大小、颜色、对齐方式
- 支持 bullet 符号(■ ✓ - 1. 等)、行间距、字符间距
- 支持矩形、圆角矩形、线条、图片等基础形状
- 复杂形状(渐变填充/透明/自定义路径/custGeom)自动降级为 PNG 嵌入
- 支持多边形近似和贝塞尔曲线的 Python 绘制
触发场景:
- 根据截图/模板复刻可编辑的PPT文件
- 仅有 PPT 导出图片,需要重新编辑
- 分析竞品 PPT 的设计结构和布局
- 将纸质/截图PPT转换为可编辑版本
来源:基于 Images2Slides 学术论文,结合 PptxGenJS + Python Pillow 绘图实现
Convert PPT screenshots or infographics into editable PPTX files.
Built on the Images2Slides paper (arXiv:2602.07645), using a Vision Language Model (VLM)
to analyze every region in a screenshot — shape types, coordinates, colors, fonts, and z-index layering —
then mapping pixel coordinates to PPT coordinates via a coordinate transformation algorithm.
Core capabilities:
- Precisely restore text content, position, font size, color, and alignment
- Support for bullet symbols (■ ✓ - 1.), line spacing, and character spacing
- Basic shapes: rectangles, rounded rectangles, lines, and images
- Complex shapes (gradient fills / transparency / custom paths / custGeom) auto-degraded to embedded PNGs
- Polygon approximation and Bezier curve rendering via Python Pillow
Trigger scenarios:
- From PPT screenshot/PPT template to an editable file
- Only have exported PPT images, need to re-edit
- Analyze competitor PPT design structure and layout
- Convert paper/screenshot PPT into an editable version
Source: Based on Images2Slides academic paper, implemented with PptxGenJS + Python Pillow
元数据
常见问题
PPT 视觉复刻 是什么?
将PPT截图或信息图转换为可编辑的PPTX文件。此技能基于Images2Slides论文(arXiv:2602.07645)实现, 利用视觉-语言模型(VLM)进行区域理解,通过坐标映射算法将像素坐标转换为PPTX坐标。 支持复杂形状降级处理策略(custGeom/渐变/透明等无法直接还原的形状转为PNG嵌入)。... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 147 次。
如何安装 PPT 视觉复刻?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install ppt-vision-replica」即可一键安装,无需额外配置。
PPT 视觉复刻 是免费的吗?
是的,PPT 视觉复刻 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
PPT 视觉复刻 支持哪些平台?
PPT 视觉复刻 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 PPT 视觉复刻?
由 Allegro(@allergro)开发并维护,当前版本 v1.4.1。
推荐 Skills