← Back to Skills Marketplace
mm-output
by
yuriYanZeXuan
· GitHub ↗
· v1.0.1
· MIT-0
243
Downloads
1
Stars
0
Active Installs
2
Versions
Install in OpenClaw
/install mm-output
Description
Parse PDF/Markdown files into structured HTML posters with multi-modal output (PDF, PNG, DOCX, PPTX), or generate poster/slides images via Gemini image gener...
Usage Guidance
This package looks functionally consistent with a poster/slide generator that uses LLMs and Gemini-style image APIs, but there are several red flags you should check before installing or running: 1) The registry lists no required environment variables, yet the code and SKILL.md expect API keys (OpenAI / RUNWAY / IMAGE_GEN / TEXT_MODEL). Do not supply keys until you confirm which are actually required. 2) Inspect files that call external services (paper2slides/image_generator.py, mm_output/integrate.py, renderer_unit.py) to see exact HTTP endpoints and what data is sent. The example .env contains non-public domains (runway.devops.rednote.life, runway.devops.xiaohongshu.com) — confirm those endpoints are trusted. 3) install.sh will run apt-get and download/install tools (UV, Python 3.12, Playwright). Run it only in an isolated environment (container or VM) and review the script first. 4) The repo contains hard-coded example test paths referencing internal network mounts — ignore or update those before running tests. 5) If you must try it, run in a disposable container, do not expose real API keys (use limited-scope/test keys), and review the network calls (e.g., with a proxy) to ensure no unexpected exfiltration. If you want, provide the contents of paper2slides/image_generator.py and mm_output/integrate.py and I can check exactly which endpoints and parameters the code posts so you can judge trustworthiness.
Capability Analysis
Type: OpenClaw Skill
Name: mm-output
Version: 1.0.1
The skill bundle is a legitimate and highly functional tool for converting PDF and Markdown documents into multi-modal outputs like HTML posters, PDFs, PNGs, and PPTX slides. It utilizes standard industry libraries such as marker-pdf for parsing, Playwright for browser-based rendering, and python-docx/python-pptx for document generation. While the installation script (install.sh) requires root privileges to install system dependencies (fonts and Chromium libraries) and the code interacts with external LLM APIs (OpenAI and Gemini), all behaviors are strictly aligned with the stated purpose. No evidence of data exfiltration, credential theft, or malicious prompt injection was found.
Capability Assessment
Purpose & Capability
The code, README, and SKILL.md all describe LLM-driven rendering and Gemini image generation which legitimately requires API keys and network access. However, the registry metadata claims no required environment variables or credentials while the project clearly expects OPENAI_API_KEY / RUNWAY_API_KEY / IMAGE_GEN_API_KEY / TEXT_MODEL, etc. That mismatch (declared none vs. actual required) is an incoherence that should be resolved before trusting the skill.
Instruction Scope
SKILL.md and run.py instruct the agent to install system packages, create a local .env file, and call LLM/image generation backends. The included .env.txt points at non-standard endpoints (e.g., runway.devops.rednote.life and runway.devops.xiaohongshu.com). The instructions do not try to read unrelated host credentials, but they do direct potentially sensitive content (parsed document text and images) to external LLM/image endpoints — expected for this tool but worth verifying which endpoints and with what keys.
Install Mechanism
There is no platform 'install' metadata, but the bundle includes install.sh that: runs apt-get to install system libraries (requires root), downloads the UV installer via curl, installs Python 3.12 with UV, and installs Playwright browsers. Those are standard but invasive operations (system package installs, network downloads). The curl installer is from astral.sh (UV project) which is a known source; pip/uv dependencies also include a GitHub package (git+https://github.com/Hadlay-Zhang/marker.git). No obviously obfuscated or random download URLs in the install script, but running install.sh will materially change the host system.
Credentials
The skill needs multiple API keys (OpenAI/Gemini/Qwen/RUNWAY/IMAGE_GEN) according to SKILL.md and .env.txt, but the registry did not declare them. The example .env includes IMAGE_GEN_ENDPOINT pointing to a non-public domain (runway.devops.rednote.life) and README references runway.devops.xiaohongshu.com — both look like internal/service endpoints rather than a public vendor endpoint. Requiring multiple secrets and an internal endpoint without declaring them or explaining trust boundaries is disproportionate and raises exfiltration/third-party trust concerns.
Persistence & Privilege
The skill is not marked always:true and does not request system-wide privileges in metadata. It will create a virtualenv and a .env file inside the project directory (normal). The included install.sh does require root to apt-get packages — that is a user action and not an automatic privilege escalation by the skill itself.
How to Use
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install mm-output - After installation, invoke the skill by name or use
/mm-output - Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.1
Version 1.0.1: Major update — local Python skill, multi-LLM, slides & poster image generation.
- Renamed and refactored skill to "postergen-parser" for local Python usage (not Docker-only).
- Added support for multiple LLM APIs: OpenAI, Gemini, and Qwen (MAAS).
- New features: generate poster images and slides images using Gemini's image generation API.
- Added PPTX export and improved multi-modal conversion options (PDF/PNG/DOCX/PPTX).
- New comprehensive setup scripts and dependency management (UV, install.sh, playwright).
- Expanded documentation: detailed configuration, usage examples, and troubleshooting instructions.
v1.0.0
mm-output 1.0.0 initial release
- Run the Docker image yuriyzx/mm-output:latest to convert PDF or Markdown into HTML and multi-modal files (PDF, PNG, DOCX).
- Supports OpenAI-compatible API configuration for LLM-powered rendering.
- Provides Docker-based usage instructions and environment setup.
- Input options: PDF, Markdown; output: HTML posters, PDF, PNG, DOCX.
- Includes troubleshooting tips and advice for image naming and local aliases.
Metadata
Frequently Asked Questions
What is mm-output?
Parse PDF/Markdown files into structured HTML posters with multi-modal output (PDF, PNG, DOCX, PPTX), or generate poster/slides images via Gemini image gener... It is an AI Agent Skill for Claude Code / OpenClaw, with 243 downloads so far.
How do I install mm-output?
Run "/install mm-output" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is mm-output free?
Yes, mm-output is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does mm-output support?
mm-output is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created mm-output?
It is built and maintained by yuriYanZeXuan (@yuriyanzexuan); the current version is v1.0.1.
More Skills