← Back to Skills Marketplace
MiniMax PDF OCR
by
chongjie-ran
· GitHub ↗
· v1.0.0
· MIT-0
308
Downloads
0
Stars
1
Active Installs
1
Versions
Install in OpenClaw
/install minimax-pdf-ocr
Description
使用 MiniMax Vision API 识别 PDF/图片中的文字
Usage Guidance
This skill's code does what its name says: it converts PDF pages to images and uploads those images to a MiniMax Vision API to get OCR results, then writes a Markdown file. Before installing or using it, consider: 1) Privacy: images (full page content) are sent to https://api.minimax.chat — do not process sensitive/confidential documents unless you trust that service and its privacy policy. 2) Credentials: the code requires MINIMAX_API_KEY (set in env). The registry metadata incorrectly stated no env vars — verify you are comfortable providing that API key. 3) System dependency: pdftoppm (poppler) must be installed; SKILL.md mentions it but registry metadata omitted it. 4) Inconsistencies: SKILL.md recommends npm packages (openai, pdf2image) that are not used by the shipped code — this suggests sloppy packaging; prefer to inspect/run the script in a sandbox first. 5) Safety checks: check the API endpoint and the publisher before using with real secrets, and test on non-sensitive sample documents. If you want to proceed, run it locally in an isolated environment and verify network endpoints and outputs yourself. If you require higher assurance, ask the publisher to correct the metadata and provide provenance/hosting information.
Capability Analysis
Type: OpenClaw Skill
Name: minimax-pdf-ocr
Version: 1.0.0
The skill contains a potential shell injection vulnerability in `pdf-ocr-minimax.js` due to the use of `child_process.spawn` with `shell: true` on unsanitized file paths (`pdfPath`). There is also a discrepancy between the documentation in `SKILL.md`, which instructs users to install unused dependencies (`openai`, `pdf2image`), and the actual implementation which uses native `fetch` and system calls. While the script correctly targets the legitimate MiniMax API endpoint (`api.minimax.chat`), the insecure execution pattern poses a risk.
Capability Assessment
Purpose & Capability
The code and SKILL.md implement PDF→PNG conversion (pdftoppm/poppler) and send images to a MiniMax Vision API for OCR — this aligns with the skill name/description. However, the registry metadata (which claimed no required env vars or binaries) is inconsistent with the SKILL.md and code that require an API key (MINIMAX_API_KEY) and rely on a system binary (pdftoppm).
Instruction Scope
Runtime instructions are focused: convert PDF to images, base64-encode images, and POST them to https://api.minimax.chat/v1/text/chatcompletion_v2 for OCR, then save Markdown. The instructions do send image data (embedded as data URLs) to an external API — expected for an OCR skill but important for privacy. SKILL.md also instructs installing npm packages (openai, pdf2image) that the shipped code does not use; this is inconsistent but not directly harmful.
Install Mechanism
No install spec (instruction-only) lowers risk. The only non-JS install guidance is to install poppler (provides pdftoppm) via brew — a standard system package. There are no remote download/extract steps or obscure URLs in the install path.
Credentials
The code requires a single credential (MINIMAX_API_KEY) and optionally OUTPUT_DIR — proportional for a remote OCR API. However, the registry metadata incorrectly lists no required env vars; this discrepancy between declared requirements and actual code is a red flag (could be sloppy packaging or mis-declared permissions). No other credentials are requested.
Persistence & Privilege
The skill does not request persistent/always-on privileges and does not modify other skills or system-wide configs. It runs as a user-invoked Node script and only accesses the files you provide plus the environment API key.
How to Use
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install minimax-pdf-ocr - After installation, invoke the skill by name or use
/minimax-pdf-ocr - Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
miniMax PDF OCR 1.0.0 – Initial Release
- Recognizes text from PDFs and images using the MiniMax Vision API, supporting Chinese and English.
- Converts PDF files to images (using poppler) for OCR processing.
- Outputs recognition results as Markdown files with preserved formatting and structure.
- Provides both command-line interface and JavaScript API usage.
- Supports configurable output directories and environment-based API key management.
Metadata
Frequently Asked Questions
What is MiniMax PDF OCR?
使用 MiniMax Vision API 识别 PDF/图片中的文字. It is an AI Agent Skill for Claude Code / OpenClaw, with 308 downloads so far.
How do I install MiniMax PDF OCR?
Run "/install minimax-pdf-ocr" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is MiniMax PDF OCR free?
Yes, MiniMax PDF OCR is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does MiniMax PDF OCR support?
MiniMax PDF OCR is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created MiniMax PDF OCR?
It is built and maintained by chongjie-ran (@chongjie-ran); the current version is v1.0.0.
More Skills