← Back to Skills Marketplace

MiniMax PDF OCR

Name: MiniMax PDF OCR
Author: chongjie-ran

by chongjie-ran · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

308

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install minimax-pdf-ocr

Description

使用 MiniMax Vision API 识别 PDF/图片中的文字

Usage Guidance

This skill's code does what its name says: it converts PDF pages to images and uploads those images to a MiniMax Vision API to get OCR results, then writes a Markdown file. Before installing or using it, consider: 1) Privacy: images (full page content) are sent to https://api.minimax.chat — do not process sensitive/confidential documents unless you trust that service and its privacy policy. 2) Credentials: the code requires MINIMAX_API_KEY (set in env). The registry metadata incorrectly stated no env vars — verify you are comfortable providing that API key. 3) System dependency: pdftoppm (poppler) must be installed; SKILL.md mentions it but registry metadata omitted it. 4) Inconsistencies: SKILL.md recommends npm packages (openai, pdf2image) that are not used by the shipped code — this suggests sloppy packaging; prefer to inspect/run the script in a sandbox first. 5) Safety checks: check the API endpoint and the publisher before using with real secrets, and test on non-sensitive sample documents. If you want to proceed, run it locally in an isolated environment and verify network endpoints and outputs yourself. If you require higher assurance, ask the publisher to correct the metadata and provide provenance/hosting information.

Capability Analysis

Type: OpenClaw Skill Name: minimax-pdf-ocr Version: 1.0.0 The skill contains a potential shell injection vulnerability in `pdf-ocr-minimax.js` due to the use of `child_process.spawn` with `shell: true` on unsanitized file paths (`pdfPath`). There is also a discrepancy between the documentation in `SKILL.md`, which instructs users to install unused dependencies (`openai`, `pdf2image`), and the actual implementation which uses native `fetch` and system calls. While the script correctly targets the legitimate MiniMax API endpoint (`api.minimax.chat`), the insecure execution pattern poses a risk.

Capability Assessment

ℹ Purpose & Capability

The code and SKILL.md implement PDF→PNG conversion (pdftoppm/poppler) and send images to a MiniMax Vision API for OCR — this aligns with the skill name/description. However, the registry metadata (which claimed no required env vars or binaries) is inconsistent with the SKILL.md and code that require an API key (MINIMAX_API_KEY) and rely on a system binary (pdftoppm).

ℹ Instruction Scope

Runtime instructions are focused: convert PDF to images, base64-encode images, and POST them to https://api.minimax.chat/v1/text/chatcompletion_v2 for OCR, then save Markdown. The instructions do send image data (embedded as data URLs) to an external API — expected for an OCR skill but important for privacy. SKILL.md also instructs installing npm packages (openai, pdf2image) that the shipped code does not use; this is inconsistent but not directly harmful.

✓ Install Mechanism

No install spec (instruction-only) lowers risk. The only non-JS install guidance is to install poppler (provides pdftoppm) via brew — a standard system package. There are no remote download/extract steps or obscure URLs in the install path.

ℹ Credentials

The code requires a single credential (MINIMAX_API_KEY) and optionally OUTPUT_DIR — proportional for a remote OCR API. However, the registry metadata incorrectly lists no required env vars; this discrepancy between declared requirements and actual code is a red flag (could be sloppy packaging or mis-declared permissions). No other credentials are requested.

✓ Persistence & Privilege

The skill does not request persistent/always-on privileges and does not modify other skills or system-wide configs. It runs as a user-invoked Node script and only accesses the files you provide plus the environment API key.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install minimax-pdf-ocr
After installation, invoke the skill by name or use /minimax-pdf-ocr
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

miniMax PDF OCR 1.0.0 – Initial Release - Recognizes text from PDFs and images using the MiniMax Vision API, supporting Chinese and English. - Converts PDF files to images (using poppler) for OCR processing. - Outputs recognition results as Markdown files with preserved formatting and structure. - Provides both command-line interface and JavaScript API usage. - Supports configurable output directories and environment-based API key management.

Metadata

Slug minimax-pdf-ocr

Version 1.0.0

License MIT-0

All-time Installs 1

Active Installs 1

Total Versions 1

Frequently Asked Questions

What is MiniMax PDF OCR?

使用 MiniMax Vision API 识别 PDF/图片中的文字. It is an AI Agent Skill for Claude Code / OpenClaw, with 308 downloads so far.

How do I install MiniMax PDF OCR?

Run "/install minimax-pdf-ocr" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is MiniMax PDF OCR free?

Yes, MiniMax PDF OCR is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does MiniMax PDF OCR support?

MiniMax PDF OCR is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created MiniMax PDF OCR?

It is built and maintained by chongjie-ran (@chongjie-ran); the current version is v1.0.0.

More Skills