← Back to Skills Marketplace

universal-pdf-vision-parser

Name: universal-pdf-vision-parser
Author: mingensiie

by M Z · GitHub ↗ · v1.0.0

cross-platform ⚠ suspicious

413

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install universal-pdf-vision-parse

Description

Extract multilingual document content and language learning notes (French, German, Japanese, Spanish, etc.) from PDFs using multimodal vision (Qwen-VL-Max)....

Usage Guidance

This skill appears to do what it says (convert PDF pages to images and send them to Qwen‑VL‑Max for transcription), but there are two issues to consider before installing: - Metadata mismatch: The registry claims no required credentials, but the SKILL.md and script require a DashScope API key (DASHSCOPE_API_KEY or --api-key) and Python packages. Confirm the registry/provider and why credentials/dependencies were omitted. - Data exposure: The skill uploads full page images (base64 PNGs) to an external service. Do not run it on sensitive or confidential PDFs unless you trust the DashScope endpoint and have reviewed its privacy/billing/retention policies. Consider using local OCR alternatives for sensitive data. Recommended actions: - Verify the skill's source and author (no homepage and unknown source are risk indicators). - Confirm API key scope and permissions (least-privilege) and monitor billing/usage for unexpected activity. - Test with non-sensitive documents first and inspect network activity if possible. - If you need stronger assurance, ask the publisher to update registry metadata to declare required env vars and dependencies, and provide a canonical homepage or repo.

Capability Analysis

Type: OpenClaw Skill Name: universal-pdf-vision-parse Version: 1.0.0 The OpenClaw skill 'universal-pdf-vision-parser' is benign. The `SKILL.md` provides clear, non-malicious instructions for the agent, and the `scripts/vision_parse.py` code legitimately uses `pymupdf` to process PDFs and `dashscope` to interact with the Qwen-VL-Max vision API. All file system and network operations are directly aligned with the stated purpose of converting PDF content to Markdown, with no evidence of data exfiltration, malicious execution, persistence mechanisms, or prompt injection attempts against the OpenClaw agent.

Capability Assessment

⚠ Purpose & Capability

The skill's name, description, SKILL.md, and code all align: converting PDF pages to images and sending them to Qwen‑VL‑Max for transcription. However, the registry metadata claims no required env vars or credentials while SKILL.md and the script require a DashScope API key (either via --api-key or DASHSCOPE_API_KEY). This metadata omission is an incoherence worth flagging.

✓ Instruction Scope

The runtime instructions and the script remain within the stated purpose: render PDF pages to PNG, base64-encode them, send them plus a transcription prompt to a multimodal API, and write Markdown. The agent is not instructed to read unrelated files or system state.

ℹ Install Mechanism

There is no formal install spec in the registry (instruction-only), but SKILL.md tells the user to pip install pymupdf and dashscope. That is typical for a Python-based, instruction-only skill, but the lack of declared dependencies in the registry is another metadata inconsistency.

⚠ Credentials

The code expects an API key (DASHSCOPE_API_KEY or CLI --api-key) to call an external service; this is proportionate to the function. The concern is that the registry lists no required credentials. Also note that the skill transmits full-page base64 images to a third-party API — that is necessary for the stated purpose but has privacy/breach implications for sensitive documents.

✓ Persistence & Privilege

The skill does not request always:true, does not modify other skills or system-wide settings, and does not persist credentials beyond setting dashscope.api_key at runtime. No elevated or permanent privileges are requested.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install universal-pdf-vision-parse
After installation, invoke the skill by name or use /universal-pdf-vision-parse
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

Universal PDF Vision Parser Skill 1.0.0 - Initial release of a high-end, multilingual PDF digitizer for language learning documents. - Uses multimodal vision (Qwen-VL-Max) to extract and structure content from complex layouts into Markdown. - Supports multiple languages including French, German, Japanese, and Spanish. - Converts PDF pages to high-resolution images for accurate text parsing and formatting. - Perfect for extracting language notes, bilingual documents, and hard-to-capture formats.

Metadata

Slug universal-pdf-vision-parse

Version 1.0.0

License —

All-time Installs 7

Active Installs 6

Total Versions 1

Frequently Asked Questions

What is universal-pdf-vision-parser?

Extract multilingual document content and language learning notes (French, German, Japanese, Spanish, etc.) from PDFs using multimodal vision (Qwen-VL-Max).... It is an AI Agent Skill for Claude Code / OpenClaw, with 413 downloads so far.

How do I install universal-pdf-vision-parser?

Run "/install universal-pdf-vision-parse" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is universal-pdf-vision-parser free?

Yes, universal-pdf-vision-parser is completely free (open-source). You can download, install and use it at no cost.

Which platforms does universal-pdf-vision-parser support?

universal-pdf-vision-parser is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created universal-pdf-vision-parser?

It is built and maintained by M Z (@mingensiie); the current version is v1.0.0.

More Skills