← 返回 Skills 市场

universal-pdf-vision-parser

Name: universal-pdf-vision-parser
Author: mingensiie

作者 M Z · GitHub ↗ · v1.0.0

cross-platform ⚠ suspicious

413

总下载

当前安装

版本数

在 OpenClaw 中安装

/install universal-pdf-vision-parse

功能描述

Extract multilingual document content and language learning notes (French, German, Japanese, Spanish, etc.) from PDFs using multimodal vision (Qwen-VL-Max)....

安全使用建议

This skill appears to do what it says (convert PDF pages to images and send them to Qwen‑VL‑Max for transcription), but there are two issues to consider before installing: - Metadata mismatch: The registry claims no required credentials, but the SKILL.md and script require a DashScope API key (DASHSCOPE_API_KEY or --api-key) and Python packages. Confirm the registry/provider and why credentials/dependencies were omitted. - Data exposure: The skill uploads full page images (base64 PNGs) to an external service. Do not run it on sensitive or confidential PDFs unless you trust the DashScope endpoint and have reviewed its privacy/billing/retention policies. Consider using local OCR alternatives for sensitive data. Recommended actions: - Verify the skill's source and author (no homepage and unknown source are risk indicators). - Confirm API key scope and permissions (least-privilege) and monitor billing/usage for unexpected activity. - Test with non-sensitive documents first and inspect network activity if possible. - If you need stronger assurance, ask the publisher to update registry metadata to declare required env vars and dependencies, and provide a canonical homepage or repo.

功能分析

Type: OpenClaw Skill Name: universal-pdf-vision-parse Version: 1.0.0 The OpenClaw skill 'universal-pdf-vision-parser' is benign. The `SKILL.md` provides clear, non-malicious instructions for the agent, and the `scripts/vision_parse.py` code legitimately uses `pymupdf` to process PDFs and `dashscope` to interact with the Qwen-VL-Max vision API. All file system and network operations are directly aligned with the stated purpose of converting PDF content to Markdown, with no evidence of data exfiltration, malicious execution, persistence mechanisms, or prompt injection attempts against the OpenClaw agent.

能力评估

⚠ Purpose & Capability

The skill's name, description, SKILL.md, and code all align: converting PDF pages to images and sending them to Qwen‑VL‑Max for transcription. However, the registry metadata claims no required env vars or credentials while SKILL.md and the script require a DashScope API key (either via --api-key or DASHSCOPE_API_KEY). This metadata omission is an incoherence worth flagging.

✓ Instruction Scope

The runtime instructions and the script remain within the stated purpose: render PDF pages to PNG, base64-encode them, send them plus a transcription prompt to a multimodal API, and write Markdown. The agent is not instructed to read unrelated files or system state.

ℹ Install Mechanism

There is no formal install spec in the registry (instruction-only), but SKILL.md tells the user to pip install pymupdf and dashscope. That is typical for a Python-based, instruction-only skill, but the lack of declared dependencies in the registry is another metadata inconsistency.

⚠ Credentials

The code expects an API key (DASHSCOPE_API_KEY or CLI --api-key) to call an external service; this is proportionate to the function. The concern is that the registry lists no required credentials. Also note that the skill transmits full-page base64 images to a third-party API — that is necessary for the stated purpose but has privacy/breach implications for sensitive documents.

✓ Persistence & Privilege

The skill does not request always:true, does not modify other skills or system-wide settings, and does not persist credentials beyond setting dashscope.api_key at runtime. No elevated or permanent privileges are requested.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install universal-pdf-vision-parse
安装完成后，直接呼叫该 Skill 的名称或使用 /universal-pdf-vision-parse 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

Universal PDF Vision Parser Skill 1.0.0 - Initial release of a high-end, multilingual PDF digitizer for language learning documents. - Uses multimodal vision (Qwen-VL-Max) to extract and structure content from complex layouts into Markdown. - Supports multiple languages including French, German, Japanese, and Spanish. - Converts PDF pages to high-resolution images for accurate text parsing and formatting. - Perfect for extracting language notes, bilingual documents, and hard-to-capture formats.

元数据

Slug universal-pdf-vision-parse

版本 1.0.0

许可证 —

累计安装 7

当前安装数 6

历史版本数 1

常见问题

universal-pdf-vision-parser 是什么？

Extract multilingual document content and language learning notes (French, German, Japanese, Spanish, etc.) from PDFs using multimodal vision (Qwen-VL-Max).... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 413 次。

如何安装 universal-pdf-vision-parser？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install universal-pdf-vision-parse」即可一键安装，无需额外配置。

universal-pdf-vision-parser 是免费的吗？

是的，universal-pdf-vision-parser 完全免费（开源免费），可自由下载、安装和使用。

universal-pdf-vision-parser 支持哪些平台？

universal-pdf-vision-parser 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 universal-pdf-vision-parser？

由 M Z（@mingensiie）开发并维护，当前版本 v1.0.0。