← 返回 Skills 市场

Pdf Vision

Name: Pdf Vision
Author: lpq6

作者 lpq6 · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

总下载

当前安装

版本数

在 OpenClaw 中安装

/install pdf-vision

功能描述

Extract text content from image-based/scanned PDFs using multiple vision APIs with automatic fallback. Supports Xflow (qwen3-vl-plus) and ZhipuAI (GLM-4.6V-F...

安全使用建议

This skill's main OCR functionality appears coherent and reasonable: it converts PDF pages to images, reads your OpenClaw config for provider baseUrls/apiKeys, and posts image data to those endpoints. However, two red flags should be addressed before installing or running it: 1) Unrelated GitHub helper script: scripts/create_github_repo.py is unrelated to PDF extraction and will attempt to find and use a GITHUB_TOKEN (from the environment or by parsing ~/.bashrc) to call the GitHub API. Only run that script if you understand and trust it; otherwise remove or ignore it. Storing tokens in shell RC files is risky—prefer a dedicated credential store. 2) Residual test artifacts: test_skill.sh contains a hardcoded user-specific PDF path (author-local); review and update or delete it to avoid accidental execution that references your filesystem. Recommended actions before installation: - Inspect and (if not needed) delete or move scripts/create_github_repo.py from the skill directory. - Search the skill for any other helper scripts that access credentials or user files and remove or sandbox them. - Run the main extraction script in a sandboxed environment (isolated user account or container) first and verify it only reads ~/.openclaw/openclaw.json and /tmp files. - Ensure your OpenClaw config stores only the credentials you intend to use and is not world-readable. If the repository owner clarifies that the GitHub script is intentionally included (e.g., convenience for packaging) and documents it in SKILL.md, and if you plan to use it only in a controlled way, this lowers the concern. If you cannot get that clarification, treat the extra scripts as suspicious and remove them before use.

能力评估

⚠ Purpose & Capability

The core files and SKILL.md align with the stated purpose: converting PDF pages to images and calling vision-capable models (Xflow / ZhipuAI). However, the repository also includes scripts unrelated to PDF extraction (scripts/create_github_repo.py) which try to locate/use a GITHUB_TOKEN (including by parsing ~/.bashrc) and instruct the user about pushing code to GitHub. That GitHub-oriented functionality is not described in the skill metadata or SKILL.md and is unnecessary for PDF extraction.

⚠ Instruction Scope

SKILL.md and the main scripts only instruct reading your OpenClaw config (~/.openclaw/openclaw.json), converting PDFs to images (pypdfium2), and calling configured model endpoints—this is appropriate. But create_github_repo.py attempts to read environment variables and fallback to parsing ~/.bashrc to find a GITHUB_TOKEN, which is outside the documented scope. test_skill.sh also references a user-specific file path (/home/lpq/.openclaw/workspace/林佩权课表.pdf), indicating leftover developer-specific test artifacts. These extras expand the runtime surface beyond what the skill promises.

✓ Install Mechanism

There is no install specification (instruction-only skill) and no remote download or archive extraction. The package contains local Python and shell scripts only. No network-installed code is fetched at install time by the skill itself.

⚠ Credentials

The skill legitimately reads API keys from ~/.openclaw/openclaw.json for the vision providers, which is proportional. However, create_github_repo.py looks for GITHUB_TOKEN (env or inside ~/.bashrc) without that credential being declared or documented in SKILL.md. That is unexpected for an OCR skill and could lead to accidental use of a shell-stored token if the helper script is executed.

✓ Persistence & Privilege

The skill does not request always:true and does not modify other skills or system-wide settings. The scripts create temporary files under /tmp (documented) and otherwise run on demand. There is no autonomous persistence or privilege escalation requested.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install pdf-vision
安装完成后，直接呼叫该 Skill 的名称或使用 /pdf-vision 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

- Initial release of pdf-vision skill: extract text content from scanned or image-based PDFs using advanced vision models. - Supports multiple AI vision APIs (Xflow qwen3-vl-plus, ZhipuAI glm-4.6v-flash, fallback to glm-5) for robust extraction. - Converts PDF pages to images and processes them via vision models, overcoming traditional text-extraction limitations. - Automatically selects the best available model with graceful fallback if a model is unavailable. - Handles structured data extraction, multi-page processing, and can be configured for cost optimization or maximum quality. - Complements standard text-based PDF extraction; use with scanned/image PDFs for best results.

元数据

Slug pdf-vision

版本 1.0.0

许可证 MIT-0

累计安装 1

当前安装数 1

历史版本数 1

常见问题

Pdf Vision 是什么？

Extract text content from image-based/scanned PDFs using multiple vision APIs with automatic fallback. Supports Xflow (qwen3-vl-plus) and ZhipuAI (GLM-4.6V-F... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 75 次。

如何安装 Pdf Vision？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install pdf-vision」即可一键安装，无需额外配置。

Pdf Vision 是免费的吗？

是的，Pdf Vision 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Pdf Vision 支持哪些平台？

Pdf Vision 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Pdf Vision？

由 lpq6（@lpq6）开发并维护，当前版本 v1.0.0。