← Back to Skills Marketplace

Pdf Vision

Name: Pdf Vision
Author: lpq6

by lpq6 · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install pdf-vision

Description

Extract text content from image-based/scanned PDFs using multiple vision APIs with automatic fallback. Supports Xflow (qwen3-vl-plus) and ZhipuAI (GLM-4.6V-F...

Usage Guidance

This skill's main OCR functionality appears coherent and reasonable: it converts PDF pages to images, reads your OpenClaw config for provider baseUrls/apiKeys, and posts image data to those endpoints. However, two red flags should be addressed before installing or running it: 1) Unrelated GitHub helper script: scripts/create_github_repo.py is unrelated to PDF extraction and will attempt to find and use a GITHUB_TOKEN (from the environment or by parsing ~/.bashrc) to call the GitHub API. Only run that script if you understand and trust it; otherwise remove or ignore it. Storing tokens in shell RC files is risky—prefer a dedicated credential store. 2) Residual test artifacts: test_skill.sh contains a hardcoded user-specific PDF path (author-local); review and update or delete it to avoid accidental execution that references your filesystem. Recommended actions before installation: - Inspect and (if not needed) delete or move scripts/create_github_repo.py from the skill directory. - Search the skill for any other helper scripts that access credentials or user files and remove or sandbox them. - Run the main extraction script in a sandboxed environment (isolated user account or container) first and verify it only reads ~/.openclaw/openclaw.json and /tmp files. - Ensure your OpenClaw config stores only the credentials you intend to use and is not world-readable. If the repository owner clarifies that the GitHub script is intentionally included (e.g., convenience for packaging) and documents it in SKILL.md, and if you plan to use it only in a controlled way, this lowers the concern. If you cannot get that clarification, treat the extra scripts as suspicious and remove them before use.

Capability Assessment

⚠ Purpose & Capability

The core files and SKILL.md align with the stated purpose: converting PDF pages to images and calling vision-capable models (Xflow / ZhipuAI). However, the repository also includes scripts unrelated to PDF extraction (scripts/create_github_repo.py) which try to locate/use a GITHUB_TOKEN (including by parsing ~/.bashrc) and instruct the user about pushing code to GitHub. That GitHub-oriented functionality is not described in the skill metadata or SKILL.md and is unnecessary for PDF extraction.

⚠ Instruction Scope

SKILL.md and the main scripts only instruct reading your OpenClaw config (~/.openclaw/openclaw.json), converting PDFs to images (pypdfium2), and calling configured model endpoints—this is appropriate. But create_github_repo.py attempts to read environment variables and fallback to parsing ~/.bashrc to find a GITHUB_TOKEN, which is outside the documented scope. test_skill.sh also references a user-specific file path (/home/lpq/.openclaw/workspace/林佩权课表.pdf), indicating leftover developer-specific test artifacts. These extras expand the runtime surface beyond what the skill promises.

✓ Install Mechanism

There is no install specification (instruction-only skill) and no remote download or archive extraction. The package contains local Python and shell scripts only. No network-installed code is fetched at install time by the skill itself.

⚠ Credentials

The skill legitimately reads API keys from ~/.openclaw/openclaw.json for the vision providers, which is proportional. However, create_github_repo.py looks for GITHUB_TOKEN (env or inside ~/.bashrc) without that credential being declared or documented in SKILL.md. That is unexpected for an OCR skill and could lead to accidental use of a shell-stored token if the helper script is executed.

✓ Persistence & Privilege

The skill does not request always:true and does not modify other skills or system-wide settings. The scripts create temporary files under /tmp (documented) and otherwise run on demand. There is no autonomous persistence or privilege escalation requested.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install pdf-vision
After installation, invoke the skill by name or use /pdf-vision
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

- Initial release of pdf-vision skill: extract text content from scanned or image-based PDFs using advanced vision models. - Supports multiple AI vision APIs (Xflow qwen3-vl-plus, ZhipuAI glm-4.6v-flash, fallback to glm-5) for robust extraction. - Converts PDF pages to images and processes them via vision models, overcoming traditional text-extraction limitations. - Automatically selects the best available model with graceful fallback if a model is unavailable. - Handles structured data extraction, multi-page processing, and can be configured for cost optimization or maximum quality. - Complements standard text-based PDF extraction; use with scanned/image PDFs for best results.

Metadata

Slug pdf-vision

Version 1.0.0

License MIT-0

All-time Installs 1

Active Installs 1

Total Versions 1

Frequently Asked Questions

What is Pdf Vision?

Extract text content from image-based/scanned PDFs using multiple vision APIs with automatic fallback. Supports Xflow (qwen3-vl-plus) and ZhipuAI (GLM-4.6V-F... It is an AI Agent Skill for Claude Code / OpenClaw, with 75 downloads so far.

How do I install Pdf Vision?

Run "/install pdf-vision" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Pdf Vision free?

Yes, Pdf Vision is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Pdf Vision support?

Pdf Vision is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Pdf Vision?

It is built and maintained by lpq6 (@lpq6); the current version is v1.0.0.

More Skills