← Back to Skills Marketplace
tsukisama9292

Pdf Ocr Tool

by Xuan-You Lin · GitHub ↗ · v1.3.0
cross-platform ⚠ suspicious
578
Downloads
0
Stars
4
Active Installs
3
Versions
Install in OpenClaw
/install pdf-ocr-tool
Description
Intelligent PDF and image to Markdown converter using Ollama GLM-OCR with smart content detection (text/table/figure)
Usage Guidance
This skill appears to do what it says: convert PDFs/images to Markdown by calling an Ollama GLM-OCR model. Before installing, review and accept these points: (1) The tool sends images and prompts to the configured Ollama host (default localhost). Do not point OLLAMA_HOST to an untrusted remote endpoint if your documents contain sensitive data. (2) Install scripts pull pyproject/uv.lock from the skill's GitHub raw URL if local copies are missing — only proceed if you trust the upstream repository. (3) The skill requires pdftoppm (poppler) to convert PDFs; if missing it will still run for images only. (4) If you need stronger assurance, inspect utils/ollama_client.py to confirm network behavior and where data is posted, and run the post-install hooks manually rather than blindly executing remote install scripts.
Capability Analysis
Type: OpenClaw Skill Name: pdf-ocr-tool Version: 1.3.0 The skill is classified as suspicious due to two significant vulnerabilities. First, the `hooks/install-deps.sh` script attempts to fetch `pyproject.toml` and `uv.lock` from a GitHub repository (`https://raw.githubusercontent.com/nala0222/pdf-ocr-tool/refs/heads/master/`) if local copies are not found. This introduces a supply chain risk, as a compromise of the GitHub repository could lead to the installation of malicious dependencies. Second, the `utils/pdf_utils.py` module uses `subprocess.run` to execute external binaries (`pdftoppm`, `pdfinfo`) with `pdf_path` directly derived from user input (`args.input` in `ocr_tool.py`). This creates a potential shell injection vulnerability if the input PDF path contains malicious shell metacharacters, allowing arbitrary command execution.
Capability Assessment
Purpose & Capability
Name/description (PDF/image → Markdown using Ollama GLM-OCR) aligns with required binaries (ollama, pdftoppm) and the included code (OCR, page splitting, prompts). uv is used for dependency management and appears justified by the install instructions.
Instruction Scope
SKILL.md and the code limit actions to converting PDFs/images, splitting regions, invoking an Ollama service, and writing Markdown/images. However, the tool transmits image data and prompts to an Ollama host you configure (defaults to localhost). If you set the host to a remote service, document contents (possibly sensitive) will be sent over the network — this is expected for an OCR integration but worth noting.
Install Mechanism
Install uses uv (local Python package manager) and shell hooks that copy pyproject/uv.lock from the local tree or raw.githubusercontent.com. The scripts do not fetch arbitrary binaries from untrusted personal servers; they reference GitHub raw and instruct the user to run official install scripts for Ollama/uv. This is typical and proportionate to the task.
Credentials
The skill declares no required credentials or secret env vars. It supports OLLAMA_HOST/OLLAMA_PORT/OCR_MODEL configuration (optional), which is appropriate for selecting the target Ollama service and model. There are no unrelated credentials or config paths requested.
Persistence & Privilege
Skill does not request always: true and does not modify other skills or global agent settings. Install hooks operate within the skill directory and virtualenv; no elevated persistent privileges were requested.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install pdf-ocr-tool
  3. After installation, invoke the skill by name or use /pdf-ocr-tool
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.3.0
Full English documentation, README added, all descriptions in English
v1.2.0
English prompts, install-deps.sh, fixed .gitignore for uv.lock
v1.1.0
**Big update: Adds mixed mode, region-based processing, and pyproject.toml support.** - 新增混合模式(mixed)和分區處理(granularity region),可自動區分並處理不同內容區域 - 支援多種處理模式:text、table、figure、mixed、auto - 增強自訂提示詞的配置功能 - 新增 pyproject.toml,使用 uv 管理 Python 依賴 - 更完善的安裝方式與使用說明,增強與 Ollama GLM-OCR、poppler、uv 的集成
Metadata
Slug pdf-ocr-tool
Version 1.3.0
License
All-time Installs 4
Active Installs 4
Total Versions 3
Frequently Asked Questions

What is Pdf Ocr Tool?

Intelligent PDF and image to Markdown converter using Ollama GLM-OCR with smart content detection (text/table/figure). It is an AI Agent Skill for Claude Code / OpenClaw, with 578 downloads so far.

How do I install Pdf Ocr Tool?

Run "/install pdf-ocr-tool" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Pdf Ocr Tool free?

Yes, Pdf Ocr Tool is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Pdf Ocr Tool support?

Pdf Ocr Tool is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Pdf Ocr Tool?

It is built and maintained by Xuan-You Lin (@tsukisama9292); the current version is v1.3.0.

💬 Comments