← Back to Skills Marketplace

Pdf Ocr Tool

Name: Pdf Ocr Tool
Author: tsukisama9292

by Xuan-You Lin · GitHub ↗ · v1.3.0

cross-platform ⚠ suspicious

578

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install pdf-ocr-tool

Description

Intelligent PDF and image to Markdown converter using Ollama GLM-OCR with smart content detection (text/table/figure)

Usage Guidance

This skill appears to do what it says: convert PDFs/images to Markdown by calling an Ollama GLM-OCR model. Before installing, review and accept these points: (1) The tool sends images and prompts to the configured Ollama host (default localhost). Do not point OLLAMA_HOST to an untrusted remote endpoint if your documents contain sensitive data. (2) Install scripts pull pyproject/uv.lock from the skill's GitHub raw URL if local copies are missing — only proceed if you trust the upstream repository. (3) The skill requires pdftoppm (poppler) to convert PDFs; if missing it will still run for images only. (4) If you need stronger assurance, inspect utils/ollama_client.py to confirm network behavior and where data is posted, and run the post-install hooks manually rather than blindly executing remote install scripts.

Capability Analysis

Type: OpenClaw Skill Name: pdf-ocr-tool Version: 1.3.0 The skill is classified as suspicious due to two significant vulnerabilities. First, the `hooks/install-deps.sh` script attempts to fetch `pyproject.toml` and `uv.lock` from a GitHub repository (`https://raw.githubusercontent.com/nala0222/pdf-ocr-tool/refs/heads/master/`) if local copies are not found. This introduces a supply chain risk, as a compromise of the GitHub repository could lead to the installation of malicious dependencies. Second, the `utils/pdf_utils.py` module uses `subprocess.run` to execute external binaries (`pdftoppm`, `pdfinfo`) with `pdf_path` directly derived from user input (`args.input` in `ocr_tool.py`). This creates a potential shell injection vulnerability if the input PDF path contains malicious shell metacharacters, allowing arbitrary command execution.

Capability Assessment

✓ Purpose & Capability

Name/description (PDF/image → Markdown using Ollama GLM-OCR) aligns with required binaries (ollama, pdftoppm) and the included code (OCR, page splitting, prompts). uv is used for dependency management and appears justified by the install instructions.

ℹ Instruction Scope

SKILL.md and the code limit actions to converting PDFs/images, splitting regions, invoking an Ollama service, and writing Markdown/images. However, the tool transmits image data and prompts to an Ollama host you configure (defaults to localhost). If you set the host to a remote service, document contents (possibly sensitive) will be sent over the network — this is expected for an OCR integration but worth noting.

✓ Install Mechanism

Install uses uv (local Python package manager) and shell hooks that copy pyproject/uv.lock from the local tree or raw.githubusercontent.com. The scripts do not fetch arbitrary binaries from untrusted personal servers; they reference GitHub raw and instruct the user to run official install scripts for Ollama/uv. This is typical and proportionate to the task.

✓ Credentials

The skill declares no required credentials or secret env vars. It supports OLLAMA_HOST/OLLAMA_PORT/OCR_MODEL configuration (optional), which is appropriate for selecting the target Ollama service and model. There are no unrelated credentials or config paths requested.

✓ Persistence & Privilege

Skill does not request always: true and does not modify other skills or global agent settings. Install hooks operate within the skill directory and virtualenv; no elevated persistent privileges were requested.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install pdf-ocr-tool
After installation, invoke the skill by name or use /pdf-ocr-tool
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.3.0

Full English documentation, README added, all descriptions in English

v1.2.0

English prompts, install-deps.sh, fixed .gitignore for uv.lock

v1.1.0

**Big update: Adds mixed mode, region-based processing, and pyproject.toml support.** - 新增混合模式（mixed）和分區處理（granularity region），可自動區分並處理不同內容區域 - 支援多種處理模式：text、table、figure、mixed、auto - 增強自訂提示詞的配置功能 - 新增 pyproject.toml，使用 uv 管理 Python 依賴 - 更完善的安裝方式與使用說明，增強與 Ollama GLM-OCR、poppler、uv 的集成

Metadata

Slug pdf-ocr-tool

Version 1.3.0

License —

All-time Installs 4

Active Installs 4

Total Versions 3

Frequently Asked Questions

What is Pdf Ocr Tool?

Intelligent PDF and image to Markdown converter using Ollama GLM-OCR with smart content detection (text/table/figure). It is an AI Agent Skill for Claude Code / OpenClaw, with 578 downloads so far.

How do I install Pdf Ocr Tool?

Run "/install pdf-ocr-tool" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Pdf Ocr Tool free?

Yes, Pdf Ocr Tool is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Pdf Ocr Tool support?

Pdf Ocr Tool is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Pdf Ocr Tool?

It is built and maintained by Xuan-You Lin (@tsukisama9292); the current version is v1.3.0.

More Skills