← Back to Skills Marketplace

PDF OCR Parse

Name: PDF OCR Parse
Author: rishabhdugar

by Rishabh Dugar · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ✓ Security Clean

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install pdf-ocr-parse

Description

Extract text from scanned PDFs using Tesseract OCR. Supports multiple languages, page selection, DPI control, and word-level bounding boxes.

README (SKILL.md)

PDF OCR Parse

What It Does

Rasterises each selected page of a PDF at the given DPI, then runs Tesseract OCR on each page image. Returns per-page text with confidence scores, and optionally per-word bounding boxes.

When to Use

Extract text from scanned PDF documents
OCR invoices, receipts, or legacy documents in PDF format
Extract digits-only data (invoice amounts) with char_whitelist
Process multi-language documents

Required Inputs

Provide one of:

url — URL to a scanned PDF
base64_pdf — base64-encoded PDF
Multipart upload with file field

Authentication

Send your API key in the CLIENT-API-KEY header.

Get your free API key at https://pdfapihub.com. Full API documentation is available at https://pdfapihub.com/docs.

Use Cases

Scanned Invoice Processing — OCR scanned PDF invoices to extract text for accounting systems
Legacy Document Digitization — Convert old scanned paper documents into searchable text
Insurance Claims — Extract text from scanned claim forms and medical documents
Legal Discovery — OCR scanned legal documents for full-text search and review
Multi-Language Documents — Process documents in Hindi, French, German, etc. with language-specific models
Form Digitization — Extract filled field values from scanned paper forms

Tesseract Configuration

Param	Default	Description
`lang`	`eng`	Language code(s), `+` separated
`psm`	`3`	Page segmentation mode (0–13)
`oem`	`3`	OCR engine mode (0=legacy, 1=LSTM, 3=default)
`dpi`	`200`	Rasterisation DPI (72–400)
`char_whitelist`	—	Restrict to specific characters

Example Usage

curl -X POST https://pdfapihub.com/api/v1/pdf/ocr/parse \
  -H "CLIENT-API-KEY: your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://pdfapihub.com/sample-pdfinvoice-with-image.pdf",
    "pages": "1-3",
    "lang": "eng",
    "dpi": 300,
    "detail": "words"
  }'

Usage Guidance

This skill appears coherent and only forwards PDFs to pdfapihub.com for OCR using an API key you provide. Before installing or using it: (1) verify and trust the pdfapihub.com service (privacy, retention, and security policies) because any uploaded PDF — potentially containing sensitive data — will be sent to that third party; (2) avoid using production or highly sensitive documents until you’ve tested with non-sensitive samples; (3) manage the API key carefully (use a dedicated key with least privilege and rotate it if possible); and (4) note that the skill owner and homepage are unknown — consider this when deciding whether to trust the service.

Capability Analysis

Type: OpenClaw Skill Name: pdf-ocr-parse Version: 1.0.0 The skill is a standard API wrapper for a cloud-based OCR service (pdfapihub.com). The files (SKILL.md, skill.json) correctly define parameters for Tesseract OCR processing and do not contain any evidence of malicious execution, data exfiltration beyond the stated purpose, or prompt injection attacks.

Capability Tags

requires-sensitive-credentials

Capability Assessment

✓ Purpose & Capability

The name/description, SKILL.md, example.json, and skill.json all describe the same behavior: submit a PDF (URL/base64/file) to pdfapihub.com for OCR and return text/bounding boxes. The required capabilities (API key in header) match the stated purpose.

✓ Instruction Scope

Runtime instructions are narrowly scoped to uploading or referencing a PDF and configuring Tesseract params (lang, dpi, psm, etc.). The SKILL.md does not instruct the agent to read unrelated files, environment variables, or system state.

✓ Install Mechanism

No install spec or code is included (instruction-only), so nothing is written to disk or fetched during installation. This is the lowest-risk install model and is proportionate for an API-wrapping skill.

ℹ Credentials

No platform environment variables are required, which matches the registry metadata. The skill does require an API key provided in the CLIENT-API-KEY header (skill.json marks auth as required) — this is expected for an external API but is a credential that will be sent to pdfapihub.com and should be provisioned per your security policies.

✓ Persistence & Privilege

always is false and the skill is user-invocable (normal). It does not request persistent system privileges or modify other skills' configuration. Autonomous invocation is permitted by default but does not introduce extra incoherence here.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install pdf-ocr-parse
After installation, invoke the skill by name or use /pdf-ocr-parse
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

OCR scanned PDFs using Tesseract. Rasterises pages at configurable DPI, then runs OCR with multi-language support (eng+hin, eng+fra, etc.). Returns per-page text with confidence scores and optional word-level bounding boxes.

Metadata

Slug pdf-ocr-parse

Version 1.0.0

License MIT-0

All-time Installs 1

Active Installs 1

Total Versions 1

Frequently Asked Questions

What is PDF OCR Parse?

Extract text from scanned PDFs using Tesseract OCR. Supports multiple languages, page selection, DPI control, and word-level bounding boxes. It is an AI Agent Skill for Claude Code / OpenClaw, with 79 downloads so far.

How do I install PDF OCR Parse?

Run "/install pdf-ocr-parse" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is PDF OCR Parse free?

Yes, PDF OCR Parse is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does PDF OCR Parse support?

PDF OCR Parse is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created PDF OCR Parse?

It is built and maintained by Rishabh Dugar (@rishabhdugar); the current version is v1.0.0.

More Skills

PDF OCR Parse

PDF OCR Parse

What It Does

When to Use

Required Inputs

Authentication

Use Cases

Tesseract Configuration

Example Usage

What is PDF OCR Parse?

How do I install PDF OCR Parse?

Is PDF OCR Parse free?

Which platforms does PDF OCR Parse support?

Who created PDF OCR Parse?

💬 Comments