← Back to Skills Marketplace
rishabhdugar

PDF OCR Parse

by Rishabh Dugar · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ Security Clean
79
Downloads
0
Stars
1
Active Installs
1
Versions
Install in OpenClaw
/install pdf-ocr-parse
Description
Extract text from scanned PDFs using Tesseract OCR. Supports multiple languages, page selection, DPI control, and word-level bounding boxes.
README (SKILL.md)

PDF OCR Parse

What It Does

Rasterises each selected page of a PDF at the given DPI, then runs Tesseract OCR on each page image. Returns per-page text with confidence scores, and optionally per-word bounding boxes.

When to Use

  • Extract text from scanned PDF documents
  • OCR invoices, receipts, or legacy documents in PDF format
  • Extract digits-only data (invoice amounts) with char_whitelist
  • Process multi-language documents

Required Inputs

Provide one of:

  • url — URL to a scanned PDF
  • base64_pdf — base64-encoded PDF
  • Multipart upload with file field

Authentication

Send your API key in the CLIENT-API-KEY header.

Get your free API key at https://pdfapihub.com. Full API documentation is available at https://pdfapihub.com/docs.

Use Cases

  • Scanned Invoice Processing — OCR scanned PDF invoices to extract text for accounting systems
  • Legacy Document Digitization — Convert old scanned paper documents into searchable text
  • Insurance Claims — Extract text from scanned claim forms and medical documents
  • Legal Discovery — OCR scanned legal documents for full-text search and review
  • Multi-Language Documents — Process documents in Hindi, French, German, etc. with language-specific models
  • Form Digitization — Extract filled field values from scanned paper forms

Tesseract Configuration

Param Default Description
lang eng Language code(s), + separated
psm 3 Page segmentation mode (0–13)
oem 3 OCR engine mode (0=legacy, 1=LSTM, 3=default)
dpi 200 Rasterisation DPI (72–400)
char_whitelist Restrict to specific characters

Example Usage

curl -X POST https://pdfapihub.com/api/v1/pdf/ocr/parse \
  -H "CLIENT-API-KEY: your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://pdfapihub.com/sample-pdfinvoice-with-image.pdf",
    "pages": "1-3",
    "lang": "eng",
    "dpi": 300,
    "detail": "words"
  }'
Usage Guidance
This skill appears coherent and only forwards PDFs to pdfapihub.com for OCR using an API key you provide. Before installing or using it: (1) verify and trust the pdfapihub.com service (privacy, retention, and security policies) because any uploaded PDF — potentially containing sensitive data — will be sent to that third party; (2) avoid using production or highly sensitive documents until you’ve tested with non-sensitive samples; (3) manage the API key carefully (use a dedicated key with least privilege and rotate it if possible); and (4) note that the skill owner and homepage are unknown — consider this when deciding whether to trust the service.
Capability Analysis
Type: OpenClaw Skill Name: pdf-ocr-parse Version: 1.0.0 The skill is a standard API wrapper for a cloud-based OCR service (pdfapihub.com). The files (SKILL.md, skill.json) correctly define parameters for Tesseract OCR processing and do not contain any evidence of malicious execution, data exfiltration beyond the stated purpose, or prompt injection attacks.
Capability Tags
requires-sensitive-credentials
Capability Assessment
Purpose & Capability
The name/description, SKILL.md, example.json, and skill.json all describe the same behavior: submit a PDF (URL/base64/file) to pdfapihub.com for OCR and return text/bounding boxes. The required capabilities (API key in header) match the stated purpose.
Instruction Scope
Runtime instructions are narrowly scoped to uploading or referencing a PDF and configuring Tesseract params (lang, dpi, psm, etc.). The SKILL.md does not instruct the agent to read unrelated files, environment variables, or system state.
Install Mechanism
No install spec or code is included (instruction-only), so nothing is written to disk or fetched during installation. This is the lowest-risk install model and is proportionate for an API-wrapping skill.
Credentials
No platform environment variables are required, which matches the registry metadata. The skill does require an API key provided in the CLIENT-API-KEY header (skill.json marks auth as required) — this is expected for an external API but is a credential that will be sent to pdfapihub.com and should be provisioned per your security policies.
Persistence & Privilege
always is false and the skill is user-invocable (normal). It does not request persistent system privileges or modify other skills' configuration. Autonomous invocation is permitted by default but does not introduce extra incoherence here.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install pdf-ocr-parse
  3. After installation, invoke the skill by name or use /pdf-ocr-parse
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
OCR scanned PDFs using Tesseract. Rasterises pages at configurable DPI, then runs OCR with multi-language support (eng+hin, eng+fra, etc.). Returns per-page text with confidence scores and optional word-level bounding boxes.
Metadata
Slug pdf-ocr-parse
Version 1.0.0
License MIT-0
All-time Installs 1
Active Installs 1
Total Versions 1
Frequently Asked Questions

What is PDF OCR Parse?

Extract text from scanned PDFs using Tesseract OCR. Supports multiple languages, page selection, DPI control, and word-level bounding boxes. It is an AI Agent Skill for Claude Code / OpenClaw, with 79 downloads so far.

How do I install PDF OCR Parse?

Run "/install pdf-ocr-parse" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is PDF OCR Parse free?

Yes, PDF OCR Parse is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does PDF OCR Parse support?

PDF OCR Parse is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created PDF OCR Parse?

It is built and maintained by Rishabh Dugar (@rishabhdugar); the current version is v1.0.0.

💬 Comments