← 返回 Skills 市场
rishabhdugar

PDF OCR Parse

作者 Rishabh Dugar · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ 安全检测通过
79
总下载
0
收藏
1
当前安装
1
版本数
在 OpenClaw 中安装
/install pdf-ocr-parse
功能描述
Extract text from scanned PDFs using Tesseract OCR. Supports multiple languages, page selection, DPI control, and word-level bounding boxes.
使用说明 (SKILL.md)

PDF OCR Parse

What It Does

Rasterises each selected page of a PDF at the given DPI, then runs Tesseract OCR on each page image. Returns per-page text with confidence scores, and optionally per-word bounding boxes.

When to Use

  • Extract text from scanned PDF documents
  • OCR invoices, receipts, or legacy documents in PDF format
  • Extract digits-only data (invoice amounts) with char_whitelist
  • Process multi-language documents

Required Inputs

Provide one of:

  • url — URL to a scanned PDF
  • base64_pdf — base64-encoded PDF
  • Multipart upload with file field

Authentication

Send your API key in the CLIENT-API-KEY header.

Get your free API key at https://pdfapihub.com. Full API documentation is available at https://pdfapihub.com/docs.

Use Cases

  • Scanned Invoice Processing — OCR scanned PDF invoices to extract text for accounting systems
  • Legacy Document Digitization — Convert old scanned paper documents into searchable text
  • Insurance Claims — Extract text from scanned claim forms and medical documents
  • Legal Discovery — OCR scanned legal documents for full-text search and review
  • Multi-Language Documents — Process documents in Hindi, French, German, etc. with language-specific models
  • Form Digitization — Extract filled field values from scanned paper forms

Tesseract Configuration

Param Default Description
lang eng Language code(s), + separated
psm 3 Page segmentation mode (0–13)
oem 3 OCR engine mode (0=legacy, 1=LSTM, 3=default)
dpi 200 Rasterisation DPI (72–400)
char_whitelist Restrict to specific characters

Example Usage

curl -X POST https://pdfapihub.com/api/v1/pdf/ocr/parse \
  -H "CLIENT-API-KEY: your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://pdfapihub.com/sample-pdfinvoice-with-image.pdf",
    "pages": "1-3",
    "lang": "eng",
    "dpi": 300,
    "detail": "words"
  }'
安全使用建议
This skill appears coherent and only forwards PDFs to pdfapihub.com for OCR using an API key you provide. Before installing or using it: (1) verify and trust the pdfapihub.com service (privacy, retention, and security policies) because any uploaded PDF — potentially containing sensitive data — will be sent to that third party; (2) avoid using production or highly sensitive documents until you’ve tested with non-sensitive samples; (3) manage the API key carefully (use a dedicated key with least privilege and rotate it if possible); and (4) note that the skill owner and homepage are unknown — consider this when deciding whether to trust the service.
功能分析
Type: OpenClaw Skill Name: pdf-ocr-parse Version: 1.0.0 The skill is a standard API wrapper for a cloud-based OCR service (pdfapihub.com). The files (SKILL.md, skill.json) correctly define parameters for Tesseract OCR processing and do not contain any evidence of malicious execution, data exfiltration beyond the stated purpose, or prompt injection attacks.
能力标签
requires-sensitive-credentials
能力评估
Purpose & Capability
The name/description, SKILL.md, example.json, and skill.json all describe the same behavior: submit a PDF (URL/base64/file) to pdfapihub.com for OCR and return text/bounding boxes. The required capabilities (API key in header) match the stated purpose.
Instruction Scope
Runtime instructions are narrowly scoped to uploading or referencing a PDF and configuring Tesseract params (lang, dpi, psm, etc.). The SKILL.md does not instruct the agent to read unrelated files, environment variables, or system state.
Install Mechanism
No install spec or code is included (instruction-only), so nothing is written to disk or fetched during installation. This is the lowest-risk install model and is proportionate for an API-wrapping skill.
Credentials
No platform environment variables are required, which matches the registry metadata. The skill does require an API key provided in the CLIENT-API-KEY header (skill.json marks auth as required) — this is expected for an external API but is a credential that will be sent to pdfapihub.com and should be provisioned per your security policies.
Persistence & Privilege
always is false and the skill is user-invocable (normal). It does not request persistent system privileges or modify other skills' configuration. Autonomous invocation is permitted by default but does not introduce extra incoherence here.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install pdf-ocr-parse
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /pdf-ocr-parse 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
OCR scanned PDFs using Tesseract. Rasterises pages at configurable DPI, then runs OCR with multi-language support (eng+hin, eng+fra, etc.). Returns per-page text with confidence scores and optional word-level bounding boxes.
元数据
Slug pdf-ocr-parse
版本 1.0.0
许可证 MIT-0
累计安装 1
当前安装数 1
历史版本数 1
常见问题

PDF OCR Parse 是什么?

Extract text from scanned PDFs using Tesseract OCR. Supports multiple languages, page selection, DPI control, and word-level bounding boxes. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 79 次。

如何安装 PDF OCR Parse?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install pdf-ocr-parse」即可一键安装,无需额外配置。

PDF OCR Parse 是免费的吗?

是的,PDF OCR Parse 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

PDF OCR Parse 支持哪些平台?

PDF OCR Parse 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 PDF OCR Parse?

由 Rishabh Dugar(@rishabhdugar)开发并维护,当前版本 v1.0.0。

💬 留言讨论