← Back to Skills Marketplace
pdf-ocr-extraction
by
bilicen700
· GitHub ↗
· v1.0.3
· MIT-0
579
Downloads
1
Stars
0
Active Installs
4
Versions
Install in OpenClaw
/install pdf-ocr-extraction
Description
Extract text from image-based or scanned PDFs using Tesseract OCR.
Usage Guidance
This skill appears to do what it says, but take these precautions before installing or running it:
- Verify and install tesseract from your distro/vendor and make sure required language packs (e.g., eng, chi_sim) are present; the skill will not auto-download them.
- Install Python packages from trusted sources (pip PyPI) and pin versions if you care about supply-chain consistency. pypdfium2 and pytesseract include native code — review wheels if you require extra assurance.
- Run OCR in a restricted environment (container or dedicated VM) if processing untrusted PDFs, as OCR returns document text which may contain sensitive data.
- Improve the example by using secure temporary-file APIs (tempfile.NamedTemporaryFile or mkstemp) rather than predictable filenames in /tmp to avoid symlink/race attacks.
- Because the skill's source is unknown, review or run the example code locally before granting any automated agent access; do not expose sensitive documents to an unreviewed or autonomous agent without oversight.
Capability Analysis
Type: OpenClaw Skill
Name: pdf-ocr-extraction
Version: 1.0.3
The skill provides a standard implementation for performing OCR on scanned PDFs using Tesseract and common Python libraries (pypdfium2, pytesseract, Pillow). The code snippet in SKILL.md performs local processing, includes basic cleanup of temporary image files in /tmp/, and contains no indicators of malicious intent, data exfiltration, or prompt injection.
Capability Assessment
Purpose & Capability
Name, description, required binaries (tesseract, python3), and Python dependencies (pypdfium2, pytesseract, Pillow) are exactly what you'd expect for a local Tesseract-based PDF OCR tool. No unrelated credentials, binaries, or config paths are requested.
Instruction Scope
SKILL.md contains concrete instructions to render PDF pages to images, OCR them, and clean up temporary files in /tmp — this is appropriate for the purpose. Note: example code uses predictable filenames (/tmp/page_{i}.png) which can be vulnerable to race/symlink attacks on multi-user systems; it also assumes language packs are present and instructs not to auto-download them. The skill will read full document contents (expected for OCR) — treat sensitive PDFs accordingly.
Install Mechanism
Install metadata uses a system package for tesseract and a 'uv' entry to install pypdfium2, pytesseract, and Pillow. This is proportional to the task. 'uv' corresponds to installing Python packages (moderate trust surface because wheels/binaries may include native code), but no arbitrary downloads or unfamiliar hosts are specified.
Credentials
No environment variables, credentials, or config paths are requested. The absence of secrets is consistent with a purely local OCR tool.
Persistence & Privilege
The skill is not forced-available (always: false) and does not request persistent system-wide privileges or modify other skills. It can be invoked autonomously by the agent (platform default) — this is normal but means agents could OCR documents if given access.
How to Use
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install pdf-ocr-extraction - After installation, invoke the skill by name or use
/pdf-ocr-extraction - Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.3
**Minor update for clarity and metadata improvements.**
- Improved documentation with clear separation of dependencies and quick start instructions.
- Added detailed metadata block defining required binaries and installation steps for both system and Python dependencies.
- Updated guidance to avoid running automated installations at runtime; users/environment must pre-install prerequisites.
- Enhanced security section: instructs storing temporary images only in `/tmp/` and immediate cleanup.
- Provided a full, copy-pasteable Python extraction script for better usability.
v1.0.2
- Highlighted that the skill is completely free, requires no third-party APIs, and has unlimited usage.
- No changes to workflows, installation, or technical details.
v1.0.1
- Documentation rewritten in English with clearer structure and concise instructions.
- No functional or code changes; core usage and workflow remain the same.
- Improved quick start and usage instructions for broader accessibility.
- Notes and caveats updated for clarity on OCR accuracy and system requirements.
v1.0.0
- Initial release of PDF OCR & Extraction skill.
- Enables text extraction from scanned or image-based PDFs using OCR, bypassing standard PDF parsing limitations.
- Automatically checks for and installs required dependencies: pypdfium2, pytesseract, Pillow.
- Processes each PDF page as a high-resolution image and extracts text using Tesseract OCR (supports Chinese and English).
- Designed for scenarios where standard PDF parsers fail due to encryption or missing text layer.
Metadata
Frequently Asked Questions
What is pdf-ocr-extraction?
Extract text from image-based or scanned PDFs using Tesseract OCR. It is an AI Agent Skill for Claude Code / OpenClaw, with 579 downloads so far.
How do I install pdf-ocr-extraction?
Run "/install pdf-ocr-extraction" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is pdf-ocr-extraction free?
Yes, pdf-ocr-extraction is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does pdf-ocr-extraction support?
pdf-ocr-extraction is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created pdf-ocr-extraction?
It is built and maintained by bilicen700 (@bilicen700); the current version is v1.0.3.
More Skills