PDF OCR Using Gemini LLM
/install geminipdfocr
Purpose
Use geminipdfocr to extract text from PDF documents via OCR (Google Gemini).
Data and privacy
Full page images/files are sent to Google's API. PDFs are split into single-page files and each page is uploaded to Google Gemini for OCR. There are no hidden exfiltration endpoints or other data collection. Do not use with highly sensitive documents unless you accept that content is sent to Google.
Setup (venv installation)
Before first use, create and activate the virtual environment:
cd geminipdfocr && python -m venv venv && source venv/bin/activate && pip install -r requirements.txt
Set GOOGLE_API_KEY in your environment before running (e.g. export GOOGLE_API_KEY=your-key).
How to use
When requested to extract text or perform OCR on a PDF:
- Run:
cd geminipdfocr && source venv/bin/activate && python -m geminipdfocr \x3Cpath-to-pdf> [--json] [--output \x3Cfile>] - Use
--jsonfor structured data. - Use
--max-pages Nfor testing or very long documents. - Use
--quietto suppress progress logs.
Requirements
- A valid PDF file path.
GOOGLE_API_KEYset in the process environment (e.g.export GOOGLE_API_KEY=your-key).
CLI options
| Option | Description |
|---|---|
pdf_path |
One or more PDF file paths (positional) |
--max-pages N |
Limit pages per PDF |
--json |
Output structured JSON instead of plain text |
--output FILE |
Write result to file (default: stdout) |
--quiet |
Suppress INFO/DEBUG logs |
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install geminipdfocr - After installation, invoke the skill by name or use
/geminipdfocr - Provide required inputs per the skill's parameter spec and get structured output
What is PDF OCR Using Gemini LLM?
Extract text from PDFs using Google Gemini OCR. Use when extracting text from PDFs, performing OCR on scanned documents, or processing image-based PDFs. It is an AI Agent Skill for Claude Code / OpenClaw, with 306 downloads so far.
How do I install PDF OCR Using Gemini LLM?
Run "/install geminipdfocr" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is PDF OCR Using Gemini LLM free?
Yes, PDF OCR Using Gemini LLM is completely free (open-source). You can download, install and use it at no cost.
Which platforms does PDF OCR Using Gemini LLM support?
PDF OCR Using Gemini LLM is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created PDF OCR Using Gemini LLM?
It is built and maintained by Issam El Alaoui (@ashtonizmev); the current version is v0.1.7.