PDF OCR Using Gemini LLM
/install geminipdfocr
Purpose
Use geminipdfocr to extract text from PDF documents via OCR (Google Gemini).
Data and privacy
Full page images/files are sent to Google's API. PDFs are split into single-page files and each page is uploaded to Google Gemini for OCR. There are no hidden exfiltration endpoints or other data collection. Do not use with highly sensitive documents unless you accept that content is sent to Google.
Setup (venv installation)
Before first use, create and activate the virtual environment:
cd geminipdfocr && python -m venv venv && source venv/bin/activate && pip install -r requirements.txt
Set GOOGLE_API_KEY in your environment before running (e.g. export GOOGLE_API_KEY=your-key).
How to use
When requested to extract text or perform OCR on a PDF:
- Run:
cd geminipdfocr && source venv/bin/activate && python -m geminipdfocr \x3Cpath-to-pdf> [--json] [--output \x3Cfile>] - Use
--jsonfor structured data. - Use
--max-pages Nfor testing or very long documents. - Use
--quietto suppress progress logs.
Requirements
- A valid PDF file path.
GOOGLE_API_KEYset in the process environment (e.g.export GOOGLE_API_KEY=your-key).
CLI options
| Option | Description |
|---|---|
pdf_path |
One or more PDF file paths (positional) |
--max-pages N |
Limit pages per PDF |
--json |
Output structured JSON instead of plain text |
--output FILE |
Write result to file (default: stdout) |
--quiet |
Suppress INFO/DEBUG logs |
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install geminipdfocr - 安装完成后,直接呼叫该 Skill 的名称或使用
/geminipdfocr触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
PDF OCR Using Gemini LLM 是什么?
Extract text from PDFs using Google Gemini OCR. Use when extracting text from PDFs, performing OCR on scanned documents, or processing image-based PDFs. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 306 次。
如何安装 PDF OCR Using Gemini LLM?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install geminipdfocr」即可一键安装,无需额外配置。
PDF OCR Using Gemini LLM 是免费的吗?
是的,PDF OCR Using Gemini LLM 完全免费(开源免费),可自由下载、安装和使用。
PDF OCR Using Gemini LLM 支持哪些平台?
PDF OCR Using Gemini LLM 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 PDF OCR Using Gemini LLM?
由 Issam El Alaoui(@ashtonizmev)开发并维护,当前版本 v0.1.7。