← 返回 Skills 市场

PDF OCR Using Gemini LLM

Name: PDF OCR Using Gemini LLM
Author: ashtonizmev

作者 Issam El Alaoui · GitHub ↗ · v0.1.7

cross-platform ✓ 安全检测通过

306

总下载

当前安装

版本数

在 OpenClaw 中安装

/install geminipdfocr

功能描述

Extract text from PDFs using Google Gemini OCR. Use when extracting text from PDFs, performing OCR on scanned documents, or processing image-based PDFs.

使用说明 (SKILL.md)

Purpose

Use geminipdfocr to extract text from PDF documents via OCR (Google Gemini).

Data and privacy

Full page images/files are sent to Google's API. PDFs are split into single-page files and each page is uploaded to Google Gemini for OCR. There are no hidden exfiltration endpoints or other data collection. Do not use with highly sensitive documents unless you accept that content is sent to Google.

Setup (venv installation)

Before first use, create and activate the virtual environment:

cd geminipdfocr && python -m venv venv && source venv/bin/activate && pip install -r requirements.txt

Set GOOGLE_API_KEY in your environment before running (e.g. export GOOGLE_API_KEY=your-key).

How to use

When requested to extract text or perform OCR on a PDF:

Run: cd geminipdfocr && source venv/bin/activate && python -m geminipdfocr \x3Cpath-to-pdf> [--json] [--output \x3Cfile>]
Use --json for structured data.
Use --max-pages N for testing or very long documents.
Use --quiet to suppress progress logs.

Requirements

A valid PDF file path.
GOOGLE_API_KEY set in the process environment (e.g. export GOOGLE_API_KEY=your-key).

CLI options

Option	Description
`pdf_path`	One or more PDF file paths (positional)
`--max-pages N`	Limit pages per PDF
`--json`	Output structured JSON instead of plain text
`--output FILE`	Write result to file (default: stdout)
`--quiet`	Suppress INFO/DEBUG logs

安全使用建议

This skill appears to be what it says: it splits PDFs into single-page files and uploads them to Google Gemini for OCR, and it requires only GOOGLE_API_KEY. Before installing, consider: (1) privacy — full page images are sent to Google, so do not use with highly sensitive documents unless acceptable; (2) cost and quotas — large PDFs mean many uploads and API usage billed against your API key; (3) secure the GOOGLE_API_KEY (don’t paste it into logs or share it); (4) review and pin package versions if you want reproducible installs; (5) test on non-sensitive sample PDFs first to confirm behavior. If you need guarantees about retention or want OCR to run locally, consider a local OCR solution instead.

功能分析

Type: OpenClaw Skill Name: geminipdfocr Version: 0.1.7 The geminipdfocr skill is a legitimate tool designed to perform OCR on PDF documents using the Google Gemini API. The code follows standard practices, using PyMuPDF for PDF splitting and the official google-genai library for API interactions. It includes clear documentation in SKILL.md regarding data privacy (disclosing that files are sent to Google) and lacks any indicators of data exfiltration to unauthorized endpoints, malicious execution, or persistence mechanisms.

能力评估

✓ Purpose & Capability

Name/description, required env (GOOGLE_API_KEY), listed Python packages (google-genai, pymupdf), CLI entry point, and code all align with a PDF OCR tool that uploads pages to Google's Gemini API.

ℹ Instruction Scope

The SKILL.md and code explicitly split PDFs into single-page files and upload full page files to Google's API for OCR. This behaviour is documented in the README and implemented in gemini_client.py (files.upload + models.generate_content). There are no apparent instructions or code that read unrelated files, other env vars, or send data to unknown endpoints, but note that entire page images are transmitted to Google (privacy/cost implication).

✓ Install Mechanism

Dependencies are standard Python packages (google-genai, pymupdf, pydantic, pydantic-settings) and a requirements.txt is included. No downloads from custom URLs or extracts from arbitrary hosts are present.

✓ Credentials

Only GOOGLE_API_KEY is required and declared as the primary credential. That single key is appropriate and required for the Google Gemini client used by the skill. No unrelated secrets or config paths are requested.

✓ Persistence & Privilege

The skill is not always-enabled, does not modify other skills, and only writes temporary files under the system temp directory (cleans up after processing). It does not request elevated system persistence.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install geminipdfocr
安装完成后，直接呼叫该 Skill 的名称或使用 /geminipdfocr 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v0.1.7

- Clarified data and privacy section to explicitly state that full page images/files are sent to Google's API. - Added note that there are no hidden exfiltration endpoints or other data collection. - Improved warning for users about using the skill with highly sensitive documents.

v0.1.6

- Added required Python package dependencies to the skill metadata for easier installation: google-genai, pymupdf, pydantic, and pydantic-settings. - No changes to functionality or usage.

v0.1.5

- Added a new metadata section for openclaw, specifying environment requirements. - Declared GOOGLE_API_KEY as the primary required environment variable in metadata. - No changes to functionality or usage instructions.

v0.1.4

- Switched configuration to require the GOOGLE_API_KEY environment variable to be set in the process environment, instead of loading from a local .env file. - Updated documentation to reflect the new authentication setup, removing instructions related to .env files.

v0.1.3

- Configuration now explicitly loads only `geminipdfocr/.env`, using a path relative to the package rather than the current working directory. - Updated documentation to clarify `.env` file loading behavior. - No other major changes or user-facing features added.

v0.1.2

- Project renamed from "geminipdf" to "geminipdfocr" throughout all files. - Updated documentation and setup instructions to reflect the new name. - Clarified configuration: now only reads environment variables from `geminipdfocr/.env`, not any parent directories. - Improved privacy note and setup details in SKILL.md. - Minor text and description improvements in CLI help and metadata.

v0.1.1

- Added an explicit warning about data and privacy, noting that PDF content is uploaded to Google Gemini for OCR. - Documented the required GOOGLE_API_KEY environment variable in metadata. - No functional changes to code; updates are documentation-only.

v0.1.0

- Initial release of Geminipdf OCR. - Extract text from PDFs using Google Gemini OCR, supporting scanned and image-based documents. - Command-line interface with options for JSON output, limiting pages, and quiet mode. - Requires GOOGLE_API_KEY configuration for operation. - Outputs can be saved to a file or printed to stdout.

元数据

Slug geminipdfocr

版本 0.1.7

许可证 —

累计安装 1

当前安装数 1

历史版本数 8

常见问题