← Back to Skills Marketplace
scottkiss

Doc Ocr Skills

by sirk · GitHub ↗ · v0.1.0 · MIT-0
cross-platform ⚠ suspicious
495
Downloads
0
Stars
1
Active Installs
1
Versions
Install in OpenClaw
/install doc-ocr-skills
Description
OCR documents (PDFs and images) using Gemini 2.5 Flash, PaddleOCR (local), or RapidOCR (local).
README (SKILL.md)

Document OCR Skill (docr)

Uses Gemini 2.5 Flash, PaddleOCR, or RapidOCR (local) to recognize text from scanned PDFs and images. Compiled as a single Go binary.

Prerequisites

  • API Key configured in ~/.ocr/config (not needed for Paddle/Rapid)
  • For RapidOCR engine: pip install rapidocr_onnxruntime
  • For PaddleOCR engine: pip install paddleocr paddlepaddle

API Key Configuration

Create the config file:

mkdir -p ~/.ocr
cat > ~/.ocr/config \x3C\x3C EOF
# Google Gemini API Key
gemini_api_key=your_gemini_key
EOF

Quick Start

Path Variable: All commands below use $DOCR. Before running any command, set this variable:

SKILL_DIR="$(cd "$(dirname "\x3Cpath-to-this-SKILL.md>")" && pwd)"
DOCR="$SKILL_DIR/scripts/docr/docr"
# OCR a single document using RapidOCR (default)
$DOCR document.pdf
$DOCR image.jpg

# Use Gemini engine
$DOCR -engine gemini document.pdf

# Use PaddleOCR local engine
$DOCR -engine paddle document.pdf

# Specify output file
$DOCR document.pdf -o result.txt

# Batch process all supported files in a directory
$DOCR -batch ./docs/ -o ./outputs/

Engines

Engine Flag API Key Config Doc Handling
RapidOCR (default) -engine rapid None Local OCR
Gemini -engine gemini gemini_api_key Cloud Vision API
PaddleOCR (local) -engine paddle None Local OCR

CLI Reference

docr [options] \x3Cfile or directory>

Options:
  -engine string   OCR engine: rapid (default) / gemini / paddle
  -e string        Engine (short flag)
  -o string        Output file path or directory (batch mode)
  -output string   Output path (long flag)
  -batch           Batch mode: process all files in directory
  -prompt string   Custom recognition prompt (gemini)

Installation

We provide pre-compiled binaries to get you started quickly.

cd doc-ocr-skills/scripts
./install.sh

This script will detect your OS (darwin/linux) and architecture (amd64/arm64) and download the appropriate version of docr.

Building from Source (Optional)

If you prefer to build from source, ensure you have Go 1.21+ installed:

cd doc-ocr-skills/scripts/docr
go build -o docr .

Error Handling

Error Solution
config file not found Create ~/.ocr/config with API keys
gemini_api_key not found Add gemini_api_key=VALUE to config
file not found Verify the document file path
API timeout Retry; large files may need longer
Usage Guidance
This package looks like a real OCR tool, but take precautions before installing: - Confirm the source: README and install script point to github.com/scottkiss/..., but the registry owner ID differs — verify the correct repository and publisher before running anything. - Avoid piping remote scripts directly into bash. Instead download the install.sh, inspect it, and run it locally, or build from source (Go build) if possible. - The installer downloads a precompiled binary from GitHub releases with no checksum or signature; prefer building from source or ask the author for a checksum/signature to verify integrity. - If you use the Gemini engine, documents will be sent to a cloud service. Do not send sensitive documents unless you understand the privacy implications and trust the endpoint/service. - The skill expects ~/.ocr/config to hold your gemini_api_key but the registry metadata doesn't declare this—be careful where you store API keys and set restrictive file permissions (chmod 600 ~/.ocr/config). - Local engine dependencies (pip installs) run arbitrary Python code during installation; consider using a virtualenv or container. If you need higher assurance, request the upstream source repository and a verifiable release (checksums/signatures) or build the binary yourself from the provided source directory.
Capability Analysis
Type: OpenClaw Skill Name: doc-ocr-skills Version: 0.1.0 The skill bundle's installation script (scripts/install.sh) downloads and executes a pre-compiled binary from an external GitHub repository (github.com/scottkiss/doc-ocr), which is a high-risk behavior as the binary's source code is not included in the bundle for verification. Additionally, the documentation (SKILL.md and README.md) instructs users to store sensitive Gemini API keys in a plaintext configuration file at ~/.ocr/config. While these are common patterns for legitimate CLI tools, the reliance on opaque remote binaries and plaintext secret storage meets the threshold for a suspicious classification.
Capability Assessment
Purpose & Capability
The declared purpose — OCR via Gemini (cloud) or local engines — matches the instructions and files: CLI, local engine pip requirements, and an optional Gemini API key. Nothing in the code or docs indicates unrelated capabilities, but the README references a GitHub repo/user (scottkiss) while the registry owner ID differs, which is an unexplained metadata mismatch.
Instruction Scope
SKILL.md instructs creating ~/.ocr/config with a gemini_api_key and running the downloaded binary. It also recommends installing Python packages for local engines. The instructions rely on a local config file (not declared in metadata) and explicitly support a cloud engine (Gemini) which will send documents to a remote service — this is scope-relevant but privacy-impacting and should be made explicit to users. The Quick Start guidance to compute SKILL_DIR from the path to SKILL.md is unusual but not harmful.
Install Mechanism
There is no packaged install spec, but scripts/install.sh downloads a precompiled binary from a GitHub releases URL (https://github.com/scottkiss/doc-ocr/releases/...). Downloading binaries from GitHub releases is common but has risk: no checksum or signature verification is provided. README also suggests using curl | bash to fetch and run the install script, which elevates risk if the remote content is tampered with. The download host itself (github.com) is a known release host, so risk is moderate rather than high.
Credentials
Skill metadata declared no required env vars or config paths, but runtime instructions require a configuration file at ~/.ocr/config to hold gemini_api_key for the Gemini engine. This file-plus-secret requirement is not reflected in the registry metadata. Requesting a cloud API key (through a local config file) is proportionate to supporting a cloud engine, but the mismatch in declared vs. required config is an incoherence and potential surprise to users.
Persistence & Privilege
The skill does not request always-enabled or elevated platform privileges. It installs a binary into the skill directory and does not modify other skills or system-wide configs. Autonomous invocation is allowed by default (normal).
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install doc-ocr-skills
  3. After installation, invoke the skill by name or use /doc-ocr-skills
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v0.1.0
- Initial release of doc-ocr-skills: an OCR utility for scanned PDFs and images. - Supports three OCR engines: Gemini 2.5 Flash (cloud), PaddleOCR (local), and RapidOCR (local, default). - Simple CLI with commands for single/multiple documents, flexible engine selection, and output options. - Requires minimal setup, with easy installation script and optional local Python dependencies. - Provides clear error messages and troubleshooting steps.
Metadata
Slug doc-ocr-skills
Version 0.1.0
License MIT-0
All-time Installs 1
Active Installs 1
Total Versions 1
Frequently Asked Questions

What is Doc Ocr Skills?

OCR documents (PDFs and images) using Gemini 2.5 Flash, PaddleOCR (local), or RapidOCR (local). It is an AI Agent Skill for Claude Code / OpenClaw, with 495 downloads so far.

How do I install Doc Ocr Skills?

Run "/install doc-ocr-skills" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Doc Ocr Skills free?

Yes, Doc Ocr Skills is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Doc Ocr Skills support?

Doc Ocr Skills is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Doc Ocr Skills?

It is built and maintained by sirk (@scottkiss); the current version is v0.1.0.

💬 Comments