← 返回 Skills 市场
scottkiss

Doc Ocr Skills

作者 sirk · GitHub ↗ · v0.1.0 · MIT-0
cross-platform ⚠ suspicious
495
总下载
0
收藏
1
当前安装
1
版本数
在 OpenClaw 中安装
/install doc-ocr-skills
功能描述
OCR documents (PDFs and images) using Gemini 2.5 Flash, PaddleOCR (local), or RapidOCR (local).
使用说明 (SKILL.md)

Document OCR Skill (docr)

Uses Gemini 2.5 Flash, PaddleOCR, or RapidOCR (local) to recognize text from scanned PDFs and images. Compiled as a single Go binary.

Prerequisites

  • API Key configured in ~/.ocr/config (not needed for Paddle/Rapid)
  • For RapidOCR engine: pip install rapidocr_onnxruntime
  • For PaddleOCR engine: pip install paddleocr paddlepaddle

API Key Configuration

Create the config file:

mkdir -p ~/.ocr
cat > ~/.ocr/config \x3C\x3C EOF
# Google Gemini API Key
gemini_api_key=your_gemini_key
EOF

Quick Start

Path Variable: All commands below use $DOCR. Before running any command, set this variable:

SKILL_DIR="$(cd "$(dirname "\x3Cpath-to-this-SKILL.md>")" && pwd)"
DOCR="$SKILL_DIR/scripts/docr/docr"
# OCR a single document using RapidOCR (default)
$DOCR document.pdf
$DOCR image.jpg

# Use Gemini engine
$DOCR -engine gemini document.pdf

# Use PaddleOCR local engine
$DOCR -engine paddle document.pdf

# Specify output file
$DOCR document.pdf -o result.txt

# Batch process all supported files in a directory
$DOCR -batch ./docs/ -o ./outputs/

Engines

Engine Flag API Key Config Doc Handling
RapidOCR (default) -engine rapid None Local OCR
Gemini -engine gemini gemini_api_key Cloud Vision API
PaddleOCR (local) -engine paddle None Local OCR

CLI Reference

docr [options] \x3Cfile or directory>

Options:
  -engine string   OCR engine: rapid (default) / gemini / paddle
  -e string        Engine (short flag)
  -o string        Output file path or directory (batch mode)
  -output string   Output path (long flag)
  -batch           Batch mode: process all files in directory
  -prompt string   Custom recognition prompt (gemini)

Installation

We provide pre-compiled binaries to get you started quickly.

cd doc-ocr-skills/scripts
./install.sh

This script will detect your OS (darwin/linux) and architecture (amd64/arm64) and download the appropriate version of docr.

Building from Source (Optional)

If you prefer to build from source, ensure you have Go 1.21+ installed:

cd doc-ocr-skills/scripts/docr
go build -o docr .

Error Handling

Error Solution
config file not found Create ~/.ocr/config with API keys
gemini_api_key not found Add gemini_api_key=VALUE to config
file not found Verify the document file path
API timeout Retry; large files may need longer
安全使用建议
This package looks like a real OCR tool, but take precautions before installing: - Confirm the source: README and install script point to github.com/scottkiss/..., but the registry owner ID differs — verify the correct repository and publisher before running anything. - Avoid piping remote scripts directly into bash. Instead download the install.sh, inspect it, and run it locally, or build from source (Go build) if possible. - The installer downloads a precompiled binary from GitHub releases with no checksum or signature; prefer building from source or ask the author for a checksum/signature to verify integrity. - If you use the Gemini engine, documents will be sent to a cloud service. Do not send sensitive documents unless you understand the privacy implications and trust the endpoint/service. - The skill expects ~/.ocr/config to hold your gemini_api_key but the registry metadata doesn't declare this—be careful where you store API keys and set restrictive file permissions (chmod 600 ~/.ocr/config). - Local engine dependencies (pip installs) run arbitrary Python code during installation; consider using a virtualenv or container. If you need higher assurance, request the upstream source repository and a verifiable release (checksums/signatures) or build the binary yourself from the provided source directory.
功能分析
Type: OpenClaw Skill Name: doc-ocr-skills Version: 0.1.0 The skill bundle's installation script (scripts/install.sh) downloads and executes a pre-compiled binary from an external GitHub repository (github.com/scottkiss/doc-ocr), which is a high-risk behavior as the binary's source code is not included in the bundle for verification. Additionally, the documentation (SKILL.md and README.md) instructs users to store sensitive Gemini API keys in a plaintext configuration file at ~/.ocr/config. While these are common patterns for legitimate CLI tools, the reliance on opaque remote binaries and plaintext secret storage meets the threshold for a suspicious classification.
能力评估
Purpose & Capability
The declared purpose — OCR via Gemini (cloud) or local engines — matches the instructions and files: CLI, local engine pip requirements, and an optional Gemini API key. Nothing in the code or docs indicates unrelated capabilities, but the README references a GitHub repo/user (scottkiss) while the registry owner ID differs, which is an unexplained metadata mismatch.
Instruction Scope
SKILL.md instructs creating ~/.ocr/config with a gemini_api_key and running the downloaded binary. It also recommends installing Python packages for local engines. The instructions rely on a local config file (not declared in metadata) and explicitly support a cloud engine (Gemini) which will send documents to a remote service — this is scope-relevant but privacy-impacting and should be made explicit to users. The Quick Start guidance to compute SKILL_DIR from the path to SKILL.md is unusual but not harmful.
Install Mechanism
There is no packaged install spec, but scripts/install.sh downloads a precompiled binary from a GitHub releases URL (https://github.com/scottkiss/doc-ocr/releases/...). Downloading binaries from GitHub releases is common but has risk: no checksum or signature verification is provided. README also suggests using curl | bash to fetch and run the install script, which elevates risk if the remote content is tampered with. The download host itself (github.com) is a known release host, so risk is moderate rather than high.
Credentials
Skill metadata declared no required env vars or config paths, but runtime instructions require a configuration file at ~/.ocr/config to hold gemini_api_key for the Gemini engine. This file-plus-secret requirement is not reflected in the registry metadata. Requesting a cloud API key (through a local config file) is proportionate to supporting a cloud engine, but the mismatch in declared vs. required config is an incoherence and potential surprise to users.
Persistence & Privilege
The skill does not request always-enabled or elevated platform privileges. It installs a binary into the skill directory and does not modify other skills or system-wide configs. Autonomous invocation is allowed by default (normal).
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install doc-ocr-skills
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /doc-ocr-skills 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.1.0
- Initial release of doc-ocr-skills: an OCR utility for scanned PDFs and images. - Supports three OCR engines: Gemini 2.5 Flash (cloud), PaddleOCR (local), and RapidOCR (local, default). - Simple CLI with commands for single/multiple documents, flexible engine selection, and output options. - Requires minimal setup, with easy installation script and optional local Python dependencies. - Provides clear error messages and troubleshooting steps.
元数据
Slug doc-ocr-skills
版本 0.1.0
许可证 MIT-0
累计安装 1
当前安装数 1
历史版本数 1
常见问题

Doc Ocr Skills 是什么?

OCR documents (PDFs and images) using Gemini 2.5 Flash, PaddleOCR (local), or RapidOCR (local). 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 495 次。

如何安装 Doc Ocr Skills?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install doc-ocr-skills」即可一键安装,无需额外配置。

Doc Ocr Skills 是免费的吗?

是的,Doc Ocr Skills 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Doc Ocr Skills 支持哪些平台?

Doc Ocr Skills 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Doc Ocr Skills?

由 sirk(@scottkiss)开发并维护,当前版本 v0.1.0。

💬 留言讨论