← Back to Skills Marketplace
Ocr Benchmark
by
yingfengli
· GitHub ↗
· v2.0.0
· MIT-0
258
Downloads
0
Stars
0
Active Installs
2
Versions
Install in OpenClaw
/install ocr-benchmark
Description
Multi-model OCR benchmark and comparison tool. Run OCR on images using Claude (Opus/Sonnet/Haiku via Bedrock), Gemini (Pro/Flash via Google AI Studio), and P...
Usage Guidance
This skill appears to be a legitimate OCR benchmarking tool, but note the following before installing and running: (1) The package metadata omits required env vars — you will need AWS credentials (for Bedrock) and GOOGLE_API_KEY for Gemini, and optionally a PADDLEOCR_ENDPOINT/TOKEN; verify and provide only least-privilege credentials. (2) Running the tool will upload image bytes and extracted text to external services (Anthropic/Bedrock, Google AI Studio, or whatever URL you provide for PaddleOCR). Do not use sensitive/private images unless you trust the destination. (3) Inspect requirements.txt and the two scripts locally before pip installing; consider running in an isolated virtualenv or container. (4) If you don’t want to provide credentials for a provider, use the --auto-skip flag or run only specific models. (5) The metadata in the registry is inconsistent — if you need a fully audited skill record, ask the publisher to correct required-env and README metadata. If you want me to, I can point out the exact lines in the code that send data to each external endpoint and summarize the permissions each provider needs.
Capability Analysis
Type: OpenClaw Skill
Name: ocr-benchmark
Version: 2.0.0
The ocr-benchmark skill bundle is a legitimate tool for comparing OCR performance across multiple AI providers (AWS Bedrock, Google Gemini, and PaddleOCR). The scripts (run_benchmark.py and make_report.py) perform standard operations such as reading local image files, making API calls to established providers using environment variables for credentials, and generating PowerPoint reports. There is no evidence of data exfiltration, malicious execution, or prompt injection; the code is well-documented and its behavior aligns strictly with the stated purpose.
Capability Assessment
Purpose & Capability
The skill's name and description (multi-model OCR benchmark) match the included code and instructions: it calls Bedrock (Claude), Google Gemini, and an optional PaddleOCR endpoint. However, the registry metadata claims no required environment variables or credentials while the SKILL.md and scripts clearly require AWS credentials (for Bedrock), GOOGLE_API_KEY (for Gemini), and optionally a PADDLEOCR_ENDPOINT/TOKEN. The functional requirements are coherent with the stated purpose, but the published metadata is inaccurate/omitted.
Instruction Scope
SKILL.md and scripts instruct the agent/user to install Python deps, point to local image files and a ground-truth JSON, and call external model endpoints. The runtime behavior is scoped to reading provided image files and ground-truth JSON, calling model provider APIs, saving per-image JSON results, scoring, and generating PPTX reports. There are no instructions to read unrelated system files or to exfiltrate arbitrary data beyond the model providers/PaddleOCR endpoint, but images and extracted text are sent to external services (expected for OCR).
Install Mechanism
There is no packaged installer; the SKILL.md instructs pip install -r requirements.txt. requirements.txt contains common packages (boto3, google-genai, python-pptx, requests) that match the providers and reporting functionality. No downloads from arbitrary URLs or archive extraction are present in the repo. Review requirements.txt before installing into any environment.
Credentials
The environment variables used by the code (AWS credentials via normal boto3 mechanisms, AWS_REGION, GOOGLE_API_KEY, optional PADDLEOCR_ENDPOINT/PADDLEOCR_TOKEN) are proportionate to the skill's purpose (calling Bedrock, Google AI Studio, or an external PaddleOCR API). The concern is that the registry metadata lists no required env vars/credentials — that mismatch could confuse users about what secrets they must provide and trust. Bedrock usage requires AWS credentials with bedrock-runtime permissions; you should use least-privilege IAM keys and avoid sharing broad credentials.
Persistence & Privilege
The skill does not request always:true, does not modify other skills or system-wide agent settings, and is instruction-driven. It runs on-demand and writes results/reports to the specified output directory only. No elevated persistence or autonomous privilege beyond normal skill invocation is requested.
How to Use
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install ocr-benchmark - After installation, invoke the skill by name or use
/ocr-benchmark - Provide required inputs per the skill's parameter spec and get structured output
Version History
v2.0.0
v2.0.0: Fuzzy scoring with Levenshtein, auto-skip missing providers, EXTRA line detection, terminal report, requirements.txt, max_output_tokens 8192
v1.0.0
Initial release: 6-model OCR benchmark (Bedrock Claude, Gemini, PaddleOCR), scoring against ground truth, PPT report generation
Metadata
Frequently Asked Questions
What is Ocr Benchmark?
Multi-model OCR benchmark and comparison tool. Run OCR on images using Claude (Opus/Sonnet/Haiku via Bedrock), Gemini (Pro/Flash via Google AI Studio), and P... It is an AI Agent Skill for Claude Code / OpenClaw, with 258 downloads so far.
How do I install Ocr Benchmark?
Run "/install ocr-benchmark" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Ocr Benchmark free?
Yes, Ocr Benchmark is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Ocr Benchmark support?
Ocr Benchmark is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Ocr Benchmark?
It is built and maintained by yingfengli (@yingfengli); the current version is v2.0.0.
More Skills