← 返回 Skills 市场

Ocr Benchmark

Name: Ocr Benchmark
Author: yingfengli

作者 yingfengli · GitHub ↗ · v2.0.0 · MIT-0

cross-platform ⚠ suspicious

258

总下载

当前安装

版本数

在 OpenClaw 中安装

/install ocr-benchmark

功能描述

Multi-model OCR benchmark and comparison tool. Run OCR on images using Claude (Opus/Sonnet/Haiku via Bedrock), Gemini (Pro/Flash via Google AI Studio), and P...

安全使用建议

This skill appears to be a legitimate OCR benchmarking tool, but note the following before installing and running: (1) The package metadata omits required env vars — you will need AWS credentials (for Bedrock) and GOOGLE_API_KEY for Gemini, and optionally a PADDLEOCR_ENDPOINT/TOKEN; verify and provide only least-privilege credentials. (2) Running the tool will upload image bytes and extracted text to external services (Anthropic/Bedrock, Google AI Studio, or whatever URL you provide for PaddleOCR). Do not use sensitive/private images unless you trust the destination. (3) Inspect requirements.txt and the two scripts locally before pip installing; consider running in an isolated virtualenv or container. (4) If you don’t want to provide credentials for a provider, use the --auto-skip flag or run only specific models. (5) The metadata in the registry is inconsistent — if you need a fully audited skill record, ask the publisher to correct required-env and README metadata. If you want me to, I can point out the exact lines in the code that send data to each external endpoint and summarize the permissions each provider needs.

功能分析

Type: OpenClaw Skill Name: ocr-benchmark Version: 2.0.0 The ocr-benchmark skill bundle is a legitimate tool for comparing OCR performance across multiple AI providers (AWS Bedrock, Google Gemini, and PaddleOCR). The scripts (run_benchmark.py and make_report.py) perform standard operations such as reading local image files, making API calls to established providers using environment variables for credentials, and generating PowerPoint reports. There is no evidence of data exfiltration, malicious execution, or prompt injection; the code is well-documented and its behavior aligns strictly with the stated purpose.

能力评估

ℹ Purpose & Capability

The skill's name and description (multi-model OCR benchmark) match the included code and instructions: it calls Bedrock (Claude), Google Gemini, and an optional PaddleOCR endpoint. However, the registry metadata claims no required environment variables or credentials while the SKILL.md and scripts clearly require AWS credentials (for Bedrock), GOOGLE_API_KEY (for Gemini), and optionally a PADDLEOCR_ENDPOINT/TOKEN. The functional requirements are coherent with the stated purpose, but the published metadata is inaccurate/omitted.

ℹ Instruction Scope

SKILL.md and scripts instruct the agent/user to install Python deps, point to local image files and a ground-truth JSON, and call external model endpoints. The runtime behavior is scoped to reading provided image files and ground-truth JSON, calling model provider APIs, saving per-image JSON results, scoring, and generating PPTX reports. There are no instructions to read unrelated system files or to exfiltrate arbitrary data beyond the model providers/PaddleOCR endpoint, but images and extracted text are sent to external services (expected for OCR).

✓ Install Mechanism

There is no packaged installer; the SKILL.md instructs pip install -r requirements.txt. requirements.txt contains common packages (boto3, google-genai, python-pptx, requests) that match the providers and reporting functionality. No downloads from arbitrary URLs or archive extraction are present in the repo. Review requirements.txt before installing into any environment.

ℹ Credentials

The environment variables used by the code (AWS credentials via normal boto3 mechanisms, AWS_REGION, GOOGLE_API_KEY, optional PADDLEOCR_ENDPOINT/PADDLEOCR_TOKEN) are proportionate to the skill's purpose (calling Bedrock, Google AI Studio, or an external PaddleOCR API). The concern is that the registry metadata lists no required env vars/credentials — that mismatch could confuse users about what secrets they must provide and trust. Bedrock usage requires AWS credentials with bedrock-runtime permissions; you should use least-privilege IAM keys and avoid sharing broad credentials.

✓ Persistence & Privilege

The skill does not request always:true, does not modify other skills or system-wide agent settings, and is instruction-driven. It runs on-demand and writes results/reports to the specified output directory only. No elevated persistence or autonomous privilege beyond normal skill invocation is requested.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install ocr-benchmark
安装完成后，直接呼叫该 Skill 的名称或使用 /ocr-benchmark 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v2.0.0

v2.0.0: Fuzzy scoring with Levenshtein, auto-skip missing providers, EXTRA line detection, terminal report, requirements.txt, max_output_tokens 8192

v1.0.0

Initial release: 6-model OCR benchmark (Bedrock Claude, Gemini, PaddleOCR), scoring against ground truth, PPT report generation

元数据

Slug ocr-benchmark

版本 2.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 2

常见问题

Ocr Benchmark 是什么？

Multi-model OCR benchmark and comparison tool. Run OCR on images using Claude (Opus/Sonnet/Haiku via Bedrock), Gemini (Pro/Flash via Google AI Studio), and P... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 258 次。

如何安装 Ocr Benchmark？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install ocr-benchmark」即可一键安装，无需额外配置。

Ocr Benchmark 是免费的吗？

是的，Ocr Benchmark 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Ocr Benchmark 支持哪些平台？

Ocr Benchmark 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Ocr Benchmark？

由 yingfengli（@yingfengli）开发并维护，当前版本 v2.0.0。