← 返回 Skills 市场
yingfengli

Ocr Benchmark

作者 yingfengli · GitHub ↗ · v2.0.0 · MIT-0
cross-platform ⚠ suspicious
258
总下载
0
收藏
0
当前安装
2
版本数
在 OpenClaw 中安装
/install ocr-benchmark
功能描述
Multi-model OCR benchmark and comparison tool. Run OCR on images using Claude (Opus/Sonnet/Haiku via Bedrock), Gemini (Pro/Flash via Google AI Studio), and P...
安全使用建议
This skill appears to be a legitimate OCR benchmarking tool, but note the following before installing and running: (1) The package metadata omits required env vars — you will need AWS credentials (for Bedrock) and GOOGLE_API_KEY for Gemini, and optionally a PADDLEOCR_ENDPOINT/TOKEN; verify and provide only least-privilege credentials. (2) Running the tool will upload image bytes and extracted text to external services (Anthropic/Bedrock, Google AI Studio, or whatever URL you provide for PaddleOCR). Do not use sensitive/private images unless you trust the destination. (3) Inspect requirements.txt and the two scripts locally before pip installing; consider running in an isolated virtualenv or container. (4) If you don’t want to provide credentials for a provider, use the --auto-skip flag or run only specific models. (5) The metadata in the registry is inconsistent — if you need a fully audited skill record, ask the publisher to correct required-env and README metadata. If you want me to, I can point out the exact lines in the code that send data to each external endpoint and summarize the permissions each provider needs.
功能分析
Type: OpenClaw Skill Name: ocr-benchmark Version: 2.0.0 The ocr-benchmark skill bundle is a legitimate tool for comparing OCR performance across multiple AI providers (AWS Bedrock, Google Gemini, and PaddleOCR). The scripts (run_benchmark.py and make_report.py) perform standard operations such as reading local image files, making API calls to established providers using environment variables for credentials, and generating PowerPoint reports. There is no evidence of data exfiltration, malicious execution, or prompt injection; the code is well-documented and its behavior aligns strictly with the stated purpose.
能力评估
Purpose & Capability
The skill's name and description (multi-model OCR benchmark) match the included code and instructions: it calls Bedrock (Claude), Google Gemini, and an optional PaddleOCR endpoint. However, the registry metadata claims no required environment variables or credentials while the SKILL.md and scripts clearly require AWS credentials (for Bedrock), GOOGLE_API_KEY (for Gemini), and optionally a PADDLEOCR_ENDPOINT/TOKEN. The functional requirements are coherent with the stated purpose, but the published metadata is inaccurate/omitted.
Instruction Scope
SKILL.md and scripts instruct the agent/user to install Python deps, point to local image files and a ground-truth JSON, and call external model endpoints. The runtime behavior is scoped to reading provided image files and ground-truth JSON, calling model provider APIs, saving per-image JSON results, scoring, and generating PPTX reports. There are no instructions to read unrelated system files or to exfiltrate arbitrary data beyond the model providers/PaddleOCR endpoint, but images and extracted text are sent to external services (expected for OCR).
Install Mechanism
There is no packaged installer; the SKILL.md instructs pip install -r requirements.txt. requirements.txt contains common packages (boto3, google-genai, python-pptx, requests) that match the providers and reporting functionality. No downloads from arbitrary URLs or archive extraction are present in the repo. Review requirements.txt before installing into any environment.
Credentials
The environment variables used by the code (AWS credentials via normal boto3 mechanisms, AWS_REGION, GOOGLE_API_KEY, optional PADDLEOCR_ENDPOINT/PADDLEOCR_TOKEN) are proportionate to the skill's purpose (calling Bedrock, Google AI Studio, or an external PaddleOCR API). The concern is that the registry metadata lists no required env vars/credentials — that mismatch could confuse users about what secrets they must provide and trust. Bedrock usage requires AWS credentials with bedrock-runtime permissions; you should use least-privilege IAM keys and avoid sharing broad credentials.
Persistence & Privilege
The skill does not request always:true, does not modify other skills or system-wide agent settings, and is instruction-driven. It runs on-demand and writes results/reports to the specified output directory only. No elevated persistence or autonomous privilege beyond normal skill invocation is requested.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install ocr-benchmark
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /ocr-benchmark 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v2.0.0
v2.0.0: Fuzzy scoring with Levenshtein, auto-skip missing providers, EXTRA line detection, terminal report, requirements.txt, max_output_tokens 8192
v1.0.0
Initial release: 6-model OCR benchmark (Bedrock Claude, Gemini, PaddleOCR), scoring against ground truth, PPT report generation
元数据
Slug ocr-benchmark
版本 2.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 2
常见问题

Ocr Benchmark 是什么?

Multi-model OCR benchmark and comparison tool. Run OCR on images using Claude (Opus/Sonnet/Haiku via Bedrock), Gemini (Pro/Flash via Google AI Studio), and P... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 258 次。

如何安装 Ocr Benchmark?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install ocr-benchmark」即可一键安装,无需额外配置。

Ocr Benchmark 是免费的吗?

是的,Ocr Benchmark 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Ocr Benchmark 支持哪些平台?

Ocr Benchmark 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Ocr Benchmark?

由 yingfengli(@yingfengli)开发并维护,当前版本 v2.0.0。

💬 留言讨论