← Back to Skills Marketplace
openclawzhangchong

free-ocr-zc

by 张翀 · GitHub ↗ · v1.0.3 · MIT-0
cross-platform ✓ Security Clean
38
Downloads
0
Stars
0
Active Installs
4
Versions
Install in OpenClaw
/install free-ocr-zc
Description
Extract text from images via OpenRouter API using Baidu Qianfan OCR model, supporting URLs and local files with customizable prompts.
README (SKILL.md)

OpenRouter OCR Skill

Overview

This skill provides OCR (Optical Character Recognition) functionality using models available via OpenRouter. It uses the OpenAI Python library to communicate with OpenRouter's API, specifically designed for models like Baidu's Qianfan OCR.

Quick Start

When you need to extract text from an image:

  1. Ensure prerequisites:

    • Python 3.x installed
    • Required packages: openai, requests (install via pip install openai requests)
    • Place your OpenRouter API key in the file: C:\Users\Administrator\.openclaw\secrets\openrouter.env (format: OPENROUTER_API_KEY=your_key_here)
  2. Call the OCR script with an image URL or local file path:

    python ocr.py \x3Cimage_input> [prompt]
    
    • image_input: Either a URL or a local file path to the image
    • prompt: Optional text prompt for the OCR (default: "OCR提取图片所有文字")
  3. Get result: The script prints the extracted text to stdout.

Usage Examples

Basic Usage with Default Prompt

python ocr.py "https://example.com/image.jpg"

Custom Prompt

python ocr.py "https://example.com/image.jpg" "请识别图片中的所有文字"

Local Image File

python ocr.py "C:\path	o\image.jpg"

How It Works

The skill uses the OpenAI client configured with:

  • Base URL: https://openrouter.ai/api/v1
  • Model: baidu/qianfan-ocr-fast:free (configurable via environment variable)
  • API Key: Read from OPENROUTER_API_KEY environment variable

It sends a multimodal request containing:

  1. A text prompt (default: "OCR提取图片所有文字")
  2. The image (encoded as base64 if local, or passed directly if URL)

The model returns the extracted text which is printed to console.

Environment Variables

  • OPENROUTER_API_KEY: Required - Your OpenRouter API key
  • OCR_MODEL: Optional - Model to use (default: baidu/qianfan-ocr-fast:free)
  • OCR_BASE_URL: Optional - OpenRouter base URL (default: https://openrouter.ai/api/v1)

Installation

  1. Create the skill directory: mkdir -p skills/openrouter-ocr
  2. Save the ocr.py script in this directory
  3. Install dependencies: pip install openai requests
  4. Set your OpenRouter API key:
    setx OPENROUTER_API_KEY "your_api_key_here"
    
    (Restart terminal after setting)

Notes

  • The skill works with both HTTP/HTTPS URLs and local file paths
  • For local files, the image is read and base64-encoded before sending
  • Error handling includes network issues, invalid API keys, and model errors
  • The default model is Baidu's Qianfan OCR fast version (free tier)
  • You can change the model by setting the OCR_MODEL environment variable
  • Response time depends on image size and model speed

Troubleshooting

  • API Key Error: Ensure OPENROUTER_API_KEY is set correctly
  • Module Not Found: Install required packages with pip install openai requests
  • Image Access: Verify the image URL is accessible or local path exists
  • Model Not Available: Check if the specified model is available on OpenRouter

Example Output

✅ OCR 识别结果:
------------------------------------------------------------
这是识别出的文本内容
...
------------------------------------------------------------

Security Note

Never commit your API key to version control. Keep it secure in environment variables.

Usage Guidance
Before installing, make sure you are comfortable sending OCR images to OpenRouter, use a dedicated or limited API key where possible, and install the Python dependencies in an isolated environment. If you only want text extraction, prefer or request an OCR-only entry point that does not perform the extra image-description step.
Capability Tags
requires-sensitive-credentials
Capability Assessment
Purpose & Capability
The OCR purpose matches the code and documentation, although the primary ocr.py script also performs an extra image-description request before OCR.
Instruction Scope
Instructions are user-directed and focused on OCR usage; no goal override, prompt-injection behavior, or hidden autonomous workflow is evident.
Install Mechanism
There is no formal install spec, while SKILL.md instructs users to manually install unpinned Python packages with pip; this is common for a Python wrapper but less reproducible.
Credentials
Reading a user-selected local image and sending it to OpenRouter is proportionate for OCR, but users should avoid images they do not want processed by an external service.
Persistence & Privilege
No background persistence is shown, but the scripts read an OpenRouter API key from a local secrets file or environment variable, which is sensitive credential handling.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install free-ocr-zc
  3. After installation, invoke the skill by name or use /free-ocr-zc
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.3
- No code or documentation changes in this release. - Version incremented to 1.0.3 for tracking purposes only. ## 技能功能 1. **图片描述**:先使用AI模型详细描述图片内容(物体、场景、颜色等) 2. **文字识别(OCR)**:再提取图片中的文字内容 3. **双重验证**:即使图片中没有文字,也能得到图片描述,避免返回空结果 ## 文件位置 - 技能说明:`C:\Users\Administrator\.openclaw\workspace\skills\openrouter-ocr\SKILL.md` - 主程序:`C:\Users\Administrator\.openclaw\workspace\skills\openrouter-ocr\ocr.py` - API密钥存储:`C:\Users\Administrator\.openclaw\secrets\openrouter.env`(已预配置您的密钥) ## 使用方法 ```bash python ocr.py <图片URL或本地路径> [可选的OCR提示词] ``` ### 示例 ```bash # 使用默认提示词 python ocr.py "https://live.staticflickr.com/3851/14825276609_098cac593d_b.jpg" # 自定义OCR提示词 python ocr.py "https://example.com/image.jpg" "请识别图片中的所有文字" ``` ## 特点 - ✅ 从 `secrets/openrouter.env` 文件读取API密钥,避免环境变量泄露 - ✅ 支持HTTP/HTTPS URL和本地文件路径 - ✅ 已修复Windows控制台编码问题(解决了emoji和特殊字符显示问题) - ✅ 默认使用 Baidu Qianfan OCR fast 模型(免费层级) - ✅ 可通过环境变量 `OCR_MODEL` 自定义模型 ## 测试结果 使用您提供的海豚图片测试: - **图片描述**:正确描述了两只海豚在海水中嬉戏的场景 - **OCR识别**:提取到了文字 "跃出一跃" 技能已就绪,您可以直接使用。如需修改模型或其他配置,请编辑技能目录下的文件。
v1.0.2
- Added new file: ocr_final.py. - No changes to documentation or existing files except for the addition of this script. - Implements additional or updated OCR functionality in ocr_final.py.
v1.0.1
- Added new file: ocr_fixed.py - No changes to documentation or core functionality apart from the new script addition.
v1.0.0
- Initial release of the OpenRouter OCR skill. - Provides OCR functionality using OpenRouter-supported models (default: Baidu Qianfan OCR). - Accepts image input via URL or local file path; outputs extracted text to stdout. - Uses environment variables for API key and model configuration. - Offers error handling for network, authentication, and model selection issues. - Includes a detailed `SKILL.md` with setup, usage, and troubleshooting instructions.
Metadata
Slug free-ocr-zc
Version 1.0.3
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 4
Frequently Asked Questions

What is free-ocr-zc?

Extract text from images via OpenRouter API using Baidu Qianfan OCR model, supporting URLs and local files with customizable prompts. It is an AI Agent Skill for Claude Code / OpenClaw, with 38 downloads so far.

How do I install free-ocr-zc?

Run "/install free-ocr-zc" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is free-ocr-zc free?

Yes, free-ocr-zc is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does free-ocr-zc support?

free-ocr-zc is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created free-ocr-zc?

It is built and maintained by 张翀 (@openclawzhangchong); the current version is v1.0.3.

💬 Comments