← Back to Skills Marketplace
caiming0331

image2text

by caiming0331 · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ Security Clean
86
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install image2text
Description
Extract text from images using tesseract OCR, supporting local files, URLs, and base64 inputs for text-only AI models without vision capability.
README (SKILL.md)

image2text

Extract text from images without needing a vision-capable AI model.

Usage

python3 scripts/ocr.py \x3Cimage path|URL|base64> [--lang \x3Clanguages>] [--psm \x3Cmode>] [--raw]

Parameters

  • --lang: Language codes, comma-separated, default chi_sim+eng
    • chi_sim Simplified Chinese | chi_tra Traditional | eng English | jpn Japanese | kor Korean | and 30+ more
    • Combine: chi_sim+eng
  • --psm: Page segmentation mode, default 6
    • 3 Fully automatic | 6 Block-level | 4 Single line | 11 Sparse text
  • --raw: Output plain text only, no markers

Auto-Detects Input Type

  1. Local path: /Users/xxx/Downloads/xxx.png
  2. Web URL: https://example.com/image.png — OSS temp links work too
  3. Base64: Pasted image data from clipboard — just paste directly

Workflow

  1. Receive image input → auto-detect type (local path / URL / base64)
  2. URL → curl downloads to temp file
  3. Base64 → decode to temp file
  4. Run tesseract OCR
  5. Output plain text

Examples

OCR a Chinese receipt:

python3 scripts/ocr.py ~/Downloads/receipt.png --lang chi_sim

English + Chinese mixed:

python3 scripts/ocr.py https://example.com/doc.jpg --lang chi_sim+eng

Plain text only (no markers):

python3 scripts/ocr.py /path/to/image.png --raw

Requirements

  • tesseract must be installed: brew install tesseract
  • Language packs auto-installed with tesseract
  • On Mac: binary at /opt/homebrew/bin/tesseract
  • Temp files auto-deleted after execution
  • For best accuracy on receipts/screenshots: try --psm 3
Usage Guidance
This skill appears to do exactly what it says: local OCR via your system tesseract. Before installing/using it: (1) ensure tesseract and any language packs you need are installed locally; (2) do not pass untrusted URLs or pasted base64 from unknown sources (the script will download and process whatever URL you supply); (3) be aware the script calls subprocesses (curl as a fallback and tesseract) and writes temporary files which it deletes; and (4) no credentials are requested, and results are printed locally (no external transmission coded into the skill). If you need automatic fetching from arbitrary web locations in a sensitive environment, consider restricting allowed sources or reviewing network policies first.
Capability Analysis
Type: OpenClaw Skill Name: image2text Version: 1.0.0 The image2text skill is a legitimate utility for performing OCR on local files, URLs, or base64-encoded images using Tesseract. The Python script (scripts/ocr.py) handles external inputs safely by using subprocess.run with argument lists instead of shell execution, and it includes proper cleanup of temporary files. No evidence of malicious intent, data exfiltration, or prompt injection was found.
Capability Assessment
Purpose & Capability
Name, description, SKILL.md, and the included script all describe the same functionality: take a local path/URL/base64 input, download or decode it to a temp file, run local tesseract, and return extracted text. Required capabilities (tesseract binary) are consistent with the purpose; no unrelated env vars or credentials are requested.
Instruction Scope
Runtime instructions and the script stay within OCR scope: they accept local/URL/base64 inputs, download or decode to temp files, run tesseract, and output text. The script will download arbitrary URLs supplied by the user (urllib or curl) and invokes subprocesses (curl, tesseract). These behaviors are expected for a URL-capable OCR tool but mean the agent will fetch remote data you provide — avoid passing untrusted URLs or base64 content.
Install Mechanism
There is no install specification; the skill is instruction-only and ships a small Python script. The only external dependency is the system tesseract binary (SKILL.md suggests brew install on mac). No downloaded archives or non-standard installers are used.
Credentials
The skill requires no environment variables, credentials, or config paths. It only uses system binaries (curl if urllib fails, and tesseract) and temporary files; requested permissions are proportional to its stated function.
Persistence & Privilege
always is false and the skill does not attempt to modify other skills, global agent config, or persist credentials. It writes temporary files during execution and deletes them in the finally block.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install image2text
  3. After installation, invoke the skill by name or use /image2text
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release — extract text from any image using tesseract OCR. Supports local paths, URLs (OSS/http/https), and base64 clipboard input. Works with text-only AI models that lack vision capability. 30+ languages supported.
Metadata
Slug image2text
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is image2text?

Extract text from images using tesseract OCR, supporting local files, URLs, and base64 inputs for text-only AI models without vision capability. It is an AI Agent Skill for Claude Code / OpenClaw, with 86 downloads so far.

How do I install image2text?

Run "/install image2text" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is image2text free?

Yes, image2text is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does image2text support?

image2text is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created image2text?

It is built and maintained by caiming0331 (@caiming0331); the current version is v1.0.0.

💬 Comments