← 返回 Skills 市场
caiming0331

image2text

作者 caiming0331 · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ 安全检测通过
86
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install image2text
功能描述
Extract text from images using tesseract OCR, supporting local files, URLs, and base64 inputs for text-only AI models without vision capability.
使用说明 (SKILL.md)

image2text

Extract text from images without needing a vision-capable AI model.

Usage

python3 scripts/ocr.py \x3Cimage path|URL|base64> [--lang \x3Clanguages>] [--psm \x3Cmode>] [--raw]

Parameters

  • --lang: Language codes, comma-separated, default chi_sim+eng
    • chi_sim Simplified Chinese | chi_tra Traditional | eng English | jpn Japanese | kor Korean | and 30+ more
    • Combine: chi_sim+eng
  • --psm: Page segmentation mode, default 6
    • 3 Fully automatic | 6 Block-level | 4 Single line | 11 Sparse text
  • --raw: Output plain text only, no markers

Auto-Detects Input Type

  1. Local path: /Users/xxx/Downloads/xxx.png
  2. Web URL: https://example.com/image.png — OSS temp links work too
  3. Base64: Pasted image data from clipboard — just paste directly

Workflow

  1. Receive image input → auto-detect type (local path / URL / base64)
  2. URL → curl downloads to temp file
  3. Base64 → decode to temp file
  4. Run tesseract OCR
  5. Output plain text

Examples

OCR a Chinese receipt:

python3 scripts/ocr.py ~/Downloads/receipt.png --lang chi_sim

English + Chinese mixed:

python3 scripts/ocr.py https://example.com/doc.jpg --lang chi_sim+eng

Plain text only (no markers):

python3 scripts/ocr.py /path/to/image.png --raw

Requirements

  • tesseract must be installed: brew install tesseract
  • Language packs auto-installed with tesseract
  • On Mac: binary at /opt/homebrew/bin/tesseract
  • Temp files auto-deleted after execution
  • For best accuracy on receipts/screenshots: try --psm 3
安全使用建议
This skill appears to do exactly what it says: local OCR via your system tesseract. Before installing/using it: (1) ensure tesseract and any language packs you need are installed locally; (2) do not pass untrusted URLs or pasted base64 from unknown sources (the script will download and process whatever URL you supply); (3) be aware the script calls subprocesses (curl as a fallback and tesseract) and writes temporary files which it deletes; and (4) no credentials are requested, and results are printed locally (no external transmission coded into the skill). If you need automatic fetching from arbitrary web locations in a sensitive environment, consider restricting allowed sources or reviewing network policies first.
功能分析
Type: OpenClaw Skill Name: image2text Version: 1.0.0 The image2text skill is a legitimate utility for performing OCR on local files, URLs, or base64-encoded images using Tesseract. The Python script (scripts/ocr.py) handles external inputs safely by using subprocess.run with argument lists instead of shell execution, and it includes proper cleanup of temporary files. No evidence of malicious intent, data exfiltration, or prompt injection was found.
能力评估
Purpose & Capability
Name, description, SKILL.md, and the included script all describe the same functionality: take a local path/URL/base64 input, download or decode it to a temp file, run local tesseract, and return extracted text. Required capabilities (tesseract binary) are consistent with the purpose; no unrelated env vars or credentials are requested.
Instruction Scope
Runtime instructions and the script stay within OCR scope: they accept local/URL/base64 inputs, download or decode to temp files, run tesseract, and output text. The script will download arbitrary URLs supplied by the user (urllib or curl) and invokes subprocesses (curl, tesseract). These behaviors are expected for a URL-capable OCR tool but mean the agent will fetch remote data you provide — avoid passing untrusted URLs or base64 content.
Install Mechanism
There is no install specification; the skill is instruction-only and ships a small Python script. The only external dependency is the system tesseract binary (SKILL.md suggests brew install on mac). No downloaded archives or non-standard installers are used.
Credentials
The skill requires no environment variables, credentials, or config paths. It only uses system binaries (curl if urllib fails, and tesseract) and temporary files; requested permissions are proportional to its stated function.
Persistence & Privilege
always is false and the skill does not attempt to modify other skills, global agent config, or persist credentials. It writes temporary files during execution and deletes them in the finally block.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install image2text
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /image2text 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release — extract text from any image using tesseract OCR. Supports local paths, URLs (OSS/http/https), and base64 clipboard input. Works with text-only AI models that lack vision capability. 30+ languages supported.
元数据
Slug image2text
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

image2text 是什么?

Extract text from images using tesseract OCR, supporting local files, URLs, and base64 inputs for text-only AI models without vision capability. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 86 次。

如何安装 image2text?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install image2text」即可一键安装,无需额外配置。

image2text 是免费的吗?

是的,image2text 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

image2text 支持哪些平台?

image2text 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 image2text?

由 caiming0331(@caiming0331)开发并维护,当前版本 v1.0.0。

💬 留言讨论