← 返回 Skills 市场
wangning823-arch

Image Recognition

作者 wangning823-arch · GitHub ↗ · v1.1.0 · MIT-0
cross-platform ⚠ suspicious
367
总下载
0
收藏
3
当前安装
2
版本数
在 OpenClaw 中安装
/install image-recognition
功能描述
图片识别 - 通用图片识别技能,支持 OCR 文字提取、物体识别、场景分析等。自动使用用户配置的视觉模型,适用于 Android/Termux 环境。
使用说明 (SKILL.md)

Image Recognition Skill (图片识别)

适用于 Android/Termux 环境的图片识别技能

何时使用

使用此技能:

  • "识别这张图片"
  • "图片里有什么"
  • "提取图片中的文字"
  • OCR 文字识别
  • 物体/场景识别
  • 截图内容分析

不使用此技能:

  • 用户明确要求用其他 OCR 服务
  • 图片文件不存在或损坏

技术原理

核心方法:

  1. 读取图片文件为二进制
  2. Base64 编码
  3. 调用用户配置的视觉模型 API
  4. 返回识别结果

为什么不用 sharp:

  • sharp 模块在 Termux (Android arm64) 无法加载
  • 直接使用 Python + requests 调用 API 更稳定

支持的模型提供商:

  • ✅ Bailian (通义千问) - qwen3.5-plus, qwen-vl-max
  • ✅ OpenRouter - 支持视觉的模型
  • ✅ 其他 OpenAI 兼容接口 - 支持 image_url 格式的模型

配置

方式一:使用 OpenClaw 配置的模型(推荐)

脚本会自动读取 OpenClaw 的配置文件 ~/.openclaw/openclaw.json,使用已配置的模型和 API Key。

无需额外配置! 只要你的 OpenClaw 配置了支持视觉的模型即可。

方式二:手动配置环境变量

# Bailian (通义千问)
export IMAGE_MODEL_PROVIDER="bailian"
export IMAGE_MODEL_API_KEY="sk-sp-xxxxxxxxxxxxx"
export IMAGE_MODEL_NAME="qwen3.5-plus"
export IMAGE_MODEL_ENDPOINT="https://coding.dashscope.aliyuncs.com/v1/chat/completions"

# OpenRouter
export IMAGE_MODEL_PROVIDER="openrouter"
export IMAGE_MODEL_API_KEY="sk-or-xxxxxxxxxxxxx"
export IMAGE_MODEL_NAME="qwen/qwen-2.5-vl-72b-instruct"
export IMAGE_MODEL_ENDPOINT="https://openrouter.ai/api/v1/chat/completions"

Python 依赖

pip3 install requests Pillow

使用方法

方式一:自动检测(推荐)

脚本会自动读取 OpenClaw 配置文件,使用已配置的支持视觉的模型

python3 ~/.openclaw/skills/image-recognition/recognize.py /path/to/image.jpg

无需额外配置! 只要你的 OpenClaw 配置了支持视觉的模型(如 qwen3.5-plus)即可。

方式二:手动指定环境变量

# Bailian (通义千问)
export IMAGE_MODEL_PROVIDER="bailian"
export IMAGE_MODEL_API_KEY="sk-sp-xxxxxxxxxxxxx"
export IMAGE_MODEL_NAME="qwen3.5-plus"

# OpenRouter
export IMAGE_MODEL_PROVIDER="openrouter"
export IMAGE_MODEL_API_KEY="sk-or-xxxxxxxxxxxxx"
export IMAGE_MODEL_NAME="qwen/qwen-2.5-vl-72b-instruct"

# 使用
python3 recognize.py /path/to/image.jpg

方式三:在 Python 代码中使用

from recognize import recognize_image, get_model_config

# 自动检测配置
config = get_model_config()
print(f"使用模型:{config['provider']}/{config['model']}")

# 识别图片
result = recognize_image("/path/to/image.jpg", "提取图片中的文字")
print(result)

# 或手动指定配置
custom_config = {
    "provider": "bailian",
    "api_key": "sk-sp-xxx",
    "model": "qwen3.5-plus",
    "endpoint": "https://coding.dashscope.aliyuncs.com/v1/chat/completions",
    "headers": {"Authorization": f"Bearer sk-sp-xxx"}
}
result = recognize_image("/path/to/image.jpg", config=custom_config)

API 配置(高级)

大多数用户不需要手动配置,脚本会自动使用 OpenClaw 的模型配置。

自动检测逻辑

  1. 优先使用环境变量(如果设置了)
  2. 其次读取 OpenClaw 配置~/.openclaw/openclaw.json
  3. 最后使用默认配置(Bailian qwen3.5-plus)

手动配置各提供商

Bailian (通义千问)

export IMAGE_MODEL_PROVIDER="bailian"
export IMAGE_MODEL_API_KEY="sk-sp-xxxxxxxxxxxxx"
export IMAGE_MODEL_NAME="qwen3.5-plus"
# 端点自动设置为:https://coding.dashscope.aliyuncs.com/v1/chat/completions

OpenRouter

export IMAGE_MODEL_PROVIDER="openrouter"
export IMAGE_MODEL_API_KEY="sk-or-xxxxxxxxxxxxx"
export IMAGE_MODEL_NAME="qwen/qwen-2.5-vl-72b-instruct"
# 端点自动设置为:https://openrouter.ai/api/v1/chat/completions

其他 OpenAI 兼容接口

export IMAGE_MODEL_PROVIDER="openai"
export IMAGE_MODEL_API_KEY="sk-xxxxxxxxxxxxx"
export IMAGE_MODEL_NAME="gpt-4o"
export IMAGE_MODEL_ENDPOINT="https://api.openai.com/v1/chat/completions"

支持的平台

已测试:

  • Android (Termux) - arm64
  • Linux - x86_64, arm64
  • macOS - x86_64, arm64

支持的图片格式:

  • JPEG/JPG
  • PNG
  • GIF (静态)
  • WebP
  • BMP

常见问题

Q: 为什么不用 sharp 模块?

A: sharp 依赖 libvips,在 Termux (Android) 上编译和安装非常困难。直接使用 Python + requests 调用 API 更简单稳定。

Q: API Key 无效怎么办?

A: 检查:

  1. API Key 是否正确(sk-sp- 开头)
  2. 是否使用了正确的端点(coding.dashscope.aliyuncs.com
  3. API Key 是否已开通视觉模型权限

Q: 识别速度慢怎么办?

A:

  • 图片太大 → 压缩到 2MB 以内
  • 网络问题 → 检查网络连接
  • 模型响应慢 → 尝试 qwen-turbo

Q: 识别不准确怎么办?

A:

  • 图片模糊 → 提供更清晰的图片
  • 文字太小 → 放大或裁剪
  • 特殊字体 → 尝试其他 OCR 服务

成本

  • qwen3.5-plus:约 0.002 元/次(1000x1000 图片)
  • 具体价格参考:https://help.aliyun.com/zh/model-studio/pricing

替代方案

如无 Bailian API,可使用:

  • OpenRouter: qwen/qwen-2.5-vl-72b-instruct
  • 本地 OCR: tesseract(需要安装)
  • 其他云服务:百度 OCR、腾讯 OCR 等

更新日志

  • 2026-04-01: 初始版本,支持 Bailian API 图片识别
  • 适用于 Android/Termux 环境,绕过 sharp 模块限制
安全使用建议
What to consider before installing: - Sensitive-data transmission: This skill sends entire images (base64) to external model endpoints. If you care about privacy/confidentiality, do not use it with sensitive images or ensure you control the API key/provider. - Hard-coded fallback API key: recognize.py and usage-guide.md include a hard-coded API key (sk-sp-e20dc070c4724e909f4b0be4d1d386e0). Treat this as suspicious — it may be a shared or compromised key. Remove it and configure your own API key before use, and avoid running the skill until the key is removed/rotated. - Implicit access to user config: The script will read ~/.openclaw/openclaw.json for model/provider API keys. Review that file first to ensure it doesn't contain credentials you don't want the skill to use. The skill should have declared this config path in metadata but did not. - Prefer explicit configuration: If you install, set IMAGE_MODEL_API_KEY and IMAGE_MODEL_PROVIDER yourself rather than relying on automatic detection or the embedded fallback key. - Audit endpoints and headers: Confirm endpoints (coding.dashscope.aliyuncs.com, openrouter.ai, or your OpenAI-compatible endpoint) are the ones you expect. The script adds extra headers (e.g., HTTP-Referer, X-Title) for some providers; review these if you have strict privacy requirements. - Test in a sandbox: Run the script in an isolated environment with non-sensitive images and with network monitoring to confirm behavior. If you need offline/local OCR, consider alternatives like tesseract instead of sending images to the cloud. If you want, I can: (1) point to the exact lines where the hard-coded key appears, (2) suggest a minimal code change to remove the fallback key and require explicit configuration, or (3) produce instructions for safely testing the skill in a sandboxed environment.
功能分析
Type: OpenClaw Skill Name: image-recognition Version: 1.1.0 The skill script `recognize.py` accesses the sensitive OpenClaw configuration file `~/.openclaw/openclaw.json` to automatically retrieve API keys for various providers. While this is documented as a feature for 'automatic configuration,' it grants the skill access to all stored credentials in the agent's environment. Additionally, the script and `usage-guide.md` contain a hardcoded API key (`sk-sp-e20dc070c4724e909f4b0be4d1d386e0`). Although the behavior is aligned with the stated purpose of image recognition, the access to global configuration files is a high-risk pattern.
能力评估
Purpose & Capability
The skill claims to perform OCR/object/scene recognition and does so by encoding images and calling external visual-model APIs — this is coherent with the stated purpose. However, it also automatically reads the user's OpenClaw config file (~/.openclaw/openclaw.json) to extract provider API keys and model info; that behavior is reasonable for an OpenClaw-integrated skill but the registry metadata did not declare the config path or required secrets, which is an omission.
Instruction Scope
SKILL.md and recognize.py instruct the agent to read ~/.openclaw/openclaw.json and/or environment variables for API keys and then send base64 image data to external endpoints (coding.dashscope.aliyuncs.com, openrouter.ai, or other OpenAI-compatible endpoints). Sending user images and any extracted config/API keys to remote services is expected for a cloud-model-based OCR skill, but the skill's instructions also include an embedded fallback API key and example code that hard-codes the same key — this increases risk because it causes sensitive data (images) to be sent using a key not controlled by the user.
Install Mechanism
No install spec is present; the skill is instruction-only with a small Python script and uses standard pip dependencies (requests, Pillow). That minimizes disk-level install risk.
Credentials
The skill does not declare required env vars but supports/reads IMAGE_MODEL_* environment variables and will read OpenClaw's config file for API keys and endpoints. Critically, if no API key is found it falls back to a hard-coded key ('sk-sp-e20dc070c4724e909f4b0be4d1d386e0') embedded in recognize.py and usage-guide.md. Hard-coded credentials in the codebase are disproportionate and risky: they may be revoked/overused by others, leaked, or allow exfiltration of images under someone else's account.
Persistence & Privilege
The skill does not request permanent/always-on presence and does not modify other skills or global agent settings. Autonomous invocation is allowed (platform default) but not combined with elevated privileges in this package.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install image-recognition
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /image-recognition 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.1.0
- Added automatic model provider detection: now supports Bailian, OpenRouter, and other OpenAI-compatible visual models by reading configuration or environment variables. - Updated documentation with new provider options and configuration examples; homepage now at clawhub.ai. - Removed mandatory sharp and related references; continues to require only `requests` and `Pillow`. - Simplified usage: script auto-detects the configured model from OpenClaw or environment variables for seamless setup. - Maintains Android/Termux compatibility and supports a wide range of image formats.
v1.0.0
Initial release: Adds image recognition skill for Android/Termux using Bailian API, bypassing sharp module limitations. - Supports OCR, object recognition, and scene analysis via API. - Python-based implementation using requests and Pillow for stability. - Requires BAILIAN_API_KEY environment variable for authentication. - Compatible with JPEG, PNG, GIF (static), WebP, and BMP image formats. - Designed for Android (Termux), Linux, and macOS platforms.
元数据
Slug image-recognition
版本 1.1.0
许可证 MIT-0
累计安装 3
当前安装数 3
历史版本数 2
常见问题

Image Recognition 是什么?

图片识别 - 通用图片识别技能,支持 OCR 文字提取、物体识别、场景分析等。自动使用用户配置的视觉模型,适用于 Android/Termux 环境。 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 367 次。

如何安装 Image Recognition?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install image-recognition」即可一键安装,无需额外配置。

Image Recognition 是免费的吗?

是的,Image Recognition 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Image Recognition 支持哪些平台?

Image Recognition 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Image Recognition?

由 wangning823-arch(@wangning823-arch)开发并维护,当前版本 v1.1.0。

💬 留言讨论