← Back to Skills Marketplace
nidhov01

03 图像识别

by nidhov01 · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
256
Downloads
0
Stars
1
Active Installs
1
Versions
Install in OpenClaw
/install 03
Description
安全的图片识别工具,支持本地和API两种模式
README (SKILL.md)

AI视觉识别技能

安全的图片识别工具,支持本地模式和API模式(GPT-4o/Claude),保护隐私。

技能描述

提供图片内容识别、描述和分析功能。支持API模式(使用OpenAI或Claude)获得高准确度,或本地模式(无需API)保护隐私。

使用场景

  • 用户:"描述这张图片的内容" → 分析图片并返回描述
  • 用户:"这张图片里有什么物体?" → 识别图片中的物体
  • 用户:"分析这张截图" → 提取图片中的文字和界面信息
  • 用户:"批量分析这些图片" → 处理多张图片

工具和依赖

工具列表

  • scripts/vision_ai.py:核心视觉识别模块

API密钥

可选(API模式)

  • OPENAI_API_KEY:OpenAI API密钥(GPT-4o)
  • ANTHROPIC_API_KEY:Anthropic API密钥(Claude)

外部依赖

API模式(推荐)

  • Python 3.7+
  • openai 或 anthropic

本地模式

  • Python 3.7+
  • torch(PyTorch)
  • transformers
  • Pillow

配置说明

环境变量

# API模式(推荐)
export OPENAI_API_KEY="sk-xxx"
# 或
export ANTHROPIC_API_KEY="sk-ant-xxx"

支持的图片格式

  • JPEG (.jpg, .jpeg)
  • PNG (.png)
  • WebP (.webp)
  • GIF (.gif)
  • 最大文件大小:10MB

使用示例

基本用法

from vision_ai import VisionAI

# API模式(推荐)
vision = VisionAI(mode="api")
result = vision.analyze("photo.jpg", "描述图片中的物体")

# 本地模式(无需API)
vision = VisionAI(mode="local")
result = vision.analyze("photo.jpg")

# 批量分析
results = vision.batch_analyze("./images")

场景1:描述图片内容

用户:"这张图片里有什么?"

AI:

vision = VisionAI(mode="api")
result = vision.analyze("photo.jpg", "描述图片内容")
# 返回:图片包含一只在草地上奔跑的金色犬...

场景2:识别图片中的文字

用户:"提取这张截图中的文字"

AI:

result = vision.analyze("screenshot.png", "提取图片中的所有文字")
# 返回:识别出的文字内容

场景3:批量分析

用户:"分析images文件夹里的所有图片"

AI:

results = vision.batch_analyze("./images")
# 返回:每张图片的分析结果

故障排除

问题1:API模式调用失败

现象:返回API错误

解决

  1. 检查API密钥是否正确
  2. 确认API配额充足
  3. 检查网络连接
  4. 验证图片格式和大小

问题2:本地模式首次运行慢

现象:第一次分析图片很慢

解决

  • 首次运行需要下载模型(约500MB)
  • 确保网络畅通
  • 下载完成后会缓存,后续速度正常

问题3:图片格式不支持

现象:提示文件格式错误

解决

  • 确认文件是JPG/PNG/WebP/GIF格式
  • 检查文件大小不超过10MB
  • 尝试转换图片格式

性能对比

模式 准确度 速度 成本 隐私
API模式 ⭐⭐⭐⭐⭐ 按量计费 需上传
本地模式 ⭐⭐⭐ 免费 完全本地

注意事项

  1. 敏感图片:建议使用本地模式,保护隐私
  2. API配额:API模式按使用量计费,注意控制成本
  3. 批量处理:注意API速率限制
  4. 模型下载:本地模式首次运行需要下载模型
Usage Guidance
This skill appears to implement the advertised local/API image recognition, but there are red flags you should check before installing: - Missing LLMConfig: vision_ai.py imports llm_config from a parent directory, but that file is not included. Inspect or obtain the LLMConfig implementation before running; it determines which environment variables are read and how API calls are made. - Undocumented credential usage: SKILL.md mentions optional OPENAI_API_KEY / ANTHROPIC_API_KEY, but the code supports many other providers and likely reads other env vars via LLMConfig. Do not supply broad or high-permission API keys until you confirm which keys are needed. - Run install in an isolated environment: use a dedicated Python virtualenv or container when running install.sh (it will create a venv if you accept). Review install.sh and run.sh contents first. - For privacy, prefer local mode: local mode uses downloaded models and avoids sending images to external APIs; note first-run model download (~500MB). - Verify network behavior: the code encodes images to base64 and sends them to provider APIs. If you must process sensitive images, avoid API mode or audit the provider endpoints used by your LLMConfig implementation. - If you are not comfortable auditing LLMConfig or trusting the unknown publisher (no homepage, unknown source), do not install on production or sensitive systems. If you can obtain the missing LLMConfig file (or the publisher provides it), review it to confirm which environment variables, endpoints, and auth flows it uses; that information would significantly increase confidence.
Capability Analysis
Type: OpenClaw Skill Name: 03 Version: 1.0.0 The skill bundle is a legitimate image recognition tool supporting both local (BLIP model) and API-based (OpenAI, Anthropic, Zhipu) processing. It includes robust file validation logic in `vision_ai.py` to check MIME types and file sizes, and the `install.sh` script performs standard dependency management without suspicious side effects. No indicators of data exfiltration, malicious execution, or prompt injection were found.
Capability Assessment
Purpose & Capability
Name/description claim local and API image recognition which matches the provided code (local model via transformers or API mode). However the Python code references many providers (zhipu, deepseek, qwen, etc.) while SKILL.md only documents OpenAI/Anthropic; the code also imports a LLMConfig from a parent directory that is not present in the package—this mismatch is unexpected.
Instruction Scope
SKILL.md instructs use of OPENAI_API_KEY / ANTHROPIC_API_KEY (optional). The runtime code delegates provider selection and credential reading to LLMConfig (imported from a parent path), so the skill may read other provider-related environment variables or configs not documented. Instructions and code could thus access credentials beyond those declared in the registry metadata.
Install Mechanism
No formal install spec in registry, but an install.sh and requirements.txt are included. install.sh interactively creates a Python venv and pip-installs either API or local dependencies. This is standard but interactive; the package does not download code from arbitrary URLs during install (only pip).
Credentials
Registry metadata lists no required environment variables, but SKILL.md documents optional OPENAI_API_KEY and ANTHROPIC_API_KEY. The code uses an external LLMConfig module (not included) which likely reads provider-specific env vars (API keys and base URLs). That mismatch means the skill may require or read secrets beyond what's declared—disproportionate and undocumented.
Persistence & Privilege
always is false and the skill does not request system-wide privileges. The included install.sh writes a run.sh and can create a venv in the current directory—normal installer behavior. It does not modify other skills or system configs.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install 03
  3. After installation, invoke the skill by name or use /03
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
vision-ai 1.0.0 初始版本发布 - 提供安全的图片识别,支持本地模式和API模式(OpenAI GPT-4o/Claude) - 支持图片内容描述、物体识别、文字提取与批量分析 - 本地模式无需API,保护用户隐私 - 支持常见图片格式(JPG/PNG/WebP/GIF),单图最大10MB - 提供详细配置、依赖说明和使用示例
Metadata
Slug 03
Version 1.0.0
License MIT-0
All-time Installs 1
Active Installs 1
Total Versions 1
Frequently Asked Questions

What is 03 图像识别?

安全的图片识别工具,支持本地和API两种模式. It is an AI Agent Skill for Claude Code / OpenClaw, with 256 downloads so far.

How do I install 03 图像识别?

Run "/install 03" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is 03 图像识别 free?

Yes, 03 图像识别 is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does 03 图像识别 support?

03 图像识别 is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created 03 图像识别?

It is built and maintained by nidhov01 (@nidhov01); the current version is v1.0.0.

💬 Comments