← 返回 Skills 市场

03 图像识别

Name: 03 图像识别
Author: nidhov01

作者 nidhov01 · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

256

总下载

当前安装

版本数

在 OpenClaw 中安装

/install 03

功能描述

安全的图片识别工具，支持本地和API两种模式

使用说明 (SKILL.md)

AI视觉识别技能

安全的图片识别工具，支持本地模式和API模式（GPT-4o/Claude），保护隐私。

技能描述

提供图片内容识别、描述和分析功能。支持API模式（使用OpenAI或Claude）获得高准确度，或本地模式（无需API）保护隐私。

使用场景

用户："描述这张图片的内容" → 分析图片并返回描述
用户："这张图片里有什么物体？" → 识别图片中的物体
用户："分析这张截图" → 提取图片中的文字和界面信息
用户："批量分析这些图片" → 处理多张图片

工具和依赖

工具列表

scripts/vision_ai.py：核心视觉识别模块

API密钥

可选（API模式）：

OPENAI_API_KEY：OpenAI API密钥（GPT-4o）
ANTHROPIC_API_KEY：Anthropic API密钥（Claude）

外部依赖

API模式（推荐）：

Python 3.7+
openai 或 anthropic

本地模式：

Python 3.7+
torch（PyTorch）
transformers
Pillow

配置说明

环境变量

# API模式（推荐）
export OPENAI_API_KEY="sk-xxx"
# 或
export ANTHROPIC_API_KEY="sk-ant-xxx"

支持的图片格式

JPEG (.jpg, .jpeg)
PNG (.png)
WebP (.webp)
GIF (.gif)
最大文件大小：10MB

使用示例

基本用法

from vision_ai import VisionAI

# API模式（推荐）
vision = VisionAI(mode="api")
result = vision.analyze("photo.jpg", "描述图片中的物体")

# 本地模式（无需API）
vision = VisionAI(mode="local")
result = vision.analyze("photo.jpg")

# 批量分析
results = vision.batch_analyze("./images")

场景1：描述图片内容

用户："这张图片里有什么？"

AI：

vision = VisionAI(mode="api")
result = vision.analyze("photo.jpg", "描述图片内容")
# 返回：图片包含一只在草地上奔跑的金色犬...

场景2：识别图片中的文字

用户："提取这张截图中的文字"

AI：

result = vision.analyze("screenshot.png", "提取图片中的所有文字")
# 返回：识别出的文字内容

场景3：批量分析

用户："分析images文件夹里的所有图片"

AI：

results = vision.batch_analyze("./images")
# 返回：每张图片的分析结果

故障排除

问题1：API模式调用失败

现象：返回API错误

解决：

检查API密钥是否正确
确认API配额充足
检查网络连接
验证图片格式和大小

问题2：本地模式首次运行慢

现象：第一次分析图片很慢

解决：

首次运行需要下载模型（约500MB）
确保网络畅通
下载完成后会缓存，后续速度正常

问题3：图片格式不支持

现象：提示文件格式错误

解决：

确认文件是JPG/PNG/WebP/GIF格式
检查文件大小不超过10MB
尝试转换图片格式

性能对比

模式	准确度	速度	成本	隐私
API模式	⭐⭐⭐⭐⭐	快	按量计费	需上传
本地模式	⭐⭐⭐	慢	免费	完全本地

注意事项

敏感图片：建议使用本地模式，保护隐私
API配额：API模式按使用量计费，注意控制成本
批量处理：注意API速率限制
模型下载：本地模式首次运行需要下载模型

安全使用建议

This skill appears to implement the advertised local/API image recognition, but there are red flags you should check before installing: - Missing LLMConfig: vision_ai.py imports llm_config from a parent directory, but that file is not included. Inspect or obtain the LLMConfig implementation before running; it determines which environment variables are read and how API calls are made. - Undocumented credential usage: SKILL.md mentions optional OPENAI_API_KEY / ANTHROPIC_API_KEY, but the code supports many other providers and likely reads other env vars via LLMConfig. Do not supply broad or high-permission API keys until you confirm which keys are needed. - Run install in an isolated environment: use a dedicated Python virtualenv or container when running install.sh (it will create a venv if you accept). Review install.sh and run.sh contents first. - For privacy, prefer local mode: local mode uses downloaded models and avoids sending images to external APIs; note first-run model download (~500MB). - Verify network behavior: the code encodes images to base64 and sends them to provider APIs. If you must process sensitive images, avoid API mode or audit the provider endpoints used by your LLMConfig implementation. - If you are not comfortable auditing LLMConfig or trusting the unknown publisher (no homepage, unknown source), do not install on production or sensitive systems. If you can obtain the missing LLMConfig file (or the publisher provides it), review it to confirm which environment variables, endpoints, and auth flows it uses; that information would significantly increase confidence.

功能分析

Type: OpenClaw Skill Name: 03 Version: 1.0.0 The skill bundle is a legitimate image recognition tool supporting both local (BLIP model) and API-based (OpenAI, Anthropic, Zhipu) processing. It includes robust file validation logic in `vision_ai.py` to check MIME types and file sizes, and the `install.sh` script performs standard dependency management without suspicious side effects. No indicators of data exfiltration, malicious execution, or prompt injection were found.

能力评估

ℹ Purpose & Capability

Name/description claim local and API image recognition which matches the provided code (local model via transformers or API mode). However the Python code references many providers (zhipu, deepseek, qwen, etc.) while SKILL.md only documents OpenAI/Anthropic; the code also imports a LLMConfig from a parent directory that is not present in the package—this mismatch is unexpected.

⚠ Instruction Scope

SKILL.md instructs use of OPENAI_API_KEY / ANTHROPIC_API_KEY (optional). The runtime code delegates provider selection and credential reading to LLMConfig (imported from a parent path), so the skill may read other provider-related environment variables or configs not documented. Instructions and code could thus access credentials beyond those declared in the registry metadata.

ℹ Install Mechanism

No formal install spec in registry, but an install.sh and requirements.txt are included. install.sh interactively creates a Python venv and pip-installs either API or local dependencies. This is standard but interactive; the package does not download code from arbitrary URLs during install (only pip).

⚠ Credentials

Registry metadata lists no required environment variables, but SKILL.md documents optional OPENAI_API_KEY and ANTHROPIC_API_KEY. The code uses an external LLMConfig module (not included) which likely reads provider-specific env vars (API keys and base URLs). That mismatch means the skill may require or read secrets beyond what's declared—disproportionate and undocumented.

✓ Persistence & Privilege

always is false and the skill does not request system-wide privileges. The included install.sh writes a run.sh and can create a venv in the current directory—normal installer behavior. It does not modify other skills or system configs.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install 03
安装完成后，直接呼叫该 Skill 的名称或使用 /03 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

vision-ai 1.0.0 初始版本发布 - 提供安全的图片识别，支持本地模式和API模式（OpenAI GPT-4o/Claude） - 支持图片内容描述、物体识别、文字提取与批量分析 - 本地模式无需API，保护用户隐私 - 支持常见图片格式（JPG/PNG/WebP/GIF），单图最大10MB - 提供详细配置、依赖说明和使用示例

元数据

Slug 03

版本 1.0.0

许可证 MIT-0

累计安装 1

当前安装数 1

历史版本数 1

常见问题

03 图像识别是什么？

安全的图片识别工具，支持本地和API两种模式. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 256 次。

如何安装 03 图像识别？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install 03」即可一键安装，无需额外配置。

03 图像识别是免费的吗？

是的，03 图像识别完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

03 图像识别支持哪些平台？

03 图像识别跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 03 图像识别？

由 nidhov01（@nidhov01）开发并维护，当前版本 v1.0.0。