← 返回 Skills 市场
Vision Bot
作者
unixlamadev-spec
· GitHub ↗
· v1.2.0
· MIT-0
1341
总下载
0
收藏
12
当前安装
4
版本数
在 OpenClaw 中安装
/install vision-bot
功能描述
Describe images, detect objects, extract text, and analyze webpages. Pass any image URL directly in your task. Responds in your language.
安全使用建议
This skill sends your image URLs or base64 image data and a secret 'spend token' to aiprox.dev for processing. Before installing, verify you trust aiprox.dev (review their privacy/billing policy and the homepage), and ask the publisher why the example includes 'rail': 'bitcoin-lightning' (it could indicate an unusual billing path). Prefer issuing a revocable or limited-scope token for testing, and try only non-sensitive images first. If you need guarantees that images aren't stored or aren't routed through other services, request proof or choose a provider with clear audited policies. If anything about the owner/homepage looks unfamiliar, treat the token like a password and avoid sharing sensitive images until you validate the service.
功能分析
Type: OpenClaw Skill
Name: vision-bot
Version: 1.2.0
The vision-bot skill is designed to perform image analysis and OCR by sending requests to the aiprox.dev API. It explicitly declares its need for the AIPROX_SPEND_TOKEN environment variable and network access to aiprox.dev in the SKILL.md security manifest. The behavior is transparent, aligns with the stated purpose, and lacks any indicators of malicious intent or unauthorized data exfiltration.
能力评估
Purpose & Capability
The name/description (image description, OCR, object detection) aligns with the skill's single runtime action: POSTing tasks and image URLs/base64 to aiprox.dev for processing. Requesting a single spend token for a third-party API is plausible. However, the example includes a 'rail': 'bitcoin-lightning' parameter which is unrelated to image analysis and is unexplained in the manifest — this is unusual and should be clarified.
Instruction Scope
SKILL.md instructs the agent to send task text and image data (URL or base64) plus the spend token to https://aiprox.dev/api/orchestrate. That means potentially sensitive images and any task context will be transmitted off-host. The trust statement claims images are transient and not stored and that processing uses 'Claude via LightningProx' — those are assertions the agent cannot verify from an instruction-only skill. The instructions do not read any local files or unrelated env vars, which is good, but they do enable exfiltration of user-supplied images and text to a third party.
Install Mechanism
There is no install spec and no code files — instruction-only skills are lower-risk from an install perspective (nothing is written to disk).
Credentials
The skill requests a single environment variable, AIPROX_SPEND_TOKEN, which is proportionate for an external paid API. However, the token is sent in the JSON body as 'spend_token', meaning it will be transmitted to a third party and used for billing. Users should treat this token as a secret (revokable, limited-scope tokens are preferable). No other credentials are requested (which is good).
Persistence & Privilege
The skill does not request always:true or any persistent system changes. It is user-invocable and can be invoked autonomously by the agent (platform default), which is expected for skills of this type.
如何使用
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install vision-bot - 安装完成后,直接呼叫该 Skill 的名称或使用
/vision-bot触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.2.0
Multilingual support, direct image URL in task string, webpage screenshot analysis
v1.1.0
Now supports model selection — specify any of 19 models across 5 providers per request (e.g. gemini-2.5-flash, mistral-large-latest, claude-opus-4-5-20251101)
v1.0.1
- Added support for analyzing images via base64 in addition to URLs.
- Vision Bot now auto-detects the requested mode (OCR, object counting, or full description) based on the task.
- Updated instructions and example request/response in documentation for both image_url and image_base64 input.
- Clarified task keywords that trigger OCR and counting modes.
- Response schema now includes the detected mode field.
v1.0.0
- Initial release of vision-bot.
- Describe images, detect objects, and extract text (OCR) from any image URL.
- Supports scene understanding, reading embedded text, object identification, and answering questions about image content.
- Accessible via AIProx with secure token authentication.
- No image storage; all processing is transient for privacy.
元数据
常见问题
Vision Bot 是什么?
Describe images, detect objects, extract text, and analyze webpages. Pass any image URL directly in your task. Responds in your language. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 1341 次。
如何安装 Vision Bot?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install vision-bot」即可一键安装,无需额外配置。
Vision Bot 是免费的吗?
是的,Vision Bot 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Vision Bot 支持哪些平台?
Vision Bot 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Vision Bot?
由 unixlamadev-spec(@unixlamadev-spec)开发并维护,当前版本 v1.2.0。
推荐 Skills