← Back to Skills Marketplace
Vision Bot
by
unixlamadev-spec
· GitHub ↗
· v1.2.0
· MIT-0
1341
Downloads
0
Stars
12
Active Installs
4
Versions
Install in OpenClaw
/install vision-bot
Description
Describe images, detect objects, extract text, and analyze webpages. Pass any image URL directly in your task. Responds in your language.
Usage Guidance
This skill sends your image URLs or base64 image data and a secret 'spend token' to aiprox.dev for processing. Before installing, verify you trust aiprox.dev (review their privacy/billing policy and the homepage), and ask the publisher why the example includes 'rail': 'bitcoin-lightning' (it could indicate an unusual billing path). Prefer issuing a revocable or limited-scope token for testing, and try only non-sensitive images first. If you need guarantees that images aren't stored or aren't routed through other services, request proof or choose a provider with clear audited policies. If anything about the owner/homepage looks unfamiliar, treat the token like a password and avoid sharing sensitive images until you validate the service.
Capability Analysis
Type: OpenClaw Skill
Name: vision-bot
Version: 1.2.0
The vision-bot skill is designed to perform image analysis and OCR by sending requests to the aiprox.dev API. It explicitly declares its need for the AIPROX_SPEND_TOKEN environment variable and network access to aiprox.dev in the SKILL.md security manifest. The behavior is transparent, aligns with the stated purpose, and lacks any indicators of malicious intent or unauthorized data exfiltration.
Capability Assessment
Purpose & Capability
The name/description (image description, OCR, object detection) aligns with the skill's single runtime action: POSTing tasks and image URLs/base64 to aiprox.dev for processing. Requesting a single spend token for a third-party API is plausible. However, the example includes a 'rail': 'bitcoin-lightning' parameter which is unrelated to image analysis and is unexplained in the manifest — this is unusual and should be clarified.
Instruction Scope
SKILL.md instructs the agent to send task text and image data (URL or base64) plus the spend token to https://aiprox.dev/api/orchestrate. That means potentially sensitive images and any task context will be transmitted off-host. The trust statement claims images are transient and not stored and that processing uses 'Claude via LightningProx' — those are assertions the agent cannot verify from an instruction-only skill. The instructions do not read any local files or unrelated env vars, which is good, but they do enable exfiltration of user-supplied images and text to a third party.
Install Mechanism
There is no install spec and no code files — instruction-only skills are lower-risk from an install perspective (nothing is written to disk).
Credentials
The skill requests a single environment variable, AIPROX_SPEND_TOKEN, which is proportionate for an external paid API. However, the token is sent in the JSON body as 'spend_token', meaning it will be transmitted to a third party and used for billing. Users should treat this token as a secret (revokable, limited-scope tokens are preferable). No other credentials are requested (which is good).
Persistence & Privilege
The skill does not request always:true or any persistent system changes. It is user-invocable and can be invoked autonomously by the agent (platform default), which is expected for skills of this type.
How to Use
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install vision-bot - After installation, invoke the skill by name or use
/vision-bot - Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.2.0
Multilingual support, direct image URL in task string, webpage screenshot analysis
v1.1.0
Now supports model selection — specify any of 19 models across 5 providers per request (e.g. gemini-2.5-flash, mistral-large-latest, claude-opus-4-5-20251101)
v1.0.1
- Added support for analyzing images via base64 in addition to URLs.
- Vision Bot now auto-detects the requested mode (OCR, object counting, or full description) based on the task.
- Updated instructions and example request/response in documentation for both image_url and image_base64 input.
- Clarified task keywords that trigger OCR and counting modes.
- Response schema now includes the detected mode field.
v1.0.0
- Initial release of vision-bot.
- Describe images, detect objects, and extract text (OCR) from any image URL.
- Supports scene understanding, reading embedded text, object identification, and answering questions about image content.
- Accessible via AIProx with secure token authentication.
- No image storage; all processing is transient for privacy.
Metadata
Frequently Asked Questions
What is Vision Bot?
Describe images, detect objects, extract text, and analyze webpages. Pass any image URL directly in your task. Responds in your language. It is an AI Agent Skill for Claude Code / OpenClaw, with 1341 downloads so far.
How do I install Vision Bot?
Run "/install vision-bot" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Vision Bot free?
Yes, Vision Bot is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Vision Bot support?
Vision Bot is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Vision Bot?
It is built and maintained by unixlamadev-spec (@unixlamadev-spec); the current version is v1.2.0.
More Skills