← Back to Skills Marketplace

Vision Tool

Name: Vision Tool
Author: huruilizhen

by Ruilizhen Hu · GitHub ↗ · v1.1.3 · MIT-0

cross-platform ✓ Security Clean

120

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install vision-tool

Description

Image recognition using Ollama + qwen3.5:4b with think=False for reliable content extraction.

Usage Guidance

This skill appears to do exactly what it claims: it reads a local image file, Base64-encodes it, and POSTs it to an Ollama /api/chat endpoint on localhost. Before installing or running it, ensure you: 1) run a trusted Ollama instance locally (ollama serve) and have pulled qwen3.5:4b, 2) confirm the Ollama service is not proxying/forwarding requests to an untrusted remote endpoint (if you change the default URL the skill will send images to wherever that URL points), and 3) review and run the included tests in a safe environment. Because the skill does not request secrets or remote installs and the code is readable, there are no incoherent or disproportionate requests — but always verify you trust the Ollama server you will use (local vs remote).

Capability Analysis

Type: OpenClaw Skill Name: vision-tool Version: 1.1.3 The vision-tool skill bundle is a legitimate implementation for image recognition using a local Ollama instance. The core logic in `scripts/vision_core.py` uses the `requests` library to communicate with the Ollama API on localhost (127.0.0.1:11434) and handles image data via standard Base64 encoding. No evidence of data exfiltration, unauthorized network calls, or malicious execution was found in `main.py` or the documentation files.

Capability Assessment

✓ Purpose & Capability

Name/description match the implementation: the code reads an image, Base64-encodes it, and posts to an Ollama /api/chat endpoint using model qwen3.5:4b. Required binaries (ollama, python3) are appropriate and no unrelated credentials or tools are requested.

ℹ Instruction Scope

Runtime instructions only run local Python code and call the Ollama API at http://127.0.0.1:11434/api/chat; they read the provided image file and send its Base64 payload. This is coherent with image-analysis purpose, but be aware that if the Ollama service URL is changed from the default, image data could be sent to a remote host — the code itself does not exfiltrate to external endpoints by default.

✓ Install Mechanism

No install spec that downloads external artifacts; included code is pure Python using the requests library. There are no archive downloads or external installers declared in the skill metadata.

✓ Credentials

The skill declares no required environment variables or credentials. It uses sensible defaults (local Ollama URL). No secret or cloud credentials are requested, which is proportionate for a local-model vision tool.

✓ Persistence & Privilege

always:false and user-invocable:true (defaults) — the skill does not request forced persistent inclusion or elevated platform privileges. It does not modify other skills or system-wide configs.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install vision-tool
After installation, invoke the skill by name or use /vision-tool
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.1.3

- Adds think=False to all API calls for more reliable and clean content extraction. - Updated documentation to reflect improved extraction approach and latest usage details. - Version bump to 1.1.3.

v1.1.2

## vision-tool v1.1.2 - Documentation improvements and minor edits to SKILL.md. - No changes to core code logic or features. - Ensures up-to-date instructions for installation, usage, and troubleshooting.

v1.1.1

- Internal code improvements in vision_core.py - Documentation updated for consistency and clarity - Version bumped to 1.1.1

v1.1.0

Vision Tool v1.1.0 introduces a streamlined approach for image analysis by switching to the Ollama /api/chat endpoint. - Now uses the /api/chat endpoint for direct extraction from the content field, improving output clarity. - Removed complex thinking field processing and unnecessary regex dependencies for simpler, more maintainable code. - Default analysis prompt is now in English: "Describe this image". - Performance guidance and troubleshooting updated for new architecture. - Codebase restructured; core analysis logic is now in scripts/vision_core.py.

v1.0.0

Vision Tool v1.0.0 - Initial release - Provides image recognition using Ollama + qwen3.5:4b - Supports all OpenClaw channels (WeChat, Telegram, Discord, etc.) - Automatically cleans and formats analysis output - Includes full error handling and reporting - Supports both standard and JSON output modes

Metadata

Slug vision-tool

Version 1.1.3

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 5

Frequently Asked Questions

What is Vision Tool?

Image recognition using Ollama + qwen3.5:4b with think=False for reliable content extraction. It is an AI Agent Skill for Claude Code / OpenClaw, with 120 downloads so far.

How do I install Vision Tool?

Run "/install vision-tool" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Vision Tool free?

Yes, Vision Tool is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Vision Tool support?

Vision Tool is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Vision Tool?

It is built and maintained by Ruilizhen Hu (@huruilizhen); the current version is v1.1.3.

More Skills