← Back to Skills Marketplace

vision-skill

Name: vision-skill
Author: lgwanai

by lgwanai · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

375

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install vision-skill

Description

Use this skill for computer vision tasks including image recognition (OCR, object detection) and image generation (text-to-image, image-to-image). Supports a...

Usage Guidance

Key points before installing: - Do NOT trust the registry metadata that says 'no env vars' — this skill requires your Tencent COS keys and a Doubao/Volcengine API key. Only provide those secrets if you intend the skill to upload images to your COS bucket and call the Doubao API. - Verify the API endpoint: the client uses https://ark.cn-beijing.volces.com/api/v3 which does not match the README link to console.volcengine.com; confirm this hostname is legitimate for your provider or replace it with an official endpoint from your Doubao/Volcengine account. - Use least-privilege credentials: create a COS bucket and keys scoped to that bucket (and consider using short-lived tokens if possible) rather than reusing broad permanent keys. - Inspect and run the code in an isolated environment first (e.g., throwaway VM or container). The scripts will write to a local .tasks directory and .tasks/worker.log, spawn background worker processes, and upload local files to COS — confirm that behavior is acceptable. - If you will expose sensitive images, set the COS bucket permissions appropriately (private by default) and review how temporary URLs are generated/used. - If anything (metadata mismatch, unusual base_url, or unexpected network endpoints) looks off, ask the publisher for clarification or consider alternative, better-audited tools.

Capability Analysis

Type: OpenClaw Skill Name: vision-skill Version: 1.0.0 The vision-skill bundle is a legitimate tool for integrating Doubao AI vision and image generation models with Tencent Cloud COS storage. It implements an asynchronous task architecture using a background worker (worker.py) and a local task tracking system (.tasks/ directory). The code follows standard practices for API integration, including environment variable configuration for secrets and retry logic for network calls. No evidence of malicious intent, data exfiltration, or unauthorized execution was found; the use of subprocess in vision_cli.py is limited to hardcoded process management and task execution.

Capability Assessment

ℹ Purpose & Capability

The name/description describe vision recognition and image generation and the code implements Tencent COS uploads and calls a Doubao (Volcengine) API — these capabilities align with the stated purpose. However the registry metadata lists no required env vars while the SKILL.md, README and code require COS_* and DOUBAO_* credentials, which is an incoherence between metadata and actual requirements.

✓ Instruction Scope

SKILL.md and CLI instruct uploading local images to COS, calling Doubao endpoints, storing async task files under a local .tasks/ directory, and optionally downloading generated images — the instructions and included code stay within that scope and do not attempt to read unrelated system files or credentials beyond those needed for COS/Doubao.

ℹ Install Mechanism

This is labelled as instruction-only in the registry, but the package includes Python source and a requirements.txt (requests, python-dotenv, cos-python-sdk-v5). There is no download-from-URL or opaque installer; installing implies pip installing listed deps and running bundled scripts. The discrepancy between 'no install spec' and presence of code is noteworthy but not inherently malicious.

⚠ Credentials

The code requires Tencent COS credentials (COS_SECRET_ID, COS_SECRET_KEY, COS_BUCKET_NAME, COS_REGION) and DOUBAO_API_KEY (plus optional fallback model vars). Those credentials are appropriate for the described cloud storage and model API usage, but the registry metadata incorrectly declared 'Required env vars: none' — a meaningful mismatch. Also the COS client uses permanent keys (Token=None), so users should understand they're providing full access keys rather than short-lived tokens.

✓ Persistence & Privilege

The skill does not request always:true or global agent privileges. It writes task state and logs under a local .tasks/ directory and spawns worker processes when a task is submitted — expected for an async CLI-style skill. It does not modify other skills' configs or system-wide settings.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install vision-skill
After installation, invoke the skill by name or use /vision-skill
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

Initial release of vision-skill, providing end-to-end computer vision and image generation capabilities. - Supports image recognition (OCR, object detection, content description, Q&A) and flexible image generation (text-to-image, image-to-image, sequential images). - Integrates with Tencent Cloud COS for image storage and uses Doubao AI models for processing. - CLI interface via `vision_cli.py` with options for batch tasks, style/format presets, quality modes, and retries. - All tasks execute asynchronously, with options to wait for completion and save outputs. - Comprehensive environment variable setup and task management through a local `.tasks/` directory.

Metadata

Slug vision-skill

Version 1.0.0

License MIT-0

All-time Installs 1

Active Installs 1

Total Versions 1

Frequently Asked Questions

What is vision-skill?

Use this skill for computer vision tasks including image recognition (OCR, object detection) and image generation (text-to-image, image-to-image). Supports a... It is an AI Agent Skill for Claude Code / OpenClaw, with 375 downloads so far.

How do I install vision-skill?

Run "/install vision-skill" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is vision-skill free?

Yes, vision-skill is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does vision-skill support?

vision-skill is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created vision-skill?

It is built and maintained by lgwanai (@lgwanai); the current version is v1.0.0.

More Skills