Description

Use when understanding images with Alibaba Cloud Model Studio Qwen VL models (qwen3-vl-plus/qwen3-vl-flash and latest aliases). Use when building image Q&A,...

README (SKILL.md)

Category: provider

Model Studio Qwen VL (Image Understanding)

Name: Aliyun Qwen Vl
Author: cinience

Validation

mkdir -p output/aliyun-qwen-vl
python -m py_compile skills/ai/multimodal/aliyun-qwen-vl/scripts/analyze_image.py && echo "py_compile_ok" > output/aliyun-qwen-vl/validate.txt

Pass criteria: command exits 0 and output/aliyun-qwen-vl/validate.txt is generated.

Output And Evidence

Save raw model responses and normalized extraction results to output/aliyun-qwen-vl/.
Include input image reference and prompt for traceability.

Use Qwen VL models for image input + text output understanding tasks via DashScope compatible-mode API.

Prerequisites

Install dependencies (recommended in a venv):

python3 -m venv .venv
. .venv/bin/activate
python -m pip install requests

Set DASHSCOPE_API_KEY in environment, or add dashscope_api_key to ~/.alibabacloud/credentials.

Critical model names

Prefer the Qwen3 VL family:

qwen3-vl-plus
qwen3-vl-flash

When you need explicit "latest" routing or reproducible snapshots, use supported aliases/snapshots from the official model list, such as:

qwen3-vl-plus-latest
qwen3-vl-plus-2025-12-19
qwen3-vl-flash-2026-01-22
qwen3-vl-flash-latest

Legacy names still seen in some workloads:

qwen-vl-max-latest
qwen-vl-plus-latest

For OCR-specialized extraction, prefer skills/ai/multimodal/aliyun-qwen-ocr/ instead of using the general VL skill.

Normalized interface (multimodal.chat)

Request

prompt (string, required): user question/instruction about image.
image (string, required): HTTPS URL, local path, or data: URL.
model (string, optional): default qwen3-vl-plus.
max_tokens (int, optional): default 512.
temperature (float, optional): default 0.2.
detail (string, optional): auto/low/high, default auto.
json_mode (bool, optional): return JSON-only response when possible.
schema (object, optional): JSON Schema for structured extraction.
max_retries (int, optional): retry count for 429/5xx, default 2.
retry_backoff_s (float, optional): exponential backoff base seconds, default 1.5.

Response

text (string): primary model answer.
model (string): model actually used.
usage (object): token usage if returned by backend.

Quickstart

python skills/ai/multimodal/aliyun-qwen-vl/scripts/analyze_image.py \
  --request '{"prompt":"Summarize the main content in this image","image":"https://example.com/demo.jpg"}' \
  --print-response

Using local image:

python skills/ai/multimodal/aliyun-qwen-vl/scripts/analyze_image.py \
  --request '{"prompt":"Extract key information from the image","image":"./samples/invoice.png","model":"qwen3-vl-plus"}' \
  --print-response

Structured extraction (JSON mode):

python skills/ai/multimodal/aliyun-qwen-vl/scripts/analyze_image.py \
  --request '{"prompt":"Extract fields: title, amount, date","image":"./samples/invoice.png"}' \
  --json-mode \
  --print-response

Structured extraction (JSON Schema):

python skills/ai/multimodal/aliyun-qwen-vl/scripts/analyze_image.py \
  --request '{"prompt":"Extract invoice fields","image":"./samples/invoice.png"}' \
  --schema skills/ai/multimodal/aliyun-qwen-vl/references/examples/invoice.schema.json \
  --print-response

cURL (compatible mode)

curl -sS https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
  -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model":"qwen3-vl-plus",
    "messages":[
      {
        "role":"user",
        "content":[
          {"type":"image_url","image_url":{"url":"https://example.com/demo.jpg"}},
          {"type":"text","text":"Describe this image and list executable actions"}
        ]
      }
    ],
    "max_tokens":512,
    "temperature":0.2
  }'

Output location

If --output is set, JSON response is saved to that file.
Default output dir convention: output/aliyun-qwen-vl/.

Smoke test

python tests/ai/multimodal/aliyun-qwen-vl-test/scripts/smoke_test_qwen_vl.py \
  --image ./tmp/vl_test_cat.png

Error handling

Error	Likely cause	Action
401/403	Missing or invalid key	Check `DASHSCOPE_API_KEY` and account permissions.
400	Invalid request schema or unsupported image source	Validate `messages` content and image URL/path format.
429	Rate limit	Retry with exponential backoff and lower concurrency.
5xx	Temporary backend issue	Retry with backoff and idempotent request design.

Operational guidance

For stable production behavior, pin snapshot model IDs instead of pure -latest.
Compress very large images before upload to reduce latency and cost.
Add explicit extraction constraints in prompt (fields, JSON shape, language).
For OCR-like output, ask for confidence notes and unresolved text markers.

Workflow

Confirm user intent, region, identifiers, and whether the operation is read-only or mutating.
Run one minimal read-only query first to verify connectivity and permissions.
Execute the target operation with explicit parameters and bounded scope.
Verify results and save output/evidence files.

References

Source list: references/sources.md
API notes: references/api_reference.md

Usage Guidance

This skill contains a Python client that will send images (including local image files converted to base64 data URLs) to Alibaba DashScope endpoints and requires a DASHSCOPE_API_KEY. Before installing: 1) Confirm you trust the source and are willing to send image data to Alibaba Cloud; avoid sending images that contain sensitive personal data. 2) Expect to set DASHSCOPE_API_KEY in your environment or add dashscope_api_key to ~/.alibabacloud/credentials; the registry metadata currently omits this requirement — ask the publisher to declare it. 3) Inspect any .env or credentials files in your repo/home before running (the script will load .env files and ~/.alibabacloud/credentials), and avoid storing unrelated secrets there. 4) Run the script in an isolated environment (dedicated venv or test machine) first and verify network endpoints (domestic vs intl) match your expectations. 5) If you need stricter guarantees, request the publisher add explicit metadata about required env vars and a reproducible install spec, and consider pinning a model snapshot rather than using '-latest'.

Capability Analysis

Type: OpenClaw Skill Name: aliyun-qwen-vl Version: 1.0.0 The skill bundle provides a standard integration for Alibaba Cloud's Qwen VL multimodal models. The primary script, scripts/analyze_image.py, correctly handles local and remote image inputs, manages credentials via standard environment variables or the ~/.alibabacloud/credentials file, and communicates exclusively with the official DashScope API endpoint (dashscope.aliyuncs.com). No evidence of malicious intent, data exfiltration, or prompt injection was found.

Capability Assessment

⚠ Purpose & Capability

The skill's purpose (Qwen VL image understanding) matches the included code and cURL examples which call DashScope endpoints. However the skill metadata declares no required environment variables or primary credential while the runtime clearly requires a DASHSCOPE_API_KEY (or dashscope_api_key in ~/.alibabacloud/credentials). That mismatch is incoherent and should be corrected by the publisher.

⚠ Instruction Scope

SKILL.md and the script instruct the agent to read environment variables, ~/.alibabacloud/credentials, and .env files in the working tree/repo root to obtain credentials. The script will convert local image files to data: URLs (base64) and POST them to Alibaba DashScope endpoints — meaning local file contents (potentially sensitive) are transmitted to a remote service. These actions are relevant to the declared purpose but are non-trivial and should be explicitly documented in the metadata and user guidance.

✓ Install Mechanism

No install script or external downloads are present; the skill is instruction+script only and depends on the widely used 'requests' Python package. There is no remote code fetch or archive extraction in the install stage.

⚠ Credentials

Runtime requires DASHSCOPE_API_KEY (or a dashscope_api_key entry in ~/.alibabacloud/credentials) and optionally respects profile env vars (ALIBABA_CLOUD_PROFILE/ALICLOUD_PROFILE). The registry metadata lists no required env/primary credential — this omission is problematic. Aside from the required API key, no unrelated credentials are requested.

✓ Persistence & Privilege

The skill does not request always:true, does not modify other skills, and does not require elevated or persistent system privileges. It writes output to a local output directory (output/aliyun-qwen-vl/) which is reasonable for evidence and logs.

Version History

v1.0.0

Initial release: Qwen VL image understanding skill for Alibaba Cloud. - Supports image Q&A, visual analysis, OCR-like extraction, chart/table reading, and screenshot understanding workflows. - Uses Qwen3 VL models via DashScope compatible-mode API, with flexible model/version options. - Provides a normalized interface for multimodal (image + text) chat requests and structured extraction. - Includes quickstart examples, cURL usage, error handling guide, and operational tips. - Outputs results and evidence to local directory for traceability.

Metadata

Slug aliyun-qwen-vl

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Aliyun Qwen Vl?

Use when understanding images with Alibaba Cloud Model Studio Qwen VL models (qwen3-vl-plus/qwen3-vl-flash and latest aliases). Use when building image Q&A,... It is an AI Agent Skill for Claude Code / OpenClaw, with 99 downloads so far.

How do I install Aliyun Qwen Vl?

Run "/install aliyun-qwen-vl" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Aliyun Qwen Vl free?

Yes, Aliyun Qwen Vl is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Aliyun Qwen Vl support?

Aliyun Qwen Vl is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Aliyun Qwen Vl?

It is built and maintained by cinience (@cinience); the current version is v1.0.0.

More Skills

Aliyun Qwen Vl