← Back to Skills Marketplace
zhouweico

qwen-omni-multimodal

by Wei Zhou · GitHub ↗ · v0.2.0 · MIT-0
cross-platform ⚠ suspicious
184
Downloads
0
Stars
0
Active Installs
2
Versions
Install in OpenClaw
/install qwen-omni-multimodal
Description
基于阿里云百炼 Qwen3.5-Omni 的全模态 skill。支持文本、图片、音频、视频理解,以及文本/语音输出。 当用户需要分析图片、转写或理解音频、理解视频、进行跨模态问答,或直接生成语音回复时,使用此 skill。
Usage Guidance
This skill appears to be what it says: a Node.js client for Alibaba Dashscope / Qwen Omni. Before installing, consider: (1) it will upload any images/audio/video paths you pass to the remote Dashscope endpoint — don't point it at sensitive local files unless you trust the service and your API key scope; (2) it stores conversation history in sessions/*.json on disk — review and/or clear these files if needed; (3) test with --dry-run first (SKILL.md describes this) to validate configuration without sending data; (4) restrict the DASHSCOPE_API_KEY you give it (use a scoped key if possible) and verify the base URL if you need to use an international endpoint. The only minor issue is that the docs reference optional env vars (DASHSCOPE_BASE_URL, DASHSCOPE_MODEL, DASHSCOPE_VOICE) but only the API key is listed as required in metadata — this is a documentation mismatch, not a functional red flag.
Capability Assessment
Purpose & Capability
Name/description claim integration with Alibaba Qwen Omni; the skill requires node and an API key for a 'dashscope' endpoint and contains a script that builds requests to dashscope.aliyuncs.com — these requirements are coherent with the stated multimodal purpose.
Instruction Scope
Runtime instructions and the script read local media files (images/audio/video), convert to Base64, and POST them to the dashscope compatible API; the script also manages local session files under sessions/*.json. These behaviors are expected for a multimodal client, but they mean any files you point the skill at will be uploaded to the remote service and conversation history will be written locally. The SKILL.md also references optional env vars (DASHSCOPE_BASE_URL, DASHSCOPE_MODEL, DASHSCOPE_VOICE) which are used by the script but only DASHSCOPE_API_KEY is listed as required in metadata — this is a minor documentation mismatch (optional vars are not declared as required).
Install Mechanism
No install spec or external downloads; the skill is a Node.js script (package.json) requiring Node >=18 and no third-party install steps. This is low-risk from an installation/extraction standpoint.
Credentials
Only a single required credential (DASHSCOPE_API_KEY) is declared and used to authorize requests to Dashscope (Alibaba). No unrelated cloud credentials or broad secrets are requested. The script references a few optional DASHSCOPE_* env vars (base URL, model, voice) which are reasonable for configuration.
Persistence & Privilege
always:false and user-invocable; the skill writes session files to a local sessions/ directory (expected for multi-turn support) but does not request system-wide privileges or modify other skills. Session persistence and local file writes are normal but worth noting.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install qwen-omni-multimodal
  3. After installation, invoke the skill by name or use /qwen-omni-multimodal
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v0.2.0
Qwen-Omni 全模态 skill 升级至 3.5 版本,支持最新模型接口与能力。 - 默认模型切换为 qwen3.5-omni-flash,自动选型时可在 qwen3.5-omni-flash 与 qwen3.5-omni-plus 间切换,仍兼容旧版模型。 - 支持新版更丰富的音色(如 Tina),推荐语音输出优先使用 Tina,支持 55+ 官方音色列表。 - 默认配置与参数、会话规则、模态限制、任务适配等均已更新适配 3.5 版本。 - 保留对 qwen3-omni-flash 和 qwen-omni-turbo 等历史模型的显式兼容,便于场景平滑迁移。 - 价格提醒、模型能力描述、输入限制、会话及音频输出等文档细节全面同步新版 Qwen-Omni 3.5 官方规范。
v0.1.0
Qwen-Omni-Multimodal skill v1.0.0 — initial release with comprehensive multimodal support. - Supports understanding and analysis of text, images, audio, and video, and can output both text and speech. - Flexible model selection: defaults to qwen3-omni-flash, with support for qwen-omni-turbo; auto-selection based on input modality and cost. - CLI tool supports single and multi-turn conversations, session management, and advanced options like dry-run, audio output, and voice selection. - Provides clear configuration hierarchy and cost reminders based on the latest pricing for each modality. - Includes robust error checks and extensive user guidance for typical workflows and edge cases.
Metadata
Slug qwen-omni-multimodal
Version 0.2.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 2
Frequently Asked Questions

What is qwen-omni-multimodal?

基于阿里云百炼 Qwen3.5-Omni 的全模态 skill。支持文本、图片、音频、视频理解,以及文本/语音输出。 当用户需要分析图片、转写或理解音频、理解视频、进行跨模态问答,或直接生成语音回复时,使用此 skill。 It is an AI Agent Skill for Claude Code / OpenClaw, with 184 downloads so far.

How do I install qwen-omni-multimodal?

Run "/install qwen-omni-multimodal" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is qwen-omni-multimodal free?

Yes, qwen-omni-multimodal is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does qwen-omni-multimodal support?

qwen-omni-multimodal is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created qwen-omni-multimodal?

It is built and maintained by Wei Zhou (@zhouweico); the current version is v0.2.0.

💬 Comments