← 返回 Skills 市场
184
总下载
0
收藏
0
当前安装
2
版本数
在 OpenClaw 中安装
/install qwen-omni-multimodal
功能描述
基于阿里云百炼 Qwen3.5-Omni 的全模态 skill。支持文本、图片、音频、视频理解,以及文本/语音输出。 当用户需要分析图片、转写或理解音频、理解视频、进行跨模态问答,或直接生成语音回复时,使用此 skill。
安全使用建议
This skill appears to be what it says: a Node.js client for Alibaba Dashscope / Qwen Omni. Before installing, consider: (1) it will upload any images/audio/video paths you pass to the remote Dashscope endpoint — don't point it at sensitive local files unless you trust the service and your API key scope; (2) it stores conversation history in sessions/*.json on disk — review and/or clear these files if needed; (3) test with --dry-run first (SKILL.md describes this) to validate configuration without sending data; (4) restrict the DASHSCOPE_API_KEY you give it (use a scoped key if possible) and verify the base URL if you need to use an international endpoint. The only minor issue is that the docs reference optional env vars (DASHSCOPE_BASE_URL, DASHSCOPE_MODEL, DASHSCOPE_VOICE) but only the API key is listed as required in metadata — this is a documentation mismatch, not a functional red flag.
能力评估
Purpose & Capability
Name/description claim integration with Alibaba Qwen Omni; the skill requires node and an API key for a 'dashscope' endpoint and contains a script that builds requests to dashscope.aliyuncs.com — these requirements are coherent with the stated multimodal purpose.
Instruction Scope
Runtime instructions and the script read local media files (images/audio/video), convert to Base64, and POST them to the dashscope compatible API; the script also manages local session files under sessions/*.json. These behaviors are expected for a multimodal client, but they mean any files you point the skill at will be uploaded to the remote service and conversation history will be written locally. The SKILL.md also references optional env vars (DASHSCOPE_BASE_URL, DASHSCOPE_MODEL, DASHSCOPE_VOICE) which are used by the script but only DASHSCOPE_API_KEY is listed as required in metadata — this is a minor documentation mismatch (optional vars are not declared as required).
Install Mechanism
No install spec or external downloads; the skill is a Node.js script (package.json) requiring Node >=18 and no third-party install steps. This is low-risk from an installation/extraction standpoint.
Credentials
Only a single required credential (DASHSCOPE_API_KEY) is declared and used to authorize requests to Dashscope (Alibaba). No unrelated cloud credentials or broad secrets are requested. The script references a few optional DASHSCOPE_* env vars (base URL, model, voice) which are reasonable for configuration.
Persistence & Privilege
always:false and user-invocable; the skill writes session files to a local sessions/ directory (expected for multi-turn support) but does not request system-wide privileges or modify other skills. Session persistence and local file writes are normal but worth noting.
如何使用
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install qwen-omni-multimodal - 安装完成后,直接呼叫该 Skill 的名称或使用
/qwen-omni-multimodal触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.2.0
Qwen-Omni 全模态 skill 升级至 3.5 版本,支持最新模型接口与能力。
- 默认模型切换为 qwen3.5-omni-flash,自动选型时可在 qwen3.5-omni-flash 与 qwen3.5-omni-plus 间切换,仍兼容旧版模型。
- 支持新版更丰富的音色(如 Tina),推荐语音输出优先使用 Tina,支持 55+ 官方音色列表。
- 默认配置与参数、会话规则、模态限制、任务适配等均已更新适配 3.5 版本。
- 保留对 qwen3-omni-flash 和 qwen-omni-turbo 等历史模型的显式兼容,便于场景平滑迁移。
- 价格提醒、模型能力描述、输入限制、会话及音频输出等文档细节全面同步新版 Qwen-Omni 3.5 官方规范。
v0.1.0
Qwen-Omni-Multimodal skill v1.0.0 — initial release with comprehensive multimodal support.
- Supports understanding and analysis of text, images, audio, and video, and can output both text and speech.
- Flexible model selection: defaults to qwen3-omni-flash, with support for qwen-omni-turbo; auto-selection based on input modality and cost.
- CLI tool supports single and multi-turn conversations, session management, and advanced options like dry-run, audio output, and voice selection.
- Provides clear configuration hierarchy and cost reminders based on the latest pricing for each modality.
- Includes robust error checks and extensive user guidance for typical workflows and edge cases.
元数据
常见问题
qwen-omni-multimodal 是什么?
基于阿里云百炼 Qwen3.5-Omni 的全模态 skill。支持文本、图片、音频、视频理解,以及文本/语音输出。 当用户需要分析图片、转写或理解音频、理解视频、进行跨模态问答,或直接生成语音回复时,使用此 skill。 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 184 次。
如何安装 qwen-omni-multimodal?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install qwen-omni-multimodal」即可一键安装,无需额外配置。
qwen-omni-multimodal 是免费的吗?
是的,qwen-omni-multimodal 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
qwen-omni-multimodal 支持哪些平台?
qwen-omni-multimodal 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 qwen-omni-multimodal?
由 Wei Zhou(@zhouweico)开发并维护,当前版本 v0.2.0。
推荐 Skills