← 返回 Skills 市场

Qwen Audio

Name: Qwen Audio
Author: darknoah

作者 noah · GitHub ↗ · v0.0.6

cross-platform ⚠ suspicious

432

总下载

当前安装

版本数

在 OpenClaw 中安装

/install qwen-audio

功能描述

High-performance audio library with text-to-speech (TTS) and speech-to-text (STT).

安全使用建议

This skill implements TTS/STT and largely does what it says, but take these precautions before installing or letting an agent run it: - Run it in an isolated environment (VM/container) because it will download and install heavy ML packages and models (torch, qwen-tts/asr, etc.), which use significant disk, memory, and network. - Ensure you have the 'uv' CLI and Python 3.10+ available — the SKILL.md uses 'uv run' but the registry metadata does not list 'uv' as a required binary. - Expect network access to Hugging Face and other endpoints (the code probes HF_ENDPOINT and can download models). If you need to avoid external network traffic, do not install or run the skill. - The script may auto-install missing Python packages via os.system('uv add ...') — this is a legitimate convenience but increases runtime privilege and attack surface. Review the pyproject.toml and the packages it will pull before proceeding. - Voices and other files are stored under ./voices/ and the skill will write to the skill folder; consider filesystem permissions and where you run it. - No credentials are requested, but environment variables (QWEN_AUDIO_DEVICE, QWEN_AUDIO_DTYPE, HF_ENDPOINT) influence behavior; these are not declared in the metadata and should be documented or locked down. If you need lower risk, ask the author to (1) declare required binaries and env vars explicitly, (2) remove runtime auto-installs or make them opt-in, and (3) document model download endpoints and disk requirements. Review the full scripts/qwen-audio.py before granting the skill autonomous invocation.

功能分析

Type: OpenClaw Skill Name: qwen-audio Version: 0.0.6 The skill bundle provides a legitimate implementation of audio processing capabilities (TTS, STT, and voice cloning) using Qwen-Audio models. The Python script `scripts/qwen-audio.py` acts as a wrapper for ML libraries, including a fallback mechanism to install dependencies via `os.system` and connectivity checks for Hugging Face. The `SKILL.md` and `env-check-list.md` files provide clear, functional instructions for the AI agent to manage the environment and interact with the user safely. No evidence of malicious intent, data exfiltration, or unauthorized execution was found.

能力评估

ℹ Purpose & Capability

Name/description (TTS/STT) matches the included code and pyproject dependencies (qwen-asr, qwen-tts, mlx-audio, torch). However the SKILL.md and registry metadata claim no required binaries/env vars while the instructions and code rely on the 'uv' CLI, Python >=3.10, and may require network access to download large models. The overall capability is coherent with its stated purpose but some required runtime pieces are not declared in the metadata.

ℹ Instruction Scope

Runtime instructions tell the agent to run 'uv run ...' and to manipulate a local ./voices/ directory; the code will read and write these local voice files. Instructions require the user to run env-checks and to explicitly confirm voice selection before TTS, which limits accidental use. The SKILL.md does not explicitly warn that model downloads and package installs will occur, but the code will contact Hugging Face and other endpoints and can operate in online/offline modes.

⚠ Install Mechanism

There is no platform install spec (instruction-only), but the pyproject.toml lists heavy ML dependencies and a custom torch index. The script itself will run a shell command (os.system("uv add mlx-audio ...")) to install missing packages at runtime. Auto-install and model downloads introduce moderate risk (large network/disk operations and execution of runtime-installed packages).

ℹ Credentials

The skill declares no required environment variables, but the code reads/uses QWEN_AUDIO_DEVICE, QWEN_AUDIO_DTYPE, HF_ENDPOINT and may set HF_HUB_OFFLINE. No secret or credential env vars are requested. The mismatch between declared requirements and actual env usage reduces transparency and should be resolved before trusting the skill.

✓ Persistence & Privilege

always is false and the skill does not request system-wide config changes or other skills' credentials. It will write voice profiles under its own ./voices/ directory and may create/update files like references/env-check-list.md as instructed, which is normal for a local audio skill.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install qwen-audio
安装完成后，直接呼叫该 Skill 的名称或使用 /qwen-audio 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v0.0.6

- No file changes detected for version 0.0.6. - No user-facing updates, feature additions, or documentation changes in this release. - Functionality and interface remain unchanged from the previous version.

v0.0.5

- Update to version 0.0.5 - Modified scripts/qwen-audio.py - No user-facing documentation or feature changes noted in SKILL.md

v0.0.4

- Added detailed documentation for voice management, including creating, listing, and using custom voice profiles. - Introduced clear prerequisites and environment check instructions. - Provided step-by-step guidance and JSON response examples for text-to-speech (TTS) and speech-to-text (STT) functionalities. - Explained the workflow for TTS voice selection and cloning, with emphasis on voice style and confirmation before generation. - Described new STT output format options and included a test audio link. - Improved clarity on usage and capabilities throughout the documentation.

元数据

Slug qwen-audio

版本 0.0.6

许可证 —

累计安装 1

当前安装数 1

历史版本数 3

常见问题

Qwen Audio 是什么？

High-performance audio library with text-to-speech (TTS) and speech-to-text (STT). 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 432 次。

如何安装 Qwen Audio？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install qwen-audio」即可一键安装，无需额外配置。

Qwen Audio 是免费的吗？

是的，Qwen Audio 完全免费（开源免费），可自由下载、安装和使用。

Qwen Audio 支持哪些平台？

Qwen Audio 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Qwen Audio？

由 noah（@darknoah）开发并维护，当前版本 v0.0.6。