← 返回 Skills 市场

Qwen3 Audio

Name: Qwen3 Audio
Author: darknoah

作者 noah · GitHub ↗ · v0.1.1

cross-platform ⚠ suspicious

454

总下载

当前安装

版本数

在 OpenClaw 中安装

/install qwen3-audio

功能描述

High-performance audio library for Apple Silicon with text-to-speech (TTS) and speech-to-text (STT).

安全使用建议

This skill appears to do what it claims (local TTS/STT using the mlx-audio package), but it will: (1) install a third‑party Python package at runtime via the 'uv' tool; (2) access network endpoints to check/download model data (Hugging Face or a mirror) and reference an external test audio URL; and (3) read/write audio/text files (including creating a voices/ folder and temp chunk files). Before installing/using: review the mlx-audio package and its reputation; run the skill in an isolated environment (VM or container) if you want to limit blast radius; confirm whether model inference happens locally or via remote inference (if remote, audio might be transmitted to external servers); and inspect or vet any models you download for license/privacy implications. Because the skill's source/homepage is unknown, exercise extra caution and avoid providing sensitive audio unless you are confident about where processing occurs.

功能分析

Type: OpenClaw Skill Name: qwen3-audio Version: 0.1.1 The skill is suspicious due to several vulnerabilities in `scripts/mlx-audio.py`. It allows arbitrary file read/write operations via user-controlled paths (`--output`, `--audio`, `--ref_audio`) in the `run_tts` and `run_stt` functions. Additionally, the `run_stt` function directly embeds the `--ass-style` argument into the ASS subtitle file header without sanitization, creating a clear injection vulnerability. While these are significant flaws that could be exploited by a malicious agent prompt or user, the code does not exhibit clear evidence of intentional malicious behavior such as data exfiltration or backdoor installation.

能力评估

✓ Purpose & Capability

The name/description (Qwen3 audio TTS/STT) match the included code and SKILL.md: the script wraps mlx-audio models for TTS, voice cloning, and STT. Declared dependency (mlx-audio) and the run commands in SKILL.md align with the stated functionality.

ℹ Instruction Scope

Instructions require a local Python .venv and running the provided script; they reference the local voices/ directory and read/write audio and text files (expected for this purpose). They also direct the agent to verify the env-check-list and to run 'uv' commands. The SKILL.md and script do not ask for unrelated secrets or to read arbitrary system files, but they do permit reading/writing files you point at and will download or access model artifacts over the network.

ℹ Install Mechanism

There is no formal install spec; the included script performs runtime installation via os.system("uv add mlx-audio --prerelease=allow") if mlx-audio is missing. This is coherent with the pyproject dependency, but it means installation happens dynamically at runtime and will execute package installation commands on the host (moderate risk). Model files may be pulled from Hugging Face or a mirror.

✓ Credentials

The skill declares no required environment variables or credentials and the code does not request secrets. It may set HF_ENDPOINT/HF_HUB_OFFLINE for model hub access. Network access to model hubs and a public test audio URL is expected for fetching models; no unrelated credentials are requested.

✓ Persistence & Privilege

The skill is not always-enabled and uses normal agent invocation. It does not modify other skills or global agent settings; it stores voice profiles in a local voices/ directory (expected).

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install qwen3-audio
安装完成后，直接呼叫该 Skill 的名称或使用 /qwen3-audio 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v0.1.1

Voice profile management updated to require and support style descriptions. - Voice profiles now include a mandatory instruct (style description) field. - voices/ directory structure updated: each voice now contains ref_instruct.txt. - voice create command requires --instruct to describe voice style (used with VoiceDesign model). - Listing or using voices now shows and applies the instruct field automatically. - Documentation updated to reflect new requirements and workflow for voice profile creation and use.

v0.1.0

Qwen3-Audio v0.1.0 - Initial release of a high-performance audio library for Apple Silicon, supporting text-to-speech (TTS) and automatic speech recognition (STT). - Adds TTS with multi-language, voice cloning via audio sample and transcript, voice creation and management, and emotion/style control. - Provides ASR with multiple output formats (txt, srt, ass). - Introduces predefined voices (CustomVoice) and support for creating new voices from text descriptions (VoiceDesign). - Includes easy-to-use voice profile management and command-line usage examples.

元数据

Slug qwen3-audio

版本 0.1.1

许可证 —

累计安装 0

当前安装数 0

历史版本数 2

常见问题

Qwen3 Audio 是什么？

High-performance audio library for Apple Silicon with text-to-speech (TTS) and speech-to-text (STT). 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 454 次。

如何安装 Qwen3 Audio？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install qwen3-audio」即可一键安装，无需额外配置。

Qwen3 Audio 是免费的吗？

是的，Qwen3 Audio 完全免费（开源免费），可自由下载、安装和使用。

Qwen3 Audio 支持哪些平台？

Qwen3 Audio 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Qwen3 Audio？

由 noah（@darknoah）开发并维护，当前版本 v0.1.1。