← Back to Skills Marketplace

Qwen3 Audio

Name: Qwen3 Audio
Author: darknoah

by noah · GitHub ↗ · v0.1.1

cross-platform ⚠ suspicious

454

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install qwen3-audio

Description

High-performance audio library for Apple Silicon with text-to-speech (TTS) and speech-to-text (STT).

Usage Guidance

This skill appears to do what it claims (local TTS/STT using the mlx-audio package), but it will: (1) install a third‑party Python package at runtime via the 'uv' tool; (2) access network endpoints to check/download model data (Hugging Face or a mirror) and reference an external test audio URL; and (3) read/write audio/text files (including creating a voices/ folder and temp chunk files). Before installing/using: review the mlx-audio package and its reputation; run the skill in an isolated environment (VM or container) if you want to limit blast radius; confirm whether model inference happens locally or via remote inference (if remote, audio might be transmitted to external servers); and inspect or vet any models you download for license/privacy implications. Because the skill's source/homepage is unknown, exercise extra caution and avoid providing sensitive audio unless you are confident about where processing occurs.

Capability Analysis

Type: OpenClaw Skill Name: qwen3-audio Version: 0.1.1 The skill is suspicious due to several vulnerabilities in `scripts/mlx-audio.py`. It allows arbitrary file read/write operations via user-controlled paths (`--output`, `--audio`, `--ref_audio`) in the `run_tts` and `run_stt` functions. Additionally, the `run_stt` function directly embeds the `--ass-style` argument into the ASS subtitle file header without sanitization, creating a clear injection vulnerability. While these are significant flaws that could be exploited by a malicious agent prompt or user, the code does not exhibit clear evidence of intentional malicious behavior such as data exfiltration or backdoor installation.

Capability Assessment

✓ Purpose & Capability

The name/description (Qwen3 audio TTS/STT) match the included code and SKILL.md: the script wraps mlx-audio models for TTS, voice cloning, and STT. Declared dependency (mlx-audio) and the run commands in SKILL.md align with the stated functionality.

ℹ Instruction Scope

Instructions require a local Python .venv and running the provided script; they reference the local voices/ directory and read/write audio and text files (expected for this purpose). They also direct the agent to verify the env-check-list and to run 'uv' commands. The SKILL.md and script do not ask for unrelated secrets or to read arbitrary system files, but they do permit reading/writing files you point at and will download or access model artifacts over the network.

ℹ Install Mechanism

There is no formal install spec; the included script performs runtime installation via os.system("uv add mlx-audio --prerelease=allow") if mlx-audio is missing. This is coherent with the pyproject dependency, but it means installation happens dynamically at runtime and will execute package installation commands on the host (moderate risk). Model files may be pulled from Hugging Face or a mirror.

✓ Credentials

The skill declares no required environment variables or credentials and the code does not request secrets. It may set HF_ENDPOINT/HF_HUB_OFFLINE for model hub access. Network access to model hubs and a public test audio URL is expected for fetching models; no unrelated credentials are requested.

✓ Persistence & Privilege

The skill is not always-enabled and uses normal agent invocation. It does not modify other skills or global agent settings; it stores voice profiles in a local voices/ directory (expected).

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install qwen3-audio
After installation, invoke the skill by name or use /qwen3-audio
Provide required inputs per the skill's parameter spec and get structured output

Version History

v0.1.1

Voice profile management updated to require and support style descriptions. - Voice profiles now include a mandatory instruct (style description) field. - voices/ directory structure updated: each voice now contains ref_instruct.txt. - voice create command requires --instruct to describe voice style (used with VoiceDesign model). - Listing or using voices now shows and applies the instruct field automatically. - Documentation updated to reflect new requirements and workflow for voice profile creation and use.

v0.1.0

Qwen3-Audio v0.1.0 - Initial release of a high-performance audio library for Apple Silicon, supporting text-to-speech (TTS) and automatic speech recognition (STT). - Adds TTS with multi-language, voice cloning via audio sample and transcript, voice creation and management, and emotion/style control. - Provides ASR with multiple output formats (txt, srt, ass). - Introduces predefined voices (CustomVoice) and support for creating new voices from text descriptions (VoiceDesign). - Includes easy-to-use voice profile management and command-line usage examples.

Metadata

Slug qwen3-audio

Version 0.1.1

License —

All-time Installs 0

Active Installs 0

Total Versions 2

Frequently Asked Questions

What is Qwen3 Audio?

High-performance audio library for Apple Silicon with text-to-speech (TTS) and speech-to-text (STT). It is an AI Agent Skill for Claude Code / OpenClaw, with 454 downloads so far.

How do I install Qwen3 Audio?

Run "/install qwen3-audio" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Qwen3 Audio free?

Yes, Qwen3 Audio is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Qwen3 Audio support?

Qwen3 Audio is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Qwen3 Audio?

It is built and maintained by noah (@darknoah); the current version is v0.1.1.

More Skills