← Back to Skills Marketplace
guoqiao

MLX STT

by guoqiao · GitHub ↗ · v1.0.7
darwin ⚠ suspicious
3528
Downloads
1
Stars
16
Active Installs
8
Versions
Install in OpenClaw
/install mlx-stt
Description
Speech-To-Text with MLX (Apple Silicon) and opensource models (default GLM-ASR-Nano-2512) locally.
README (SKILL.md)

MLX STT

Speech-To-Text/ASR/Transcribe with MLX (Apple Silicon) and opensource models (default GLM-ASR-Nano-2512) locally.

Free and Accurate. No api key required. No server required.

Requirements

  • mlx: macOS with Apple Silicon
  • brew: used to install deps if not available

Installation

bash ${baseDir}/install.sh

This script will use brew to install these cli tools if not available:

  • ffmpeg: convert audio format when needed
  • uv: install python package and run python script
  • mlx_audio: do the real job

Usage

To transcribe an audio file, run this script:

bash  ${baseDir}/mlx-stt.sh \x3Caudio_file_path>
  • First run could be a little slow, since it will need to download model.
  • The transcript result will be printed to stdout.
Usage Guidance
This skill appears to perform local STT as described, but exercise caution before installing: - always:true is unnecessary for an on‑demand STT tool; prefer a skill that is not force‑enabled. - install.sh runs 'uv tool install --force mlx-audio --prerelease=allow' — that will fetch and install a third‑party prerelease binary from an unspecified source. Ask the author for the exact upstream registry/URL and inspect that package before installing. - The mlx_audio tool will download models at runtime (network activity). If you have sensitive data or need an auditable supply chain, run this in an isolated VM or disposable machine first. - Because stdout/stderr are silenced for the tool, initial failures or unexpected network activity may be hidden; consider running the command manually without redirection to inspect behavior. - If you decide to proceed, manually run the install script in a controlled environment, verify the origin of the 'uv' CLI and the 'mlx-audio' package, and avoid installing on a machine with sensitive secrets. Additional information that would raise confidence to 'high': explicit upstream URLs or package registry details for 'uv' and 'mlx-audio', a signed release or checksum for the model/binary, and removal of always:true or an explanation why force‑enable is required.
Capability Analysis
Type: OpenClaw Skill Name: mlx-stt Version: 1.0.7 The skill bundle is designed for local Speech-To-Text on Apple Silicon. It uses `brew` and `uv` to install necessary dependencies like `ffmpeg` and `mlx-audio`, which are standard tools for audio processing and MLX-based operations. The `mlx-stt.sh` script converts audio to a suitable format using `ffmpeg` and then processes it with `mlx_audio.stt.generate`, printing the transcript to stdout. There is no evidence of data exfiltration, malicious execution, persistence mechanisms, or prompt injection attempts against the agent. All actions are directly aligned with the stated purpose of providing local STT functionality.
Capability Assessment
Purpose & Capability
Name and description (local MLX-based STT on Apple Silicon) align with the provided scripts: ffmpeg + mlx_audio invocation to transcribe audio. Requiring brew on macOS to install ffmpeg/uv is reasonable for this purpose.
Instruction Scope
Runtime instructions and scripts only convert the provided audio to WAV, invoke mlx_audio.stt.generate, print transcript files, and clean up temporary files. The scripts do download a model at first run and the mlx_audio command's output is redirected to /dev/null (silenced), which hides runtime logs/errors — not clearly malicious but reduces transparency. The skill does not read unrelated files or request extra environment data.
Install Mechanism
install.sh uses brew (expected) but relies on the 'uv' CLI to install 'mlx-audio' with --force and --prerelease=allow. 'uv' and the source/registry used for mlx-audio are not documented here; installing a force/prerelease package from an opaque source can deliver arbitrary code. The install does not download from a clearly identified, verifiable release URL (e.g., official GitHub release or known package registry with provenance shown).
Credentials
The skill declares no required environment variables or credentials and its scripts do not attempt to read other env vars or sensitive config paths — the requested environment access appears minimal and proportional.
Persistence & Privilege
Registry metadata sets always:true (force‑included in every agent run). A narrow, on‑demand STT skill does not reasonably need to be force‑enabled for all agents. Combined with the opaque install of a prerelease binary, this increases the blast radius if the installed tool were malicious or buggy.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install mlx-stt
  3. After installation, invoke the skill by name or use /mlx-stt
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.7
- Added main script mlx-stt.sh for audio transcription; removed previous Python script mlx-stt.py. - Updated usage instructions to use the new shell script instead of a Python command. - Expanded triggers in SKILL.md for easier activation. - Added version and author information to metadata. - The skill still performs local speech-to-text on Apple Silicon with MLX and open-source models.
v1.0.6
- Removed the deprecation notice from documentation. - Updated SKILL.md to indicate the skill is no longer deprecated and can be used normally.
v1.0.5
- Added a deprecation notice: this skill is no longer maintained. - Recommended migrating to the replacement skill, mlx-audio-server, for improved functionality.
v1.0.4
- Clarified and simplified the skill description and title. - Added information about initial model download from Hugging Face during first run. - Improved formatting and fixed typos (e.g., "Transcibe" → "Transcribe"). - Emphasized that no API key or server is required. - No changes to installation or usage commands.
v1.0.3
- Expanded description to clarify local operation, supported model (glm-asr-nano-2512), and no need for API keys or servers. - Added relevant tags in metadata for better discovery. - Improved description and title formatting for consistency. - Minor clarifications to requirements and feature statements.
v1.0.2
- Minor documentation update: clarified the role of `mlx_audio` (“do the real job”) in the installation instructions. - No functional or code changes in this release.
v1.0.1
- Added install.sh script for streamlined installation using Homebrew. - Removed deprecated mlx-stt.sh script. - Updated documentation: simplified requirements, added installation instructions, and clarified usage. - Metadata now references Homebrew for dependency management.
v1.0.0
Initial release of mlx-stt - Transcribe audio files to text using MLX (Apple Silicon) and GLM-ASR. - Provides both Python and Bash script options for running transcription. - Outputs transcription results directly to the terminal. - Requires Apple Silicon macOS, mlx, ffmpeg, and mlx_audio.generate.stt.
Metadata
Slug mlx-stt
Version 1.0.7
License
All-time Installs 16
Active Installs 16
Total Versions 8
Frequently Asked Questions

What is MLX STT?

Speech-To-Text with MLX (Apple Silicon) and opensource models (default GLM-ASR-Nano-2512) locally. It is an AI Agent Skill for Claude Code / OpenClaw, with 3528 downloads so far.

How do I install MLX STT?

Run "/install mlx-stt" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is MLX STT free?

Yes, MLX STT is completely free (open-source). You can download, install and use it at no cost.

Which platforms does MLX STT support?

MLX STT is cross-platform and runs anywhere OpenClaw / Claude Code is available (darwin).

Who created MLX STT?

It is built and maintained by guoqiao (@guoqiao); the current version is v1.0.7.

💬 Comments