功能描述

Local speech-to-text transcription with Qwen ASR — transcription routed across your Apple Silicon fleet. Transcribe meetings, voice notes, podcasts with loca...

使用说明 (SKILL.md)

Local Speech-to-Text Transcription

Name: Local Transcription
Author: twinsgeeks

You're helping someone use speech-to-text transcription on audio files — meetings, voice memos, podcast episodes, phone recordings — without sending anything to the cloud. Every audio file stays on their devices. The fleet picks the best node to handle each speech-to-text transcription automatically.

Why local speech-to-text transcription matters

Cloud speech-to-text transcription APIs charge per minute and send your audio to third-party servers. Meeting recordings contain sensitive business discussions. Voice notes contain personal thoughts. Podcast interviews contain unreleased content. None of that should leave your network. Local transcription keeps it private.

This skill routes speech-to-text transcription requests across your fleet of devices. If one machine is busy with a 3-hour transcription, the next speech-to-text request goes to a different device. Transcription queue management, health monitoring, and dashboard visibility — same infrastructure you'd get from a cloud speech-to-text API, running entirely on your hardware.

Get started with speech-to-text transcription

pip install ollama-herd
herd                                    # start the transcription router (port 11435)
herd-node                               # start on each transcription device
uv tool install "mlx-qwen3-asr[serve]" --python 3.14  # install speech-to-text model

Enable speech-to-text transcription:

curl -X POST http://localhost:11435/dashboard/api/settings \
  -H "Content-Type: application/json" \
  -d '{"transcription": true}'

Package: ollama-herd | Repo: github.com/geeks-accelerator/ollama-herd

Transcribe audio with speech-to-text

curl — basic transcription

# Speech-to-text transcription of a meeting recording
curl -s http://localhost:11435/api/transcribe \
  -F "[email protected]" | python3 -m json.tool

Python — speech-to-text transcription

import httpx

def speech_to_text_transcription(audio_path):
    """Run speech-to-text transcription on an audio file."""
    with open(audio_path, "rb") as f:
        transcription_resp = httpx.post(
            "http://localhost:11435/api/transcribe",
            files={"audio": (audio_path, f)},
            timeout=300.0,
        )
    transcription_resp.raise_for_status()
    transcription_result = transcription_resp.json()
    return transcription_result["text"]

# Run speech-to-text transcription
transcription_text = speech_to_text_transcription("meeting.wav")
print(transcription_text)

Speech-to-text transcription with timestamps

def transcription_with_timestamps(audio_path):
    """Speech-to-text transcription returning timestamped chunks."""
    with open(audio_path, "rb") as f:
        transcription_resp = httpx.post(
            "http://localhost:11435/api/transcribe",
            files={"audio": (audio_path, f)},
            timeout=300.0,
        )
    transcription_resp.raise_for_status()
    transcription_result = transcription_resp.json()
    for transcription_chunk in transcription_result.get("chunks", []):
        print(f"[{transcription_chunk['start']:.1f}s - {transcription_chunk['end']:.1f}s] {transcription_chunk['text']}")
    return transcription_result

Transcription response format

{
  "transcription_text": "Hello, this is a test of the speech-to-text transcription system.",
  "language": "English",
  "transcription_chunks": [
    {
      "text": "Hello, this is a test of the speech-to-text transcription system.",
      "start": 0.0,
      "end": 3.2,
      "chunk_index": 0,
      "language": "English"
    }
  ]
}

Supported audio formats for transcription

WAV, MP3, M4A, FLAC, MP4, OGG — any format FFmpeg supports. WAV files get a ~25% transcription speed boost via native fast-path.

Speech-to-text transcription response headers

Header	Description
`X-Fleet-Node`	Which device performed the speech-to-text transcription
`X-Fleet-Model`	Transcription model used (qwen3-asr)
`X-Transcription-Time`	Transcription processing time in milliseconds

Speech-to-text transcription model

Qwen3-ASR — state-of-the-art open-source speech-to-text transcription in 2026. ~5% word error rate, runs natively on Apple Silicon via MLX. The 0.6B transcription model uses ~1.2GB memory and transcribes at 0.08x real-time factor (a 10-minute recording completes transcription in ~48 seconds).

Also available on this fleet

The same router handles three other AI workloads alongside speech-to-text transcription. All endpoints are at http://localhost:11435:

LLM inference

curl http://localhost:11435/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-oss:120b","messages":[{"role":"user","content":"Hello"}]}'

Image generation

curl -o image.png http://localhost:11435/api/generate-image \
  -H "Content-Type: application/json" \
  -d '{"model":"z-image-turbo","prompt":"a sunset","width":1024,"height":1024,"steps":4}'

Embeddings

curl http://localhost:11435/api/embeddings \
  -d '{"model":"nomic-embed-text","prompt":"search query"}'

Monitoring speech-to-text transcription

# Transcription stats (last 24h)
curl -s http://localhost:11435/dashboard/api/transcription-stats | python3 -m json.tool

# Fleet health (includes speech-to-text transcription activity)
curl -s http://localhost:11435/dashboard/api/health | python3 -m json.tool

Dashboard at http://localhost:11435/dashboard — speech-to-text transcription queues show with [STT] badge alongside LLM and image queues.

Full documentation

Agent Setup Guide — complete reference for all 4 model types including speech-to-text transcription with Python, JavaScript, and curl examples.

Guardrails

Never delete or modify audio files provided by the user for transcription.
Never send audio data to external services — all speech-to-text transcription is local.
Never delete or modify files in ~/.fleet-manager/.
If transcription fails, suggest checking node logs: tail ~/.fleet-manager/logs/herd.jsonl.
If no speech-to-text models available, suggest installing: uv tool install "mlx-qwen3-asr[serve]" --python 3.14.

安全使用建议

This skill appears coherent, but before installing: 1) confirm you trust the PyPI package and the 'uv' model installer sources (they will download model weights). 2) Check herd configuration to ensure the router binds to localhost if you want traffic restricted to your machine (otherwise it may be reachable on the LAN). 3) Inspect ~/.fleet-manager files (latency.db, logs) after startup to understand what telemetry/logs are collected. 4) Ensure adequate disk space and bandwidth for model downloads. 5) If you plan to run nodes across multiple devices, only join devices you control or fully trust.

能力评估

✓ Purpose & Capability

Name/description claim local ASR across an Apple Silicon fleet and the SKILL.md only requires local HTTP endpoints, curl/wget, and optional python/pip; the listed metadata config paths (~/.fleet-manager/...) are consistent with a fleet manager and the darwin OS restriction matches Apple Silicon.

ℹ Instruction Scope

Instructions focus on installing and running a local router/node and calling localhost endpoints for transcription and other workloads. They do not ask the agent to read unrelated files or external credentials. Note: running the router/node will open network services (port 11435) and may expose endpoints to LAN depending on configuration—verify binding to localhost if you want strictly local-only access.

ℹ Install Mechanism

This is instruction-only (no install spec). The SKILL.md tells users to pip install 'ollama-herd' and to run a model installer (uv tool) which will download model weights and software from external sources—expected for a local ASR setup but means large downloads and external network access during setup. No opaque download URLs are embedded in the skill itself.

✓ Credentials

No environment variables, credentials, or unrelated config paths are requested beyond fleet manager files. The lack of secret requests is appropriate for the claimed local-only transcription purpose.

✓ Persistence & Privilege

always:false (not force-enabled). The skill is user-invocable and allows autonomous invocation (platform default) which is expected. The skill does not request to modify other skills or system-wide agent settings.

版本历史

v1.0.2

Cross-platform support: macOS, Linux, and Windows. Updated OS metadata, descriptions, and hardware recommendations.

v1.0.1

- Updated documentation to consistently use "speech-to-text transcription" throughout for clarity. - Expanded section titles and code examples with speech-to-text terminology. - Clarified features and workflow steps for local, private transcription. - Kept version, requirements, and model details unchanged. - Minor edits for conciseness and clearer internationalization support.

v1.2.0

local-transcription 1.2.0 - Updated description to clarify Apple Silicon support and list Mac Studio, Mac Mini, and MacBook Pro as examples. - Reworded introduction for improved clarity and conciseness. - No changes to functionality or interfaces; this update affects documentation only.

v1.1.0

- Updated skill description to emphasize Qwen ASR, local routing, Apple Silicon (MLX), and Whisper alternative positioning - Clarified workflow summary in the first lines and improved mention of supported formats and use cases - No changes to logic, API, or functionality — documentation improvements only - Version number remains at 1.0.0, indicating no breaking or behavioral changes

v1.0.0

- Initial release of local audio transcription across a device fleet. - Transcribes WAV, MP3, M4A, FLAC, and other formats locally without sending data to cloud APIs. - Features automatic queue management, dashboard visibility, health monitoring, and node selection for transcription requests. - Uses Qwen3-ASR model, supporting native performance on Apple Silicon via MLX. - Includes REST API endpoints for transcription, monitoring, and includes code examples for curl and Python. - Integrates seamlessly with fleet LLM, image generation, and embedding workloads through the same infrastructure.

元数据

Slug local-transcription

版本 1.0.2

许可证 MIT-0

累计安装 2

当前安装数 2

历史版本数 5

常见问题

Local Transcription 是什么？

Local speech-to-text transcription with Qwen ASR — transcription routed across your Apple Silicon fleet. Transcribe meetings, voice notes, podcasts with loca... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 166 次。

如何安装 Local Transcription？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install local-transcription」即可一键安装，无需额外配置。

Local Transcription 是免费的吗？

是的，Local Transcription 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Local Transcription 支持哪些平台？

Local Transcription 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（darwin）。

谁开发了 Local Transcription？

由 Twin Geeks（@twinsgeeks）开发并维护，当前版本 v1.0.2。

Local Transcription