← Back to Skills Marketplace
twinsgeeks

Local Transcription

by Twin Geeks · GitHub ↗ · v1.0.2 · MIT-0
darwin ✓ Security Clean
166
Downloads
1
Stars
2
Active Installs
5
Versions
Install in OpenClaw
/install local-transcription
Description
Local speech-to-text transcription with Qwen ASR — transcription routed across your Apple Silicon fleet. Transcribe meetings, voice notes, podcasts with loca...
README (SKILL.md)

Local Speech-to-Text Transcription

You're helping someone use speech-to-text transcription on audio files — meetings, voice memos, podcast episodes, phone recordings — without sending anything to the cloud. Every audio file stays on their devices. The fleet picks the best node to handle each speech-to-text transcription automatically.

Why local speech-to-text transcription matters

Cloud speech-to-text transcription APIs charge per minute and send your audio to third-party servers. Meeting recordings contain sensitive business discussions. Voice notes contain personal thoughts. Podcast interviews contain unreleased content. None of that should leave your network. Local transcription keeps it private.

This skill routes speech-to-text transcription requests across your fleet of devices. If one machine is busy with a 3-hour transcription, the next speech-to-text request goes to a different device. Transcription queue management, health monitoring, and dashboard visibility — same infrastructure you'd get from a cloud speech-to-text API, running entirely on your hardware.

Get started with speech-to-text transcription

pip install ollama-herd
herd                                    # start the transcription router (port 11435)
herd-node                               # start on each transcription device
uv tool install "mlx-qwen3-asr[serve]" --python 3.14  # install speech-to-text model

Enable speech-to-text transcription:

curl -X POST http://localhost:11435/dashboard/api/settings \
  -H "Content-Type: application/json" \
  -d '{"transcription": true}'

Package: ollama-herd | Repo: github.com/geeks-accelerator/ollama-herd

Transcribe audio with speech-to-text

curl — basic transcription

# Speech-to-text transcription of a meeting recording
curl -s http://localhost:11435/api/transcribe \
  -F "[email protected]" | python3 -m json.tool

Python — speech-to-text transcription

import httpx

def speech_to_text_transcription(audio_path):
    """Run speech-to-text transcription on an audio file."""
    with open(audio_path, "rb") as f:
        transcription_resp = httpx.post(
            "http://localhost:11435/api/transcribe",
            files={"audio": (audio_path, f)},
            timeout=300.0,
        )
    transcription_resp.raise_for_status()
    transcription_result = transcription_resp.json()
    return transcription_result["text"]

# Run speech-to-text transcription
transcription_text = speech_to_text_transcription("meeting.wav")
print(transcription_text)

Speech-to-text transcription with timestamps

def transcription_with_timestamps(audio_path):
    """Speech-to-text transcription returning timestamped chunks."""
    with open(audio_path, "rb") as f:
        transcription_resp = httpx.post(
            "http://localhost:11435/api/transcribe",
            files={"audio": (audio_path, f)},
            timeout=300.0,
        )
    transcription_resp.raise_for_status()
    transcription_result = transcription_resp.json()
    for transcription_chunk in transcription_result.get("chunks", []):
        print(f"[{transcription_chunk['start']:.1f}s - {transcription_chunk['end']:.1f}s] {transcription_chunk['text']}")
    return transcription_result

Transcription response format

{
  "transcription_text": "Hello, this is a test of the speech-to-text transcription system.",
  "language": "English",
  "transcription_chunks": [
    {
      "text": "Hello, this is a test of the speech-to-text transcription system.",
      "start": 0.0,
      "end": 3.2,
      "chunk_index": 0,
      "language": "English"
    }
  ]
}

Supported audio formats for transcription

WAV, MP3, M4A, FLAC, MP4, OGG — any format FFmpeg supports. WAV files get a ~25% transcription speed boost via native fast-path.

Speech-to-text transcription response headers

Header Description
X-Fleet-Node Which device performed the speech-to-text transcription
X-Fleet-Model Transcription model used (qwen3-asr)
X-Transcription-Time Transcription processing time in milliseconds

Speech-to-text transcription model

Qwen3-ASR — state-of-the-art open-source speech-to-text transcription in 2026. ~5% word error rate, runs natively on Apple Silicon via MLX. The 0.6B transcription model uses ~1.2GB memory and transcribes at 0.08x real-time factor (a 10-minute recording completes transcription in ~48 seconds).

Also available on this fleet

The same router handles three other AI workloads alongside speech-to-text transcription. All endpoints are at http://localhost:11435:

LLM inference

curl http://localhost:11435/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-oss:120b","messages":[{"role":"user","content":"Hello"}]}'

Image generation

curl -o image.png http://localhost:11435/api/generate-image \
  -H "Content-Type: application/json" \
  -d '{"model":"z-image-turbo","prompt":"a sunset","width":1024,"height":1024,"steps":4}'

Embeddings

curl http://localhost:11435/api/embeddings \
  -d '{"model":"nomic-embed-text","prompt":"search query"}'

Monitoring speech-to-text transcription

# Transcription stats (last 24h)
curl -s http://localhost:11435/dashboard/api/transcription-stats | python3 -m json.tool

# Fleet health (includes speech-to-text transcription activity)
curl -s http://localhost:11435/dashboard/api/health | python3 -m json.tool

Dashboard at http://localhost:11435/dashboard — speech-to-text transcription queues show with [STT] badge alongside LLM and image queues.

Full documentation

Agent Setup Guide — complete reference for all 4 model types including speech-to-text transcription with Python, JavaScript, and curl examples.

Guardrails

  • Never delete or modify audio files provided by the user for transcription.
  • Never send audio data to external services — all speech-to-text transcription is local.
  • Never delete or modify files in ~/.fleet-manager/.
  • If transcription fails, suggest checking node logs: tail ~/.fleet-manager/logs/herd.jsonl.
  • If no speech-to-text models available, suggest installing: uv tool install "mlx-qwen3-asr[serve]" --python 3.14.
Usage Guidance
This skill appears coherent, but before installing: 1) confirm you trust the PyPI package and the 'uv' model installer sources (they will download model weights). 2) Check herd configuration to ensure the router binds to localhost if you want traffic restricted to your machine (otherwise it may be reachable on the LAN). 3) Inspect ~/.fleet-manager files (latency.db, logs) after startup to understand what telemetry/logs are collected. 4) Ensure adequate disk space and bandwidth for model downloads. 5) If you plan to run nodes across multiple devices, only join devices you control or fully trust.
Capability Assessment
Purpose & Capability
Name/description claim local ASR across an Apple Silicon fleet and the SKILL.md only requires local HTTP endpoints, curl/wget, and optional python/pip; the listed metadata config paths (~/.fleet-manager/...) are consistent with a fleet manager and the darwin OS restriction matches Apple Silicon.
Instruction Scope
Instructions focus on installing and running a local router/node and calling localhost endpoints for transcription and other workloads. They do not ask the agent to read unrelated files or external credentials. Note: running the router/node will open network services (port 11435) and may expose endpoints to LAN depending on configuration—verify binding to localhost if you want strictly local-only access.
Install Mechanism
This is instruction-only (no install spec). The SKILL.md tells users to pip install 'ollama-herd' and to run a model installer (uv tool) which will download model weights and software from external sources—expected for a local ASR setup but means large downloads and external network access during setup. No opaque download URLs are embedded in the skill itself.
Credentials
No environment variables, credentials, or unrelated config paths are requested beyond fleet manager files. The lack of secret requests is appropriate for the claimed local-only transcription purpose.
Persistence & Privilege
always:false (not force-enabled). The skill is user-invocable and allows autonomous invocation (platform default) which is expected. The skill does not request to modify other skills or system-wide agent settings.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install local-transcription
  3. After installation, invoke the skill by name or use /local-transcription
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.2
Cross-platform support: macOS, Linux, and Windows. Updated OS metadata, descriptions, and hardware recommendations.
v1.0.1
- Updated documentation to consistently use "speech-to-text transcription" throughout for clarity. - Expanded section titles and code examples with speech-to-text terminology. - Clarified features and workflow steps for local, private transcription. - Kept version, requirements, and model details unchanged. - Minor edits for conciseness and clearer internationalization support.
v1.2.0
local-transcription 1.2.0 - Updated description to clarify Apple Silicon support and list Mac Studio, Mac Mini, and MacBook Pro as examples. - Reworded introduction for improved clarity and conciseness. - No changes to functionality or interfaces; this update affects documentation only.
v1.1.0
- Updated skill description to emphasize Qwen ASR, local routing, Apple Silicon (MLX), and Whisper alternative positioning - Clarified workflow summary in the first lines and improved mention of supported formats and use cases - No changes to logic, API, or functionality — documentation improvements only - Version number remains at 1.0.0, indicating no breaking or behavioral changes
v1.0.0
- Initial release of local audio transcription across a device fleet. - Transcribes WAV, MP3, M4A, FLAC, and other formats locally without sending data to cloud APIs. - Features automatic queue management, dashboard visibility, health monitoring, and node selection for transcription requests. - Uses Qwen3-ASR model, supporting native performance on Apple Silicon via MLX. - Includes REST API endpoints for transcription, monitoring, and includes code examples for curl and Python. - Integrates seamlessly with fleet LLM, image generation, and embedding workloads through the same infrastructure.
Metadata
Slug local-transcription
Version 1.0.2
License MIT-0
All-time Installs 2
Active Installs 2
Total Versions 5
Frequently Asked Questions

What is Local Transcription?

Local speech-to-text transcription with Qwen ASR — transcription routed across your Apple Silicon fleet. Transcribe meetings, voice notes, podcasts with loca... It is an AI Agent Skill for Claude Code / OpenClaw, with 166 downloads so far.

How do I install Local Transcription?

Run "/install local-transcription" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Local Transcription free?

Yes, Local Transcription is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Local Transcription support?

Local Transcription is cross-platform and runs anywhere OpenClaw / Claude Code is available (darwin).

Who created Local Transcription?

It is built and maintained by Twin Geeks (@twinsgeeks); the current version is v1.0.2.

💬 Comments