← 返回 Skills 市场
mozi1924

Local Stt Workflow

作者 Mozi Arasaka · GitHub ↗ · v1.0.2 · MIT-0
cross-platform ✓ 安全检测通过
138
总下载
0
收藏
1
当前安装
3
版本数
在 OpenClaw 中安装
/install local-stt-workflow
功能描述
Local speech-to-text workflow for an OpenAI-compatible STT server, typically on http://127.0.0.1:8000/v1. Use when configuring, testing, debugging, or valida...
使用说明 (SKILL.md)

Local STT Workflow

Use this skill to debug the full transcription path, not just the model.

Default assumption: the local STT server lives at http://127.0.0.1:8000/v1.

Current local model-path fallback worth remembering: if the server did not pull a model by name, it may be loading directly from a local path such as ./models/Qwen3-ASR-0.6B-bf16.

When exact route shape matters, the local OpenAPI document is available at:

  • http://localhost:8000/openapi.json

Use this OpenAPI doc as a schema/reference source to compare this local mlx-audio server against OpenAI’s API. Do not treat it as a health check.

Workflow

1. Verify the server before blaming OpenClaw

Check the basics first:

curl http://127.0.0.1:8000/health
curl http://127.0.0.1:8000/v1/models

Confirm that the intended STT model exists, usually qwen3-asr.

If the model does not appear by pulled registry name, do not assume STT is broken — this server may be running a local-path model such as ./models/Qwen3-ASR-0.6B-bf16.

If the server is task-gated, ensure STT is enabled:

MLX_AUDIO_SERVER_TASKS=stt uv run python server.py

If the model is missing, register it before testing clients — but first check whether the server is intentionally loading from a local path and verify the exact accepted model IDs through /v1/models or http://localhost:8000/openapi.json.

2. Prove the raw STT endpoint works

Always isolate the server from the client stack.

Minimal direct transcription test:

curl -X POST http://127.0.0.1:8000/v1/audio/transcriptions \
  -F [email protected] \
  -F model=qwen3-asr \
  -F response_format=json

Useful richer test:

curl -X POST http://127.0.0.1:8000/v1/audio/transcriptions \
  -F [email protected] \
  -F model=qwen3-asr \
  -F response_format=verbose_json \
  -F 'timestamp_granularities[]=segment' \
  -F 'timestamp_granularities[]=word'

If direct curl works but OpenClaw does not, the bug is probably in the message ingestion or routing layer, not the STT backend.

3. Distinguish server failure from routing failure

Use this rule hard:

  • Direct curl fails → fix the local STT server first
  • Direct curl works, but OpenClaw shows no transcript → inspect OpenClaw audio pipeline / attachment routing
  • OpenClaw sends requests, but fields are wrong → inspect request shape compatibility

This distinction saves a shitload of time.

4. Check the request shape

This server is designed around OpenAI-style multipart form upload.

Expected core fields for /v1/audio/transcriptions from the current local OpenAPI schema:

  • required: file, model
  • optional: language, verbose, max_tokens, chunk_duration, frame_threshold, stream, context, prefill_step_size, text

This means the local server is not exposing the same form shape as OpenAI Whisper-style docs. Do not blindly assume response_format, prompt, or timestamp_granularities[] exist just because OpenAI supports them.

If a client is suspected of sending the wrong shape, inspect traffic with a temporary dump proxy or server logs.

5. Use the reference doc when exact fields matter

Read references/stt-api.md when you need exact behavior for:

  • response_format=json|text|verbose_json|srt|vtt
  • stream=true SSE events
  • timestamp_granularities[]
  • include[]
  • translation endpoint semantics
  • error envelope shape
  • current compatibility limits

Do not guess field support from generic OpenAI docs when this local server may intentionally differ.

Current notable mismatch: the local schema exposes context and text, plus chunking/prefill controls like chunk_duration, frame_threshold, and prefill_step_size, which are not the usual OpenAI STT field set.

6. OpenClaw-specific debugging pattern

When OpenClaw STT appears broken:

  1. Confirm tools.media.audio is configured, not messages.stt
  2. Confirm base URL points at http://127.0.0.1:8000/v1
  3. Confirm the chosen model exists in /v1/models
  4. Send the exact inbound audio file directly to /v1/audio/transcriptions
  5. Inspect gateway logs for any sign of transcription dispatch
  6. If there is no /audio/transcriptions request at all, the problem is upstream of STT

If OpenClaw never hits the server, stop tweaking model params. That would be cargo-cult debugging.

7. Preferred test ladder

Use this order:

  1. GET /health
  2. GET /v1/models
  3. direct curl transcription with the same audio file
  4. compare request fields against http://localhost:8000/openapi.json
  5. OpenAI client compatibility test
  6. OpenClaw integration test
  7. dump-proxy / log inspection only if still ambiguous

8. Common conclusions

Niche input container bug

Typical signs:

  • direct upload of a less-common container like .m4a returns 500
  • server logs mention unsupported format handling during temp write or normalization
  • converting the same source audio to mp3 or wav makes transcription succeed immediately

Conclusion: treat this as an input-container compatibility bug, not an ASR-quality failure. For now, transcode niche formats to mp3 or wav before testing recognition quality.

Server good, client bad

Typical signs:

  • manual curl returns { "text": ... }
  • OpenClaw logs show no transcription request
  • changing model/language does nothing

Conclusion: fix routing, not inference.

Multipart mismatch

Typical signs:

  • server is up
  • model exists
  • client gets 400 errors
  • direct curl works but app client does not

Conclusion: compare multipart field names and values.

Feature mismatch

Typical signs:

  • client expects diarization, logprobs, or richer streaming fields
  • local server only implements a smaller compatible subset

Conclusion: align expectations with references/stt-api.md.

Resources

references/

  • references/stt-api.md — exact local API behavior, schema, response formats, SSE events, limits, and compatibility notes
安全使用建议
This skill is a local troubleshooting guide — it tells you to curl localhost, read the server's openapi.json, and inspect logs. Those are reasonable when debugging a local STT server. Before running commands, ensure you understand any curl/file commands you paste into a shell and avoid sending private audio to unknown remote endpoints. If your environment restricts access to system logs or localhost ports, run these steps on a trusted machine where the STT server is intentionally hosted.
功能分析
Type: OpenClaw Skill Name: local-stt-workflow Version: 1.0.2 The skill bundle provides a structured workflow and documentation for debugging a local Speech-to-Text (STT) server (typically at 127.0.0.1:8000). It uses standard diagnostic tools like curl to verify server health and API compatibility (SKILL.md, references/stt-api.md). No indicators of data exfiltration, malicious execution, or harmful prompt injection were found; the instructions are focused entirely on troubleshooting local audio processing pipelines.
能力评估
Purpose & Capability
The name/description (local STT debug workflow) matches the content: step-by-step curl tests, OpenAPI checks, and OpenClaw integration guidance. There are no unrelated requirements (no cloud creds, no extraneous binaries).
Instruction Scope
SKILL.md contains only diagnostic steps: curl against localhost endpoints, read local openapi.json, check server logs, and use a dump proxy for traffic inspection. These actions are appropriate for local STT debugging; nothing instructs reading unrelated system secrets or exfiltrating data to remote endpoints.
Install Mechanism
No install spec or code is included. This is instruction-only, so nothing is written to disk or pulled from external URLs.
Credentials
No required environment variables, credentials, or config paths are declared. The document mentions MLX_AUDIO_SERVER_TASKS as an example runtime flag (contextual, not required). No disproportionate secret access is requested.
Persistence & Privilege
The skill is not always-enabled and is user-invocable. It does not request permanent presence or modification of other skills or system-wide settings.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install local-stt-workflow
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /local-stt-workflow 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.2
local-stt-workflow 1.0.2 - Added guidance for handling transcription failures when using less-common audio containers like `.m4a` - Clarified that container incompatibility should be treated as an input compatibility, not ASR quality, issue - Updated "Common conclusions" section with troubleshooting advice for input-container bugs
v1.0.1
- Clarified that the local STT server may load models from a local path if registry-pulled models are missing. - Added information about referencing the local OpenAPI schema at `http://localhost:8000/openapi.json` to verify exact request/response shape. - Updated documentation to highlight differences between this server’s API and OpenAI Whisper’s API, especially regarding supported request fields. - Emphasized not to assume OpenAI field compatibility; notable mention of unique local fields (`context`, `text`, chunking/prefill controls). - Adjusted debugging/test workflow steps to prioritize verifying actual supported schema and added OpenAPI comparison as a diagnostic step.
v1.0.0
Initial release of local-stt-workflow. - Introduces a detailed workflow for debugging and validating speech-to-text servers compatible with OpenAI endpoints. - Guides users through verifying server setup, isolating server vs. client routing failures, and checking request payload compatibility. - Provides troubleshooting advice specific to OpenClaw audio pipelines and multipart/form-data issues. - Emphasizes use of direct `curl` commands for diagnosis and references documentation for exact API behaviors. - Outlines common error patterns and troubleshooting steps to streamline local audio transcription debugging.
元数据
Slug local-stt-workflow
版本 1.0.2
许可证 MIT-0
累计安装 1
当前安装数 1
历史版本数 3
常见问题

Local Stt Workflow 是什么?

Local speech-to-text workflow for an OpenAI-compatible STT server, typically on http://127.0.0.1:8000/v1. Use when configuring, testing, debugging, or valida... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 138 次。

如何安装 Local Stt Workflow?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install local-stt-workflow」即可一键安装,无需额外配置。

Local Stt Workflow 是免费的吗?

是的,Local Stt Workflow 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Local Stt Workflow 支持哪些平台?

Local Stt Workflow 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Local Stt Workflow?

由 Mozi Arasaka(@mozi1924)开发并维护,当前版本 v1.0.2。

💬 留言讨论