← 返回 Skills 市场
neldar

Faster Whisper Local Service

作者 neldar · GitHub ↗ · v0.2.0
cross-platform ✓ 安全检测通过
1397
总下载
0
收藏
12
当前安装
9
版本数
在 OpenClaw 中安装
/install faster-whisper-local-service
功能描述
OpenClaw local speech-to-text backend using faster-whisper over HTTP on 127.0.0.1:18790. Use when you want voice transcription without external APIs, without...
使用说明 (SKILL.md)

Faster Whisper Local Service

Provision a local STT backend used by voice skills.

What this sets up

  • Python venv for faster-whisper
  • transcribe-server.py HTTP endpoint at http://127.0.0.1:18790/transcribe
  • systemd user service: openclaw-transcribe.service

Important: Model download on first run

On first startup, faster-whisper downloads model weights from Hugging Face (~1.5 GB for medium). This requires internet access and disk space. After the initial download, models are cached locally and the service runs fully offline.

Model Download size RAM usage
tiny ~75 MB ~400 MB
base ~150 MB ~500 MB
small ~500 MB ~800 MB
medium ~1.5 GB ~1.4 GB
large-v3 ~3.0 GB ~3.5 GB

To pre-download models in an air-gapped environment, see faster-whisper docs.

Security notes

Network isolation

  • Binds to 127.0.0.1 only — not reachable from the network.
  • CORS restricted to a single origin (https://127.0.0.1:8443 by default).
  • No credentials, API keys, or secrets are used or stored.

Input validation

  • Upload size limit: Requests exceeding the configured limit are rejected before processing (HTTP 413). Default: 50 MB, configurable via MAX_UPLOAD_MB.
  • Magic-byte check: Only files with recognized audio signatures (WAV, OGG, FLAC, MP3, WebM, M4A) are accepted. Unrecognized formats are rejected (HTTP 415) before reaching GStreamer.
  • Subprocess safety: All arguments to gst-launch-1.0 are passed as a list — no shell expansion or injection is possible.

GStreamer dependency

The service uses GStreamer's decodebin for audio format conversion. Like any media library, GStreamer's parsers process binary data and should be kept up to date. Mitigation: install gst-launch-1.0 from your OS vendor's trusted packages and apply security updates regularly. The magic-byte pre-filter above reduces the attack surface by rejecting non-audio payloads before they reach GStreamer.

No data exfiltration

  • No outbound network calls (after initial model download).
  • No telemetry, analytics, or phone-home behavior.
  • Temporary files are created in a per-request TemporaryDirectory and cleaned up immediately.

Reproducibility defaults

  • Pinned package: faster-whisper==1.1.1 (override via env)
  • Explicit dependency check for gst-launch-1.0
  • CORS restricted to one origin by default
  • Configurable workspace/service paths (no hardcoded user path)

Deploy

bash scripts/deploy.sh

With custom settings:

WORKSPACE=~/.openclaw/workspace \
TRANSCRIBE_PORT=18790 \
WHISPER_MODEL_SIZE=medium \
WHISPER_LANGUAGE=auto \
TRANSCRIBE_ALLOWED_ORIGIN=https://10.0.0.42:8443 \
bash scripts/deploy.sh

Language setting

Default: auto (auto-detect language). Set WHISPER_LANGUAGE=de for German-only, en for English-only, etc. Fixed language is faster and more accurate if you only use one language.

Idempotent: safe to run repeatedly.

What this skill modifies

What Path Action
Python venv $WORKSPACE/.venv-faster-whisper/ Creates venv, installs faster-whisper via pip
Transcribe server $WORKSPACE/voice-input/transcribe-server.py Writes server script
Systemd service ~/.config/systemd/user/openclaw-transcribe.service Creates + enables persistent service
Model cache ~/.cache/huggingface/ Downloads model weights on first run

Uninstall

systemctl --user stop openclaw-transcribe.service
systemctl --user disable openclaw-transcribe.service
rm -f ~/.config/systemd/user/openclaw-transcribe.service
systemctl --user daemon-reload

Optional full cleanup:

rm -rf ~/.openclaw/workspace/.venv-faster-whisper
rm -f ~/.openclaw/workspace/voice-input/transcribe-server.py

Verify

bash scripts/status.sh

Expected:

  • service active
  • endpoint responds (HTTP 200/500 acceptable for invalid sample payload)

Notes

  • This skill provides backend transcription only.
  • Pair with webchat-voice-proxy for browser mic + HTTPS/WSS integration.
  • For one-step install, use webchat-voice-full-stack (deploys backend + proxy in order).
安全使用建议
This appears to be a legitimate local transcription installer. Before installing, consider: (1) the deploy script will pip-install faster-whisper into a user venv and create a systemd user service (it runs with your user privileges); (2) on first run faster-whisper will download large model weights from Hugging Face — ensure you want that network activity and disk use; (3) the service uses gst-launch-1.0 (OS package) for audio conversion — keep GStreamer updated; (4) review transcribe-server.py (included) if you want to audit behavior yourself; and (5) if you need stronger isolation, run the service in a dedicated user account, container, or VM. If you do not trust the faster-whisper package source, consider inspecting or pinning dependencies before installing.
功能分析
Type: OpenClaw Skill Name: faster-whisper-local-service Version: 0.2.0 The skill provisions a local speech-to-text service using the `faster-whisper` library. It includes a deployment script (`scripts/deploy.sh`) that sets up a Python virtual environment, a transcription server (`transcribe-server.py`), and a systemd user service. The implementation includes security best practices such as magic-byte verification for audio files, upload size limits, and safe subprocess execution for GStreamer. No evidence of data exfiltration, malicious persistence, or prompt injection was found.
能力评估
Purpose & Capability
Name/description match the behavior: the scripts create a venv, install faster-whisper, write a local HTTP server, and register a per-user systemd service. All created files and env vars are clearly for configuring a local STT backend.
Instruction Scope
SKILL.md and scripts only perform actions needed for a local transcription service: check for python and gst-launch, create venv, pip-install faster-whisper, write server script and systemd unit, and run the service. The server enforces upload limits, magic-byte checks, and binds to 127.0.0.1. It does spawn gst-launch for audio conversion (via subprocess with argument list), which is expected for media handling and is explicitly called out and constrained.
Install Mechanism
No registry install spec, but deploy.sh pip-installs faster-whisper from PyPI into a local venv (moderate risk but expected). Model weights are downloaded from Hugging Face on first run (large files, requires Internet). No downloads from untrusted custom URLs or URL shorteners.
Credentials
The skill does not require secrets or unrelated environment variables. Environment variables present are configuration knobs (port, model size, device, allowed origin, max upload) and are proportional to the stated function.
Persistence & Privilege
The skill creates a per-user systemd service (~~/.config/systemd/user/...) and files under a configurable workspace in the user's home. It does not request always:true, system-wide privileges, or modify other skills' configurations.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install faster-whisper-local-service
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /faster-whisper-local-service 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.2.0
Security hardening: input validation (magic-byte check, configurable upload size limit), GStreamer subprocess timeout, sanitized error responses, restructured security docs
v0.1.7
Refresh tags for local/offline STT discoverability.
v0.1.6
Fixed misleading 'fully offline' claim: now documents model download on first run with size table. Made language configurable (WHISPER_LANGUAGE, default: auto instead of hardcoded 'de'). Added security notes about gst-launch-1.0 and untrusted audio. Added system modifications table and uninstall instructions.
v0.1.5
Improved discoverability: expanded description with STT, speech-to-text, transcription, microphone, voice input keywords
v0.1.4
Ranking tune: stronger OpenClaw/local STT naming and discoverability keywords.
v0.1.3
Searchability update: explicit OpenClaw/local/offline/no extra API cost wording in description.
v0.1.2
Added reference to webchat-voice-full-stack for one-step backend+proxy deployment.
v0.1.1
Hardened deploy: configurable paths, dependency checks, pinned faster-whisper version, and restrictive configurable CORS origin.
v0.1.0
Initial release: installs local faster-whisper transcription service with systemd user unit and health checks.
元数据
Slug faster-whisper-local-service
版本 0.2.0
许可证
累计安装 12
当前安装数 12
历史版本数 9
常见问题

Faster Whisper Local Service 是什么?

OpenClaw local speech-to-text backend using faster-whisper over HTTP on 127.0.0.1:18790. Use when you want voice transcription without external APIs, without... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 1397 次。

如何安装 Faster Whisper Local Service?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install faster-whisper-local-service」即可一键安装,无需额外配置。

Faster Whisper Local Service 是免费的吗?

是的,Faster Whisper Local Service 完全免费(开源免费),可自由下载、安装和使用。

Faster Whisper Local Service 支持哪些平台?

Faster Whisper Local Service 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Faster Whisper Local Service?

由 neldar(@neldar)开发并维护,当前版本 v0.2.0。

💬 留言讨论