/install gipformer
Gipformer ASR
Vietnamese speech recognition — send audio of any length, get transcript.
Huggingface Model: g-group-ai-lab/gipformer-65M-rnnt (65M params, int8/fp32 ONNX)
Architecture
flowchart TD
A[Audio file] -->|base64 encode| B[POST /transcribe]
B --> C[Decode & resample to 16kHz]
C --> D[VAD chunking ≤ 20s]
D --> E[Batch inference — sherpa-onnx]
E --> F[Merge chunk texts]
F --> G["{ transcript, chunks }"]
The client sends base64-encoded audio (any length, any format). The server decodes, chunks with VAD, infers in batches, and returns the full transcript.
Quick Start
1. Install dependencies
pip install -r {baseDir}/requirements.txt
System dependency: ffmpeg (required for M4A support).
2. Start the server
python {baseDir}/scripts/serve.py
# or with options:
python {baseDir}/scripts/serve.py --port 8910 --quantize int8 --max-batch-size 32
The server downloads the ASR model + VAD model on first run and listens on http://127.0.0.1:8910.
3. Transcribe audio
# Single file (any format)
python {baseDir}/scripts/transcribe.py audio.wav
python {baseDir}/scripts/transcribe.py recording.mp3
# Multiple files
python {baseDir}/scripts/transcribe.py *.wav
# JSON output with chunk details
python {baseDir}/scripts/transcribe.py audio.wav --json
# Save results
python {baseDir}/scripts/transcribe.py audio.wav -o results.json
4. Direct API call (curl)
# Transcribe (any length, any format)
curl -X POST http://127.0.0.1:8910/transcribe \
-H "Content-Type: application/json" \
-d "{\"audio_b64\": \"$(base64 -i audio.wav)\"}"
# Response:
# { "transcript": "full text...", "duration_s": 120.5, "process_time_s": 5.2,
# "chunks": [{"text": "...", "start_s": 0.0, "end_s": 8.7}, ...] }
# Health check
curl http://127.0.0.1:8910/health
Audio Format
| Format | Extension | Support |
|---|---|---|
| WAV | .wav | Native (soundfile) |
| FLAC | .flac | Native (soundfile) |
| OGG | .ogg | Native (soundfile) |
| MP3 | .mp3 | Native (soundfile) |
| M4A/AAC | .m4a | Via ffmpeg |
All formats are converted to WAV 16-bit PCM mono 16kHz internally.
Server Tuning
| Flag | Default | Effect |
|---|---|---|
--quantize |
int8 | fp32 for accuracy, int8 for speed/size |
--max-batch-size |
16 | Higher = more throughput, more latency |
--max-wait-ms |
100 | How long to wait before flushing a partial batch |
--num-threads |
4 | ONNX runtime threads |
--decoding-method |
modified_beam_search | greedy_search for faster speed |
API Reference
See references/api.md for full endpoint documentation.
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install gipformer - 安装完成后,直接呼叫该 Skill 的名称或使用
/gipformer触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Gipformer ASR 是什么?
Vietnamese speech-to-text using Gipformer ASR (65M params, Zipformer-RNNT). Accepts audio of any length — the server handles VAD chunking, batching, and retu... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 190 次。
如何安装 Gipformer ASR?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install gipformer」即可一键安装,无需额外配置。
Gipformer ASR 是免费的吗?
是的,Gipformer ASR 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Gipformer ASR 支持哪些平台?
Gipformer ASR 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Gipformer ASR?
由 AI-GGroup(@ai-ggroup)开发并维护,当前版本 v1.0.0。