← 返回 Skills 市场

Gipformer ASR

Name: Gipformer ASR
Author: ai-ggroup

作者 AI-GGroup · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ✓ 安全检测通过

190

总下载

当前安装

版本数

在 OpenClaw 中安装

/install gipformer

功能描述

Vietnamese speech-to-text using Gipformer ASR (65M params, Zipformer-RNNT). Accepts audio of any length — the server handles VAD chunking, batching, and retu...

使用说明 (SKILL.md)

Gipformer ASR

Vietnamese speech recognition — send audio of any length, get transcript.

Huggingface Model: g-group-ai-lab/gipformer-65M-rnnt (65M params, int8/fp32 ONNX)

Architecture

flowchart TD
    A[Audio file] -->|base64 encode| B[POST /transcribe]
    B --> C[Decode & resample to 16kHz]
    C --> D[VAD chunking ≤ 20s]
    D --> E[Batch inference — sherpa-onnx]
    E --> F[Merge chunk texts]
    F --> G["{ transcript, chunks }"]

The client sends base64-encoded audio (any length, any format). The server decodes, chunks with VAD, infers in batches, and returns the full transcript.

Quick Start

1. Install dependencies

pip install -r {baseDir}/requirements.txt

System dependency: ffmpeg (required for M4A support).

2. Start the server

python {baseDir}/scripts/serve.py
# or with options:
python {baseDir}/scripts/serve.py --port 8910 --quantize int8 --max-batch-size 32

The server downloads the ASR model + VAD model on first run and listens on http://127.0.0.1:8910.

3. Transcribe audio

# Single file (any format)
python {baseDir}/scripts/transcribe.py audio.wav
python {baseDir}/scripts/transcribe.py recording.mp3

# Multiple files
python {baseDir}/scripts/transcribe.py *.wav

# JSON output with chunk details
python {baseDir}/scripts/transcribe.py audio.wav --json

# Save results
python {baseDir}/scripts/transcribe.py audio.wav -o results.json

4. Direct API call (curl)

# Transcribe (any length, any format)
curl -X POST http://127.0.0.1:8910/transcribe \
  -H "Content-Type: application/json" \
  -d "{\"audio_b64\": \"$(base64 -i audio.wav)\"}"

# Response:
# { "transcript": "full text...", "duration_s": 120.5, "process_time_s": 5.2,
#   "chunks": [{"text": "...", "start_s": 0.0, "end_s": 8.7}, ...] }

# Health check
curl http://127.0.0.1:8910/health

Audio Format

Format	Extension	Support
WAV	.wav	Native (soundfile)
FLAC	.flac	Native (soundfile)
OGG	.ogg	Native (soundfile)
MP3	.mp3	Native (soundfile)
M4A/AAC	.m4a	Via ffmpeg

All formats are converted to WAV 16-bit PCM mono 16kHz internally.

Server Tuning

Flag	Default	Effect
`--quantize`	int8	`fp32` for accuracy, `int8` for speed/size
`--max-batch-size`	16	Higher = more throughput, more latency
`--max-wait-ms`	100	How long to wait before flushing a partial batch
`--num-threads`	4	ONNX runtime threads
`--decoding-method`	modified_beam_search	`greedy_search` for faster speed

API Reference

See references/api.md for full endpoint documentation.

安全使用建议

This skill appears coherent for running a local Vietnamese ASR server, but review and be prepared for the following before installing: 1) It will download model files from Hugging Face at first run — verify the REPO_ID (g-group-ai-lab/gipformer-65M-rnnt) is trusted. 2) You must install Python packages (sherpa-onnx, onnxruntime, silero-vad, fastapi, etc.) and system dependencies like ffmpeg and possibly libsndfile — these can be large and may require system package installs. 3) The server executes ffmpeg via subprocess and writes temporary files while decoding uploaded audio; run in a sandbox/virtualenv or container if you want isolation. 4) No secrets are requested by the skill, but huggingface_hub may use your HUGGINGFACE_HUB_TOKEN automatically if present (only needed for private models). 5) If you plan to expose the server beyond localhost, review network/security settings (authentication is not implemented). If uncertain, run the code in a controlled environment and inspect the repository on Hugging Face before use.

功能分析

Type: OpenClaw Skill Name: gipformer Version: 1.0.0 The gipformer skill provides Vietnamese speech-to-text functionality using the Gipformer ASR model. The bundle includes a FastAPI server (serve.py) that handles model inference via sherpa-onnx, an audio chunking utility (chunk_audio.py) using Silero VAD, and a client script (transcribe.py) for interacting with the API. The code follows standard practices for machine learning services, such as downloading models from Hugging Face and using subprocess safely for audio conversion with ffmpeg. No indicators of malicious intent, data exfiltration, or harmful prompt injection were found.

能力评估

✓ Purpose & Capability

Name/description (Vietnamese ASR) align with the included code and requirements: scripts implement VAD chunking, ONNX-based inference (sherpa-onnx), a FastAPI server, and a client. Required packages in requirements.txt are consistent with the functionality.

✓ Instruction Scope

SKILL.md instructs installing dependencies, running a local server, and sending base64 audio to /transcribe. The runtime instructions and code operate on provided audio files and do not read unrelated system files or env vars. The server decodes audio, chunks it, runs inference, and returns transcripts as described.

ℹ Install Mechanism

There is no automated install spec in the registry; SKILL.md expects the user to pip install -r requirements.txt. Model files are downloaded at first run from Hugging Face (hf_hub_download). Network downloads and heavy native/system deps (ffmpeg, libsndfile) are required — expected for this use-case but worth noting before install.

✓ Credentials

The skill does not request environment variables, credentials, or configuration paths. It uses huggingface_hub to download public model files; if a private repo were used the huggingface token (HUGGINGFACE_HUB_TOKEN) would be used by the library but is not required by this package.

✓ Persistence & Privilege

Skill is not always-enabled and does not modify other skills or system-wide agent settings. It runs a local server when started; no privileged or persistent platform-level presence is requested by the skill metadata.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install gipformer
安装完成后，直接呼叫该 Skill 的名称或使用 /gipformer 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

Initial release of Vietnamese speech-to-text using Gipformer ASR. - Supports speech recognition for Vietnamese audio using a 65M parameter Zipformer-RNNT model. - Accepts audio in WAV, FLAC, OGG, MP3, and M4A formats; any duration. - Handles VAD chunking, batching, and provides full transcript with chunk metadata. - Server and CLI tools provided for both API and script-based transcription. - Configurable for quantization, batch size, decoding method, and format support (ffmpeg required for M4A). - Includes health check and comprehensive API documentation.

元数据

Slug gipformer

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

Gipformer ASR 是什么？

Vietnamese speech-to-text using Gipformer ASR (65M params, Zipformer-RNNT). Accepts audio of any length — the server handles VAD chunking, batching, and retu... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 190 次。

如何安装 Gipformer ASR？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install gipformer」即可一键安装，无需额外配置。

Gipformer ASR 是免费的吗？

是的，Gipformer ASR 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Gipformer ASR 支持哪些平台？

Gipformer ASR 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Gipformer ASR？

由 AI-GGroup（@ai-ggroup）开发并维护，当前版本 v1.0.0。

Gipformer ASR

Gipformer ASR

Architecture

Quick Start

1. Install dependencies

2. Start the server

3. Transcribe audio

4. Direct API call (curl)

Audio Format

Server Tuning

API Reference

Gipformer ASR 是什么？

如何安装 Gipformer ASR？

Gipformer ASR 是免费的吗？

Gipformer ASR 支持哪些平台？

谁开发了 Gipformer ASR？

💬 留言讨论