← Back to Skills Marketplace

Gipformer ASR

Name: Gipformer ASR
Author: ai-ggroup

by AI-GGroup · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ✓ Security Clean

190

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install gipformer

Description

Vietnamese speech-to-text using Gipformer ASR (65M params, Zipformer-RNNT). Accepts audio of any length — the server handles VAD chunking, batching, and retu...

README (SKILL.md)

Gipformer ASR

Vietnamese speech recognition — send audio of any length, get transcript.

Huggingface Model: g-group-ai-lab/gipformer-65M-rnnt (65M params, int8/fp32 ONNX)

Architecture

flowchart TD
    A[Audio file] -->|base64 encode| B[POST /transcribe]
    B --> C[Decode & resample to 16kHz]
    C --> D[VAD chunking ≤ 20s]
    D --> E[Batch inference — sherpa-onnx]
    E --> F[Merge chunk texts]
    F --> G["{ transcript, chunks }"]

The client sends base64-encoded audio (any length, any format). The server decodes, chunks with VAD, infers in batches, and returns the full transcript.

Quick Start

1. Install dependencies

pip install -r {baseDir}/requirements.txt

System dependency: ffmpeg (required for M4A support).

2. Start the server

python {baseDir}/scripts/serve.py
# or with options:
python {baseDir}/scripts/serve.py --port 8910 --quantize int8 --max-batch-size 32

The server downloads the ASR model + VAD model on first run and listens on http://127.0.0.1:8910.

3. Transcribe audio

# Single file (any format)
python {baseDir}/scripts/transcribe.py audio.wav
python {baseDir}/scripts/transcribe.py recording.mp3

# Multiple files
python {baseDir}/scripts/transcribe.py *.wav

# JSON output with chunk details
python {baseDir}/scripts/transcribe.py audio.wav --json

# Save results
python {baseDir}/scripts/transcribe.py audio.wav -o results.json

4. Direct API call (curl)

# Transcribe (any length, any format)
curl -X POST http://127.0.0.1:8910/transcribe \
  -H "Content-Type: application/json" \
  -d "{\"audio_b64\": \"$(base64 -i audio.wav)\"}"

# Response:
# { "transcript": "full text...", "duration_s": 120.5, "process_time_s": 5.2,
#   "chunks": [{"text": "...", "start_s": 0.0, "end_s": 8.7}, ...] }

# Health check
curl http://127.0.0.1:8910/health

Audio Format

Format	Extension	Support
WAV	.wav	Native (soundfile)
FLAC	.flac	Native (soundfile)
OGG	.ogg	Native (soundfile)
MP3	.mp3	Native (soundfile)
M4A/AAC	.m4a	Via ffmpeg

All formats are converted to WAV 16-bit PCM mono 16kHz internally.

Server Tuning

Flag	Default	Effect
`--quantize`	int8	`fp32` for accuracy, `int8` for speed/size
`--max-batch-size`	16	Higher = more throughput, more latency
`--max-wait-ms`	100	How long to wait before flushing a partial batch
`--num-threads`	4	ONNX runtime threads
`--decoding-method`	modified_beam_search	`greedy_search` for faster speed

API Reference

See references/api.md for full endpoint documentation.

Usage Guidance

This skill appears coherent for running a local Vietnamese ASR server, but review and be prepared for the following before installing: 1) It will download model files from Hugging Face at first run — verify the REPO_ID (g-group-ai-lab/gipformer-65M-rnnt) is trusted. 2) You must install Python packages (sherpa-onnx, onnxruntime, silero-vad, fastapi, etc.) and system dependencies like ffmpeg and possibly libsndfile — these can be large and may require system package installs. 3) The server executes ffmpeg via subprocess and writes temporary files while decoding uploaded audio; run in a sandbox/virtualenv or container if you want isolation. 4) No secrets are requested by the skill, but huggingface_hub may use your HUGGINGFACE_HUB_TOKEN automatically if present (only needed for private models). 5) If you plan to expose the server beyond localhost, review network/security settings (authentication is not implemented). If uncertain, run the code in a controlled environment and inspect the repository on Hugging Face before use.

Capability Analysis

Type: OpenClaw Skill Name: gipformer Version: 1.0.0 The gipformer skill provides Vietnamese speech-to-text functionality using the Gipformer ASR model. The bundle includes a FastAPI server (serve.py) that handles model inference via sherpa-onnx, an audio chunking utility (chunk_audio.py) using Silero VAD, and a client script (transcribe.py) for interacting with the API. The code follows standard practices for machine learning services, such as downloading models from Hugging Face and using subprocess safely for audio conversion with ffmpeg. No indicators of malicious intent, data exfiltration, or harmful prompt injection were found.

Capability Assessment

✓ Purpose & Capability

Name/description (Vietnamese ASR) align with the included code and requirements: scripts implement VAD chunking, ONNX-based inference (sherpa-onnx), a FastAPI server, and a client. Required packages in requirements.txt are consistent with the functionality.

✓ Instruction Scope

SKILL.md instructs installing dependencies, running a local server, and sending base64 audio to /transcribe. The runtime instructions and code operate on provided audio files and do not read unrelated system files or env vars. The server decodes audio, chunks it, runs inference, and returns transcripts as described.

ℹ Install Mechanism

There is no automated install spec in the registry; SKILL.md expects the user to pip install -r requirements.txt. Model files are downloaded at first run from Hugging Face (hf_hub_download). Network downloads and heavy native/system deps (ffmpeg, libsndfile) are required — expected for this use-case but worth noting before install.

✓ Credentials

The skill does not request environment variables, credentials, or configuration paths. It uses huggingface_hub to download public model files; if a private repo were used the huggingface token (HUGGINGFACE_HUB_TOKEN) would be used by the library but is not required by this package.

✓ Persistence & Privilege

Skill is not always-enabled and does not modify other skills or system-wide agent settings. It runs a local server when started; no privileged or persistent platform-level presence is requested by the skill metadata.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install gipformer
After installation, invoke the skill by name or use /gipformer
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

Initial release of Vietnamese speech-to-text using Gipformer ASR. - Supports speech recognition for Vietnamese audio using a 65M parameter Zipformer-RNNT model. - Accepts audio in WAV, FLAC, OGG, MP3, and M4A formats; any duration. - Handles VAD chunking, batching, and provides full transcript with chunk metadata. - Server and CLI tools provided for both API and script-based transcription. - Configurable for quantization, batch size, decoding method, and format support (ffmpeg required for M4A). - Includes health check and comprehensive API documentation.

Metadata

Slug gipformer

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Gipformer ASR?

Vietnamese speech-to-text using Gipformer ASR (65M params, Zipformer-RNNT). Accepts audio of any length — the server handles VAD chunking, batching, and retu... It is an AI Agent Skill for Claude Code / OpenClaw, with 190 downloads so far.

How do I install Gipformer ASR?

Run "/install gipformer" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Gipformer ASR free?

Yes, Gipformer ASR is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Gipformer ASR support?

Gipformer ASR is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Gipformer ASR?

It is built and maintained by AI-GGroup (@ai-ggroup); the current version is v1.0.0.

More Skills

Gipformer ASR

Gipformer ASR

Architecture

Quick Start

1. Install dependencies

2. Start the server

3. Transcribe audio

4. Direct API call (curl)

Audio Format

Server Tuning

API Reference

What is Gipformer ASR?

How do I install Gipformer ASR?

Is Gipformer ASR free?

Which platforms does Gipformer ASR support?

Who created Gipformer ASR?

💬 Comments