← Back to Skills Marketplace

Gladia Live Transcription

Name: Gladia Live Transcription
Author: gladiaio

by Gladia · GitHub ↗ · v1.0.1 · MIT-0

cross-platform ✓ Security Clean

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install gladia-live-transcription

Description

Real-time speech-to-text streaming with Gladia via WebSocket. Use when the user needs live transcription, builds a voice agent, meeting recorder, call center...

README (SKILL.md)

Live Transcription

Gladia's live API transcribes audio in real-time over WebSocket.

SDK-first: always use the official SDK — see gladia-sdk-integration for policy, setup, and fallback criteria.

When to Use

Real-time transcription for microphone, telephony, or broadcast streams
Voice agents, meeting assistants, call center tools, or live subtitles
Live audio intelligence (translation, sentiment, NER)

When NOT to use: If the user has a pre-existing audio/video file or URL to transcribe after the fact, use the gladia-pre-recorded-transcription skill instead. Pre-recorded supports additional features like speaker diarization and PII redaction that are unavailable in live mode.

References

Consult these resources as needed:

./references/recommended-params.md -- Use-case presets and tuning
./references/session-config.md -- Full startSession() config (JS + Python)
./references/managing-sessions.md -- get, list, getFile, delete
./references/websocket-events.md -- WebSocket event reference
../gladia-audio-intelligence/SKILL.md -- Feature availability
../gladia-audio-intelligence/references/live-audio-intelligence.md -- Live feature details
../gladia-sdk-integration/SKILL.md -- Setup, config, SDK vs raw API
../gladia-sdk-integration/references/sdk-versions.md -- Current SDK versions
../gladia-troubleshooting/SKILL.md -- Errors and diagnostics

API Endpoints (reference — prefer SDK methods)

Endpoint	Method	SDK equivalent
`/v2/live`	POST	`startSession()`
`/v2/live`	GET	`list()`
`/v2/live/:id`	GET	`get(id)`
`/v2/live/:id`	DELETE	`delete(id)`
`/v2/live/:id/file`	GET	`getFile(id)`
WebSocket URL from init	—	`sendAudio()` / `session.on()`

Session Lifecycle

SDK flow: startSession() -> sendAudio() -> receive transcript events -> stopRecording() -> get(id) for final result.

Quick Start

For SDK installation and client initialization, see the gladia-sdk-integration skill.

JavaScript/TypeScript

const session = client.liveV2().startSession({
  model: "solaria-1",
  encoding: "wav/pcm",
  sample_rate: 16000,
  bit_depth: 16,
  channels: 1,
  language_config: { languages: ["en"] },
  messages_config: { receive_partial_transcripts: true },
});

session.on("message", (msg) => {
  if (msg.type === "transcript") console.log(msg.data.utterance.text);
});
session.sendAudio(audioBuffer);
session.stopRecording();

Python (sync)

from gladiaio_sdk import (
    LiveV2InitRequest,
    LiveV2LanguageConfig,
    LiveV2MessagesConfig,
    LiveV2WebSocketMessage,
)

live_client = client.live()

session = live_client.start_session(
    LiveV2InitRequest(
        model="solaria-1",
        encoding="wav/pcm",
        sample_rate=16000,
        bit_depth=16,
        channels=1,
        language_config=LiveV2LanguageConfig(languages=["en"]),
        messages_config=LiveV2MessagesConfig(receive_partial_transcripts=True),
    )
)

@session.on("message")
def on_message(message: LiveV2WebSocketMessage):
    if message.type == "transcript":
        print(message.data.utterance.text.strip())

session.send_audio(audio_bytes)
session.stop_recording()

Session Configuration

Core fields to set on every session:

Audio format: encoding, sample_rate, bit_depth, channels (must exactly match the stream)
Language: language_config.languages and optional code_switching
Message behavior: messages_config.receive_partial_transcripts and speech events
Optional processing: pre_processing, realtime_processing, post_processing

See ./references/session-config.md for full examples and gladia-sdk-integration for client retry/timeout settings.

Key Tuning Parameters

endpointing is the primary latency-versus-completeness control for final transcripts.

Use case	Recommended value
Voice agent	`0.05` - `0.1`
Call center	`0.1` - `0.3`
Live subtitles	`0.2` - `0.4`
Meeting recorder	`0.3` - `0.5`

For maximum_duration_without_endpointing, speech_threshold, and full tuning guidance, see ./references/recommended-params.md.

Audio Streaming

Use session.sendAudio(chunk) (JS) / session.send_audio(chunk) (Python) to stream audio data. The SDK sends each chunk as a binary WebSocket frame.

Chunk size: 100ms of audio per frame (recommended)
Send continuously — do not batch large chunks
Audio format MUST match the encoding, sample_rate, bit_depth, and channels in session config

Stopping and Reconnection

Normal stop

session.stopRecording(); // Triggers post-processing, then session ends

session.stop_recording()  # Triggers post-processing, then session ends

Force end (skip post-processing)

session.endSession(); // Immediately closes, no post-processing

session.end_session()  # Immediately closes, no post-processing

Reconnection

SDK reconnection is automatic (wsRetry). For raw WebSocket fallback, reconnect to the same URL.

Limits

Constraint	Value
Max session duration	3 hours
Supported encodings	wav/pcm, wav/alaw, wav/ulaw
Concurrency (paid)	30 concurrent sessions
Concurrency (free)	1 concurrent session
Billing	Per-second of streamed audio
Multi-channel	Billed as N x duration

Managing Sessions

Use SDK methods for post-capture operations:

JavaScript: client.liveV2().get(id), .list(filters), .getFile(id), .delete(id)
Python: client.live().get(id), .list(filters), .get_file(id), .delete(id)

For full examples and pagination filters, see ./references/managing-sessions.md.

Common Mistakes

Audio format mismatch: the encoding, sample_rate, bit_depth, and channels in session config MUST match the actual audio stream exactly.
Forgetting to stop recording: leaving a session open without stopRecording() keeps it hanging.
Wrong audio file path: the audio download endpoint is /v2/live/:id/file, not /v2/live/:id/audio.

For the full list of gotchas and diagnostics, see the gladia-troubleshooting skill.

Install only if you intend to use Gladia for live transcription. Obtain required participant consent before streaming or recording audio, secure the Gladia API key, choose the correct language settings, disable analytics features you do not need, use callback URLs only for trusted HTTPS endpoints, and delete retained sessions when no longer needed.

Capability Tags

requires-sensitive-credentials

Capability Assessment

✓ Purpose & Capability

The skill's live audio streaming, transcription, translation, sentiment, named-entity recognition, summarization, session retrieval, audio download, and deletion guidance all fit the stated Gladia live transcription purpose.

ℹ Instruction Scope

Instructions are scoped to SDK-first Gladia usage with raw API references as fallback; the examples include English defaults and optional analytics/callback features that should be treated as examples and configured intentionally.

✓ Install Mechanism

The artifact contains Markdown documentation only, with no executable scripts, dependencies, install hooks, or background components.

ℹ Credentials

Use requires a Gladia API key and can send live audio, transcript text, and derived metadata to Gladia and optionally to a user-provided callback URL, which is expected for this integration but privacy-sensitive.

✓ Persistence & Privilege

No local persistence, privilege escalation, local credential/session access, or broad filesystem behavior is present; documented session deletion is user-directed.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install gladia-live-transcription
After installation, invoke the skill by name or use /gladia-live-transcription
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.1

- Documentation formatting updated in SKILL.md: added missing front matter triple-dash separators for consistency. - No functional or API changes; all instructions and examples remain unchanged.

v1.0.0

- Initial release of gladia-live-transcription skill for real-time speech-to-text streaming via Gladia WebSocket API. - Provides SDK-first guidance with fallback to raw WebSocket/REST only if SDK does not meet requirements. - Includes setup instructions, usage examples for JavaScript/TypeScript and Python, and core session configuration. - Documents recommended parameters, session lifecycle, key tuning (e.g., endpointing), supported audio formats, and concurrency/billing limits. - Lists common mistakes, troubleshooting resources, and links to full Gladia documentation.

Metadata

Slug gladia-live-transcription

Version 1.0.1

License MIT-0

All-time Installs 1

Active Installs 1

Total Versions 2

Frequently Asked Questions

What is Gladia Live Transcription?

Real-time speech-to-text streaming with Gladia via WebSocket. Use when the user needs live transcription, builds a voice agent, meeting recorder, call center... It is an AI Agent Skill for Claude Code / OpenClaw, with 31 downloads so far.

How do I install Gladia Live Transcription?

Run "/install gladia-live-transcription" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Gladia Live Transcription free?

Yes, Gladia Live Transcription is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Gladia Live Transcription support?

Gladia Live Transcription is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Gladia Live Transcription?

It is built and maintained by Gladia (@gladiaio); the current version is v1.0.1.

More Skills