← Back to Skills Marketplace
gladiaio

Gladia Live Transcription

by Gladia · GitHub ↗ · v1.0.1 · MIT-0
cross-platform ✓ Security Clean
31
Downloads
0
Stars
1
Active Installs
2
Versions
Install in OpenClaw
/install gladia-live-transcription
Description
Real-time speech-to-text streaming with Gladia via WebSocket. Use when the user needs live transcription, builds a voice agent, meeting recorder, call center...
README (SKILL.md)

Live Transcription

Gladia's live API transcribes audio in real-time over WebSocket.

SDK-first: always use the official SDK — see gladia-sdk-integration for policy, setup, and fallback criteria.

When to Use

  • Real-time transcription for microphone, telephony, or broadcast streams
  • Voice agents, meeting assistants, call center tools, or live subtitles
  • Live audio intelligence (translation, sentiment, NER)

When NOT to use: If the user has a pre-existing audio/video file or URL to transcribe after the fact, use the gladia-pre-recorded-transcription skill instead. Pre-recorded supports additional features like speaker diarization and PII redaction that are unavailable in live mode.

References

Consult these resources as needed:

  • ./references/recommended-params.md -- Use-case presets and tuning
  • ./references/session-config.md -- Full startSession() config (JS + Python)
  • ./references/managing-sessions.md -- get, list, getFile, delete
  • ./references/websocket-events.md -- WebSocket event reference
  • ../gladia-audio-intelligence/SKILL.md -- Feature availability
  • ../gladia-audio-intelligence/references/live-audio-intelligence.md -- Live feature details
  • ../gladia-sdk-integration/SKILL.md -- Setup, config, SDK vs raw API
  • ../gladia-sdk-integration/references/sdk-versions.md -- Current SDK versions
  • ../gladia-troubleshooting/SKILL.md -- Errors and diagnostics

API Endpoints (reference — prefer SDK methods)

Endpoint Method SDK equivalent
/v2/live POST startSession()
/v2/live GET list()
/v2/live/:id GET get(id)
/v2/live/:id DELETE delete(id)
/v2/live/:id/file GET getFile(id)
WebSocket URL from init sendAudio() / session.on()

Session Lifecycle

SDK flow: startSession() -> sendAudio() -> receive transcript events -> stopRecording() -> get(id) for final result.

Quick Start

For SDK installation and client initialization, see the gladia-sdk-integration skill.

JavaScript/TypeScript

const session = client.liveV2().startSession({
  model: "solaria-1",
  encoding: "wav/pcm",
  sample_rate: 16000,
  bit_depth: 16,
  channels: 1,
  language_config: { languages: ["en"] },
  messages_config: { receive_partial_transcripts: true },
});

session.on("message", (msg) => {
  if (msg.type === "transcript") console.log(msg.data.utterance.text);
});
session.sendAudio(audioBuffer);
session.stopRecording();

Python (sync)

from gladiaio_sdk import (
    LiveV2InitRequest,
    LiveV2LanguageConfig,
    LiveV2MessagesConfig,
    LiveV2WebSocketMessage,
)

live_client = client.live()

session = live_client.start_session(
    LiveV2InitRequest(
        model="solaria-1",
        encoding="wav/pcm",
        sample_rate=16000,
        bit_depth=16,
        channels=1,
        language_config=LiveV2LanguageConfig(languages=["en"]),
        messages_config=LiveV2MessagesConfig(receive_partial_transcripts=True),
    )
)

@session.on("message")
def on_message(message: LiveV2WebSocketMessage):
    if message.type == "transcript":
        print(message.data.utterance.text.strip())

session.send_audio(audio_bytes)
session.stop_recording()

Session Configuration

Core fields to set on every session:

  • Audio format: encoding, sample_rate, bit_depth, channels (must exactly match the stream)
  • Language: language_config.languages and optional code_switching
  • Message behavior: messages_config.receive_partial_transcripts and speech events
  • Optional processing: pre_processing, realtime_processing, post_processing

See ./references/session-config.md for full examples and gladia-sdk-integration for client retry/timeout settings.

Key Tuning Parameters

endpointing is the primary latency-versus-completeness control for final transcripts.

Use case Recommended value
Voice agent 0.05 - 0.1
Call center 0.1 - 0.3
Live subtitles 0.2 - 0.4
Meeting recorder 0.3 - 0.5

For maximum_duration_without_endpointing, speech_threshold, and full tuning guidance, see ./references/recommended-params.md.

Audio Streaming

Use session.sendAudio(chunk) (JS) / session.send_audio(chunk) (Python) to stream audio data. The SDK sends each chunk as a binary WebSocket frame.

  • Chunk size: 100ms of audio per frame (recommended)
  • Send continuously — do not batch large chunks
  • Audio format MUST match the encoding, sample_rate, bit_depth, and channels in session config

Stopping and Reconnection

Normal stop

session.stopRecording(); // Triggers post-processing, then session ends
session.stop_recording()  # Triggers post-processing, then session ends

Force end (skip post-processing)

session.endSession(); // Immediately closes, no post-processing
session.end_session()  # Immediately closes, no post-processing

Reconnection

SDK reconnection is automatic (wsRetry). For raw WebSocket fallback, reconnect to the same URL.

Limits

Constraint Value
Max session duration 3 hours
Supported encodings wav/pcm, wav/alaw, wav/ulaw
Concurrency (paid) 30 concurrent sessions
Concurrency (free) 1 concurrent session
Billing Per-second of streamed audio
Multi-channel Billed as N x duration

Managing Sessions

Use SDK methods for post-capture operations:

  • JavaScript: client.liveV2().get(id), .list(filters), .getFile(id), .delete(id)
  • Python: client.live().get(id), .list(filters), .get_file(id), .delete(id)

For full examples and pagination filters, see ./references/managing-sessions.md.

Common Mistakes

  • Audio format mismatch: the encoding, sample_rate, bit_depth, and channels in session config MUST match the actual audio stream exactly.
  • Forgetting to stop recording: leaving a session open without stopRecording() keeps it hanging.
  • Wrong audio file path: the audio download endpoint is /v2/live/:id/file, not /v2/live/:id/audio.

For the full list of gotchas and diagnostics, see the gladia-troubleshooting skill.

Further Reading

Usage Guidance
Install only if you intend to use Gladia for live transcription. Obtain required participant consent before streaming or recording audio, secure the Gladia API key, choose the correct language settings, disable analytics features you do not need, use callback URLs only for trusted HTTPS endpoints, and delete retained sessions when no longer needed.
Capability Tags
requires-sensitive-credentials
Capability Assessment
Purpose & Capability
The skill's live audio streaming, transcription, translation, sentiment, named-entity recognition, summarization, session retrieval, audio download, and deletion guidance all fit the stated Gladia live transcription purpose.
Instruction Scope
Instructions are scoped to SDK-first Gladia usage with raw API references as fallback; the examples include English defaults and optional analytics/callback features that should be treated as examples and configured intentionally.
Install Mechanism
The artifact contains Markdown documentation only, with no executable scripts, dependencies, install hooks, or background components.
Credentials
Use requires a Gladia API key and can send live audio, transcript text, and derived metadata to Gladia and optionally to a user-provided callback URL, which is expected for this integration but privacy-sensitive.
Persistence & Privilege
No local persistence, privilege escalation, local credential/session access, or broad filesystem behavior is present; documented session deletion is user-directed.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install gladia-live-transcription
  3. After installation, invoke the skill by name or use /gladia-live-transcription
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.1
- Documentation formatting updated in SKILL.md: added missing front matter triple-dash separators for consistency. - No functional or API changes; all instructions and examples remain unchanged.
v1.0.0
- Initial release of gladia-live-transcription skill for real-time speech-to-text streaming via Gladia WebSocket API. - Provides SDK-first guidance with fallback to raw WebSocket/REST only if SDK does not meet requirements. - Includes setup instructions, usage examples for JavaScript/TypeScript and Python, and core session configuration. - Documents recommended parameters, session lifecycle, key tuning (e.g., endpointing), supported audio formats, and concurrency/billing limits. - Lists common mistakes, troubleshooting resources, and links to full Gladia documentation.
Metadata
Slug gladia-live-transcription
Version 1.0.1
License MIT-0
All-time Installs 1
Active Installs 1
Total Versions 2
Frequently Asked Questions

What is Gladia Live Transcription?

Real-time speech-to-text streaming with Gladia via WebSocket. Use when the user needs live transcription, builds a voice agent, meeting recorder, call center... It is an AI Agent Skill for Claude Code / OpenClaw, with 31 downloads so far.

How do I install Gladia Live Transcription?

Run "/install gladia-live-transcription" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Gladia Live Transcription free?

Yes, Gladia Live Transcription is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Gladia Live Transcription support?

Gladia Live Transcription is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Gladia Live Transcription?

It is built and maintained by Gladia (@gladiaio); the current version is v1.0.1.

💬 Comments