← 返回 Skills 市场

Gladia Live Transcription

Name: Gladia Live Transcription
Author: gladiaio

作者 Gladia · GitHub ↗ · v1.0.1 · MIT-0

cross-platform ✓ 安全检测通过

总下载

当前安装

版本数

在 OpenClaw 中安装

/install gladia-live-transcription

功能描述

Real-time speech-to-text streaming with Gladia via WebSocket. Use when the user needs live transcription, builds a voice agent, meeting recorder, call center...

使用说明 (SKILL.md)

Live Transcription

Gladia's live API transcribes audio in real-time over WebSocket.

SDK-first: always use the official SDK — see gladia-sdk-integration for policy, setup, and fallback criteria.

When to Use

Real-time transcription for microphone, telephony, or broadcast streams
Voice agents, meeting assistants, call center tools, or live subtitles
Live audio intelligence (translation, sentiment, NER)

When NOT to use: If the user has a pre-existing audio/video file or URL to transcribe after the fact, use the gladia-pre-recorded-transcription skill instead. Pre-recorded supports additional features like speaker diarization and PII redaction that are unavailable in live mode.

References

Consult these resources as needed:

./references/recommended-params.md -- Use-case presets and tuning
./references/session-config.md -- Full startSession() config (JS + Python)
./references/managing-sessions.md -- get, list, getFile, delete
./references/websocket-events.md -- WebSocket event reference
../gladia-audio-intelligence/SKILL.md -- Feature availability
../gladia-audio-intelligence/references/live-audio-intelligence.md -- Live feature details
../gladia-sdk-integration/SKILL.md -- Setup, config, SDK vs raw API
../gladia-sdk-integration/references/sdk-versions.md -- Current SDK versions
../gladia-troubleshooting/SKILL.md -- Errors and diagnostics

API Endpoints (reference — prefer SDK methods)

Endpoint	Method	SDK equivalent
`/v2/live`	POST	`startSession()`
`/v2/live`	GET	`list()`
`/v2/live/:id`	GET	`get(id)`
`/v2/live/:id`	DELETE	`delete(id)`
`/v2/live/:id/file`	GET	`getFile(id)`
WebSocket URL from init	—	`sendAudio()` / `session.on()`

Session Lifecycle

SDK flow: startSession() -> sendAudio() -> receive transcript events -> stopRecording() -> get(id) for final result.

Quick Start

For SDK installation and client initialization, see the gladia-sdk-integration skill.

JavaScript/TypeScript

const session = client.liveV2().startSession({
  model: "solaria-1",
  encoding: "wav/pcm",
  sample_rate: 16000,
  bit_depth: 16,
  channels: 1,
  language_config: { languages: ["en"] },
  messages_config: { receive_partial_transcripts: true },
});

session.on("message", (msg) => {
  if (msg.type === "transcript") console.log(msg.data.utterance.text);
});
session.sendAudio(audioBuffer);
session.stopRecording();

Python (sync)

from gladiaio_sdk import (
    LiveV2InitRequest,
    LiveV2LanguageConfig,
    LiveV2MessagesConfig,
    LiveV2WebSocketMessage,
)

live_client = client.live()

session = live_client.start_session(
    LiveV2InitRequest(
        model="solaria-1",
        encoding="wav/pcm",
        sample_rate=16000,
        bit_depth=16,
        channels=1,
        language_config=LiveV2LanguageConfig(languages=["en"]),
        messages_config=LiveV2MessagesConfig(receive_partial_transcripts=True),
    )
)

@session.on("message")
def on_message(message: LiveV2WebSocketMessage):
    if message.type == "transcript":
        print(message.data.utterance.text.strip())

session.send_audio(audio_bytes)
session.stop_recording()

Session Configuration

Core fields to set on every session:

Audio format: encoding, sample_rate, bit_depth, channels (must exactly match the stream)
Language: language_config.languages and optional code_switching
Message behavior: messages_config.receive_partial_transcripts and speech events
Optional processing: pre_processing, realtime_processing, post_processing

See ./references/session-config.md for full examples and gladia-sdk-integration for client retry/timeout settings.

Key Tuning Parameters

endpointing is the primary latency-versus-completeness control for final transcripts.

Use case	Recommended value
Voice agent	`0.05` - `0.1`
Call center	`0.1` - `0.3`
Live subtitles	`0.2` - `0.4`
Meeting recorder	`0.3` - `0.5`

For maximum_duration_without_endpointing, speech_threshold, and full tuning guidance, see ./references/recommended-params.md.

Audio Streaming

Use session.sendAudio(chunk) (JS) / session.send_audio(chunk) (Python) to stream audio data. The SDK sends each chunk as a binary WebSocket frame.

Chunk size: 100ms of audio per frame (recommended)
Send continuously — do not batch large chunks
Audio format MUST match the encoding, sample_rate, bit_depth, and channels in session config

Stopping and Reconnection

Normal stop

session.stopRecording(); // Triggers post-processing, then session ends

session.stop_recording()  # Triggers post-processing, then session ends

Force end (skip post-processing)

session.endSession(); // Immediately closes, no post-processing

session.end_session()  # Immediately closes, no post-processing

Reconnection

SDK reconnection is automatic (wsRetry). For raw WebSocket fallback, reconnect to the same URL.

Limits

Constraint	Value
Max session duration	3 hours
Supported encodings	wav/pcm, wav/alaw, wav/ulaw
Concurrency (paid)	30 concurrent sessions
Concurrency (free)	1 concurrent session
Billing	Per-second of streamed audio
Multi-channel	Billed as N x duration

Managing Sessions

Use SDK methods for post-capture operations:

JavaScript: client.liveV2().get(id), .list(filters), .getFile(id), .delete(id)
Python: client.live().get(id), .list(filters), .get_file(id), .delete(id)

For full examples and pagination filters, see ./references/managing-sessions.md.

Common Mistakes

Audio format mismatch: the encoding, sample_rate, bit_depth, and channels in session config MUST match the actual audio stream exactly.
Forgetting to stop recording: leaving a session open without stopRecording() keeps it hanging.
Wrong audio file path: the audio download endpoint is /v2/live/:id/file, not /v2/live/:id/audio.

For the full list of gotchas and diagnostics, see the gladia-troubleshooting skill.

Install only if you intend to use Gladia for live transcription. Obtain required participant consent before streaming or recording audio, secure the Gladia API key, choose the correct language settings, disable analytics features you do not need, use callback URLs only for trusted HTTPS endpoints, and delete retained sessions when no longer needed.

能力标签

requires-sensitive-credentials

能力评估

✓ Purpose & Capability

The skill's live audio streaming, transcription, translation, sentiment, named-entity recognition, summarization, session retrieval, audio download, and deletion guidance all fit the stated Gladia live transcription purpose.

ℹ Instruction Scope

Instructions are scoped to SDK-first Gladia usage with raw API references as fallback; the examples include English defaults and optional analytics/callback features that should be treated as examples and configured intentionally.

✓ Install Mechanism

The artifact contains Markdown documentation only, with no executable scripts, dependencies, install hooks, or background components.

ℹ Credentials

Use requires a Gladia API key and can send live audio, transcript text, and derived metadata to Gladia and optionally to a user-provided callback URL, which is expected for this integration but privacy-sensitive.

✓ Persistence & Privilege

No local persistence, privilege escalation, local credential/session access, or broad filesystem behavior is present; documented session deletion is user-directed.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install gladia-live-transcription
安装完成后，直接呼叫该 Skill 的名称或使用 /gladia-live-transcription 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.1

- Documentation formatting updated in SKILL.md: added missing front matter triple-dash separators for consistency. - No functional or API changes; all instructions and examples remain unchanged.

v1.0.0

- Initial release of gladia-live-transcription skill for real-time speech-to-text streaming via Gladia WebSocket API. - Provides SDK-first guidance with fallback to raw WebSocket/REST only if SDK does not meet requirements. - Includes setup instructions, usage examples for JavaScript/TypeScript and Python, and core session configuration. - Documents recommended parameters, session lifecycle, key tuning (e.g., endpointing), supported audio formats, and concurrency/billing limits. - Lists common mistakes, troubleshooting resources, and links to full Gladia documentation.

元数据

Slug gladia-live-transcription

版本 1.0.1

许可证 MIT-0

累计安装 1

当前安装数 1

历史版本数 2

常见问题