← 返回 Skills 市场
gladiaio

Gladia Live Transcription

作者 Gladia · GitHub ↗ · v1.0.1 · MIT-0
cross-platform ✓ 安全检测通过
31
总下载
0
收藏
1
当前安装
2
版本数
在 OpenClaw 中安装
/install gladia-live-transcription
功能描述
Real-time speech-to-text streaming with Gladia via WebSocket. Use when the user needs live transcription, builds a voice agent, meeting recorder, call center...
使用说明 (SKILL.md)

Live Transcription

Gladia's live API transcribes audio in real-time over WebSocket.

SDK-first: always use the official SDK — see gladia-sdk-integration for policy, setup, and fallback criteria.

When to Use

  • Real-time transcription for microphone, telephony, or broadcast streams
  • Voice agents, meeting assistants, call center tools, or live subtitles
  • Live audio intelligence (translation, sentiment, NER)

When NOT to use: If the user has a pre-existing audio/video file or URL to transcribe after the fact, use the gladia-pre-recorded-transcription skill instead. Pre-recorded supports additional features like speaker diarization and PII redaction that are unavailable in live mode.

References

Consult these resources as needed:

  • ./references/recommended-params.md -- Use-case presets and tuning
  • ./references/session-config.md -- Full startSession() config (JS + Python)
  • ./references/managing-sessions.md -- get, list, getFile, delete
  • ./references/websocket-events.md -- WebSocket event reference
  • ../gladia-audio-intelligence/SKILL.md -- Feature availability
  • ../gladia-audio-intelligence/references/live-audio-intelligence.md -- Live feature details
  • ../gladia-sdk-integration/SKILL.md -- Setup, config, SDK vs raw API
  • ../gladia-sdk-integration/references/sdk-versions.md -- Current SDK versions
  • ../gladia-troubleshooting/SKILL.md -- Errors and diagnostics

API Endpoints (reference — prefer SDK methods)

Endpoint Method SDK equivalent
/v2/live POST startSession()
/v2/live GET list()
/v2/live/:id GET get(id)
/v2/live/:id DELETE delete(id)
/v2/live/:id/file GET getFile(id)
WebSocket URL from init sendAudio() / session.on()

Session Lifecycle

SDK flow: startSession() -> sendAudio() -> receive transcript events -> stopRecording() -> get(id) for final result.

Quick Start

For SDK installation and client initialization, see the gladia-sdk-integration skill.

JavaScript/TypeScript

const session = client.liveV2().startSession({
  model: "solaria-1",
  encoding: "wav/pcm",
  sample_rate: 16000,
  bit_depth: 16,
  channels: 1,
  language_config: { languages: ["en"] },
  messages_config: { receive_partial_transcripts: true },
});

session.on("message", (msg) => {
  if (msg.type === "transcript") console.log(msg.data.utterance.text);
});
session.sendAudio(audioBuffer);
session.stopRecording();

Python (sync)

from gladiaio_sdk import (
    LiveV2InitRequest,
    LiveV2LanguageConfig,
    LiveV2MessagesConfig,
    LiveV2WebSocketMessage,
)

live_client = client.live()

session = live_client.start_session(
    LiveV2InitRequest(
        model="solaria-1",
        encoding="wav/pcm",
        sample_rate=16000,
        bit_depth=16,
        channels=1,
        language_config=LiveV2LanguageConfig(languages=["en"]),
        messages_config=LiveV2MessagesConfig(receive_partial_transcripts=True),
    )
)

@session.on("message")
def on_message(message: LiveV2WebSocketMessage):
    if message.type == "transcript":
        print(message.data.utterance.text.strip())

session.send_audio(audio_bytes)
session.stop_recording()

Session Configuration

Core fields to set on every session:

  • Audio format: encoding, sample_rate, bit_depth, channels (must exactly match the stream)
  • Language: language_config.languages and optional code_switching
  • Message behavior: messages_config.receive_partial_transcripts and speech events
  • Optional processing: pre_processing, realtime_processing, post_processing

See ./references/session-config.md for full examples and gladia-sdk-integration for client retry/timeout settings.

Key Tuning Parameters

endpointing is the primary latency-versus-completeness control for final transcripts.

Use case Recommended value
Voice agent 0.05 - 0.1
Call center 0.1 - 0.3
Live subtitles 0.2 - 0.4
Meeting recorder 0.3 - 0.5

For maximum_duration_without_endpointing, speech_threshold, and full tuning guidance, see ./references/recommended-params.md.

Audio Streaming

Use session.sendAudio(chunk) (JS) / session.send_audio(chunk) (Python) to stream audio data. The SDK sends each chunk as a binary WebSocket frame.

  • Chunk size: 100ms of audio per frame (recommended)
  • Send continuously — do not batch large chunks
  • Audio format MUST match the encoding, sample_rate, bit_depth, and channels in session config

Stopping and Reconnection

Normal stop

session.stopRecording(); // Triggers post-processing, then session ends
session.stop_recording()  # Triggers post-processing, then session ends

Force end (skip post-processing)

session.endSession(); // Immediately closes, no post-processing
session.end_session()  # Immediately closes, no post-processing

Reconnection

SDK reconnection is automatic (wsRetry). For raw WebSocket fallback, reconnect to the same URL.

Limits

Constraint Value
Max session duration 3 hours
Supported encodings wav/pcm, wav/alaw, wav/ulaw
Concurrency (paid) 30 concurrent sessions
Concurrency (free) 1 concurrent session
Billing Per-second of streamed audio
Multi-channel Billed as N x duration

Managing Sessions

Use SDK methods for post-capture operations:

  • JavaScript: client.liveV2().get(id), .list(filters), .getFile(id), .delete(id)
  • Python: client.live().get(id), .list(filters), .get_file(id), .delete(id)

For full examples and pagination filters, see ./references/managing-sessions.md.

Common Mistakes

  • Audio format mismatch: the encoding, sample_rate, bit_depth, and channels in session config MUST match the actual audio stream exactly.
  • Forgetting to stop recording: leaving a session open without stopRecording() keeps it hanging.
  • Wrong audio file path: the audio download endpoint is /v2/live/:id/file, not /v2/live/:id/audio.

For the full list of gotchas and diagnostics, see the gladia-troubleshooting skill.

Further Reading

安全使用建议
Install only if you intend to use Gladia for live transcription. Obtain required participant consent before streaming or recording audio, secure the Gladia API key, choose the correct language settings, disable analytics features you do not need, use callback URLs only for trusted HTTPS endpoints, and delete retained sessions when no longer needed.
能力标签
requires-sensitive-credentials
能力评估
Purpose & Capability
The skill's live audio streaming, transcription, translation, sentiment, named-entity recognition, summarization, session retrieval, audio download, and deletion guidance all fit the stated Gladia live transcription purpose.
Instruction Scope
Instructions are scoped to SDK-first Gladia usage with raw API references as fallback; the examples include English defaults and optional analytics/callback features that should be treated as examples and configured intentionally.
Install Mechanism
The artifact contains Markdown documentation only, with no executable scripts, dependencies, install hooks, or background components.
Credentials
Use requires a Gladia API key and can send live audio, transcript text, and derived metadata to Gladia and optionally to a user-provided callback URL, which is expected for this integration but privacy-sensitive.
Persistence & Privilege
No local persistence, privilege escalation, local credential/session access, or broad filesystem behavior is present; documented session deletion is user-directed.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install gladia-live-transcription
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /gladia-live-transcription 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.1
- Documentation formatting updated in SKILL.md: added missing front matter triple-dash separators for consistency. - No functional or API changes; all instructions and examples remain unchanged.
v1.0.0
- Initial release of gladia-live-transcription skill for real-time speech-to-text streaming via Gladia WebSocket API. - Provides SDK-first guidance with fallback to raw WebSocket/REST only if SDK does not meet requirements. - Includes setup instructions, usage examples for JavaScript/TypeScript and Python, and core session configuration. - Documents recommended parameters, session lifecycle, key tuning (e.g., endpointing), supported audio formats, and concurrency/billing limits. - Lists common mistakes, troubleshooting resources, and links to full Gladia documentation.
元数据
Slug gladia-live-transcription
版本 1.0.1
许可证 MIT-0
累计安装 1
当前安装数 1
历史版本数 2
常见问题

Gladia Live Transcription 是什么?

Real-time speech-to-text streaming with Gladia via WebSocket. Use when the user needs live transcription, builds a voice agent, meeting recorder, call center... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 31 次。

如何安装 Gladia Live Transcription?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install gladia-live-transcription」即可一键安装,无需额外配置。

Gladia Live Transcription 是免费的吗?

是的,Gladia Live Transcription 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Gladia Live Transcription 支持哪些平台?

Gladia Live Transcription 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Gladia Live Transcription?

由 Gladia(@gladiaio)开发并维护,当前版本 v1.0.1。

💬 留言讨论