← 返回 Skills 市场
gladiaio

Gladia Audio Intelligence

作者 Gladia · GitHub ↗ · v1.0.1 · MIT-0
cross-platform ✓ 安全检测通过
29
总下载
0
收藏
1
当前安装
2
版本数
在 OpenClaw 中安装
/install gladia-audio-intelligence
功能描述
Configure and use Gladia audio intelligence features: speaker diarization, translation, sentiment analysis, named entity recognition (NER), PII redaction, su...
使用说明 (SKILL.md)

Audio Intelligence

Gladia's audio intelligence features extract structured data and insights from transcripts. They work on top of the base transcription — most are enabled by adding options to the transcribe() call (pre-recorded) or the startSession() config (live).

SDK-first: always use the official SDK — see gladia-sdk-integration for policy, setup, and fallback criteria.

When to Use

  • User asks about a specific feature: diarization, translation, PII redaction, sentiment, NER, subtitles, summarization, etc.
  • Enabling or configuring one or more audio intelligence features on pre-recorded or live transcription
  • Understanding which features are available in live vs pre-recorded mode
  • Combining multiple features in a single transcription job

When NOT to use: For basic transcription without audio intelligence features, go directly to gladia-pre-recorded-transcription or gladia-live-transcription. For gotchas and errors related to specific features, see gladia-troubleshooting.

References

Consult these resources as needed:

  • ./references/live-audio-intelligence.md -- Detailed config and WebSocket responses for all live-mode features
  • ./references/pre-recorded-audio-intelligence.md -- Detailed config and response structures for all pre-recorded audio intelligence features
  • ../gladia-pre-recorded-transcription/SKILL.md -- Pre-recorded transcription workflow and options
  • ../gladia-live-transcription/SKILL.md -- Live transcription session config and event handling
  • ../gladia-sdk-integration/SKILL.md -- SDK setup, client initialization, and SDK vs raw API decision guide
  • ../gladia-troubleshooting/SKILL.md -- Common errors, gotchas, and verification checklist

Feature Availability

Feature Pre-recorded Live Config key
Speaker diarization Yes No diarization
Translation Yes Yes translation
Sentiment analysis Yes Yes sentiment_analysis
Named entity recognition Yes Yes named_entity_recognition
Subtitles (SRT/VTT) Yes No subtitles
Custom vocabulary Yes Yes custom_vocabulary
PII redaction Yes No pii_redaction
Chapterization Yes Yes chapterization (post-process)
Summarization Yes Yes summarization (post-process)
Audio-to-LLM Yes No audio_to_llm
Custom spelling Yes Yes custom_spelling
Custom metadata Yes Yes custom_metadata

Live features split into two groups: real-time (results stream during the session) and post-processing (results arrive after stopRecording()). See ./references/live-audio-intelligence.md for details.

Quick Config Examples

Code examples assume GladiaClient is already initialized — see gladia-sdk-integration for setup.

Speaker Diarization (pre-recorded only)

const result = await client.preRecorded().transcribe("audio.mp3", {
  diarization: true,
  diarization_config: { number_of_speakers: 2 },
});
// Each utterance includes a `speaker` field (0-indexed integer)
result = client.prerecorded().transcribe("audio.mp3", {
    "diarization": True,
    "diarization_config": {"number_of_speakers": 2},
})

Translation (pre-recorded and live)

Pre-recorded:

const result = await client.preRecorded().transcribe("audio.mp3", {
  translation: true,
  translation_config: { target_languages: ["fr", "es"] },
});
result = client.prerecorded().transcribe("audio.mp3", {
    "translation": True,
    "translation_config": {"target_languages": ["fr", "es"]},
})

Live (result streams as translation WebSocket events — see live-audio-intelligence.md):

const session = client.liveV2().startSession({
  // ... audio format options ...
  realtime_processing: {
    translation: true,
    translation_config: { target_languages: ["fr"] },
  },
});
from gladiaio_sdk import LiveV2InitRequest, LiveV2RealtimeProcessing

session = client.live().start_session(
    LiveV2InitRequest(
        # ... audio format options ...
        realtime_processing=LiveV2RealtimeProcessing(
            translation=True,
            translation_config={"target_languages": ["fr"]},
        ),
    )
)

Summarization (pre-recorded and live)

Pre-recorded:

const result = await client.preRecorded().transcribe("audio.mp3", {
  summarization: true,
  summarization_config: { type: "bullet_points" },
});

Live (arrives after stopRecording() as post_summarization event):

const session = client.liveV2().startSession({
  // ... audio format options ...
  post_processing: {
    summarization: true,
    summarization_config: { type: "bullet_points" },
  },
});
session.on("message", (msg) => {
  if (msg.type === "post_summarization") console.log(msg.data.results);
});

For full per-feature config options and response structures, see:

Common Mistakes

  • code_switching: true with empty languages: triggers evaluation across 100+ languages and causes frequent misdetections. Always provide 3-5 expected languages.
  • Custom vocabulary intensity above 0.6: values over 0.6 cause false positives where unrelated words get replaced. Keep at 0.4-0.6 and use pronunciations for better results.
  • Expecting diarization, PII redaction, subtitles, or audio-to-LLM in live mode: these four features are pre-recorded only.
  • Enabling many features simultaneously without considering cost/latency: each enabled feature adds processing time. Enable only what you need; combine diarization + summarization + translation only when all are required.

For the full gotcha list, see gladia-troubleshooting.

Further Reading

安全使用建议
Install only if you are comfortable using Gladia for the audio you process. Do not upload confidential, regulated, or third-party personal recordings unless you have permission and have checked retention, security, and compliance settings. Treat PII redaction as a downstream reduction step, not a guarantee that raw audio or transcripts were never exposed during processing.
能力评估
Purpose & Capability
The skill explains how to configure Gladia audio intelligence features such as diarization, translation, NER, PII redaction, summarization, custom metadata, and audio-to-LLM, which matches its stated purpose.
Instruction Scope
Instructions are scoped to SDK-first configuration examples and reference docs, with raw REST only as a fallback. The main gap is limited privacy guidance around sensitive audio, transcripts, prompts, and metadata.
Install Mechanism
The artifact contains Markdown documentation files only. No executable scripts, dependency installs, autorun hooks, package mutation steps, or hidden install behavior were found.
Credentials
Sending audio and transcript content to Gladia is inherent to the documented integration and proportionate for the purpose, but users may need consent, data minimization, retention review, and compliance checks for sensitive recordings.
Persistence & Privilege
No local persistence, privilege escalation, credential harvesting, or background workers were found. The references disclose that custom metadata can be stored with Gladia jobs and returned in later list/get responses.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install gladia-audio-intelligence
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /gladia-audio-intelligence 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.1
- Minor update to documentation formatting in SKILL.md. - Added missing frontmatter separator (`---`) at the top of the file. - No changes to functionality or feature descriptions.
v1.0.0
Initial release of gladia-audio-intelligence skill. - Supports configuration and usage guidance for audio intelligence features: speaker diarization, translation, sentiment analysis, NER, PII redaction, subtitles, summarization, chapterization, custom vocabulary, audio-to-LLM, custom spelling, and metadata. - Clearly distinguishes feature availability in pre-recorded versus live transcription modes. - Prioritizes SDK use, with fallback guidance to REST API when needed. - Provides quick configuration examples in TypeScript and Python. - Documents common mistakes and troubleshooting resources. - Includes reference links for further implementation details and documentation.
元数据
Slug gladia-audio-intelligence
版本 1.0.1
许可证 MIT-0
累计安装 1
当前安装数 1
历史版本数 2
常见问题

Gladia Audio Intelligence 是什么?

Configure and use Gladia audio intelligence features: speaker diarization, translation, sentiment analysis, named entity recognition (NER), PII redaction, su... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 29 次。

如何安装 Gladia Audio Intelligence?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install gladia-audio-intelligence」即可一键安装,无需额外配置。

Gladia Audio Intelligence 是免费的吗?

是的,Gladia Audio Intelligence 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Gladia Audio Intelligence 支持哪些平台?

Gladia Audio Intelligence 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Gladia Audio Intelligence?

由 Gladia(@gladiaio)开发并维护,当前版本 v1.0.1。

💬 留言讨论