← 返回 Skills 市场

Gladia Audio Intelligence

Name: Gladia Audio Intelligence
Author: gladiaio

作者 Gladia · GitHub ↗ · v1.0.1 · MIT-0

cross-platform ✓ 安全检测通过

总下载

当前安装

版本数

在 OpenClaw 中安装

/install gladia-audio-intelligence

功能描述

Configure and use Gladia audio intelligence features: speaker diarization, translation, sentiment analysis, named entity recognition (NER), PII redaction, su...

使用说明 (SKILL.md)

Audio Intelligence

Gladia's audio intelligence features extract structured data and insights from transcripts. They work on top of the base transcription — most are enabled by adding options to the transcribe() call (pre-recorded) or the startSession() config (live).

SDK-first: always use the official SDK — see gladia-sdk-integration for policy, setup, and fallback criteria.

When to Use

User asks about a specific feature: diarization, translation, PII redaction, sentiment, NER, subtitles, summarization, etc.
Enabling or configuring one or more audio intelligence features on pre-recorded or live transcription
Understanding which features are available in live vs pre-recorded mode
Combining multiple features in a single transcription job

When NOT to use: For basic transcription without audio intelligence features, go directly to gladia-pre-recorded-transcription or gladia-live-transcription. For gotchas and errors related to specific features, see gladia-troubleshooting.

References

Consult these resources as needed:

./references/live-audio-intelligence.md -- Detailed config and WebSocket responses for all live-mode features
./references/pre-recorded-audio-intelligence.md -- Detailed config and response structures for all pre-recorded audio intelligence features
../gladia-pre-recorded-transcription/SKILL.md -- Pre-recorded transcription workflow and options
../gladia-live-transcription/SKILL.md -- Live transcription session config and event handling
../gladia-sdk-integration/SKILL.md -- SDK setup, client initialization, and SDK vs raw API decision guide
../gladia-troubleshooting/SKILL.md -- Common errors, gotchas, and verification checklist

Feature Availability

Feature	Pre-recorded	Live	Config key
Speaker diarization	Yes	No	`diarization`
Translation	Yes	Yes	`translation`
Sentiment analysis	Yes	Yes	`sentiment_analysis`
Named entity recognition	Yes	Yes	`named_entity_recognition`
Subtitles (SRT/VTT)	Yes	No	`subtitles`
Custom vocabulary	Yes	Yes	`custom_vocabulary`
PII redaction	Yes	No	`pii_redaction`
Chapterization	Yes	Yes	`chapterization` (post-process)
Summarization	Yes	Yes	`summarization` (post-process)
Audio-to-LLM	Yes	No	`audio_to_llm`
Custom spelling	Yes	Yes	`custom_spelling`
Custom metadata	Yes	Yes	`custom_metadata`

Live features split into two groups: real-time (results stream during the session) and post-processing (results arrive after stopRecording()). See ./references/live-audio-intelligence.md for details.

Quick Config Examples

Code examples assume GladiaClient is already initialized — see gladia-sdk-integration for setup.

Speaker Diarization (pre-recorded only)

const result = await client.preRecorded().transcribe("audio.mp3", {
  diarization: true,
  diarization_config: { number_of_speakers: 2 },
});
// Each utterance includes a `speaker` field (0-indexed integer)

result = client.prerecorded().transcribe("audio.mp3", {
    "diarization": True,
    "diarization_config": {"number_of_speakers": 2},
})

Translation (pre-recorded and live)

Pre-recorded:

const result = await client.preRecorded().transcribe("audio.mp3", {
  translation: true,
  translation_config: { target_languages: ["fr", "es"] },
});

result = client.prerecorded().transcribe("audio.mp3", {
    "translation": True,
    "translation_config": {"target_languages": ["fr", "es"]},
})

Live (result streams as translation WebSocket events — see live-audio-intelligence.md):

const session = client.liveV2().startSession({
  // ... audio format options ...
  realtime_processing: {
    translation: true,
    translation_config: { target_languages: ["fr"] },
  },
});

from gladiaio_sdk import LiveV2InitRequest, LiveV2RealtimeProcessing

session = client.live().start_session(
    LiveV2InitRequest(
        # ... audio format options ...
        realtime_processing=LiveV2RealtimeProcessing(
            translation=True,
            translation_config={"target_languages": ["fr"]},
        ),
    )
)

Summarization (pre-recorded and live)

Pre-recorded:

const result = await client.preRecorded().transcribe("audio.mp3", {
  summarization: true,
  summarization_config: { type: "bullet_points" },
});

Live (arrives after stopRecording() as post_summarization event):

const session = client.liveV2().startSession({
  // ... audio format options ...
  post_processing: {
    summarization: true,
    summarization_config: { type: "bullet_points" },
  },
});
session.on("message", (msg) => {
  if (msg.type === "post_summarization") console.log(msg.data.results);
});

For full per-feature config options and response structures, see:

Pre-recorded: ./references/pre-recorded-audio-intelligence.md
Live: ./references/live-audio-intelligence.md

Common Mistakes

code_switching: true with empty languages: triggers evaluation across 100+ languages and causes frequent misdetections. Always provide 3-5 expected languages.
Custom vocabulary intensity above 0.6: values over 0.6 cause false positives where unrelated words get replaced. Keep at 0.4-0.6 and use pronunciations for better results.
Expecting diarization, PII redaction, subtitles, or audio-to-LLM in live mode: these four features are pre-recorded only.
Enabling many features simultaneously without considering cost/latency: each enabled feature adds processing time. Enable only what you need; combine diarization + summarization + translation only when all are required.

For the full gotcha list, see gladia-troubleshooting.

Install only if you are comfortable using Gladia for the audio you process. Do not upload confidential, regulated, or third-party personal recordings unless you have permission and have checked retention, security, and compliance settings. Treat PII redaction as a downstream reduction step, not a guarantee that raw audio or transcripts were never exposed during processing.

能力评估

✓ Purpose & Capability

The skill explains how to configure Gladia audio intelligence features such as diarization, translation, NER, PII redaction, summarization, custom metadata, and audio-to-LLM, which matches its stated purpose.

ℹ Instruction Scope

Instructions are scoped to SDK-first configuration examples and reference docs, with raw REST only as a fallback. The main gap is limited privacy guidance around sensitive audio, transcripts, prompts, and metadata.

✓ Install Mechanism

The artifact contains Markdown documentation files only. No executable scripts, dependency installs, autorun hooks, package mutation steps, or hidden install behavior were found.

ℹ Credentials

Sending audio and transcript content to Gladia is inherent to the documented integration and proportionate for the purpose, but users may need consent, data minimization, retention review, and compliance checks for sensitive recordings.

ℹ Persistence & Privilege

No local persistence, privilege escalation, credential harvesting, or background workers were found. The references disclose that custom metadata can be stored with Gladia jobs and returned in later list/get responses.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install gladia-audio-intelligence
安装完成后，直接呼叫该 Skill 的名称或使用 /gladia-audio-intelligence 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.1

- Minor update to documentation formatting in SKILL.md. - Added missing frontmatter separator (`---`) at the top of the file. - No changes to functionality or feature descriptions.

v1.0.0

Initial release of gladia-audio-intelligence skill. - Supports configuration and usage guidance for audio intelligence features: speaker diarization, translation, sentiment analysis, NER, PII redaction, subtitles, summarization, chapterization, custom vocabulary, audio-to-LLM, custom spelling, and metadata. - Clearly distinguishes feature availability in pre-recorded versus live transcription modes. - Prioritizes SDK use, with fallback guidance to REST API when needed. - Provides quick configuration examples in TypeScript and Python. - Documents common mistakes and troubleshooting resources. - Includes reference links for further implementation details and documentation.

元数据

Slug gladia-audio-intelligence

版本 1.0.1

许可证 MIT-0

累计安装 1

当前安装数 1

历史版本数 2

常见问题