← Back to Skills Marketplace
gladiaio

Gladia Audio Intelligence

by Gladia · GitHub ↗ · v1.0.1 · MIT-0
cross-platform ✓ Security Clean
29
Downloads
0
Stars
1
Active Installs
2
Versions
Install in OpenClaw
/install gladia-audio-intelligence
Description
Configure and use Gladia audio intelligence features: speaker diarization, translation, sentiment analysis, named entity recognition (NER), PII redaction, su...
README (SKILL.md)

Audio Intelligence

Gladia's audio intelligence features extract structured data and insights from transcripts. They work on top of the base transcription — most are enabled by adding options to the transcribe() call (pre-recorded) or the startSession() config (live).

SDK-first: always use the official SDK — see gladia-sdk-integration for policy, setup, and fallback criteria.

When to Use

  • User asks about a specific feature: diarization, translation, PII redaction, sentiment, NER, subtitles, summarization, etc.
  • Enabling or configuring one or more audio intelligence features on pre-recorded or live transcription
  • Understanding which features are available in live vs pre-recorded mode
  • Combining multiple features in a single transcription job

When NOT to use: For basic transcription without audio intelligence features, go directly to gladia-pre-recorded-transcription or gladia-live-transcription. For gotchas and errors related to specific features, see gladia-troubleshooting.

References

Consult these resources as needed:

  • ./references/live-audio-intelligence.md -- Detailed config and WebSocket responses for all live-mode features
  • ./references/pre-recorded-audio-intelligence.md -- Detailed config and response structures for all pre-recorded audio intelligence features
  • ../gladia-pre-recorded-transcription/SKILL.md -- Pre-recorded transcription workflow and options
  • ../gladia-live-transcription/SKILL.md -- Live transcription session config and event handling
  • ../gladia-sdk-integration/SKILL.md -- SDK setup, client initialization, and SDK vs raw API decision guide
  • ../gladia-troubleshooting/SKILL.md -- Common errors, gotchas, and verification checklist

Feature Availability

Feature Pre-recorded Live Config key
Speaker diarization Yes No diarization
Translation Yes Yes translation
Sentiment analysis Yes Yes sentiment_analysis
Named entity recognition Yes Yes named_entity_recognition
Subtitles (SRT/VTT) Yes No subtitles
Custom vocabulary Yes Yes custom_vocabulary
PII redaction Yes No pii_redaction
Chapterization Yes Yes chapterization (post-process)
Summarization Yes Yes summarization (post-process)
Audio-to-LLM Yes No audio_to_llm
Custom spelling Yes Yes custom_spelling
Custom metadata Yes Yes custom_metadata

Live features split into two groups: real-time (results stream during the session) and post-processing (results arrive after stopRecording()). See ./references/live-audio-intelligence.md for details.

Quick Config Examples

Code examples assume GladiaClient is already initialized — see gladia-sdk-integration for setup.

Speaker Diarization (pre-recorded only)

const result = await client.preRecorded().transcribe("audio.mp3", {
  diarization: true,
  diarization_config: { number_of_speakers: 2 },
});
// Each utterance includes a `speaker` field (0-indexed integer)
result = client.prerecorded().transcribe("audio.mp3", {
    "diarization": True,
    "diarization_config": {"number_of_speakers": 2},
})

Translation (pre-recorded and live)

Pre-recorded:

const result = await client.preRecorded().transcribe("audio.mp3", {
  translation: true,
  translation_config: { target_languages: ["fr", "es"] },
});
result = client.prerecorded().transcribe("audio.mp3", {
    "translation": True,
    "translation_config": {"target_languages": ["fr", "es"]},
})

Live (result streams as translation WebSocket events — see live-audio-intelligence.md):

const session = client.liveV2().startSession({
  // ... audio format options ...
  realtime_processing: {
    translation: true,
    translation_config: { target_languages: ["fr"] },
  },
});
from gladiaio_sdk import LiveV2InitRequest, LiveV2RealtimeProcessing

session = client.live().start_session(
    LiveV2InitRequest(
        # ... audio format options ...
        realtime_processing=LiveV2RealtimeProcessing(
            translation=True,
            translation_config={"target_languages": ["fr"]},
        ),
    )
)

Summarization (pre-recorded and live)

Pre-recorded:

const result = await client.preRecorded().transcribe("audio.mp3", {
  summarization: true,
  summarization_config: { type: "bullet_points" },
});

Live (arrives after stopRecording() as post_summarization event):

const session = client.liveV2().startSession({
  // ... audio format options ...
  post_processing: {
    summarization: true,
    summarization_config: { type: "bullet_points" },
  },
});
session.on("message", (msg) => {
  if (msg.type === "post_summarization") console.log(msg.data.results);
});

For full per-feature config options and response structures, see:

Common Mistakes

  • code_switching: true with empty languages: triggers evaluation across 100+ languages and causes frequent misdetections. Always provide 3-5 expected languages.
  • Custom vocabulary intensity above 0.6: values over 0.6 cause false positives where unrelated words get replaced. Keep at 0.4-0.6 and use pronunciations for better results.
  • Expecting diarization, PII redaction, subtitles, or audio-to-LLM in live mode: these four features are pre-recorded only.
  • Enabling many features simultaneously without considering cost/latency: each enabled feature adds processing time. Enable only what you need; combine diarization + summarization + translation only when all are required.

For the full gotcha list, see gladia-troubleshooting.

Further Reading

Usage Guidance
Install only if you are comfortable using Gladia for the audio you process. Do not upload confidential, regulated, or third-party personal recordings unless you have permission and have checked retention, security, and compliance settings. Treat PII redaction as a downstream reduction step, not a guarantee that raw audio or transcripts were never exposed during processing.
Capability Assessment
Purpose & Capability
The skill explains how to configure Gladia audio intelligence features such as diarization, translation, NER, PII redaction, summarization, custom metadata, and audio-to-LLM, which matches its stated purpose.
Instruction Scope
Instructions are scoped to SDK-first configuration examples and reference docs, with raw REST only as a fallback. The main gap is limited privacy guidance around sensitive audio, transcripts, prompts, and metadata.
Install Mechanism
The artifact contains Markdown documentation files only. No executable scripts, dependency installs, autorun hooks, package mutation steps, or hidden install behavior were found.
Credentials
Sending audio and transcript content to Gladia is inherent to the documented integration and proportionate for the purpose, but users may need consent, data minimization, retention review, and compliance checks for sensitive recordings.
Persistence & Privilege
No local persistence, privilege escalation, credential harvesting, or background workers were found. The references disclose that custom metadata can be stored with Gladia jobs and returned in later list/get responses.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install gladia-audio-intelligence
  3. After installation, invoke the skill by name or use /gladia-audio-intelligence
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.1
- Minor update to documentation formatting in SKILL.md. - Added missing frontmatter separator (`---`) at the top of the file. - No changes to functionality or feature descriptions.
v1.0.0
Initial release of gladia-audio-intelligence skill. - Supports configuration and usage guidance for audio intelligence features: speaker diarization, translation, sentiment analysis, NER, PII redaction, subtitles, summarization, chapterization, custom vocabulary, audio-to-LLM, custom spelling, and metadata. - Clearly distinguishes feature availability in pre-recorded versus live transcription modes. - Prioritizes SDK use, with fallback guidance to REST API when needed. - Provides quick configuration examples in TypeScript and Python. - Documents common mistakes and troubleshooting resources. - Includes reference links for further implementation details and documentation.
Metadata
Slug gladia-audio-intelligence
Version 1.0.1
License MIT-0
All-time Installs 1
Active Installs 1
Total Versions 2
Frequently Asked Questions

What is Gladia Audio Intelligence?

Configure and use Gladia audio intelligence features: speaker diarization, translation, sentiment analysis, named entity recognition (NER), PII redaction, su... It is an AI Agent Skill for Claude Code / OpenClaw, with 29 downloads so far.

How do I install Gladia Audio Intelligence?

Run "/install gladia-audio-intelligence" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Gladia Audio Intelligence free?

Yes, Gladia Audio Intelligence is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Gladia Audio Intelligence support?

Gladia Audio Intelligence is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Gladia Audio Intelligence?

It is built and maintained by Gladia (@gladiaio); the current version is v1.0.1.

💬 Comments