← Back to Skills Marketplace

Gladia Audio Intelligence

Name: Gladia Audio Intelligence
Author: gladiaio

by Gladia · GitHub ↗ · v1.0.1 · MIT-0

cross-platform ✓ Security Clean

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install gladia-audio-intelligence

Description

Configure and use Gladia audio intelligence features: speaker diarization, translation, sentiment analysis, named entity recognition (NER), PII redaction, su...

README (SKILL.md)

Audio Intelligence

Gladia's audio intelligence features extract structured data and insights from transcripts. They work on top of the base transcription — most are enabled by adding options to the transcribe() call (pre-recorded) or the startSession() config (live).

SDK-first: always use the official SDK — see gladia-sdk-integration for policy, setup, and fallback criteria.

When to Use

User asks about a specific feature: diarization, translation, PII redaction, sentiment, NER, subtitles, summarization, etc.
Enabling or configuring one or more audio intelligence features on pre-recorded or live transcription
Understanding which features are available in live vs pre-recorded mode
Combining multiple features in a single transcription job

When NOT to use: For basic transcription without audio intelligence features, go directly to gladia-pre-recorded-transcription or gladia-live-transcription. For gotchas and errors related to specific features, see gladia-troubleshooting.

References

Consult these resources as needed:

./references/live-audio-intelligence.md -- Detailed config and WebSocket responses for all live-mode features
./references/pre-recorded-audio-intelligence.md -- Detailed config and response structures for all pre-recorded audio intelligence features
../gladia-pre-recorded-transcription/SKILL.md -- Pre-recorded transcription workflow and options
../gladia-live-transcription/SKILL.md -- Live transcription session config and event handling
../gladia-sdk-integration/SKILL.md -- SDK setup, client initialization, and SDK vs raw API decision guide
../gladia-troubleshooting/SKILL.md -- Common errors, gotchas, and verification checklist

Feature Availability

Feature	Pre-recorded	Live	Config key
Speaker diarization	Yes	No	`diarization`
Translation	Yes	Yes	`translation`
Sentiment analysis	Yes	Yes	`sentiment_analysis`
Named entity recognition	Yes	Yes	`named_entity_recognition`
Subtitles (SRT/VTT)	Yes	No	`subtitles`
Custom vocabulary	Yes	Yes	`custom_vocabulary`
PII redaction	Yes	No	`pii_redaction`
Chapterization	Yes	Yes	`chapterization` (post-process)
Summarization	Yes	Yes	`summarization` (post-process)
Audio-to-LLM	Yes	No	`audio_to_llm`
Custom spelling	Yes	Yes	`custom_spelling`
Custom metadata	Yes	Yes	`custom_metadata`

Live features split into two groups: real-time (results stream during the session) and post-processing (results arrive after stopRecording()). See ./references/live-audio-intelligence.md for details.

Quick Config Examples

Code examples assume GladiaClient is already initialized — see gladia-sdk-integration for setup.

Speaker Diarization (pre-recorded only)

const result = await client.preRecorded().transcribe("audio.mp3", {
  diarization: true,
  diarization_config: { number_of_speakers: 2 },
});
// Each utterance includes a `speaker` field (0-indexed integer)

result = client.prerecorded().transcribe("audio.mp3", {
    "diarization": True,
    "diarization_config": {"number_of_speakers": 2},
})

Translation (pre-recorded and live)

Pre-recorded:

const result = await client.preRecorded().transcribe("audio.mp3", {
  translation: true,
  translation_config: { target_languages: ["fr", "es"] },
});

result = client.prerecorded().transcribe("audio.mp3", {
    "translation": True,
    "translation_config": {"target_languages": ["fr", "es"]},
})

Live (result streams as translation WebSocket events — see live-audio-intelligence.md):

const session = client.liveV2().startSession({
  // ... audio format options ...
  realtime_processing: {
    translation: true,
    translation_config: { target_languages: ["fr"] },
  },
});

from gladiaio_sdk import LiveV2InitRequest, LiveV2RealtimeProcessing

session = client.live().start_session(
    LiveV2InitRequest(
        # ... audio format options ...
        realtime_processing=LiveV2RealtimeProcessing(
            translation=True,
            translation_config={"target_languages": ["fr"]},
        ),
    )
)

Summarization (pre-recorded and live)

Pre-recorded:

const result = await client.preRecorded().transcribe("audio.mp3", {
  summarization: true,
  summarization_config: { type: "bullet_points" },
});

Live (arrives after stopRecording() as post_summarization event):

const session = client.liveV2().startSession({
  // ... audio format options ...
  post_processing: {
    summarization: true,
    summarization_config: { type: "bullet_points" },
  },
});
session.on("message", (msg) => {
  if (msg.type === "post_summarization") console.log(msg.data.results);
});

For full per-feature config options and response structures, see:

Pre-recorded: ./references/pre-recorded-audio-intelligence.md
Live: ./references/live-audio-intelligence.md

Common Mistakes

code_switching: true with empty languages: triggers evaluation across 100+ languages and causes frequent misdetections. Always provide 3-5 expected languages.
Custom vocabulary intensity above 0.6: values over 0.6 cause false positives where unrelated words get replaced. Keep at 0.4-0.6 and use pronunciations for better results.
Expecting diarization, PII redaction, subtitles, or audio-to-LLM in live mode: these four features are pre-recorded only.
Enabling many features simultaneously without considering cost/latency: each enabled feature adds processing time. Enable only what you need; combine diarization + summarization + translation only when all are required.

For the full gotcha list, see gladia-troubleshooting.

Install only if you are comfortable using Gladia for the audio you process. Do not upload confidential, regulated, or third-party personal recordings unless you have permission and have checked retention, security, and compliance settings. Treat PII redaction as a downstream reduction step, not a guarantee that raw audio or transcripts were never exposed during processing.

Capability Assessment

✓ Purpose & Capability

The skill explains how to configure Gladia audio intelligence features such as diarization, translation, NER, PII redaction, summarization, custom metadata, and audio-to-LLM, which matches its stated purpose.

ℹ Instruction Scope

Instructions are scoped to SDK-first configuration examples and reference docs, with raw REST only as a fallback. The main gap is limited privacy guidance around sensitive audio, transcripts, prompts, and metadata.

✓ Install Mechanism

The artifact contains Markdown documentation files only. No executable scripts, dependency installs, autorun hooks, package mutation steps, or hidden install behavior were found.

ℹ Credentials

Sending audio and transcript content to Gladia is inherent to the documented integration and proportionate for the purpose, but users may need consent, data minimization, retention review, and compliance checks for sensitive recordings.

ℹ Persistence & Privilege

No local persistence, privilege escalation, credential harvesting, or background workers were found. The references disclose that custom metadata can be stored with Gladia jobs and returned in later list/get responses.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install gladia-audio-intelligence
After installation, invoke the skill by name or use /gladia-audio-intelligence
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.1

- Minor update to documentation formatting in SKILL.md. - Added missing frontmatter separator (`---`) at the top of the file. - No changes to functionality or feature descriptions.

v1.0.0

Initial release of gladia-audio-intelligence skill. - Supports configuration and usage guidance for audio intelligence features: speaker diarization, translation, sentiment analysis, NER, PII redaction, subtitles, summarization, chapterization, custom vocabulary, audio-to-LLM, custom spelling, and metadata. - Clearly distinguishes feature availability in pre-recorded versus live transcription modes. - Prioritizes SDK use, with fallback guidance to REST API when needed. - Provides quick configuration examples in TypeScript and Python. - Documents common mistakes and troubleshooting resources. - Includes reference links for further implementation details and documentation.

Metadata

Slug gladia-audio-intelligence

Version 1.0.1

License MIT-0

All-time Installs 1

Active Installs 1

Total Versions 2

Frequently Asked Questions

What is Gladia Audio Intelligence?

Configure and use Gladia audio intelligence features: speaker diarization, translation, sentiment analysis, named entity recognition (NER), PII redaction, su... It is an AI Agent Skill for Claude Code / OpenClaw, with 29 downloads so far.

How do I install Gladia Audio Intelligence?

Run "/install gladia-audio-intelligence" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Gladia Audio Intelligence free?

Yes, Gladia Audio Intelligence is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Gladia Audio Intelligence support?

Gladia Audio Intelligence is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Gladia Audio Intelligence?

It is built and maintained by Gladia (@gladiaio); the current version is v1.0.1.

More Skills