← Back to Skills Marketplace
thegovind

Azure Ai Voicelive Py

by thegovind · GitHub ↗ · v0.1.0
cross-platform ⚠ suspicious
2144
Downloads
2
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install azure-ai-voicelive-py
Description
Build real-time voice AI applications using Azure AI Voice Live SDK (azure-ai-voicelive). Use this skill when creating Python applications that need real-time bidirectional audio communication with Azure AI, including voice assistants, voice-enabled chatbots, real-time speech-to-speech translation, voice-driven avatars, or any WebSocket-based audio streaming with AI models. Supports Server VAD (Voice Activity Detection), turn-based conversation, function calling, MCP tools, avatar integration, and transcription.
README (SKILL.md)

Azure AI Voice Live SDK

Build real-time voice AI applications with bidirectional WebSocket communication.

Installation

pip install azure-ai-voicelive aiohttp azure-identity

Environment Variables

AZURE_COGNITIVE_SERVICES_ENDPOINT=https://\x3Cregion>.api.cognitive.microsoft.com
# For API key auth (not recommended for production)
AZURE_COGNITIVE_SERVICES_KEY=\x3Capi-key>

Authentication

DefaultAzureCredential (preferred):

from azure.ai.voicelive.aio import connect
from azure.identity.aio import DefaultAzureCredential

async with connect(
    endpoint=os.environ["AZURE_COGNITIVE_SERVICES_ENDPOINT"],
    credential=DefaultAzureCredential(),
    model="gpt-4o-realtime-preview",
    credential_scopes=["https://cognitiveservices.azure.com/.default"]
) as conn:
    ...

API Key:

from azure.ai.voicelive.aio import connect
from azure.core.credentials import AzureKeyCredential

async with connect(
    endpoint=os.environ["AZURE_COGNITIVE_SERVICES_ENDPOINT"],
    credential=AzureKeyCredential(os.environ["AZURE_COGNITIVE_SERVICES_KEY"]),
    model="gpt-4o-realtime-preview"
) as conn:
    ...

Quick Start

import asyncio
import os
from azure.ai.voicelive.aio import connect
from azure.identity.aio import DefaultAzureCredential

async def main():
    async with connect(
        endpoint=os.environ["AZURE_COGNITIVE_SERVICES_ENDPOINT"],
        credential=DefaultAzureCredential(),
        model="gpt-4o-realtime-preview",
        credential_scopes=["https://cognitiveservices.azure.com/.default"]
    ) as conn:
        # Update session with instructions
        await conn.session.update(session={
            "instructions": "You are a helpful assistant.",
            "modalities": ["text", "audio"],
            "voice": "alloy"
        })
        
        # Listen for events
        async for event in conn:
            print(f"Event: {event.type}")
            if event.type == "response.audio_transcript.done":
                print(f"Transcript: {event.transcript}")
            elif event.type == "response.done":
                break

asyncio.run(main())

Core Architecture

Connection Resources

The VoiceLiveConnection exposes these resources:

Resource Purpose Key Methods
conn.session Session configuration update(session=...)
conn.response Model responses create(), cancel()
conn.input_audio_buffer Audio input append(), commit(), clear()
conn.output_audio_buffer Audio output clear()
conn.conversation Conversation state item.create(), item.delete(), item.truncate()
conn.transcription_session Transcription config update(session=...)

Session Configuration

from azure.ai.voicelive.models import RequestSession, FunctionTool

await conn.session.update(session=RequestSession(
    instructions="You are a helpful voice assistant.",
    modalities=["text", "audio"],
    voice="alloy",  # or "echo", "shimmer", "sage", etc.
    input_audio_format="pcm16",
    output_audio_format="pcm16",
    turn_detection={
        "type": "server_vad",
        "threshold": 0.5,
        "prefix_padding_ms": 300,
        "silence_duration_ms": 500
    },
    tools=[
        FunctionTool(
            type="function",
            name="get_weather",
            description="Get current weather",
            parameters={
                "type": "object",
                "properties": {
                    "location": {"type": "string"}
                },
                "required": ["location"]
            }
        )
    ]
))

Audio Streaming

Send Audio (Base64 PCM16)

import base64

# Read audio chunk (16-bit PCM, 24kHz mono)
audio_chunk = await read_audio_from_microphone()
b64_audio = base64.b64encode(audio_chunk).decode()

await conn.input_audio_buffer.append(audio=b64_audio)

Receive Audio

async for event in conn:
    if event.type == "response.audio.delta":
        audio_bytes = base64.b64decode(event.delta)
        await play_audio(audio_bytes)
    elif event.type == "response.audio.done":
        print("Audio complete")

Event Handling

async for event in conn:
    match event.type:
        # Session events
        case "session.created":
            print(f"Session: {event.session}")
        case "session.updated":
            print("Session updated")
        
        # Audio input events
        case "input_audio_buffer.speech_started":
            print(f"Speech started at {event.audio_start_ms}ms")
        case "input_audio_buffer.speech_stopped":
            print(f"Speech stopped at {event.audio_end_ms}ms")
        
        # Transcription events
        case "conversation.item.input_audio_transcription.completed":
            print(f"User said: {event.transcript}")
        case "conversation.item.input_audio_transcription.delta":
            print(f"Partial: {event.delta}")
        
        # Response events
        case "response.created":
            print(f"Response started: {event.response.id}")
        case "response.audio_transcript.delta":
            print(event.delta, end="", flush=True)
        case "response.audio.delta":
            audio = base64.b64decode(event.delta)
        case "response.done":
            print(f"Response complete: {event.response.status}")
        
        # Function calls
        case "response.function_call_arguments.done":
            result = handle_function(event.name, event.arguments)
            await conn.conversation.item.create(item={
                "type": "function_call_output",
                "call_id": event.call_id,
                "output": json.dumps(result)
            })
            await conn.response.create()
        
        # Errors
        case "error":
            print(f"Error: {event.error.message}")

Common Patterns

Manual Turn Mode (No VAD)

await conn.session.update(session={"turn_detection": None})

# Manually control turns
await conn.input_audio_buffer.append(audio=b64_audio)
await conn.input_audio_buffer.commit()  # End of user turn
await conn.response.create()  # Trigger response

Interrupt Handling

async for event in conn:
    if event.type == "input_audio_buffer.speech_started":
        # User interrupted - cancel current response
        await conn.response.cancel()
        await conn.output_audio_buffer.clear()

Conversation History

# Add system message
await conn.conversation.item.create(item={
    "type": "message",
    "role": "system",
    "content": [{"type": "input_text", "text": "Be concise."}]
})

# Add user message
await conn.conversation.item.create(item={
    "type": "message",
    "role": "user", 
    "content": [{"type": "input_text", "text": "Hello!"}]
})

await conn.response.create()

Voice Options

Voice Description
alloy Neutral, balanced
echo Warm, conversational
shimmer Clear, professional
sage Calm, authoritative
coral Friendly, upbeat
ash Deep, measured
ballad Expressive
verse Storytelling

Azure voices: Use AzureStandardVoice, AzureCustomVoice, or AzurePersonalVoice models.

Audio Formats

Format Sample Rate Use Case
pcm16 24kHz Default, high quality
pcm16-8000hz 8kHz Telephony
pcm16-16000hz 16kHz Voice assistants
g711_ulaw 8kHz Telephony (US)
g711_alaw 8kHz Telephony (EU)

Turn Detection Options

# Server VAD (default)
{"type": "server_vad", "threshold": 0.5, "silence_duration_ms": 500}

# Azure Semantic VAD (smarter detection)
{"type": "azure_semantic_vad"}
{"type": "azure_semantic_vad_en"}  # English optimized
{"type": "azure_semantic_vad_multilingual"}

Error Handling

from azure.ai.voicelive.aio import ConnectionError, ConnectionClosed

try:
    async with connect(...) as conn:
        async for event in conn:
            if event.type == "error":
                print(f"API Error: {event.error.code} - {event.error.message}")
except ConnectionClosed as e:
    print(f"Connection closed: {e.code} - {e.reason}")
except ConnectionError as e:
    print(f"Connection error: {e}")

References

Usage Guidance
This skill appears to be legitimate documentation for using Azure's Voice Live SDK, but the package metadata does not declare the environment variables and credential access that the SKILL.md actually requires. Before installing or running code derived from this skill: - Treat the mismatch as a red flag: confirm with the skill author/source why required env vars and credentials are not declared. Ask for an authoritative homepage or repository. - Do not expose broad Azure credentials (AZURE_CLIENT_ID/SECRET, Azure CLI tokens, or subscription-level keys) to untrusted code. Prefer creating a dedicated Azure resource with minimal permissions and a short-lived credential for testing. - Be aware DefaultAzureCredential will attempt multiple auth methods (env vars, managed identity, Azure CLI cache) and could cause the agent to use existing credentials on the host — run in an isolated environment if you want to limit exposure. - Verify packages (azure-ai-voicelive, azure-identity) come from a trusted source (PyPI/GitHub) before pip installing. If you cannot validate the source or get corrected metadata, treat the skill as suspicious and avoid giving it credentials or running it in a privileged environment.
Capability Analysis
Type: OpenClaw Skill Name: azure-ai-voicelive-py Version: 0.1.0 The skill bundle provides documentation and code examples for integrating with Azure AI Voice Live SDK. All files describe legitimate usage of the SDK, including standard Python package installation (`pip install azure-ai-voicelive aiohttp azure-identity`), Azure authentication methods (environment variables, `DefaultAzureCredential`), and various real-time audio processing scenarios. There is no evidence of data exfiltration, malicious execution, persistence mechanisms, obfuscation, or prompt injection attempts against the OpenClaw agent. The content is clearly aligned with its stated purpose of building real-time voice AI applications using Azure services.
Capability Assessment
Purpose & Capability
The SKILL.md describes building real-time voice apps with Azure's Voice Live SDK (logical and coherent). However the registry metadata declares no required environment variables or primary credential even though the instructions explicitly require AZURE_COGNITIVE_SERVICES_ENDPOINT and optionally AZURE_COGNITIVE_SERVICES_KEY and recommend DefaultAzureCredential. The missing declared credentials is disproportionate to the documented purpose (likely an oversight, but it is an incoherence).
Instruction Scope
Runtime instructions instruct the agent to read environment variables (endpoint, optional API key) and to use DefaultAzureCredential which may surface additional Azure credentials (AZURE_CLIENT_ID/TENANT_ID/CLIENT_SECRET, Azure CLI tokens, managed identity). The examples also read local audio files and stream microphone audio (expected for the stated purpose). The problem is the instructions access credentials and auth surfaces that are not declared in the skill metadata, granting the agent potential access to broader Azure credentials than the registry advertises.
Install Mechanism
This is an instruction-only skill with no install spec or shipped code — lowest install risk. The SKILL.md recommends installing pip packages (azure-ai-voicelive, aiohttp, azure-identity), which is expected for this SDK and would be a normal developer dependency, but those installs would happen outside the skill bundle.
Credentials
The skill's documentation requires AZURE_COGNITIVE_SERVICES_ENDPOINT and optionally AZURE_COGNITIVE_SERVICES_KEY, and suggests DefaultAzureCredential (which uses other Azure auth sources). Yet the skill metadata lists no required env vars or primary credential. Requesting (via instructions) broad Azure credential sources without declaring them is disproportionate and opaque; it could cause unexpected use of existing Azure credentials on the host.
Persistence & Privilege
always:false and no install code or persistent modifications are requested. The skill is user-invocable and can be invoked autonomously (platform default), but there's no evidence it requests persistent elevated privileges or modifies other skills.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install azure-ai-voicelive-py
  3. After installation, invoke the skill by name or use /azure-ai-voicelive-py
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v0.1.0
Initial release – enables building real-time voice AI apps using Azure AI Voice Live SDK for Python. - Real-time bidirectional WebSocket audio streaming with Azure AI models. - Supports Server VAD, turn-based conversation, function calls, tools, transcription, and avatar integration. - Easy authentication via `DefaultAzureCredential` or API key. - Provides structured resources for session, response, audio buffers, and conversation state. - Includes example snippets for session config, event handling, audio streaming, and interrupt management. - Supports voice selection, multiple audio formats, and manual or VAD turn-taking.
Metadata
Slug azure-ai-voicelive-py
Version 0.1.0
License
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Azure Ai Voicelive Py?

Build real-time voice AI applications using Azure AI Voice Live SDK (azure-ai-voicelive). Use this skill when creating Python applications that need real-time bidirectional audio communication with Azure AI, including voice assistants, voice-enabled chatbots, real-time speech-to-speech translation, voice-driven avatars, or any WebSocket-based audio streaming with AI models. Supports Server VAD (Voice Activity Detection), turn-based conversation, function calling, MCP tools, avatar integration, and transcription. It is an AI Agent Skill for Claude Code / OpenClaw, with 2144 downloads so far.

How do I install Azure Ai Voicelive Py?

Run "/install azure-ai-voicelive-py" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Azure Ai Voicelive Py free?

Yes, Azure Ai Voicelive Py is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Azure Ai Voicelive Py support?

Azure Ai Voicelive Py is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Azure Ai Voicelive Py?

It is built and maintained by thegovind (@thegovind); the current version is v0.1.0.

💬 Comments