/install azure-speech-tts
Azure Speech TTS
Use Azure Speech to turn text or SSML into a local audio file under download/.
What this skill does
- Synthesize plain text into speech
- Synthesize full SSML payloads directly
- Choose voice, output format, rate, pitch, style, and role
- Save the result as a local audio file and print a JSON summary
Configuration
This skill uses a small default config file plus environment variables.
Default config file
File:
config.json
Default values:
default_voice:zh-CN-Yunqi:DragonHDOmniLatestNeuraldefault_format:mp3default_output_dir:downloaddefault_timeout_seconds:60
Secret values
Set these in the local shell environment:
AZURE_SPEECH_KEYAZURE_SPEECH_REGION
Optional environment overrides
AZURE_SPEECH_VOICEAZURE_SPEECH_FORMAT
Precedence
Use this order:
- CLI flag
- Environment variable
config.json- Built-in fallback
Quick start
python3 scripts/azure_tts.py \
--text "你好,这是一段测试语音。" \
--voice zh-CN-Yunqi:DragonHDOmniLatestNeural \
--format mp3 \
--output download/test.mp3
For SSML:
python3 scripts/azure_tts.py \
--ssml-file temp/input.ssml \
--format wav \
--output download/test.wav
Workflow
- Decide whether the input is plain text or full SSML.
- Use
--text/--text-filefor normal narration. - Use
--ssml/--ssml-fileonly when the payload already contains a complete\x3Cspeak>document. - Pick the voice and output format, or let
config.jsonsupply the defaults. - Run
scripts/azure_tts.py. - Return the generated audio path to the user.
Rules
- Prefer plain text unless the user needs pauses, emphasis, multi-voice content, or expressive styling.
--ssmlinput must include a full\x3Cspeak>root element.- Default voice is
zh-CN-Yunqi:DragonHDOmniLatestNeuralif nothing else is set. - Default output folder is
download/. - If the user does not specify format, use the default MP3 output.
- Do not put secrets in
config.json.
Common formats
See references/azure-speech-cheatsheet.md for the format map and examples.
Short aliases supported by the script:
mp3wavpcmogg
Useful options
--voice: Azure voice name, for exampleen-US-AriaNeural--language: SSMLxml:langfor plain-text mode--rate: speaking rate, for example+10%--pitch: pitch adjustment, for example+2st--style: expressive style such ascheerful,sad,chat--style-degree: strength of the expressive style--role: voice role when supported--save-ssml: write the generated SSML to a file for inspection--dry-run: print the generated SSML without calling Azure
Output
The helper script writes the audio file and prints JSON like:
{
"ok": true,
"output_path": "download/test.mp3",
"format": "audio-24khz-48kbitrate-mono-mp3",
"voice": "zh-CN-Yunqi:DragonHDOmniLatestNeural",
"language": "zh-CN",
"bytes": 123456
}
Use the printed output_path as the deliverable path.
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install azure-speech-tts - After installation, invoke the skill by name or use
/azure-speech-tts - Provide required inputs per the skill's parameter spec and get structured output
What is Azure Speech Tts?
Azure Speech TTS skill for generating local audio files from text or SSML with Azure Speech. Use when the user asks to use Azure Speech / Azure TTS / Microso... It is an AI Agent Skill for Claude Code / OpenClaw, with 165 downloads so far.
How do I install Azure Speech Tts?
Run "/install azure-speech-tts" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Azure Speech Tts free?
Yes, Azure Speech Tts is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Azure Speech Tts support?
Azure Speech Tts is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Azure Speech Tts?
It is built and maintained by conanwhf (@conanwhf); the current version is v1.0.2.