← Back to Skills Marketplace
conanwhf

Azure Speech Tts

by conanwhf · GitHub ↗ · v1.0.2 · MIT-0
cross-platform ✓ Security Clean
165
Downloads
0
Stars
1
Active Installs
3
Versions
Install in OpenClaw
/install azure-speech-tts
Description
Azure Speech TTS skill for generating local audio files from text or SSML with Azure Speech. Use when the user asks to use Azure Speech / Azure TTS / Microso...
README (SKILL.md)

Azure Speech TTS

Use Azure Speech to turn text or SSML into a local audio file under download/.

What this skill does

  • Synthesize plain text into speech
  • Synthesize full SSML payloads directly
  • Choose voice, output format, rate, pitch, style, and role
  • Save the result as a local audio file and print a JSON summary

Configuration

This skill uses a small default config file plus environment variables.

Default config file

File:

  • config.json

Default values:

  • default_voice: zh-CN-Yunqi:DragonHDOmniLatestNeural
  • default_format: mp3
  • default_output_dir: download
  • default_timeout_seconds: 60

Secret values

Set these in the local shell environment:

  • AZURE_SPEECH_KEY
  • AZURE_SPEECH_REGION

Optional environment overrides

  • AZURE_SPEECH_VOICE
  • AZURE_SPEECH_FORMAT

Precedence

Use this order:

  1. CLI flag
  2. Environment variable
  3. config.json
  4. Built-in fallback

Quick start

python3 scripts/azure_tts.py \
  --text "你好,这是一段测试语音。" \
  --voice zh-CN-Yunqi:DragonHDOmniLatestNeural \
  --format mp3 \
  --output download/test.mp3

For SSML:

python3 scripts/azure_tts.py \
  --ssml-file temp/input.ssml \
  --format wav \
  --output download/test.wav

Workflow

  1. Decide whether the input is plain text or full SSML.
  2. Use --text / --text-file for normal narration.
  3. Use --ssml / --ssml-file only when the payload already contains a complete \x3Cspeak> document.
  4. Pick the voice and output format, or let config.json supply the defaults.
  5. Run scripts/azure_tts.py.
  6. Return the generated audio path to the user.

Rules

  • Prefer plain text unless the user needs pauses, emphasis, multi-voice content, or expressive styling.
  • --ssml input must include a full \x3Cspeak> root element.
  • Default voice is zh-CN-Yunqi:DragonHDOmniLatestNeural if nothing else is set.
  • Default output folder is download/.
  • If the user does not specify format, use the default MP3 output.
  • Do not put secrets in config.json.

Common formats

See references/azure-speech-cheatsheet.md for the format map and examples.

Short aliases supported by the script:

  • mp3
  • wav
  • pcm
  • ogg

Useful options

  • --voice: Azure voice name, for example en-US-AriaNeural
  • --language: SSML xml:lang for plain-text mode
  • --rate: speaking rate, for example +10%
  • --pitch: pitch adjustment, for example +2st
  • --style: expressive style such as cheerful, sad, chat
  • --style-degree: strength of the expressive style
  • --role: voice role when supported
  • --save-ssml: write the generated SSML to a file for inspection
  • --dry-run: print the generated SSML without calling Azure

Output

The helper script writes the audio file and prints JSON like:

{
  "ok": true,
  "output_path": "download/test.mp3",
  "format": "audio-24khz-48kbitrate-mono-mp3",
  "voice": "zh-CN-Yunqi:DragonHDOmniLatestNeural",
  "language": "zh-CN",
  "bytes": 123456
}

Use the printed output_path as the deliverable path.

Usage Guidance
This skill appears coherent and limited to Azure TTS. Before installing, (1) provide a dedicated Azure Speech key/region with minimal privileges, (2) do not store secrets in config.json (keep them in environment variables as instructed), (3) review or sandbox the included Python script if you plan to run it on sensitive hosts, and (4) avoid feeding SSML that contains or references sensitive data or external URLs you don't trust. If you need higher assurance, run the script in an isolated environment and verify Azure endpoints and network traffic match expectations.
Capability Analysis
Type: OpenClaw Skill Name: azure-speech-tts Version: 1.0.2 The azure-speech-tts skill is a legitimate implementation for converting text or SSML to audio using Microsoft Azure Speech services. The core logic in scripts/azure_tts.py uses standard Python libraries (urllib) to communicate exclusively with official Azure endpoints and includes proper XML escaping to prevent injection within the generated SSML. No evidence of malicious intent, data exfiltration, or suspicious obfuscation was found.
Capability Assessment
Purpose & Capability
Name and description match the included helper script and docs. The only secrets the skill asks for (AZURE_SPEECH_KEY and AZURE_SPEECH_REGION) are exactly what an Azure TTS client needs; requested files and paths (config.json, download/) align with generating local audio files.
Instruction Scope
SKILL.md and the script limit actions to reading text/SSML (from CLI, files, or stdin), optionally writing generated SSML, calling Azure STS and TTS endpoints, and writing audio files plus a small JSON summary. There are no instructions to read unrelated system files or transmit arbitrary local data to third parties.
Install Mechanism
No install spec is present (instruction-only install behavior). The repository includes a Python script that uses only stdlib urllib/argparse/json/file I/O. No downloads from untrusted URLs, no package manager pulls, and nothing that would execute arbitrary remote code during install.
Credentials
Only AZURE_SPEECH_KEY and AZURE_SPEECH_REGION are used for operation (plus optional AZURE_SPEECH_VOICE/FORMAT). No unrelated secrets, keys, or config paths are requested. The SKILL.md correctly instructs not to put secrets in config.json.
Persistence & Privilege
The skill is not always-enabled and does not request elevated or persistent platform privileges. It does not modify other skills or system-wide settings; it only writes outputs to the configured download/ folder and optionally the save-ssml path.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install azure-speech-tts
  3. After installation, invoke the skill by name or use /azure-speech-tts
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.2
- Added _meta.json metadata file to the repository. - No changes made to skill functionality or documentation content.
v1.0.1
Add README and polish the public skill docs for Azure Speech TTS.
v1.0.0
Initial public release: Azure Speech TTS skill with config defaults, env-based secrets, and text/SSML synthesis script.
Metadata
Slug azure-speech-tts
Version 1.0.2
License MIT-0
All-time Installs 1
Active Installs 1
Total Versions 3
Frequently Asked Questions

What is Azure Speech Tts?

Azure Speech TTS skill for generating local audio files from text or SSML with Azure Speech. Use when the user asks to use Azure Speech / Azure TTS / Microso... It is an AI Agent Skill for Claude Code / OpenClaw, with 165 downloads so far.

How do I install Azure Speech Tts?

Run "/install azure-speech-tts" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Azure Speech Tts free?

Yes, Azure Speech Tts is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Azure Speech Tts support?

Azure Speech Tts is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Azure Speech Tts?

It is built and maintained by conanwhf (@conanwhf); the current version is v1.0.2.

💬 Comments