← 返回 Skills 市场
guoxh

Baidu Speech Synthesis

作者 guoxh · GitHub ↗ · v1.2.3 · MIT-0
cross-platform ✓ 安全检测通过
178
总下载
0
收藏
1
当前安装
3
版本数
在 OpenClaw 中安装
/install baidu-speech-synthesis
功能描述
Baidu Intelligent Cloud Speech Synthesis (TTS), supporting multi-role dialogue audio generation, SSML/segment-merge dual modes, speech rate/pitch adjustment.
使用说明 (SKILL.md)

Baidu Intelligent Cloud Speech Synthesis Skill

Triggers

Use this skill when the user mentions:

  • "Convert this dialogue to audio using Baidu TTS"
  • "Generate male-female dialogue, male voice using Duxiaoyao, female voice using Duxiaomei"
  • "Batch process all dialogues in dialogue.txt"
  • "Adjust speech rate to 7, pitch to 6"
  • "View available voice list"
  • "baidu tts", "dialogue to audio", "multi-speaker speech synthesis"
  • "baidu speech synthesis", "multi-speaker dialogue", "Baidu TTS"

Chinese triggers (for Chinese users):

  • "用百度TTS把这段对话转成音频"
  • "生成男女对话,男声用度逍遥,女声用度小美"
  • "批量处理 dialogue.txt 里的所有对话"
  • "调整语速到7,音调到6"
  • "查看可用的音色列表"

Overview

This skill calls the Baidu Intelligent Cloud Speech Synthesis API, supporting multi-speaker dialogue synthesis (SSML mode or segment-merge fallback). It provides rich voice selection, speech rate/pitch/volume adjustment, and can automatically convert text dialogues into audio files with character-specific voices.

Installation Dependencies

# Install Python dependencies
pip install requests

# Ensure ffmpeg is installed (required for audio merging)
# Ubuntu/Debian:
sudo apt install ffmpeg
# macOS:
brew install ffmpeg
# Windows: Download from https://ffmpeg.org/download.html

# Optional: If pydub is needed (alternative merging solution)
# pip install pydub

Environment Variables Setup

Choose one of three authentication methods:

Method 1: API Key + Secret Key (auto-token)

export BAIDU_API_KEY="Your API Key (non-bce-v3 format)"
export BAIDU_SECRET_KEY="Your Secret Key"

Method 2: Direct access_token (starts with 1.)

export BAIDU_API_KEY="1.a6b7dbd428f731035f771b8d********"
# BAIDU_SECRET_KEY not required

Method 3: IAM Key (starts with bce-v3/)

export BAIDU_API_KEY="bce-v3/ALTAK-8h6t5Y7uI9o0P1q3W2e4R5t6Y7u8I9o0P"
# BAIDU_SECRET_KEY not required
# Note: Existing bce-v3/ALTAK-... keys may be dedicated to other services (e.g., search).
# If authentication fails, create a dedicated speech synthesis application to get API Key + Secret Key.

Required Environment Variables

BAIDU_API_KEY must be set. Whether BAIDU_SECRET_KEY is needed depends on the authentication method:

Method 1: API Key + Secret Key (auto-token)

BAIDU_API_KEY=Your API Key (non-bce-v3 format)
BAIDU_SECRET_KEY=Your Secret Key

Method 2: Direct access_token (starts with 1.)

BAIDU_API_KEY=1.a6b7dbd428f731035f771b8d********
# BAIDU_SECRET_KEY not required

Method 3: IAM Key (starts with bce-v3/)

BAIDU_API_KEY=bce-v3/ALTAK-8h6t5Y7uI9o0P1q3W2e4R5t6Y7u8I9o0P
# BAIDU_SECRET_KEY not required

The skill scripts automatically detect the key format and choose the corresponding authentication method. If not set, the user will be prompted.

Usage

1. Direct script invocation (command line)

# Single dialogue file synthesis
python ~/.openclaw/skills/baidu-speech-synthesis/scripts/baidu_tts.py \
    --input dialogue.txt \
    --output conversation.mp3

# Specify voice mapping (character name → voice code)
python scripts/baidu_tts.py \
    --input script.txt \
    --map 小明:1 小红:0 老师:106

# Batch process all .txt files in a directory
python scripts/baidu_tts.py \
    --dir ./dialogues \
    --format mp3

# Adjust parameters
python scripts/baidu_tts.py \
    --input text.txt \
    --spd 7 --pit 6 --vol 5 \
    --aue 3

2. Usage in OpenClaw sessions

When the user triggers the above phrases, the skill will:

  1. Check environment variable configuration
  2. Ask or automatically identify input text/file
  3. Generate SSML according to default or specified voice assignment scheme
  4. Call the Baidu API and return the audio file (can be played automatically or saved)

File Structure

baidu-speech-synthesis/
├── SKILL.md                    # This file
├── scripts/
│   ├── baidu_tts.py            # Main API client (token acquisition, SSML requests, segment merging)
│   ├── dialogue_formatter.py   # Dialogue text → SSML conversion and voice mapping
│   └── audio_merger.py         # ffmpeg audio merging tool (segment merge solution)
└── references/
    ├── voice_list.md           # Voice code table, samples, recommended pairings
    ├── ssml_guide.md           # Baidu SSML tags, limitations, examples
    └── api_setup.md            # How to obtain keys, free quota (5 million chars/month), authentication details

Technical Points

  • Intelligent Mode Selection: Automatically detects multi-voice requirements, defaults to segment synthesis mode (Baidu API only supports single-voice SSML).
  • Segment Synthesis Solution: Splits multi-role dialogues into single-voice segments → synthesizes separately → merges with ffmpeg (solves API limitations, compatible with Python 3.13).
  • SSML Single-Voice Support: Supports single-voice SSML (tex_type=3) for complex speech expressions of individual characters.
  • Automatic Voice Assignment: Default mapping "老王" → Duxiaoyao (3), "张经理" → Duxiaoyu (1), "小李" → Duyaya (4), customizable via --map.
  • Error Handling: Friendly prompts for network timeouts, quota exhaustion, audio merge failures, etc.

Notes

  • Free Quota: Baidu Speech Synthesis provides 5 million characters/month free quota (2026 latest policy), pay-as-you-go beyond that.
  • Authentication Methods: Supports three authentication methods (API Key+Secret Key, access_token, IAM Key), automatically detected by skill.
  • SSML Limitations: SSML text length limited to 1024 bytes (note Chinese character count), recommend each sentence not exceed 120 characters.
  • Dependencies: Segment merge solution requires ffmpeg installation (skill will detect and prompt). No need to install pydub.
  • Voice Expressiveness: Baidu's base voices are relatively flat; recommend enhancing dialogue expressiveness through text optimization (adding语气词, emotional descriptions).
  • Key Security: Do not hardcode API keys in code; always use environment variables or .env files.
  • Error Handling: Detailed guidance provided for authentication failures; refer to references/api_setup.md for help.

Changelog

  • 2026‑03‑31 (v1.2.3): Fixed bare except: statements in audio_merger.py; replaced with proper exception handling to improve debugging and error visibility.
  • 2026‑03‑26 (v1.2.2): Added MIT LICENSE file; updated metadata to declare ffmpeg dependency; addressing ClawHub security warnings.
  • 2026‑03‑26 (v1.2.1): Complete English translation of skill documentation; improved bilingual triggers for both English and Chinese users.
  • 2026‑03‑26 (v1.2): Switched to ffmpeg instead of pydub, solving Python 3.13 compatibility issues; corrected Baidu API limitation description (only supports single-voice SSML); optimized documentation and default voice mapping.
  • 2026‑03‑26 (v1.1): Enhanced authentication support, added IAM Key and direct access_token authentication, updated free quota information, improved error guidance.
  • 2026‑03‑26 (v1.0): Initial release, supporting multi-speaker dialogue synthesis, SSML/segment-merge dual modes.
安全使用建议
This skill appears to do what it claims: construct SSML, call Baidu TTS endpoints, and merge audio with ffmpeg. Before installing, consider: (1) Keys you provide (BAIDU_API_KEY / BAIDU_SECRET_KEY or access_token/IAM key) will be used to call Baidu endpoints — keep them secret and prefer least-privilege keys scoped to TTS. (2) validate_config may require both API and Secret for its checks and may reject some valid IAM/access-token formats; if you use an alternative auth method, the validator might give false errors. (3) The skill runs ffmpeg via subprocess and writes temporary files — avoid feeding untrusted input files to prevent maliciously crafted inputs from causing problems. (4) The included requirements.txt lists pydub and python-dotenv in addition to requests; install only what you need and review the code if you plan to run it in sensitive environments. Overall the package is internally consistent with its stated purpose.
功能分析
Type: OpenClaw Skill Name: baidu-speech-synthesis Version: 1.2.3 The skill bundle provides a robust implementation for Baidu Intelligent Cloud Speech Synthesis (TTS), supporting multi-role dialogues and SSML formatting. The code is well-structured, using legitimate Baidu API endpoints (aip.baidubce.com and tsn.baidu.com) and standard libraries like requests and ffmpeg for audio processing. While some test scripts (e.g., test_client.py) contain a hardcoded placeholder IAM key and others print partial keys for diagnostic purposes, these are clearly intended for local debugging and authentication troubleshooting rather than data exfiltration or malicious intent.
能力评估
Purpose & Capability
Name/description (Baidu TTS) matches required binaries (python3, ffmpeg), required env vars (BAIDU_API_KEY, BAIDU_SECRET_KEY), and included client/formatter/merger scripts. No unrelated credentials or surprising binaries are requested.
Instruction Scope
SKILL.md and the scripts instruct the agent to read input text files, build SSML, call Baidu token and TTS endpoints, produce temporary audio files and merge them with ffmpeg. These actions are within the stated purpose. Note: some helper scripts (validate_config, diagnose_auth) perform network calls to Baidu endpoints and inspect environment variables (including BAIDU_ACCESS_TOKEN if present); this is expected behavior but worth noting.
Install Mechanism
No remote download/install spec is present (instruction-only install). Dependencies are typical Python libraries and ffmpeg. Minor inconsistency: SKILL.md suggests installing only requests, whereas requirements.txt also lists pydub and python-dotenv; this is not a security issue but is a documentation mismatch to be aware of.
Credentials
Requested environment variables (BAIDU_API_KEY as primary, BAIDU_SECRET_KEY when needed) are proportionate for a Baidu TTS client. The skill supports access_token and IAM key formats as well. One caveat: validate_config enforces specific length/alphanumeric checks for API/Secret that may not match all valid key formats (e.g., bce-v3 IAM keys), causing false failures if using alternate auth methods.
Persistence & Privilege
Skill is not force-included (always: false) and is user-invocable. It allows autonomous invocation (platform default) but does not request elevated or system-wide persistence or credentials for other skills.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install baidu-speech-synthesis
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /baidu-speech-synthesis 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.2.3
Fixed bare except statements in audio_merger.py for better error visibility and debugging
v1.2.2
- Added MIT LICENSE file. - Updated metadata to explicitly declare ffmpeg as a dependency. - Addressed ClawHub security warnings.
v1.2.0
Version 1.2.0 - Switched to using ffmpeg instead of pydub for audio merging, ensuring compatibility with Python 3.13. - Clarified that Baidu TTS API only supports SSML for single voice; improved related documentation. - Enhanced skill documentation with clearer setup, usage, and technical explanations. - Improved default voice mapping and added more robust error handling guidance.
元数据
Slug baidu-speech-synthesis
版本 1.2.3
许可证 MIT-0
累计安装 1
当前安装数 1
历史版本数 3
常见问题

Baidu Speech Synthesis 是什么?

Baidu Intelligent Cloud Speech Synthesis (TTS), supporting multi-role dialogue audio generation, SSML/segment-merge dual modes, speech rate/pitch adjustment. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 178 次。

如何安装 Baidu Speech Synthesis?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install baidu-speech-synthesis」即可一键安装,无需额外配置。

Baidu Speech Synthesis 是免费的吗?

是的,Baidu Speech Synthesis 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Baidu Speech Synthesis 支持哪些平台?

Baidu Speech Synthesis 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Baidu Speech Synthesis?

由 guoxh(@guoxh)开发并维护,当前版本 v1.2.3。

💬 留言讨论