← 返回 Skills 市场
aleglowa

Deapi Audio

作者 Alex Glowacki · GitHub ↗ · v1.0.1 · MIT-0
cross-platform ✓ 安全检测通过
164
总下载
1
收藏
0
当前安装
2
版本数
在 OpenClaw 中安装
/install deapi-audio
功能描述
Text-to-speech, voice cloning, voice design, and transcribe audio files via deAPI GPU network. Trigger on 'text to speech', 'TTS', 'generate voice', 'read al...
使用说明 (SKILL.md)

deAPI Audio

Text-to-speech, voice cloning, voice design, and audio transcription via deAPI decentralized GPU network.

Scripts

Script Use when...
scripts/text-to-speech.sh User wants to convert text to spoken audio
scripts/voice-clone.sh User wants to clone/replicate a voice from a sample audio file
scripts/voice-design.sh User wants to generate speech with a voice described in natural language
scripts/speech-to-text.sh User wants to transcribe an audio file (AAC, MP3, OGG, WAV, WebM, FLAC, max 10MB)

Your config

! cat ${CLAUDE_SKILL_DIR}/config.json 2>/dev/null || echo "NOT_CONFIGURED"

If the config above is NOT_CONFIGURED, ask the user:

Then write the answer to ${CLAUDE_SKILL_DIR}/config.json as { "api_key": "their_key" }.

Alternatively, the user can set the DEAPI_API_KEY environment variable directly, which takes priority over config.json.

Gotchas

  • For YouTube/video transcription, use the deapi-video skill instead. This skill handles audio-only files (.mp3, .wav, .m4a, .flac, .ogg).
  • Three TTS models: Kokoro (default), Chatterbox, Qwen3. Use --model Chatterbox or --model Qwen3 to switch.
  • Kokoro: Voice ID format is {lang}{gender}_{name}. Language is auto-detected from voice prefix if --lang is omitted.
  • Chatterbox: voice is always default, speed is fixed at 1, supports 22 languages. Text limit 10-2000 chars.
  • Kokoro: text limit 3-10001 chars. Long text may timeout — split into segments and generate separately.
  • TTS output format defaults to mp3. WAV files are much larger but lossless.
  • Kokoro: speed range is 0.5-2.0. Values outside this range cause errors.
  • Qwen3 Voice Clone (voice-clone.sh): ref audio must be 5-15 seconds. Too short or too long degrades quality. Formats: MP3, WAV, FLAC, OGG, M4A. URLs are downloaded automatically.
  • Qwen3 Voice Design (voice-design.sh): quality depends on the --instruct description. Encourage specific details: gender, age, accent, speaking style, emotion.
  • Qwen3 models use full language names (English, French, etc.) NOT language codes. 10 supported languages: English, Italian, Spanish, Portuguese, Russian, French, German, Korean, Japanese, Chinese.
  • Qwen3 TTS (--model Qwen3): 9 voices available, default Vivian. Chinese language lacks Ryan voice.
  • Qwen3 text limit is 10-5000 chars. Speed is fixed at 1. Voice Clone and Voice Design use voice=default.
  • Audio transcription accepts a local file path or URL (--audio). Formats: AAC, MP3, OGG, WAV, WebM, FLAC. Max 10 MB.
  • Result URLs expire in 24 hours. Download promptly.

Quick examples

# Basic TTS
bash scripts/text-to-speech.sh --text "Hello world"

# British voice
bash scripts/text-to-speech.sh --text "Good morning" --voice bf_emma

# Chatterbox model (multilingual)
bash scripts/text-to-speech.sh --model Chatterbox --text "Bonjour le monde" --lang fr

# Qwen3 model
bash scripts/text-to-speech.sh --model Qwen3 --text "Hello world" --voice Serena --lang English

# Clone a voice from a sample
bash scripts/voice-clone.sh --text "Hello, this is my cloned voice" --ref-audio /path/to/sample.mp3

# Clone with reference transcript for better accuracy
bash scripts/voice-clone.sh --text "Welcome to the show" --ref-audio /path/to/sample.wav --ref-text "This is the original transcript"

# Design a custom voice from description
bash scripts/voice-design.sh --text "Good morning everyone" --instruct "A warm, deep male voice with a slight British accent"

# Voice design in another language
bash scripts/voice-design.sh --text "Bonjour tout le monde" --instruct "A cheerful young female voice" --lang French

# Transcribe audio file (local or URL)
bash scripts/speech-to-text.sh --audio /path/to/recording.mp3
bash scripts/speech-to-text.sh --audio "https://example.com/podcast.mp3"

For the full voice list and language codes, see references/voices.md.

安全使用建议
This skill appears to do what it says, but review and consider these practical risks before installing: - Provide the API key via environment variable (DEAPI_API_KEY) rather than saving it in the skill's config.json if you want less disk-persistent exposure. If you do use config.json be aware the API key is stored in plaintext under the skill directory. - The scripts download user-supplied URLs (ref-audio, audio). Do not provide internal or sensitive endpoints (e.g., http://169.254.x.x, internal cloud metadata endpoints, or private intranet URLs) because the skill will fetch them and may upload content to the external deAPI service. This is an expected capability for remote-audio support but is an SSRF/exfiltration risk if misused. - The scripts log API responses to stderr on some errors; sensitive response payloads could appear in agent logs. Rotate the API key if it's ever exposed. - Voice cloning involves uploading sample audio — consider privacy and legal implications before cloning someone else's voice. If you need stronger guarantees: keep the key in a protected secret store, avoid providing internal URLs as inputs, inspect the scripts locally before use, and monitor/network-restrict the runtime environment if you want to limit outbound fetches.
功能分析
Type: OpenClaw Skill Name: deapi-audio Version: 1.0.1 The deapi-audio skill bundle provides a legitimate integration with the deAPI GPU network for audio processing tasks like TTS, voice cloning, and transcription. The scripts (text-to-speech.sh, voice-clone.sh, voice-design.sh, and speech-to-text.sh) use standard shell utilities (curl, jq) to interact with the official API at api.deapi.ai. The bundle includes proper input handling, such as using jq to safely escape strings for JSON payloads and mktemp for temporary file management. No evidence of data exfiltration, malicious execution, or prompt injection was found; the behavior is entirely consistent with the stated purpose of the skill.
能力评估
Purpose & Capability
Name/description, declared dependencies (curl, jq), primaryEnv (DEAPI_API_KEY), API base URL, and the provided scripts all align with a deAPI audio integration. There are no unrelated credentials or binaries requested.
Instruction Scope
SKILL.md and the scripts stay within the audio TTS/STT/voice-clone/design domain. The scripts will download user-supplied audio URLs and upload local files to the deAPI service (expected for this purpose). Note: accepting arbitrary URLs means the skill will fetch external resources (and could access internal URLs if provided), and error logging prints raw API responses in some failure cases (the scripts echo response content to stderr), so those behaviors are expected but worth being aware of.
Install Mechanism
This is an instruction-only skill with bundled shell scripts and no install spec. No remote downloads or package installs are performed by the skill itself, minimizing install-time risk.
Credentials
Only DEAPI_API_KEY is required and is the declared primary credential — appropriate for an API client. The skill also supports a plaintext fallback config.json written to the skill directory to store the API key; storing keys on disk is functional but increases exposure compared with using an environment variable only.
Persistence & Privilege
always:false (no forced inclusion). The skill can be invoked autonomously by the agent (default behavior), which is normal. It does not request system-wide privileges or modify other skills' configs.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install deapi-audio
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /deapi-audio 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.1
No file or functionality changes detected in this release. - Version updated to 1.0.1 with no modifications to files or documentation.
v1.0.0
Initial release of deapi-audio v1.0.0: - Provides text-to-speech, voice cloning, custom voice design, and audio transcription via deAPI GPU network. - Supports three TTS models: Kokoro (default), Chatterbox (multilingual), and Qwen3 (advanced features). - Includes scripts for TTS, voice cloning, voice design, and speech-to-text transcription. - Handles various audio formats (AAC, MP3, OGG, WAV, WebM, FLAC). - API key configuration via environment variable or JSON config file. - Clear guidance and gotchas for practical usage, including model limits and tips for optimal results.
元数据
Slug deapi-audio
版本 1.0.1
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 2
常见问题

Deapi Audio 是什么?

Text-to-speech, voice cloning, voice design, and transcribe audio files via deAPI GPU network. Trigger on 'text to speech', 'TTS', 'generate voice', 'read al... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 164 次。

如何安装 Deapi Audio?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install deapi-audio」即可一键安装,无需额外配置。

Deapi Audio 是免费的吗?

是的,Deapi Audio 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Deapi Audio 支持哪些平台?

Deapi Audio 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Deapi Audio?

由 Alex Glowacki(@aleglowa)开发并维护,当前版本 v1.0.1。

💬 留言讨论