← 返回 Skills 市场
nissan

Elevenlabs Toolkit

作者 Nissan Dookeran · GitHub ↗ · v1.0.2 · MIT-0
cross-platform ⚠ suspicious
581
总下载
0
收藏
5
当前安装
3
版本数
在 OpenClaw 中安装
/install elevenlabs-toolkit
功能描述
ElevenLabs voice API integration — TTS, sound effects, music generation, speech-to-text, voice isolation, and streaming. Use when building voice-enabled apps...
使用说明 (SKILL.md)

ElevenLabs Toolkit

Programmatic access to all 7 ElevenLabs API capabilities via FastAPI endpoints or standalone Python functions.


When to Use This / When NOT to Use This

Use ElevenLabs when:

  • Generating high-quality narration audio for videos, demos, or content (especially with Rachel or a consistent character voice)
  • Building a voice-enabled app that needs streamed speech in real-time
  • Transcribing audio files (STT/Scribe)
  • Generating ambient sound effects or background music from text descriptions
  • Isolating clean voice from a noisy recording

Do NOT use ElevenLabs when:

  • You need fast/cheap TTS with no quality bar — use local TTS instead (see below)
  • You're offline or the API key isn't available
  • You're generating large volumes of test audio and don't want to burn character quota

ElevenLabs vs Local TTS (kokoro / chatterbox)

Criteria ElevenLabs Local TTS (kokoro/chatterbox)
Voice quality ★★★★★ — natural, expressive ★★★ — good but robotic edges
Cost Chars deducted from monthly quota Free, unlimited
Latency ~300–800ms API round-trip ~50–200ms local inference
Voice consistency Named voices (Rachel etc.) persist Model-dependent
Offline use ❌ Requires internet + API key ✅ Fully local
Best for Final narration, published content Drafts, testing, high-volume batch

Rule of thumb: Use ElevenLabs for anything that will be seen/heard by a user. Use local TTS for drafts, tests, and volume work.


Capabilities

Tool Endpoint What It Does
Voices GET /api/voices Browse available voices with metadata
TTS POST /api/voice/tts Batch text-to-speech (any voice, any language)
TTS Stream WS /api/voice/stream Real-time WebSocket TTS streaming
Sound Effects POST /api/voice/sfx Generate ambient audio from text prompts
Music POST /api/voice/music Generate background music from descriptions
STT (Scribe) POST /api/voice/stt Transcribe audio with language detection
Voice Isolation POST /api/voice/isolate Extract clean voice from noisy audio

Known Voice IDs

These are confirmed voices used in OpenClaw workflows. Always prefer these over browsing the full list:

Voice Voice ID Best For
Rachel 21m00Tcm4TlvDq8ikWAM Default narration — clear, warm, American English
Adam pNInz6obpgDQGcFmaJgB Male narration, authoritative tone
Domi AZnzlk1XvdvUeBnXmlld Energetic, conversational
Bella EXAVITQu4vr4xnSDxMaL Soft, gentle narration

Default for all narration tasks: Use Rachel (21m00Tcm4TlvDq8ikWAM) unless explicitly specified otherwise.

To get the full current list from the API:

curl -s -H "xi-api-key: $ELEVENLABS_API_KEY" https://api.elevenlabs.io/v1/voices | python3 -m json.tool

Quick Start

import httpx

BASE = "http://localhost:8000"  # Your FastAPI app
KEY = os.environ["ELEVENLABS_API_KEY"]

# Get voices
voices = httpx.get(f"{BASE}/api/voices").json()

# Generate speech
audio = httpx.post(f"{BASE}/api/voice/tts", json={
    "text": "Hello world",
    "voice_id": voices[0]["voice_id"],
    "model_id": "eleven_multilingual_v2"
}).content  # Returns raw audio bytes

# Generate sound effects
sfx = httpx.post(f"{BASE}/api/voice/sfx", json={
    "prompt": "ocean waves on a quiet beach at night"
}).content

Audio Output Format

TTS and SFX endpoints return raw audio bytes (not base64, not JSON).

# Correct: .content gives you bytes
audio_bytes = response.content  # type: bytes

# Save to file
with open("output.mp3", "wb") as f:
    f.write(audio_bytes)

# The file format is MP3 by default
# File size estimate: ~1 MB per minute of speech at standard quality

What you get back from each endpoint:

Endpoint Response type How to handle
POST /api/voice/tts bytes (MP3) Write directly to .mp3 file
POST /api/voice/sfx bytes (MP3) Write directly to .mp3 file
POST /api/voice/music bytes (MP3) Write directly to .mp3 file
POST /api/voice/stt JSON {"text": "transcription...", "language": "en"}
POST /api/voice/isolate bytes (MP3) Write directly to .mp3 file
GET /api/voices JSON List of {voice_id, name, labels, ...}

Voice Selection Guide

  • English only: Use eleven_turbo_v2_5 — faster, no accent bleed
  • Multilingual: Use eleven_multilingual_v2 — supports 29 languages
  • Accent warning: Multilingual model can bleed accents across languages. If an English voice sounds Japanese, switch to turbo.

Quota Management

ElevenLabs charges per character for TTS. Key patterns:

  • Cache aggressively — identical text + voice = identical audio
  • Use prompt-cache skill for SHA-256 dedup before calling TTS
  • A 6-scene children's story ≈ 2,000 characters
  • Free tier: 10k chars/month. Starter: 30k. Creator: 100k.

Integration

Copy scripts/elevenlabs_api.py into your FastAPI app and mount the router:

from elevenlabs_api import router
app.include_router(router)

Set ELEVENLABS_API_KEY in your environment. All endpoints handle errors gracefully with proper HTTP status codes.


What If the FastAPI Server Isn't Running?

The Quick Start examples assume http://localhost:8000 is live. If it's not:

# Check if server is up before calling
import httpx

try:
    httpx.get("http://localhost:8000/health", timeout=2.0)
except httpx.ConnectError:
    # Server is not running — start it first
    import subprocess
    subprocess.Popen(["uvicorn", "elevenlabs_api:app", "--port", "8000"])
    import time; time.sleep(2)  # Give it a moment to bind

Or call the ElevenLabs API directly without the FastAPI wrapper — the scripts/elevenlabs_api.py functions are importable standalone:

from elevenlabs_api import generate_tts  # if the module exposes standalone functions

Error Handling: API Key and Rate Limits

Missing API key:

httpx.HTTPStatusError: 401 Unauthorized
{"detail": {"status": "unauthorized", "message": "Invalid API key"}}

→ Check ELEVENLABS_API_KEY is set: echo $ELEVENLABS_API_KEY → Retrieve from 1Password: op read "op://OpenClaw/ElevenLabs API Credentials/credential"

Rate limited (429):

{"detail": {"status": "too_many_requests", "message": "Too many requests"}}

→ Wait and retry with exponential backoff. ElevenLabs rate limits are per-minute on the free/starter tiers. → On Creator tier and above, limits are much higher — check your tier in the ElevenLabs dashboard.

Quota exhausted:

{"detail": {"status": "quota_exceeded", "message": "Quota exceeded"}}

→ Character quota for the month is used up. Either wait for monthly reset or upgrade tier. → Check current usage: curl -s -H "xi-api-key: $KEY" https://api.elevenlabs.io/v1/user/subscription


Files

  • scripts/elevenlabs_api.py — FastAPI router with all 7 endpoints

Common Mistakes

  1. Treating the response as JSON when it's bytes

    • response.json() on a TTS call → JSONDecodeError
    • response.content → raw bytes, then write to .mp3
  2. Using the wrong voice ID

    • ElevenLabs voice IDs are opaque strings, not names
    • "voice_id": "Rachel" → 404 or wrong voice
    • "voice_id": "21m00Tcm4TlvDq8ikWAM" (Rachel's actual ID)
  3. Calling TTS for large batches without caching

    • Identical text+voice always produces identical audio — don't re-generate what's already cached
    • Burns character quota unnecessarily
  4. Using multilingual model for English-only content

    • eleven_multilingual_v2 is slower and can produce accent artifacts on English-only text
    • Use eleven_turbo_v2_5 for English-only work
  5. Not checking the FastAPI server is running before calling

    • httpx.ConnectError is confusing if you forget the local server dependency
    • Add a health check or start-server step before calling endpoints

Security Notes

This skill uses patterns that may trigger automated security scanners:

  • base64: Used for encoding audio/binary data in API responses (standard practice for media APIs)
  • UploadFile: FastAPI's built-in file upload parameter for STT/voice isolation endpoints
  • "system prompt": Refers to configuring agent instructions, not prompt injection
安全使用建议
This skill appears to implement the ElevenLabs features it advertises, but you should be cautious before installing or running it: 1) The package includes Python code that requires additional libraries (fastapi, httpx, websockets, mistralai, etc.) but provides no install instructions — ask the author for a requirements file or installation spec or prepare to install dependencies yourself. 2) The code can optionally call Mistral if MISTRAL_API_KEY is present, but that env var is not declared; if you do not want it to call Mistral, ensure MISTRAL_API_KEY is not set in your environment. 3) The skill needs outbound network access and your ELEVENLABS_API_KEY; never share that key with untrusted code. 4) Confirm expected behavior (for streaming, STT uploads, and conversational features) in a safe environment before using in production. If you need higher assurance, request the author to: (a) declare all required env vars (including optional ones), (b) provide a requirements.txt or install spec that uses trusted package sources, and (c) document exactly when additional services (like Mistral) will be invoked.
功能分析
Type: OpenClaw Skill Name: elevenlabs-toolkit Version: 1.0.2 The elevenlabs-toolkit skill provides a legitimate FastAPI-based integration for ElevenLabs voice and audio services. The code in scripts/elevenlabs_api.py correctly handles API keys via environment variables and communicates only with official ElevenLabs and Mistral AI endpoints. The documentation in SKILL.md is informative and includes proactive security notes regarding common false-positive triggers like base64 encoding and file uploads.
能力评估
Purpose & Capability
The name/description (ElevenLabs TTS, STT, SFX, music, streaming, voice isolation) align with the code and SKILL.md: the code proxies to api.elevenlabs.io endpoints for voices, text-to-speech, sound generation, speech-to-text, isolation, and streaming. ELEVENLABS_API_KEY is declared and used as the primary credential.
Instruction Scope
SKILL.md and the included Python implement only the declared ElevenLabs features and expose FastAPI endpoints for them. However, the code also implements a conversational 'story concierge' that calls a third-party Mistral client if MISTRAL_API_KEY is present — this behavior is not declared in requires.env and broadens the runtime scope. SKILL.md's metadata mentions base64 usage but the implementation returns raw bytes (minor inconsistency).
Install Mechanism
There is no install spec, yet the included code depends on multiple Python packages (fastapi, starlette, httpx, websockets, mistralai, etc.). Without a declared install step, an environment running this skill may lack required dependencies or the operator may need to install them manually; that absence is an operational and supply-chain mismatch (not necessarily malicious but worth noting).
Credentials
ELEVENLABS_API_KEY is appropriate and declared as primary. The code optionally reads MISTRAL_API_KEY and imports a 'mistralai' client to call another service, but MISTRAL_API_KEY is not listed in requires.env. Requesting or using additional service credentials without declaration is a proportionality/information-gap concern.
Persistence & Privilege
The skill does not request always:true, does not modify other skills, and has no declared persistent/system-level privileges. It performs outbound network calls to ElevenLabs (expected for the stated purpose).
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install elevenlabs-toolkit
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /elevenlabs-toolkit 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.2
Add security_notes explaining base64 audio encoding, UploadFile type, and system prompt config context
v1.0.1
- Added explicit security notes field to metadata, clarifying use of base64 encoding and FastAPI's UploadFile. - Version updated to 1.0.1. - Documentation and integration details remain unchanged.
v1.0.0
Initial release — extracted from Sandman Tales v2 hackathon
元数据
Slug elevenlabs-toolkit
版本 1.0.2
许可证 MIT-0
累计安装 6
当前安装数 5
历史版本数 3
常见问题

Elevenlabs Toolkit 是什么?

ElevenLabs voice API integration — TTS, sound effects, music generation, speech-to-text, voice isolation, and streaming. Use when building voice-enabled apps... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 581 次。

如何安装 Elevenlabs Toolkit?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install elevenlabs-toolkit」即可一键安装,无需额外配置。

Elevenlabs Toolkit 是免费的吗?

是的,Elevenlabs Toolkit 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Elevenlabs Toolkit 支持哪些平台?

Elevenlabs Toolkit 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Elevenlabs Toolkit?

由 Nissan Dookeran(@nissan)开发并维护,当前版本 v1.0.2。

💬 留言讨论