← Back to Skills Marketplace
windseeker1111

FlowVoice — Clone Any Voice From a Short Audio Sample

by windseeker1111 · GitHub ↗ · v1.1.0 · MIT-0
cross-platform ⚠ suspicious
386
Downloads
0
Stars
0
Active Installs
4
Versions
Install in OpenClaw
/install flow-voice
Description
Clone any voice from a short audio sample and generate speech with it. Powered by LuxTTS (150x realtime, local, free, no API key). Use when asked to clone a...
README (SKILL.md)

Flow Voice — Voice Cloning for OpenClaw

Clone any voice from a 3–30 second audio sample and generate speech from text. Powered by LuxTTS — 150x realtime, runs locally, fits in 1GB VRAM, works on CPU and Apple Silicon MPS. No API key, no cloud, no cost.

Output directory: ~/clawd/output/voice/


Commands

What you say What it does
"clone this voice [audio file]" Encode a voice profile from a sample
"speak as [name]: [text]" Generate speech using a saved voice profile
"add voiceover to [video]: [text]" Generate speech + bake into video with ffmpeg
"list voices" Show saved voice profiles
"clone voice from URL [url]" Download audio from URL, then clone

Workflow

Step 1: Clone a voice

uv run ~/clawd/skills/flow-voice/scripts/clone.py \
  --sample /path/to/sample.wav \
  --name "eric"

Saves encoded profile to ~/clawd/output/voice/profiles/eric.pkl. Requires at least 3 seconds of clean audio. 10–30 seconds is ideal.

Step 2: Generate speech

uv run ~/clawd/skills/flow-voice/scripts/speak.py \
  --voice "eric" \
  --text "Hello, this is a test of voice cloning." \
  --output ~/clawd/output/voice/output.wav

Outputs 48kHz WAV. Use --speed 1.0 to adjust pace.

Step 3: Bake into video (optional)

uv run ~/clawd/skills/flow-voice/scripts/speak.py \
  --voice "eric" \
  --text "Your agent can think. Now teach it to draw." \
  --output /tmp/vo.wav

ffmpeg -i input.mp4 -i /tmp/vo.wav \
  -c:v copy -c:a aac -shortest output_with_voice.mp4

One-Shot: Clone + Speak in one command

uv run ~/clawd/skills/flow-voice/scripts/flow_voice.py \
  --sample /path/to/sample.wav \
  --text "Beautiful diagrams, from a single prompt." \
  --output ~/clawd/output/voice/result.wav

No profile saving — just clone and speak immediately.

Bake voiceover directly into a video

uv run ~/clawd/skills/flow-voice/scripts/flow_voice.py \
  --sample /path/to/sample.wav \
  --text "Your agent can think. Now teach it to draw." \
  --video /path/to/animation.mp4 \
  --output ~/clawd/output/voice/final_with_voice.mp4

Parameters

Flag Default Description
--sample required Reference audio file (wav/mp3, min 3s)
--text required Text to speak
--output auto-named Output file path
--video none If set, bakes audio into this video
--voice none Use saved profile instead of --sample
--name none Save cloned profile with this name
--speed 1.0 Speech speed (0.8 = slower, 1.2 = faster)
--steps 4 Inference steps (3–4 recommended)
--t-shift 0.9 Sampling param (higher = potentially better quality)
--smooth false Add smoothing (reduces metallic artifacts)
--device auto Force cpu / mps / cuda

Tips

  • Minimum 3 seconds of audio for cloning — 10–30s is ideal
  • If you hear metallic artifacts, add --smooth
  • For Apple Silicon (M1/M2/M3), device defaults to mps automatically
  • First run downloads the model (~200MB) to ~/.cache/huggingface/
  • Clean audio works best — no background music or noise in the reference sample

Examples

Clone Eric's voice from a recording:

uv run ~/clawd/skills/flow-voice/scripts/flow_voice.py \
  --sample ~/recordings/eric-30s.wav \
  --name eric \
  --text "FlowStay is live. Book your room with AI." \
  --output ~/clawd/output/voice/flowstay-promo.wav

Add voiceover to a Flow Visual Explainer animation:

uv run ~/clawd/skills/flow-voice/scripts/flow_voice.py \
  --voice eric \
  --text "Your agent can think. Now teach it to draw." \
  --video ~/clawd/2026-03-10-flowvisual-c3-magic-wand-comp.mp4 \
  --output ~/clawd/output/voice/flowvisual-voiced.mp4

Quick one-shot from a downloaded audio clip:

yt-dlp -x --audio-format wav -o /tmp/ref.wav "https://www.instagram.com/reel/..."
uv run ~/clawd/skills/flow-voice/scripts/flow_voice.py \
  --sample /tmp/ref.wav \
  --text "Hello from OpenClaw." \
  --output ~/clawd/output/voice/test.wav

Powered by LuxTTS (ysharma3501/LuxTTS, ZipVoice-based) — Free, local, no API key required. Packaged for OpenClaw by Flow — March 2026

Usage Guidance
What to check before installing or running: - Verify the packaging: SKILL.md calls clone.py and speak.py, but only flow_voice.py is included — confirm you have the intended scripts or that flow_voice.py covers all documented commands. - Do not load .pkl profile files from untrusted sources. The script uses Python pickle for saved voice profiles; untrusted pickles can execute code when loaded. - Expect network downloads: the model weights will be fetched from Hugging Face (~200MB) on first run and examples use yt-dlp. If you require offline-only operation, do not run until you have pre-downloaded artifacts. - Run in a controlled environment (virtualenv or isolated container) to install pip deps (zipvoice, soundfile, librosa, numpy) and to limit impact if something goes wrong. - Legal/privacy note: cloning voices may raise consent and copyright issues — ensure you have the right to clone a voice before using the skill. - If you want higher assurance, request the missing scripts or an explanation from the maintainer and inspect any saved profile files before loading them.
Capability Analysis
Type: OpenClaw Skill Name: flow-voice Version: 1.1.0 The skill provides legitimate voice cloning functionality using LuxTTS but contains critical security vulnerabilities in `scripts/flow_voice.py`. Specifically, it uses `pickle.load()` to deserialize voice profiles and lacks input sanitization on the `--voice` argument, which allows for path traversal and potential arbitrary code execution (RCE) if a malicious pickle file is loaded. While these appear to be unintentional implementation flaws rather than intentional malware, they present a significant risk to the host environment.
Capability Assessment
Purpose & Capability
Name/description, required binaries (uv, ffmpeg), and Python dependencies (zipvoice, soundfile, librosa, numpy) align with a local LuxTTS-based voice-cloning skill. ffmpeg is appropriate for baking audio into video; uv is used by the SKILL.md commands.
Instruction Scope
SKILL.md references scripts clone.py and speak.py and a 'clone from URL' flow that are not present in the package — the only included script is flow_voice.py. That mismatch is an incoherence (documentation vs. code) and could cause surprising behavior. The script saves/loads encoded voice profiles using pickle without validation; loading .pkl files from untrusted sources can lead to arbitrary code execution. The runtime will also download model weights from Hugging Face and examples call out tools like yt-dlp (network activity).
Install Mechanism
There is no automated install spec; this is instruction-only with a Python script. Required pip packages are listed in metadata but not installed automatically. Runtime will download model artifacts (~200MB) from Hugging Face cache, which is expected but means the skill performs network I/O at first run.
Credentials
No environment variables, credentials, or unusual config paths are requested. The skill writes outputs and profiles under the user's home (~/.cache/huggingface and ~/clawd/output/voice), which is proportional to its purpose.
Persistence & Privilege
Skill is not always-enabled and can be invoked by the user. It stores profile files under ~/clawd/output/voice/profiles and does not modify other skills or system-wide agent settings.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install flow-voice
  3. After installation, invoke the skill by name or use /flow-voice
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.1.0
Clone any voice from a short audio sample and generate speech. Powered by LuxTTS (150x realtime, local, free, no API key). Supports wav/mp3 input, 48kHz output. Works on CPU and Apple Silicon MPS.
v1.0.2
No changes detected in this version. - Version bump only; contents and instructions remain unchanged. - No new features, fixes, or documentation updates.
v1.0.1
- Capitalization fixed: "name" changed from "flow-voice" to "FlowVoice" in SKILL.md. - No changes to features, CLI, or usage. - All functionality and documentation remain the same.
v1.0.0
flow-voice 1.0.0 - Initial release of voice cloning and speech generation skill for OpenClaw - Clone any voice from a 3–30 second audio sample (wav/mp3 input) - Generate speech from text using cloned voices, save and manage voice profiles - Add generated voiceovers directly into video files - Powered by LuxTTS: Runs locally, supports CPU and Apple Silicon, no API key needed - Outputs high-quality 48kHz audio; fast inference (150x realtime)
Metadata
Slug flow-voice
Version 1.1.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 4
Frequently Asked Questions

What is FlowVoice — Clone Any Voice From a Short Audio Sample?

Clone any voice from a short audio sample and generate speech with it. Powered by LuxTTS (150x realtime, local, free, no API key). Use when asked to clone a... It is an AI Agent Skill for Claude Code / OpenClaw, with 386 downloads so far.

How do I install FlowVoice — Clone Any Voice From a Short Audio Sample?

Run "/install flow-voice" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is FlowVoice — Clone Any Voice From a Short Audio Sample free?

Yes, FlowVoice — Clone Any Voice From a Short Audio Sample is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does FlowVoice — Clone Any Voice From a Short Audio Sample support?

FlowVoice — Clone Any Voice From a Short Audio Sample is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created FlowVoice — Clone Any Voice From a Short Audio Sample?

It is built and maintained by windseeker1111 (@windseeker1111); the current version is v1.1.0.

💬 Comments