← 返回 Skills 市场
windseeker1111

FlowVoice — Clone Any Voice From a Short Audio Sample

作者 windseeker1111 · GitHub ↗ · v1.1.0 · MIT-0
cross-platform ⚠ suspicious
386
总下载
0
收藏
0
当前安装
4
版本数
在 OpenClaw 中安装
/install flow-voice
功能描述
Clone any voice from a short audio sample and generate speech with it. Powered by LuxTTS (150x realtime, local, free, no API key). Use when asked to clone a...
使用说明 (SKILL.md)

Flow Voice — Voice Cloning for OpenClaw

Clone any voice from a 3–30 second audio sample and generate speech from text. Powered by LuxTTS — 150x realtime, runs locally, fits in 1GB VRAM, works on CPU and Apple Silicon MPS. No API key, no cloud, no cost.

Output directory: ~/clawd/output/voice/


Commands

What you say What it does
"clone this voice [audio file]" Encode a voice profile from a sample
"speak as [name]: [text]" Generate speech using a saved voice profile
"add voiceover to [video]: [text]" Generate speech + bake into video with ffmpeg
"list voices" Show saved voice profiles
"clone voice from URL [url]" Download audio from URL, then clone

Workflow

Step 1: Clone a voice

uv run ~/clawd/skills/flow-voice/scripts/clone.py \
  --sample /path/to/sample.wav \
  --name "eric"

Saves encoded profile to ~/clawd/output/voice/profiles/eric.pkl. Requires at least 3 seconds of clean audio. 10–30 seconds is ideal.

Step 2: Generate speech

uv run ~/clawd/skills/flow-voice/scripts/speak.py \
  --voice "eric" \
  --text "Hello, this is a test of voice cloning." \
  --output ~/clawd/output/voice/output.wav

Outputs 48kHz WAV. Use --speed 1.0 to adjust pace.

Step 3: Bake into video (optional)

uv run ~/clawd/skills/flow-voice/scripts/speak.py \
  --voice "eric" \
  --text "Your agent can think. Now teach it to draw." \
  --output /tmp/vo.wav

ffmpeg -i input.mp4 -i /tmp/vo.wav \
  -c:v copy -c:a aac -shortest output_with_voice.mp4

One-Shot: Clone + Speak in one command

uv run ~/clawd/skills/flow-voice/scripts/flow_voice.py \
  --sample /path/to/sample.wav \
  --text "Beautiful diagrams, from a single prompt." \
  --output ~/clawd/output/voice/result.wav

No profile saving — just clone and speak immediately.

Bake voiceover directly into a video

uv run ~/clawd/skills/flow-voice/scripts/flow_voice.py \
  --sample /path/to/sample.wav \
  --text "Your agent can think. Now teach it to draw." \
  --video /path/to/animation.mp4 \
  --output ~/clawd/output/voice/final_with_voice.mp4

Parameters

Flag Default Description
--sample required Reference audio file (wav/mp3, min 3s)
--text required Text to speak
--output auto-named Output file path
--video none If set, bakes audio into this video
--voice none Use saved profile instead of --sample
--name none Save cloned profile with this name
--speed 1.0 Speech speed (0.8 = slower, 1.2 = faster)
--steps 4 Inference steps (3–4 recommended)
--t-shift 0.9 Sampling param (higher = potentially better quality)
--smooth false Add smoothing (reduces metallic artifacts)
--device auto Force cpu / mps / cuda

Tips

  • Minimum 3 seconds of audio for cloning — 10–30s is ideal
  • If you hear metallic artifacts, add --smooth
  • For Apple Silicon (M1/M2/M3), device defaults to mps automatically
  • First run downloads the model (~200MB) to ~/.cache/huggingface/
  • Clean audio works best — no background music or noise in the reference sample

Examples

Clone Eric's voice from a recording:

uv run ~/clawd/skills/flow-voice/scripts/flow_voice.py \
  --sample ~/recordings/eric-30s.wav \
  --name eric \
  --text "FlowStay is live. Book your room with AI." \
  --output ~/clawd/output/voice/flowstay-promo.wav

Add voiceover to a Flow Visual Explainer animation:

uv run ~/clawd/skills/flow-voice/scripts/flow_voice.py \
  --voice eric \
  --text "Your agent can think. Now teach it to draw." \
  --video ~/clawd/2026-03-10-flowvisual-c3-magic-wand-comp.mp4 \
  --output ~/clawd/output/voice/flowvisual-voiced.mp4

Quick one-shot from a downloaded audio clip:

yt-dlp -x --audio-format wav -o /tmp/ref.wav "https://www.instagram.com/reel/..."
uv run ~/clawd/skills/flow-voice/scripts/flow_voice.py \
  --sample /tmp/ref.wav \
  --text "Hello from OpenClaw." \
  --output ~/clawd/output/voice/test.wav

Powered by LuxTTS (ysharma3501/LuxTTS, ZipVoice-based) — Free, local, no API key required. Packaged for OpenClaw by Flow — March 2026

安全使用建议
What to check before installing or running: - Verify the packaging: SKILL.md calls clone.py and speak.py, but only flow_voice.py is included — confirm you have the intended scripts or that flow_voice.py covers all documented commands. - Do not load .pkl profile files from untrusted sources. The script uses Python pickle for saved voice profiles; untrusted pickles can execute code when loaded. - Expect network downloads: the model weights will be fetched from Hugging Face (~200MB) on first run and examples use yt-dlp. If you require offline-only operation, do not run until you have pre-downloaded artifacts. - Run in a controlled environment (virtualenv or isolated container) to install pip deps (zipvoice, soundfile, librosa, numpy) and to limit impact if something goes wrong. - Legal/privacy note: cloning voices may raise consent and copyright issues — ensure you have the right to clone a voice before using the skill. - If you want higher assurance, request the missing scripts or an explanation from the maintainer and inspect any saved profile files before loading them.
功能分析
Type: OpenClaw Skill Name: flow-voice Version: 1.1.0 The skill provides legitimate voice cloning functionality using LuxTTS but contains critical security vulnerabilities in `scripts/flow_voice.py`. Specifically, it uses `pickle.load()` to deserialize voice profiles and lacks input sanitization on the `--voice` argument, which allows for path traversal and potential arbitrary code execution (RCE) if a malicious pickle file is loaded. While these appear to be unintentional implementation flaws rather than intentional malware, they present a significant risk to the host environment.
能力评估
Purpose & Capability
Name/description, required binaries (uv, ffmpeg), and Python dependencies (zipvoice, soundfile, librosa, numpy) align with a local LuxTTS-based voice-cloning skill. ffmpeg is appropriate for baking audio into video; uv is used by the SKILL.md commands.
Instruction Scope
SKILL.md references scripts clone.py and speak.py and a 'clone from URL' flow that are not present in the package — the only included script is flow_voice.py. That mismatch is an incoherence (documentation vs. code) and could cause surprising behavior. The script saves/loads encoded voice profiles using pickle without validation; loading .pkl files from untrusted sources can lead to arbitrary code execution. The runtime will also download model weights from Hugging Face and examples call out tools like yt-dlp (network activity).
Install Mechanism
There is no automated install spec; this is instruction-only with a Python script. Required pip packages are listed in metadata but not installed automatically. Runtime will download model artifacts (~200MB) from Hugging Face cache, which is expected but means the skill performs network I/O at first run.
Credentials
No environment variables, credentials, or unusual config paths are requested. The skill writes outputs and profiles under the user's home (~/.cache/huggingface and ~/clawd/output/voice), which is proportional to its purpose.
Persistence & Privilege
Skill is not always-enabled and can be invoked by the user. It stores profile files under ~/clawd/output/voice/profiles and does not modify other skills or system-wide agent settings.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install flow-voice
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /flow-voice 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.1.0
Clone any voice from a short audio sample and generate speech. Powered by LuxTTS (150x realtime, local, free, no API key). Supports wav/mp3 input, 48kHz output. Works on CPU and Apple Silicon MPS.
v1.0.2
No changes detected in this version. - Version bump only; contents and instructions remain unchanged. - No new features, fixes, or documentation updates.
v1.0.1
- Capitalization fixed: "name" changed from "flow-voice" to "FlowVoice" in SKILL.md. - No changes to features, CLI, or usage. - All functionality and documentation remain the same.
v1.0.0
flow-voice 1.0.0 - Initial release of voice cloning and speech generation skill for OpenClaw - Clone any voice from a 3–30 second audio sample (wav/mp3 input) - Generate speech from text using cloned voices, save and manage voice profiles - Add generated voiceovers directly into video files - Powered by LuxTTS: Runs locally, supports CPU and Apple Silicon, no API key needed - Outputs high-quality 48kHz audio; fast inference (150x realtime)
元数据
Slug flow-voice
版本 1.1.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 4
常见问题

FlowVoice — Clone Any Voice From a Short Audio Sample 是什么?

Clone any voice from a short audio sample and generate speech with it. Powered by LuxTTS (150x realtime, local, free, no API key). Use when asked to clone a... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 386 次。

如何安装 FlowVoice — Clone Any Voice From a Short Audio Sample?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install flow-voice」即可一键安装,无需额外配置。

FlowVoice — Clone Any Voice From a Short Audio Sample 是免费的吗?

是的,FlowVoice — Clone Any Voice From a Short Audio Sample 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

FlowVoice — Clone Any Voice From a Short Audio Sample 支持哪些平台?

FlowVoice — Clone Any Voice From a Short Audio Sample 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 FlowVoice — Clone Any Voice From a Short Audio Sample?

由 windseeker1111(@windseeker1111)开发并维护,当前版本 v1.1.0。

💬 留言讨论