← 返回 Skills 市场

FlowVoice — Clone Any Voice From a Short Audio Sample

Name: FlowVoice — Clone Any Voice From a Short Audio Sample
Author: windseeker1111

作者 windseeker1111 · GitHub ↗ · v1.1.0 · MIT-0

cross-platform ⚠ suspicious

386

总下载

当前安装

版本数

在 OpenClaw 中安装

/install flow-voice

功能描述

Clone any voice from a short audio sample and generate speech with it. Powered by LuxTTS (150x realtime, local, free, no API key). Use when asked to clone a...

使用说明 (SKILL.md)

Flow Voice — Voice Cloning for OpenClaw

Clone any voice from a 3–30 second audio sample and generate speech from text. Powered by LuxTTS — 150x realtime, runs locally, fits in 1GB VRAM, works on CPU and Apple Silicon MPS. No API key, no cloud, no cost.

Output directory: ~/clawd/output/voice/

Commands

What you say	What it does
"clone this voice [audio file]"	Encode a voice profile from a sample
"speak as [name]: [text]"	Generate speech using a saved voice profile
"add voiceover to [video]: [text]"	Generate speech + bake into video with ffmpeg
"list voices"	Show saved voice profiles
"clone voice from URL [url]"	Download audio from URL, then clone

Workflow

Step 1: Clone a voice

uv run ~/clawd/skills/flow-voice/scripts/clone.py \
  --sample /path/to/sample.wav \
  --name "eric"

Saves encoded profile to ~/clawd/output/voice/profiles/eric.pkl. Requires at least 3 seconds of clean audio. 10–30 seconds is ideal.

Step 2: Generate speech

uv run ~/clawd/skills/flow-voice/scripts/speak.py \
  --voice "eric" \
  --text "Hello, this is a test of voice cloning." \
  --output ~/clawd/output/voice/output.wav

Outputs 48kHz WAV. Use --speed 1.0 to adjust pace.

Step 3: Bake into video (optional)

uv run ~/clawd/skills/flow-voice/scripts/speak.py \
  --voice "eric" \
  --text "Your agent can think. Now teach it to draw." \
  --output /tmp/vo.wav

ffmpeg -i input.mp4 -i /tmp/vo.wav \
  -c:v copy -c:a aac -shortest output_with_voice.mp4

One-Shot: Clone + Speak in one command

uv run ~/clawd/skills/flow-voice/scripts/flow_voice.py \
  --sample /path/to/sample.wav \
  --text "Beautiful diagrams, from a single prompt." \
  --output ~/clawd/output/voice/result.wav

No profile saving — just clone and speak immediately.

Bake voiceover directly into a video

uv run ~/clawd/skills/flow-voice/scripts/flow_voice.py \
  --sample /path/to/sample.wav \
  --text "Your agent can think. Now teach it to draw." \
  --video /path/to/animation.mp4 \
  --output ~/clawd/output/voice/final_with_voice.mp4

Parameters

Flag	Default	Description
`--sample`	required	Reference audio file (wav/mp3, min 3s)
`--text`	required	Text to speak
`--output`	auto-named	Output file path
`--video`	none	If set, bakes audio into this video
`--voice`	none	Use saved profile instead of --sample
`--name`	none	Save cloned profile with this name
`--speed`	1.0	Speech speed (0.8 = slower, 1.2 = faster)
`--steps`	4	Inference steps (3–4 recommended)
`--t-shift`	0.9	Sampling param (higher = potentially better quality)
`--smooth`	false	Add smoothing (reduces metallic artifacts)
`--device`	auto	Force cpu / mps / cuda

Tips

Minimum 3 seconds of audio for cloning — 10–30s is ideal
If you hear metallic artifacts, add --smooth
For Apple Silicon (M1/M2/M3), device defaults to mps automatically
First run downloads the model (~200MB) to ~/.cache/huggingface/
Clean audio works best — no background music or noise in the reference sample

Examples

Clone Eric's voice from a recording:

uv run ~/clawd/skills/flow-voice/scripts/flow_voice.py \
  --sample ~/recordings/eric-30s.wav \
  --name eric \
  --text "FlowStay is live. Book your room with AI." \
  --output ~/clawd/output/voice/flowstay-promo.wav

Add voiceover to a Flow Visual Explainer animation:

uv run ~/clawd/skills/flow-voice/scripts/flow_voice.py \
  --voice eric \
  --text "Your agent can think. Now teach it to draw." \
  --video ~/clawd/2026-03-10-flowvisual-c3-magic-wand-comp.mp4 \
  --output ~/clawd/output/voice/flowvisual-voiced.mp4

Quick one-shot from a downloaded audio clip:

yt-dlp -x --audio-format wav -o /tmp/ref.wav "https://www.instagram.com/reel/..."
uv run ~/clawd/skills/flow-voice/scripts/flow_voice.py \
  --sample /tmp/ref.wav \
  --text "Hello from OpenClaw." \
  --output ~/clawd/output/voice/test.wav

Powered by LuxTTS (ysharma3501/LuxTTS, ZipVoice-based) — Free, local, no API key required. Packaged for OpenClaw by Flow — March 2026

安全使用建议

What to check before installing or running: - Verify the packaging: SKILL.md calls clone.py and speak.py, but only flow_voice.py is included — confirm you have the intended scripts or that flow_voice.py covers all documented commands. - Do not load .pkl profile files from untrusted sources. The script uses Python pickle for saved voice profiles; untrusted pickles can execute code when loaded. - Expect network downloads: the model weights will be fetched from Hugging Face (~200MB) on first run and examples use yt-dlp. If you require offline-only operation, do not run until you have pre-downloaded artifacts. - Run in a controlled environment (virtualenv or isolated container) to install pip deps (zipvoice, soundfile, librosa, numpy) and to limit impact if something goes wrong. - Legal/privacy note: cloning voices may raise consent and copyright issues — ensure you have the right to clone a voice before using the skill. - If you want higher assurance, request the missing scripts or an explanation from the maintainer and inspect any saved profile files before loading them.

功能分析

Type: OpenClaw Skill Name: flow-voice Version: 1.1.0 The skill provides legitimate voice cloning functionality using LuxTTS but contains critical security vulnerabilities in `scripts/flow_voice.py`. Specifically, it uses `pickle.load()` to deserialize voice profiles and lacks input sanitization on the `--voice` argument, which allows for path traversal and potential arbitrary code execution (RCE) if a malicious pickle file is loaded. While these appear to be unintentional implementation flaws rather than intentional malware, they present a significant risk to the host environment.

能力评估

✓ Purpose & Capability

Name/description, required binaries (uv, ffmpeg), and Python dependencies (zipvoice, soundfile, librosa, numpy) align with a local LuxTTS-based voice-cloning skill. ffmpeg is appropriate for baking audio into video; uv is used by the SKILL.md commands.

⚠ Instruction Scope

SKILL.md references scripts clone.py and speak.py and a 'clone from URL' flow that are not present in the package — the only included script is flow_voice.py. That mismatch is an incoherence (documentation vs. code) and could cause surprising behavior. The script saves/loads encoded voice profiles using pickle without validation; loading .pkl files from untrusted sources can lead to arbitrary code execution. The runtime will also download model weights from Hugging Face and examples call out tools like yt-dlp (network activity).

ℹ Install Mechanism

There is no automated install spec; this is instruction-only with a Python script. Required pip packages are listed in metadata but not installed automatically. Runtime will download model artifacts (~200MB) from Hugging Face cache, which is expected but means the skill performs network I/O at first run.

✓ Credentials

No environment variables, credentials, or unusual config paths are requested. The skill writes outputs and profiles under the user's home (~/.cache/huggingface and ~/clawd/output/voice), which is proportional to its purpose.

✓ Persistence & Privilege

Skill is not always-enabled and can be invoked by the user. It stores profile files under ~/clawd/output/voice/profiles and does not modify other skills or system-wide agent settings.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install flow-voice
安装完成后，直接呼叫该 Skill 的名称或使用 /flow-voice 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.1.0

Clone any voice from a short audio sample and generate speech. Powered by LuxTTS (150x realtime, local, free, no API key). Supports wav/mp3 input, 48kHz output. Works on CPU and Apple Silicon MPS.

v1.0.2

No changes detected in this version. - Version bump only; contents and instructions remain unchanged. - No new features, fixes, or documentation updates.

v1.0.1

- Capitalization fixed: "name" changed from "flow-voice" to "FlowVoice" in SKILL.md. - No changes to features, CLI, or usage. - All functionality and documentation remain the same.

v1.0.0

flow-voice 1.0.0 - Initial release of voice cloning and speech generation skill for OpenClaw - Clone any voice from a 3–30 second audio sample (wav/mp3 input) - Generate speech from text using cloned voices, save and manage voice profiles - Add generated voiceovers directly into video files - Powered by LuxTTS: Runs locally, supports CPU and Apple Silicon, no API key needed - Outputs high-quality 48kHz audio; fast inference (150x realtime)

元数据

Slug flow-voice

版本 1.1.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 4

常见问题