功能描述

Clone any voice and generate speech using Coqui XTTS v2. SUPER SIMPLE - provide a voice sample (6-30 sec WAV) and text, get cloned voice audio. Supports 14+ languages. Use when the user wants to (1) Clone their voice or someone else's voice, (2) Generate speech that sounds like a specific person, (3) Create personalized voice messages, (4) Multi-lingual voice cloning (speak any language with cloned voice).

使用说明 (SKILL.md)

CloneV Skill - Voice Cloning Made Simple

Name: Clonev
Author: instant-picture

⚠️ CRITICAL INSTRUCTIONS FOR AI MODELS

DO NOT try to use Docker containers directly. DO NOT try to interact with coqui-xtts container - it is broken and restarting. DO NOT try to use APIs or servers.

ONLY USE THE SCRIPT: scripts/clonev.sh

The script handles everything automatically. Just call it with text, voice sample, and language.

What This Skill Does

Clones any voice from a short audio sample and generates new speech in that voice.

Input:

Text to speak
Voice sample (WAV file, 6-30 seconds)
Language code

Output: OGG voice file (cloned voice speaking the text)

Works with: Any voice! Yours, a celebrity, a character, etc.

The ONE Command You Need

$(scripts/clonev.sh "Your text here" /path/to/voice_sample.wav language)

That's it! Nothing else needed.

Step-by-Step Usage (FOR AI MODELS)

Step 1: Get the required inputs

Text to speak (from user)
Path to voice sample WAV file (from user)
Language code (from user or default to en)

Step 2: Run the script

VOICE_FILE=$(scripts/clonev.sh "TEXT_HERE" "/path/to/sample.wav" LANGUAGE)

Step 3: Use the output

The variable $VOICE_FILE now contains the path to the generated OGG file.

Complete Working Examples

Example 1: Clone voice and send to Telegram

# Generate cloned voice
VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "Hello, this is my cloned voice!" "/mnt/c/TEMP/Recording 25.wav" en)

# Send to Telegram (as voice message)
message action=send channel=telegram asVoice=true filePath="$VOICE"

Example 2: Clone voice in Czech

# Generate Czech voice
VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "Ahoj, tohle je můj hlas" "/mnt/c/TEMP/Recording 25.wav" cs)

# Send
message action=send channel=telegram asVoice=true filePath="$VOICE"

Example 3: Full workflow with check

#!/bin/bash

# Generate voice
VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "Task completed!" "/path/to/sample.wav" en)

# Verify file was created
if [ -f "$VOICE" ]; then
    echo "Success! Voice file: $VOICE"
    ls -lh "$VOICE"
else
    echo "Error: Voice file not created"
fi

Common Language Codes

Code	Language	Example Usage
`en`	English	`scripts/clonev.sh "Hello" sample.wav en`
`cs`	Czech	`scripts/clonev.sh "Ahoj" sample.wav cs`
`de`	German	`scripts/clonev.sh "Hallo" sample.wav de`
`fr`	French	`scripts/clonev.sh "Bonjour" sample.wav fr`
`es`	Spanish	`scripts/clonev.sh "Hola" sample.wav es`

Full list: en, cs, de, fr, es, it, pl, pt, tr, ru, nl, ar, zh, ja, hu, ko

Voice Sample Requirements

Format: WAV file
Length: 6-30 seconds (optimal: 10-15 seconds)
Quality: Clear audio, no background noise
Content: Any speech (the actual words don't matter)

Good samples:

✅ Recording of someone speaking clearly
✅ No music or noise in background
✅ Consistent volume

Bad samples:

❌ Music or songs
❌ Heavy background noise
❌ Very short (\x3C 6 seconds)
❌ Very long (> 30 seconds)

⚠️ Important Notes

Model Download

First use downloads ~1.87GB model (one-time)
Model is stored at: /mnt/c/TEMP/Docker-containers/coqui-tts/models-xtts/
Status: ✅ Already downloaded

Processing Time

Takes 20-40 seconds depending on text length
This is normal - voice cloning is computationally intensive

Troubleshooting

"Command not found"

Make sure you're in the skill directory or use full path:

/home/bernie/clawd/skills/clonev/scripts/clonev.sh "text" sample.wav en

"Voice sample not found"

Check the path to the WAV file
Use absolute paths (starting with /)
Ensure file exists: ls -la /path/to/sample.wav

"Model not found"

The model should auto-download. If not:

cd /mnt/c/TEMP/Docker-containers/coqui-tts
docker run --rm --entrypoint "" \
  -v $(pwd)/models-xtts:/root/.local/share/tts \
  ghcr.io/coqui-ai/tts:latest \
  python3 -c "from TTS.api import TTS; TTS('tts_models/multilingual/multi-dataset/xtts_v2')"

Poor voice quality

Use clearer voice sample
Ensure no background noise
Try different sample (some voices clone better)

Quick Reference Card (FOR AI MODELS)

USER: "Clone my voice and say 'hello'"
→ Get: sample path, text="hello", language="en"
→ Run: VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "hello" "/path/to/sample.wav" en)
→ Result: $VOICE contains path to OGG file
→ Send: message action=send channel=telegram asVoice=true filePath="$VOICE"

USER: "Make me speak Czech"
→ Get: sample path, text="Ahoj", language="cs"  
→ Run: VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "Ahoj" "/path/to/sample.wav" cs)
→ Send: message action=send channel=telegram asVoice=true filePath="$VOICE"

Output Location

Generated files are saved to:

/mnt/c/TEMP/Docker-containers/coqui-tts/output/clonev_output.ogg

The script returns this path, so you can use it directly.

Summary

ONLY use the script: scripts/clonev.sh
NEVER try to use Docker containers directly
NEVER try to interact with the coqui-xtts container
Script handles everything automatically
Returns path to OGG file ready to send

Simple. Just use the script.

Clone any voice. Speak any language. Just use the script.

安全使用建议

Key things to consider before installing or running this skill: 1) Implementation vs instructions mismatch: SKILL.md explicitly warns not to use Docker, but the provided script runs 'docker run' and will pull ghcr.io/coqui-ai/tts:latest. Ask the author to clarify why the doc forbids Docker while the script depends on it. Do not trust the wording alone. 2) Undeclared local dependencies: The metadata lists no required binaries, but the script requires Docker and ffmpeg. Ensure those binaries are installed and review their versions. Prefer running the script in an isolated environment (VM or disposable container) first. 3) File-system and privacy implications: The script copies your provided WAV into /mnt/c/TEMP/Docker-containers/coqui-tts/voice-samples and writes outputs to /mnt/c/TEMP/Docker-containers/coqui-tts/output/. That means your voice samples and the generated model files are persisted on disk in that path. If you will clone other people’s voices, ensure you have consent and be aware of local storage of sensitive audio. 4) Container image source: The container image is pulled from GHCR (ghcr.io/coqui-ai/tts:latest), which is the official Coqui registry — that is expected. Still, inspect the command and consider pulling and inspecting the image separately before allowing the skill to run it in a privileged environment. 5) Run in sandbox first: Because of the contradictions and host-path writes, run the script in a sandboxed environment (separate user account, VM, or disposable instance) and inspect the resulting files and network activity before integrating into production agents. 6) Ask for fixes or clarifications: Request that the skill metadata be updated to list required binaries (docker, ffmpeg), that SKILL.md be corrected to reflect that the script uses Docker (or the script be modified to avoid Docker), and that output/model paths be configurable instead of hardcoded. If you cannot get satisfactory clarification or prefer not to expose voice samples and host paths, do not install or run this skill.

功能分析

Type: OpenClaw Skill Name: clonev Version: 1.0.0 The skill is classified as suspicious due to its reliance on powerful system commands and file system interactions, despite being plausibly needed for its stated purpose. The `scripts/clonev.sh` file executes `docker run` and `ffmpeg`, and performs a `cp` operation on a user-provided voice sample path, which could be abused to copy arbitrary files if the agent is tricked. While the `SKILL.md` instructions strongly guide the AI agent to use the script and avoid direct Docker interaction, these directives themselves are a form of prompt injection to ensure a specific execution flow involving these risky capabilities. There is no clear evidence of intentional malicious behavior like data exfiltration or persistence, but the potential for misuse of these powerful commands warrants a 'suspicious' classification.

能力评估

⚠ Purpose & Capability

The skill claims to be a simple script-only voice-cloner and declares no required binaries or credentials, but the included script requires docker and ffmpeg and mounts host paths under /mnt/c/TEMP/Docker-containers/coqui-tts. Those binaries and path access are a legitimate need for the stated purpose, but they are not declared in the metadata and thus the manifest is inconsistent.

⚠ Instruction Scope

SKILL.md repeatedly tells agents 'DO NOT use Docker' and 'ONLY USE the script', yet scripts/clonev.sh runs docker run to perform synthesis and writes/copies files into host directories (/mnt/c/TEMP/...). This is a direct contradiction. The script will copy any user-specified WAV into the skill's voice-samples directory and write outputs to a host path — behavior that is expected for local processing but should be explicit and audited.

ℹ Install Mechanism

There is no install spec (instruction-only), which reduces install-scope risk. However the script invokes ghcr.io/coqui-ai/tts:latest which will be pulled by Docker at runtime if missing. Pulling an upstream container image from GHCR is expected for Coqui TTS, but it is a network-hosted binary download that the manifest does not advertise.

ℹ Credentials

No environment variables or credentials are requested — appropriate for a local tool. However the script implicitly requires Docker and ffmpeg and writes to/reads from a specific host directory (/mnt/c/TEMP/Docker-containers/coqui-tts). Access to those filesystem locations is effectively required and should be disclosed; the skill does not declare or justify that host-path access in metadata.

✓ Persistence & Privilege

The skill does not request always:true and does not attempt to modify other skills or system-wide configs. It stores model files and outputs under /mnt/c/TEMP/Docker-containers/coqui-tts, which is normal for model caching but note that it persists a ~1.87GB model and user samples on disk.

版本历史

v1.0.0

Initial release of CloneV skill – voice cloning made simple. - Provides a one-command solution for cloning any voice and generating speech using Coqui XTTS v2. - Supports 14+ languages; easily specify language code for multi-lingual speech. - Requires a 6–30 second WAV voice sample and text; outputs an OGG file with cloned voice. - Strict instructions: use only the included `scripts/clonev.sh` script (no direct Docker/API use). - Detailed usage, troubleshooting, and reference provided in SKILL.md for quick, efficient deployment.

元数据

Slug clonev

版本 1.0.0

许可证 —

累计安装 6

当前安装数 6

历史版本数 1

常见问题

Clonev 是什么？

Clone any voice and generate speech using Coqui XTTS v2. SUPER SIMPLE - provide a voice sample (6-30 sec WAV) and text, get cloned voice audio. Supports 14+ languages. Use when the user wants to (1) Clone their voice or someone else's voice, (2) Generate speech that sounds like a specific person, (3) Create personalized voice messages, (4) Multi-lingual voice cloning (speak any language with cloned voice). 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 2090 次。

如何安装 Clonev？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install clonev」即可一键安装，无需额外配置。

Clonev 是免费的吗？

是的，Clonev 完全免费（开源免费），可自由下载、安装和使用。

Clonev 支持哪些平台？

Clonev 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Clonev？

由 instant-picture（@instant-picture）开发并维护，当前版本 v1.0.0。

Clonev