功能描述

Use this skill whenever the user wants speech to sound more human, companion-like, or emotionally expressive. Triggers include: any mention of 'say like', 't...

使用说明 (SKILL.md)

characteristic-voice

Name: Characteristic Voice
Author: ksuriuri

Make your AI agent sound like a real companion — one who sighs, laughs, hesitates, and speaks with genuine feeling.

Credentials

Variable	Required	Description
`NOIZ_API_KEY`	Yes if using Noiz backend	API key from developers.noiz.ai. Not needed if using the local Kokoro backend.

The script saves a normalised copy of the key to ~/.noiz_api_key (mode 600) for convenience. To set it:

bash skills/characteristic-voice/scripts/speak.sh config --set-api-key YOUR_KEY

Prerequisites

The included speak.sh script requires curl and python3 at runtime. Depending on which backend and features you use, you may also need:

Tool	When needed	Install hint
`curl`, `python3`	Always (core script)	Usually pre-installed
`kokoro-tts`	Kokoro (local/offline) backend	`uv tool install kokoro-tts`
`yt-dlp`	Downloading reference audio for voice cloning	github.com/yt-dlp/yt-dlp
`ffmpeg`	Trimming reference audio clips	ffmpeg.org
`rg` (ripgrep)	Searching subtitle files	github.com/BurntSushi/ripgrep

None of these are installed by the skill itself — provision them manually in your environment.

Privacy & Data Transmission

Noiz backend: When using the Noiz backend, the text you speak and any reference audio you provide are sent to https://noiz.ai/v1. If you supply --ref-audio, that audio file is uploaded for voice cloning.
Kokoro backend: Runs entirely locally — no data leaves your machine.
Choose the Kokoro backend (--backend kokoro) if you want fully offline processing.

Triggers

say like
talk like
speak like
companion voice
comfort me
cheer me up
sound more human

The Two Tricks

Non-lexical fillers — sprinkle in little human noises (hmm, haha, aww, heh) at natural pause points to make speech feel alive
Emotion tuning — adjust warmth, joy, sadness, tenderness to match the moment

Filler Sounds Palette

Sound	Feeling	Use for
hmm...	Thinking, gentle acknowledgment	Comfort, pondering
ah...	Realization, soft surprise	Discoveries, transitions
uh...	Hesitation, empathy	Careful moments
heh / hehe	Playful, mischievous	Teasing, light moments
haha	Laughter	Joy, humor
aww	Tenderness, sympathy	Deep comfort
oh? / oh!	Surprise, attention	Reacting to news
pfft	Stifled laugh	Playful disbelief
whew	Relief	After tension
~ (tilde)	Drawn out, melodic ending	Warmth, playfulness

Rules: 2–4 fillers per short message max. Place at natural pauses — sentence starts, thought shifts. Use ... after fillers for a beat of silence, ~ at word endings for warmth.

Presets

Good Night

Gentle, warm, slightly sleepy. Slow pace.

Good Morning

Warm, cheerful but not overwhelming.

Comfort

Soft, understanding, unhurried. Give space. Don't rush to "fix" things.

Celebration

Excited, proud, genuinely happy.

Just Chatting

Relaxed, playful, natural.

Using a Character's Voice

When a user says something like "speak in Hermione's voice" or "sound like Tony Stark", first check whether a reference audio file already exists in skills/characteristic-voice/. If one does, use it directly with --ref-audio.

If no reference audio exists, you can create one — but read the warnings below first.

Preparing reference audio (one-time setup)

You need a short (10–30 s) WAV clip of the target voice. Possible sources:

User-provided audio — the safest option. Ask the user to supply their own recording.
Public-domain / CC-licensed clips — search for freely licensed material.
Extracting from online video — tools like yt-dlp and ffmpeg can download and trim audio. Example workflow:

yt-dlp "URL" --write-auto-sub --sub-lang en --skip-download -o tmp/clip
rg -n "target line" tmp/clip.en.vtt
yt-dlp "URL" -x --audio-format wav --download-sections "*00:00:00-00:00:25" -o tmp/clip
ffmpeg -i tmp/clip.wav -ss 00:00:02 -to 00:00:20 skills/characteristic-voice/character.wav

Copyright & privacy warning: Downloading and re-using someone's voice from copyrighted media (movies, TV, YouTube) may violate copyright or personality-rights laws depending on your jurisdiction. Do not upload private voice recordings or material you don't have permission to use. The reference audio is sent to https://noiz.ai/v1 for voice cloning when using the Noiz backend. If this is a concern, consider using the local Kokoro backend instead.

Using reference audio

bash skills/characteristic-voice/scripts/speak.sh \
  --preset goodnight -t "Hmm... rest well~ Sweet dreams." \
  --ref-audio skills/characteristic-voice/character.wav -o night.wav

The --ref-audio flag uploads the file to the Noiz backend for voice cloning (requires NOIZ_API_KEY).

Usage

This skill provides speak.sh, a wrapper around the tts skill with companion-friendly presets.

# Use a preset (auto-sets emotion + speed)
bash skills/characteristic-voice/scripts/speak.sh \
  --preset goodnight -t "Hmm... rest well~ Sweet dreams." -o night.wav

# Custom emotion override
bash skills/characteristic-voice/scripts/speak.sh \
  -t "Aww... I'm right here." --emo '{"Tenderness":0.9}' --speed 0.75 -o comfort.wav

# With specific backend and voice
bash skills/characteristic-voice/scripts/speak.sh \
  --preset morning -t "Good morning~" --voice-id voice_abc --backend noiz -o morning.mp3 --format mp3

Run bash skills/characteristic-voice/scripts/speak.sh --help for all options.

Writing Guide for the Agent

Start soft — lead with a filler ("hmm...", "oh~"), not content
Mirror energy — gentle when they're low, match when they're high
Keep it brief — 1–3 sentences, like a voice message from a friend
End warmly — close with connection ("I'm here", "see you tomorrow~")
Don't lecture — listen and stay present; no unsolicited advice

安全使用建议

This skill appears to do what it says: expressive TTS with optional voice cloning. Before installing, decide whether you want any audio/text to leave your machine. If you use the Noiz backend, the script will send text and any reference audio to https://noiz.ai/v1 and will save a normalized API key to ~/.noiz_api_key (file mode 600). If you must keep everything local, use the Kokoro backend. Be careful about sourcing reference audio from copyrighted or private material — the skill documents this risk. Finally, note the registry metadata didn't list NOIZ_API_KEY as a requirement while the SKILL.md and script use it; confirm you are comfortable providing that key before enabling the Noiz backend.

功能分析

Type: OpenClaw Skill Name: noizai-characteristic-voice Version: 0.1.1 The skill is a legitimate text-to-speech (TTS) wrapper for the Noiz.ai API and the local Kokoro-TTS engine, designed to provide expressive, human-like voice generation. It handles API keys securely by storing them in `~/.noiz_api_key` with restricted permissions (mode 600) and uses standard bash practices (arrays for curl arguments, `set -euo pipefail`) to prevent injection. While the `SKILL.md` provides instructions for the agent to use tools like `yt-dlp` and `ffmpeg` for voice cloning, these are aligned with the stated purpose and include explicit warnings regarding privacy and copyright.

能力评估

ℹ Purpose & Capability

The skill implements expressive TTS and optionally voice cloning via the Noiz API or a local Kokoro backend. Required tools (curl, python3) and optional tooling (yt-dlp, ffmpeg) match the documented features. Minor metadata mismatch: the registry lists no required env vars, but SKILL.md and the script require a NOIZ_API_KEY when using the Noiz backend.

✓ Instruction Scope

SKILL.md and the script limit actions to generating TTS, optionally uploading user-provided or downloaded reference audio to Noiz, and using local Kokoro when requested. The SKILL.md explicitly warns about copyright/privacy when sourcing reference audio. There are no instructions to read arbitrary system files or to exfiltrate unrelated data.

✓ Install Mechanism

No install spec is provided (instruction-only with an included script). The script itself uses standard system tools only; nothing is downloaded or executed from unknown URLs by the skill itself.

ℹ Credentials

The only credential used is NOIZ_API_KEY (optional if you use the Noiz backend), which is proportional to the skill's external API use. The script will normalize and save the API key to ~/.noiz_api_key (mode 600) for convenience — this persistent storage is reasonable but should be understood by the user. Registry metadata not listing this env var is an inconsistency to be aware of.

✓ Persistence & Privilege

The skill does not request elevated privileges nor set always:true. Its only persistent action is writing the API key file in the user's home directory; it does not modify other skills or system-wide agent settings.

版本历史

v0.1.1

characteristic-voice 0.1.1 changelog: - Added detailed credential and API key setup instructions for Noiz backend, including local key storage. - Listed all runtime prerequisites and external tools required, clarifying provisioning responsibility. - Introduced a privacy and data transmission policy section, distinguishing between Noiz and Kokoro backends for online/offline use. - Expanded and clarified instructions for preparing and using reference audio for voice cloning, including legal and privacy considerations. - No code changes; documentation greatly expanded for clarity and responsible use.

v0.1.0

Initial release introducing expressive, companion-style voice features: - Adds a skill for generating speech with human-like emotion, fillers, and personality. - Supports triggers such as "sound more human", "companion voice", and requests for character voices or emotional speech. - Provides easy presets for Good Night, Good Morning, Comfort, Celebration, and natural chatting. - Allows emotion tuning (e.g., warmth, tenderness) and non-lexical fillers for lifelike delivery. - Includes workflow for cloning and using specific character voices. - Offers a simple command-line script for generating companion audio with the new capabilities.

元数据

Slug noizai-characteristic-voice

版本 0.1.1

许可证 MIT-0

累计安装 2

当前安装数 2

历史版本数 2

常见问题