← Back to Skills Marketplace
hudeven

chichi-speech (local text-to-speech service with Qwen3-TTS model)

by hudeven · GitHub ↗ · v1.0.2
cross-platform ✓ Security Clean
1894
Downloads
1
Stars
1
Active Installs
3
Versions
Install in OpenClaw
/install chichi-speech
Description
A RESTful service for high-quality text-to-speech using Qwen3 and specialized voice cloning. Optimized for reusing a specific voice prompt to avoid re-computation.
README (SKILL.md)

Chichi Speech Service

This skill provides a FastAPI-based REST service for Qwen3 TTS, specifically configured for reusing a high-quality reference audio prompt for efficient and consistent voice cloning. This service is packaged as an installable CLI.

Installation

Prerequisites: python >= 3.10.

pip install -e .

Usage

1. Start the Service

The service runs on port 9090 by default.

# Start the server (runs in foreground, use & for background or a separate terminal)
# Optional: Uudate to your own reference audio and text for voice cloning
chichi-speech --port 9090 --host 127.0.0.1 --ref-audio "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-TTS-Repo/clone_2.wav" --ref-text "Okay. Yeah. I resent you. I love you. I respect you. But you know what? You blew it! And thanks to you."

2. Verify Service is Running

Check the health/docs:

curl http://localhost:9090/docs

3. Generate Speech

Use cURL:

curl -X POST "http://localhost:9090/synthesize" \
     -H "Content-Type: application/json" \
     -d '{
           "text": "Nice to meet you",
           "language": "English"
         }' \
     --output output/nice_to_meet.wav

Functionality

  • Endpoint: POST /synthesize
  • Default Port: 9090
  • Voice Cloning: Uses a pre-computed voice prompt from reference files to ensure the cloned voice is consistent and generation is fast.

Requirements

  • Python 3.10+
  • qwen-tts (Qwen3 model library)
  • Access to a reference audio file for voice cloning.
    • By default, it uses public sample audio from Qwen3.
    • CRITICAL: You can provide your own reference audio using the --ref-audio and --ref-text flags.
Usage Guidance
This code implements the stated local TTS service, but before installing/running consider: 1) Network activity — the service will download model weights and fetch a default reference audio from a public OSS URL; if you need full offline behavior provide a local reference audio and ensure the model cache is available locally. 2) Exposure — the server default host is 0.0.0.0 (public); run with --host 127.0.0.1 or firewall it if you want local-only access. 3) Resource usage — dependencies (torch, numba, qwen-tts) and model weights are large; ensure you have disk, RAM, and hardware (GPU/MPS) capacity. 4) Source trust — the skill's source is unknown and package metadata shows a minor version/name mismatch; review the code yourself if you need to trust it fully. 5) Sanity-check arguments — the CLI accepts ref-audio/ref-text overrides; prefer local files to avoid unintended remote fetches. If you want higher assurance, request a signed upstream/package source, a reproducible release (GitHub release or known registry), or run in an isolated environment (container/VM) behind a firewall.
Capability Analysis
Type: OpenClaw Skill Name: chichi-speech Version: 1.0.2 The skill bundle provides a FastAPI-based text-to-speech service using the Qwen3 model. All files align with the stated purpose, including loading a pre-trained model and a reference audio file from legitimate public URLs (qianwen-res.oss-cn-beijing.aliyuncs.com). The `SKILL.md` instructions are clear and do not contain any prompt injection attempts. While the server defaults to listening on `0.0.0.0` in `src/chichi_speech/server.py` and allows specifying arbitrary `--ref-audio` URLs, these are common practices for web services and core features for voice cloning, respectively, and do not indicate intentional malicious behavior like data exfiltration or unauthorized execution.
Capability Assessment
Purpose & Capability
Name, description, SKILL.md and the Python sources all implement a FastAPI-based Qwen3 TTS service with voice-clone prompt reuse. The declared dependencies (qwen-tts, torch, fastapi, uvicorn, soundfile) match the code's behavior. Minor inconsistency: pyproject version (0.1.1) differs from registry version (1.0.2) and the package imports qwen_tts while pyproject lists qwen-tts — these are likely packaging/name mismatches but do not indicate additional functionality beyond TTS.
Instruction Scope
SKILL.md instructs pip install -e . and running the CLI to start the service. The server code will download model weights (via model.from_pretrained) and fetch a hardcoded reference audio URL (an Alibaba OSS URL) to precompute the voice prompt. The service initializes the model at startup and exposes a POST /synthesize endpoint that streams WAV audio back. These actions are consistent with TTS but do involve network activity (model and reference audio downloads) and preloading large binaries.
Install Mechanism
There is no platform install spec — installation is via pip install -e . per SKILL.md. That will pull heavy native dependencies (torch, numba) and qwen-tts which can download large model artifacts at runtime. No obfuscated or suspicious third‑party download URLs in code besides the public OSS reference audio and the normal model download mechanisms (from_pretrained). This is higher friction and requires substantial disk/CPU/GPU resources but not intrinsically malicious.
Credentials
The skill requests no credentials or environment variables. The code reads PORT if present (a reasonable override). No secrets or unrelated environment variables are required or accessed.
Persistence & Privilege
The CLI default binds the FastAPI app to 0.0.0.0 (publicly reachable) which can unintentionally expose the service to the network; the SKILL.md example does show 127.0.0.1 but the code uses 0.0.0.0 by default. The skill does not request always:true and does not modify other skills or global agent config, but you should be careful to run it with appropriate host/network restrictions and firewall rules.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install chichi-speech
  3. After installation, invoke the skill by name or use /chichi-speech
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.2
- Initial release of chichi-speech as an installable RESTful text-to-speech service. - Provides FastAPI-based service for high-quality TTS with Qwen3 and voice cloning via reference audio. - Command line interface (CLI) for easy server startup with customizable reference audio and text options. - Includes usage examples, service endpoint documentation, and cURL sample requests.
v1.0.1
- Updated installation instructions to use pip (`pip install chichi-speech`) for easier setup. - Updated CLI invocation to use `chichi-speech` instead of `chichi-speech-server`. - Removed references to manual virtual environment creation and editable installs. - No code or functional changes; documentation improvements only.
v1.0.0
Initial release of Chichi Speech Service – a RESTful FastAPI service for high-quality, reference-based text-to-speech. - Provides a CLI to run a TTS server using Qwen3 and specialized voice cloning. - Efficiently reuses a specified reference audio to avoid repeated computation and ensure consistent voice output. - Supports customizable server port, host, reference audio, and reference text configuration. - Exposes a POST /synthesize endpoint for speech generation. - Includes healthcheck and OpenAPI docs at /docs. - Installation and usage instructions provided for quick setup.
Metadata
Slug chichi-speech
Version 1.0.2
License
All-time Installs 1
Active Installs 1
Total Versions 3
Frequently Asked Questions

What is chichi-speech (local text-to-speech service with Qwen3-TTS model)?

A RESTful service for high-quality text-to-speech using Qwen3 and specialized voice cloning. Optimized for reusing a specific voice prompt to avoid re-computation. It is an AI Agent Skill for Claude Code / OpenClaw, with 1894 downloads so far.

How do I install chichi-speech (local text-to-speech service with Qwen3-TTS model)?

Run "/install chichi-speech" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is chichi-speech (local text-to-speech service with Qwen3-TTS model) free?

Yes, chichi-speech (local text-to-speech service with Qwen3-TTS model) is completely free (open-source). You can download, install and use it at no cost.

Which platforms does chichi-speech (local text-to-speech service with Qwen3-TTS model) support?

chichi-speech (local text-to-speech service with Qwen3-TTS model) is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created chichi-speech (local text-to-speech service with Qwen3-TTS model)?

It is built and maintained by hudeven (@hudeven); the current version is v1.0.2.

💬 Comments