← 返回 Skills 市场

chichi-speech (local text-to-speech service with Qwen3-TTS model)

Name: chichi-speech (local text-to-speech service with Qwen3-TTS model)
Author: hudeven

作者 hudeven · GitHub ↗ · v1.0.2

cross-platform ✓ 安全检测通过

1894

总下载

当前安装

版本数

在 OpenClaw 中安装

/install chichi-speech

功能描述

A RESTful service for high-quality text-to-speech using Qwen3 and specialized voice cloning. Optimized for reusing a specific voice prompt to avoid re-computation.

使用说明 (SKILL.md)

Chichi Speech Service

This skill provides a FastAPI-based REST service for Qwen3 TTS, specifically configured for reusing a high-quality reference audio prompt for efficient and consistent voice cloning. This service is packaged as an installable CLI.

Installation

Prerequisites: python >= 3.10.

pip install -e .

Usage

1. Start the Service

The service runs on port 9090 by default.

# Start the server (runs in foreground, use & for background or a separate terminal)
# Optional: Uudate to your own reference audio and text for voice cloning
chichi-speech --port 9090 --host 127.0.0.1 --ref-audio "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-TTS-Repo/clone_2.wav" --ref-text "Okay. Yeah. I resent you. I love you. I respect you. But you know what? You blew it! And thanks to you."

2. Verify Service is Running

Check the health/docs:

curl http://localhost:9090/docs

3. Generate Speech

Use cURL:

curl -X POST "http://localhost:9090/synthesize" \
     -H "Content-Type: application/json" \
     -d '{
           "text": "Nice to meet you",
           "language": "English"
         }' \
     --output output/nice_to_meet.wav

Functionality

Endpoint: POST /synthesize
Default Port: 9090
Voice Cloning: Uses a pre-computed voice prompt from reference files to ensure the cloned voice is consistent and generation is fast.

Requirements

Python 3.10+
qwen-tts (Qwen3 model library)
Access to a reference audio file for voice cloning.
- By default, it uses public sample audio from Qwen3.
- CRITICAL: You can provide your own reference audio using the --ref-audio and --ref-text flags.

安全使用建议

This code implements the stated local TTS service, but before installing/running consider: 1) Network activity — the service will download model weights and fetch a default reference audio from a public OSS URL; if you need full offline behavior provide a local reference audio and ensure the model cache is available locally. 2) Exposure — the server default host is 0.0.0.0 (public); run with --host 127.0.0.1 or firewall it if you want local-only access. 3) Resource usage — dependencies (torch, numba, qwen-tts) and model weights are large; ensure you have disk, RAM, and hardware (GPU/MPS) capacity. 4) Source trust — the skill's source is unknown and package metadata shows a minor version/name mismatch; review the code yourself if you need to trust it fully. 5) Sanity-check arguments — the CLI accepts ref-audio/ref-text overrides; prefer local files to avoid unintended remote fetches. If you want higher assurance, request a signed upstream/package source, a reproducible release (GitHub release or known registry), or run in an isolated environment (container/VM) behind a firewall.

功能分析

Type: OpenClaw Skill Name: chichi-speech Version: 1.0.2 The skill bundle provides a FastAPI-based text-to-speech service using the Qwen3 model. All files align with the stated purpose, including loading a pre-trained model and a reference audio file from legitimate public URLs (qianwen-res.oss-cn-beijing.aliyuncs.com). The `SKILL.md` instructions are clear and do not contain any prompt injection attempts. While the server defaults to listening on `0.0.0.0` in `src/chichi_speech/server.py` and allows specifying arbitrary `--ref-audio` URLs, these are common practices for web services and core features for voice cloning, respectively, and do not indicate intentional malicious behavior like data exfiltration or unauthorized execution.

能力评估

✓ Purpose & Capability

Name, description, SKILL.md and the Python sources all implement a FastAPI-based Qwen3 TTS service with voice-clone prompt reuse. The declared dependencies (qwen-tts, torch, fastapi, uvicorn, soundfile) match the code's behavior. Minor inconsistency: pyproject version (0.1.1) differs from registry version (1.0.2) and the package imports qwen_tts while pyproject lists qwen-tts — these are likely packaging/name mismatches but do not indicate additional functionality beyond TTS.

ℹ Instruction Scope

SKILL.md instructs pip install -e . and running the CLI to start the service. The server code will download model weights (via model.from_pretrained) and fetch a hardcoded reference audio URL (an Alibaba OSS URL) to precompute the voice prompt. The service initializes the model at startup and exposes a POST /synthesize endpoint that streams WAV audio back. These actions are consistent with TTS but do involve network activity (model and reference audio downloads) and preloading large binaries.

ℹ Install Mechanism

There is no platform install spec — installation is via pip install -e . per SKILL.md. That will pull heavy native dependencies (torch, numba) and qwen-tts which can download large model artifacts at runtime. No obfuscated or suspicious third‑party download URLs in code besides the public OSS reference audio and the normal model download mechanisms (from_pretrained). This is higher friction and requires substantial disk/CPU/GPU resources but not intrinsically malicious.

✓ Credentials

The skill requests no credentials or environment variables. The code reads PORT if present (a reasonable override). No secrets or unrelated environment variables are required or accessed.

⚠ Persistence & Privilege

The CLI default binds the FastAPI app to 0.0.0.0 (publicly reachable) which can unintentionally expose the service to the network; the SKILL.md example does show 127.0.0.1 but the code uses 0.0.0.0 by default. The skill does not request always:true and does not modify other skills or global agent config, but you should be careful to run it with appropriate host/network restrictions and firewall rules.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install chichi-speech
安装完成后，直接呼叫该 Skill 的名称或使用 /chichi-speech 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.2

- Initial release of chichi-speech as an installable RESTful text-to-speech service. - Provides FastAPI-based service for high-quality TTS with Qwen3 and voice cloning via reference audio. - Command line interface (CLI) for easy server startup with customizable reference audio and text options. - Includes usage examples, service endpoint documentation, and cURL sample requests.

v1.0.1

- Updated installation instructions to use pip (`pip install chichi-speech`) for easier setup. - Updated CLI invocation to use `chichi-speech` instead of `chichi-speech-server`. - Removed references to manual virtual environment creation and editable installs. - No code or functional changes; documentation improvements only.

v1.0.0

Initial release of Chichi Speech Service – a RESTful FastAPI service for high-quality, reference-based text-to-speech. - Provides a CLI to run a TTS server using Qwen3 and specialized voice cloning. - Efficiently reuses a specified reference audio to avoid repeated computation and ensure consistent voice output. - Supports customizable server port, host, reference audio, and reference text configuration. - Exposes a POST /synthesize endpoint for speech generation. - Includes healthcheck and OpenAPI docs at /docs. - Installation and usage instructions provided for quick setup.

元数据

Slug chichi-speech

版本 1.0.2

许可证 —

累计安装 1

当前安装数 1

历史版本数 3

常见问题