← Back to Skills Marketplace
yhsi5358

ComfyUI TTS

by YHSI5358 · GitHub ↗ · v1.0.0
cross-platform ⚠ suspicious
908
Downloads
0
Stars
2
Active Installs
1
Versions
Install in OpenClaw
/install comfyui-tts
Description
Convert text to speech audio via ComfyUI's Qwen-TTS API, supporting customizable voice, style, model, and output options.
README (SKILL.md)

ComfyUI TTS Skill

Generate speech audio using ComfyUI's Qwen-TTS service. This skill allows you to convert text to speech through ComfyUI's API.

Configuration

Environment Variables

Set these environment variables to configure the ComfyUI connection:

export COMFYUI_HOST="localhost"      # ComfyUI server host
export COMFYUI_PORT="8188"           # ComfyUI server port
export COMFYUI_OUTPUT_DIR=""         # Optional: Custom output directory

Usage

Basic Text-to-Speech

Generate audio from text using default settings:

scripts/tts.sh "你好,世界"

Advanced Options

Customize voice characteristics:

# Specify character and style
scripts/tts.sh "你好" --character "Girl" --style "Emotional"

# Change model size
scripts/tts.sh "你好" --model "3B"

# Specify output file
scripts/tts.sh "你好" --output "/path/to/output.wav"

# Combine options
scripts/tts.sh "你好,这是测试" \
  --character "Girl" \
  --style "Emotional" \
  --model "1.7B" \
  --output "~/audio/test.wav"

Available Options

Option Description Default
--character Voice character (Girl/Boy/etc.) "Girl"
--style Speaking style (Emotional/Neutral/etc.) "Emotional"
--model Model size (0.5B/1.7B/3B) "1.7B"
--output Output file path Auto-generated
--temperature Generation temperature (0-1) 0.9
--top-p Top-p sampling 0.9
--top-k Top-k sampling 50

Workflow

The skill performs these steps:

  1. Construct Workflow: Builds a ComfyUI workflow JSON with your text and settings
  2. Submit Job: Sends the workflow to ComfyUI's /prompt endpoint
  3. Poll Status: Monitors job completion via /history endpoint
  4. Retrieve Audio: Returns the path to the generated audio file

Troubleshooting

Connection Refused

  • Verify ComfyUI is running: curl http://$COMFYUI_HOST:$COMFYUI_PORT/system_stats
  • Check host and port settings

Job Timeout

  • Large models (3B) take longer to generate
  • Try smaller models (0.5B, 1.7B) for faster results

Output Not Found

  • Check ComfyUI's output directory configuration
  • Verify file permissions

API Reference

The skill uses ComfyUI's native API endpoints:

  • POST /prompt - Submit workflow
  • GET /history - Check job status
  • Output files are saved to ComfyUI's configured output directory
Usage Guidance
This skill appears to do what it says: it sends TTS jobs to a ComfyUI server and downloads resulting audio. Before installing or running: (1) verify you intend to connect to the configured COMFYUI_HOST/COMFYUI_PORT — default is localhost; avoid pointing it at untrusted public hosts; (2) review the included scripts (scripts/tts.sh) if you have stricter security requirements; (3) be aware generated audio files are referenced by the ComfyUI output directory and the script may download files to paths you supply; (4) note the SKILL.md mentions environment variables (COMFYUI_HOST, COMFYUI_PORT, COMFYUI_OUTPUT_DIR) but the registry metadata did not declare them — set these explicitly as needed. If you plan to run against a remote ComfyUI instance, ensure that instance is trusted, since the script will send the text you provide to that server.
Capability Analysis
Type: OpenClaw Skill Name: comfyui-tts Version: 1.0.0 The `scripts/tts.sh` file contains significant vulnerabilities. Several parameters (e.g., `--character`, `--style`, `--model`) are directly interpolated into the JSON workflow sent to the ComfyUI API without proper sanitization, leading to a JSON injection vulnerability. An attacker controlling these inputs could inject arbitrary JSON into the ComfyUI prompt. Additionally, the `--output` argument is used directly for file download and directory creation, posing a path traversal vulnerability that could allow writing files to arbitrary locations on the agent's filesystem. While these are critical flaws, there is no clear evidence of intentional malicious behavior (e.g., data exfiltration, backdoor installation) within the provided code, classifying it as suspicious due to the exploitable vulnerabilities.
Capability Assessment
Purpose & Capability
Name/description (ComfyUI TTS) match the delivered artifacts: two shell scripts implement submitting a workflow to ComfyUI, polling /history, and retrieving audio. Required binaries (curl, jq) are reasonable for the stated purpose.
Instruction Scope
Runtime instructions and the scripts focus on contacting the ComfyUI endpoints (/prompt, /history, /view) and handling audio files. Minor inconsistency: SKILL.md documents environment variables (COMFYUI_HOST, COMFYUI_PORT, COMFYUI_OUTPUT_DIR) that are used by the scripts but were not declared in the skill's registry 'required env vars' metadata — this is informational, not evidence of hidden behavior. The scripts do not read unrelated system files or transmit data to third-party hosts beyond the configured COMFYUI_URL.
Install Mechanism
No install spec; this is instruction-only with included shell scripts. No downloads or remote installers are used, so nothing arbitrary is fetched or written by the skill itself during installation.
Credentials
The skill requests no credentials and only uses optional environment variables for the ComfyUI host/port/output. The lack of declared required env vars in registry metadata is a small mismatch with SKILL.md but not disproportionate: the env vars merely point the script at a ComfyUI server and do not grant access to unrelated services or secrets.
Persistence & Privilege
always is false and the skill does not request persistent/privileged system changes. It does not attempt to modify other skills or global agent settings.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install comfyui-tts
  3. After installation, invoke the skill by name or use /comfyui-tts
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release of comfyui-tts – generate speech audio with ComfyUI Qwen-TTS. - Provides a shell script to convert text to speech using ComfyUI's Qwen-TTS service. - Supports customizable options: character, style, model size, output path, and sampling parameters. - Requires curl and jq; configurable via environment variables for host, port, and output directory. - Automatically submits jobs to ComfyUI, monitors completion, and retrieves audio files. - Includes usage instructions, troubleshooting tips, and API endpoint references.
Metadata
Slug comfyui-tts
Version 1.0.0
License
All-time Installs 2
Active Installs 2
Total Versions 1
Frequently Asked Questions

What is ComfyUI TTS?

Convert text to speech audio via ComfyUI's Qwen-TTS API, supporting customizable voice, style, model, and output options. It is an AI Agent Skill for Claude Code / OpenClaw, with 908 downloads so far.

How do I install ComfyUI TTS?

Run "/install comfyui-tts" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is ComfyUI TTS free?

Yes, ComfyUI TTS is completely free (open-source). You can download, install and use it at no cost.

Which platforms does ComfyUI TTS support?

ComfyUI TTS is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created ComfyUI TTS?

It is built and maintained by YHSI5358 (@yhsi5358); the current version is v1.0.0.

💬 Comments