← Back to Skills Marketplace

ComfyUI TTS

Name: ComfyUI TTS
Author: yhsi5358

by YHSI5358 · GitHub ↗ · v1.0.0

cross-platform ⚠ suspicious

908

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install comfyui-tts

Description

Convert text to speech audio via ComfyUI's Qwen-TTS API, supporting customizable voice, style, model, and output options.

README (SKILL.md)

ComfyUI TTS Skill

Generate speech audio using ComfyUI's Qwen-TTS service. This skill allows you to convert text to speech through ComfyUI's API.

Configuration

Environment Variables

Set these environment variables to configure the ComfyUI connection:

export COMFYUI_HOST="localhost"      # ComfyUI server host
export COMFYUI_PORT="8188"           # ComfyUI server port
export COMFYUI_OUTPUT_DIR=""         # Optional: Custom output directory

Usage

Basic Text-to-Speech

Generate audio from text using default settings:

scripts/tts.sh "你好，世界"

Advanced Options

Customize voice characteristics:

# Specify character and style
scripts/tts.sh "你好" --character "Girl" --style "Emotional"

# Change model size
scripts/tts.sh "你好" --model "3B"

# Specify output file
scripts/tts.sh "你好" --output "/path/to/output.wav"

# Combine options
scripts/tts.sh "你好，这是测试" \
  --character "Girl" \
  --style "Emotional" \
  --model "1.7B" \
  --output "~/audio/test.wav"

Available Options

Option	Description	Default
`--character`	Voice character (Girl/Boy/etc.)	"Girl"
`--style`	Speaking style (Emotional/Neutral/etc.)	"Emotional"
`--model`	Model size (0.5B/1.7B/3B)	"1.7B"
`--output`	Output file path	Auto-generated
`--temperature`	Generation temperature (0-1)	0.9
`--top-p`	Top-p sampling	0.9
`--top-k`	Top-k sampling	50

Workflow

The skill performs these steps:

Construct Workflow: Builds a ComfyUI workflow JSON with your text and settings
Submit Job: Sends the workflow to ComfyUI's /prompt endpoint
Poll Status: Monitors job completion via /history endpoint
Retrieve Audio: Returns the path to the generated audio file

Troubleshooting

Connection Refused

Verify ComfyUI is running: curl http://$COMFYUI_HOST:$COMFYUI_PORT/system_stats
Check host and port settings

Job Timeout

Large models (3B) take longer to generate
Try smaller models (0.5B, 1.7B) for faster results

Output Not Found

Check ComfyUI's output directory configuration
Verify file permissions

API Reference

The skill uses ComfyUI's native API endpoints:

POST /prompt - Submit workflow
GET /history - Check job status
Output files are saved to ComfyUI's configured output directory

Usage Guidance

This skill appears to do what it says: it sends TTS jobs to a ComfyUI server and downloads resulting audio. Before installing or running: (1) verify you intend to connect to the configured COMFYUI_HOST/COMFYUI_PORT — default is localhost; avoid pointing it at untrusted public hosts; (2) review the included scripts (scripts/tts.sh) if you have stricter security requirements; (3) be aware generated audio files are referenced by the ComfyUI output directory and the script may download files to paths you supply; (4) note the SKILL.md mentions environment variables (COMFYUI_HOST, COMFYUI_PORT, COMFYUI_OUTPUT_DIR) but the registry metadata did not declare them — set these explicitly as needed. If you plan to run against a remote ComfyUI instance, ensure that instance is trusted, since the script will send the text you provide to that server.

Capability Analysis

Type: OpenClaw Skill Name: comfyui-tts Version: 1.0.0 The `scripts/tts.sh` file contains significant vulnerabilities. Several parameters (e.g., `--character`, `--style`, `--model`) are directly interpolated into the JSON workflow sent to the ComfyUI API without proper sanitization, leading to a JSON injection vulnerability. An attacker controlling these inputs could inject arbitrary JSON into the ComfyUI prompt. Additionally, the `--output` argument is used directly for file download and directory creation, posing a path traversal vulnerability that could allow writing files to arbitrary locations on the agent's filesystem. While these are critical flaws, there is no clear evidence of intentional malicious behavior (e.g., data exfiltration, backdoor installation) within the provided code, classifying it as suspicious due to the exploitable vulnerabilities.

Capability Assessment

✓ Purpose & Capability

Name/description (ComfyUI TTS) match the delivered artifacts: two shell scripts implement submitting a workflow to ComfyUI, polling /history, and retrieving audio. Required binaries (curl, jq) are reasonable for the stated purpose.

ℹ Instruction Scope

Runtime instructions and the scripts focus on contacting the ComfyUI endpoints (/prompt, /history, /view) and handling audio files. Minor inconsistency: SKILL.md documents environment variables (COMFYUI_HOST, COMFYUI_PORT, COMFYUI_OUTPUT_DIR) that are used by the scripts but were not declared in the skill's registry 'required env vars' metadata — this is informational, not evidence of hidden behavior. The scripts do not read unrelated system files or transmit data to third-party hosts beyond the configured COMFYUI_URL.

✓ Install Mechanism

No install spec; this is instruction-only with included shell scripts. No downloads or remote installers are used, so nothing arbitrary is fetched or written by the skill itself during installation.

ℹ Credentials

The skill requests no credentials and only uses optional environment variables for the ComfyUI host/port/output. The lack of declared required env vars in registry metadata is a small mismatch with SKILL.md but not disproportionate: the env vars merely point the script at a ComfyUI server and do not grant access to unrelated services or secrets.

✓ Persistence & Privilege

always is false and the skill does not request persistent/privileged system changes. It does not attempt to modify other skills or global agent settings.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install comfyui-tts
After installation, invoke the skill by name or use /comfyui-tts
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

Initial release of comfyui-tts – generate speech audio with ComfyUI Qwen-TTS. - Provides a shell script to convert text to speech using ComfyUI's Qwen-TTS service. - Supports customizable options: character, style, model size, output path, and sampling parameters. - Requires curl and jq; configurable via environment variables for host, port, and output directory. - Automatically submits jobs to ComfyUI, monitors completion, and retrieves audio files. - Includes usage instructions, troubleshooting tips, and API endpoint references.

Metadata

Slug comfyui-tts

Version 1.0.0

License —

All-time Installs 2

Active Installs 2

Total Versions 1

Frequently Asked Questions

What is ComfyUI TTS?

Convert text to speech audio via ComfyUI's Qwen-TTS API, supporting customizable voice, style, model, and output options. It is an AI Agent Skill for Claude Code / OpenClaw, with 908 downloads so far.

How do I install ComfyUI TTS?

Run "/install comfyui-tts" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is ComfyUI TTS free?

Yes, ComfyUI TTS is completely free (open-source). You can download, install and use it at no cost.

Which platforms does ComfyUI TTS support?

ComfyUI TTS is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created ComfyUI TTS?

It is built and maintained by YHSI5358 (@yhsi5358); the current version is v1.0.0.

More Skills