← 返回 Skills 市场

Genor-Comfy-Gate

Name: Genor-Comfy-Gate
Author: genortg

作者 Krzysztof · GitHub ↗ · v2.0.0 · MIT-0

cross-platform ⚠ suspicious

总下载

当前安装

版本数

在 OpenClaw 中安装

/install genor-comfy-gate

功能描述

Comprehensive multi-modal gateway for ComfyUI enabling audio generation with ACE-Step 1.5 and photorealistic image creation via SDXL workflows.

使用说明 (SKILL.md)

Genor-Comfy-Gate — Comprehensive Skill

THE authoritative reference for ALL ComfyUI operations through our gateway. Multi-modal: audio, images, video (future). Read this before any generation. Updated as we learn.

Modalities

Type	Status	Workflow	Model
🎵 Audio	✅ Active	`acestep-rapcore`	ACE-Step 1.5 SFT merge
🖼️ Image	✅ Active	`lustify-sdxl`	LUSTIFY SDXL
🎬 Video	🔜 Planned	—	—

The gateway is modality-agnostic — it submits any workflow JSON to ComfyUI, polls, waits, downloads, and saves. Adding a new modality means adding a workflow file + WORKFLOW_INFO entry. The type field determines output dir (audio/ or images/).

Gateway

Property	Value
Endpoint	`http://127.0.0.1:8188`
Auth	`x-api-key: gcg-4d...` header (localhost exempt)
Managed by	pm2 (`genor-comfy-gate`)
Location	`~/projects/Genor-Comfy-Gate/`
Config	`server.js` (inline SERVERS array)

Backend Servers

ID	URL	GPU	VRAM	Priority
pri	`http://100.125.137.96:8169`	RTX 3090	24GB	★ PRIMARY
sec	`http://100.80.161.74:8169`	RTX 3080 Laptop	16GB	Secondary

Load Balancing Logic (in `pickServer()`)

PRIMARY always preferred when IDLE (0 running tasks)
If PRIMARY has ANY running task → ALL new requests → SECONDARY
If SECONDARY offline → fallback to PRIMARY regardless
Download ALWAYS from the server that generated the file (server.url)

Workflows

`acestep-rapcore` — ACE-Step 1.5 Audio Generation

Model: aceStep15Music_sft17BAIO.safetensors (ACE-Step 1.5 SFT merge)

Workflow Pipeline:
  CheckpointLoader(160) → AnySwitch(model/clip/vae) → TextEncode(94) → KSampler(35 steps, dpmpp_3m_sde, beta, cfg=1) → VAEDecodeTiled → SaveAudioMP3(104)
  Lyrics: String(252) → TextEncode.lyrics
  Duration: mxSlider(274) → TextEncode + EmptyLatent
  Negative: ConditioningZeroOut(47) → zeroes the positive conditioning

Node Map

Node	Class	Role	Injections
94	`TextEncodeAceStepAudio1.5`	Main text encoder	`prompt` → `tags`, `lyrics` ← 252, `bpm`, `keyscale`, `duration` ← 274, `language`
252	`String`	Lyrics feed into node 94	`lyrics` → `String`
3	`KSampler`	Denoising (35 steps, dpmpp_3m_sde, beta, cfg=1)	`seed` ← 307
98	`EmptyAceStep1.5LatentAudio`	Creates latent audio space	`seconds` ← 274
104	`SaveAudioMP3`	Output V0 MP3	—
128	`VAEDecodeAudioTiled`	VAE decode (tile=512, overlap=64)	—
160	`CheckpointLoaderSimple`	Loads model	—
274	`mxSlider`	Song duration (seconds)	`duration` → `Xi` and `Xf`
307	`Seed (rgthree)`	Global seed	`seed` → `seed`
257	`Text Concatenate`	Builds output filename	artist+title+path
47	`ConditioningZeroOut`	Negative prompt (zeroed)	—
78	`ModelSamplingAuraFlow`	Shift=13	Bypassed by default — use `model_sampling: true` to enable

Reference Nodes (informational, in workflow but not connected)

Node	Content
317	Genre description table (38 genres with tags)
318	Keyscale/BPM reference table (38 genres × scale + key + BPM)
320	Structure example (metalcore duet with timeline)
321	Preset example (detailed scene-by-scene prompt)
319	LLM input example (NSFW lyrics prompt format)
400	Disconnected tags node (original rapcore tags, kept for reference)

Generation Parameters

{
  "workflow": "acestep-rapcore",
  "prompt": "comma-separated tags (under 512 chars)",
  "lyrics": "structured lyrics with [section] tags",
  "duration": 180,
  "bpm": 150,
  "keyscale": "E minor",
  "language": "en",
  "seed": -1
}

All parameters EXCEPT prompt and lyrics are optional. Omitted parameters keep their workflow defaults.

model_sampling (optional, boolean): Enables ModelSamplingAuraFlow (shift=13) for acestep-aio. Bypassed by default — it's 50/50 whether it improves quality, so safer to leave off. Set model_sampling: true if you want to experiment with it on.

`lustify-sdxl` — Image Generation

Model: LUSTIFY SDXL NSFW photorealistic
Sampler: LCM, 4 steps, cfg=1
Output: PNG

{
  "workflow": "lustify-sdxl",
  "prompt": "photo of...",
  "aspect_ratio": "896x1152",
  "seed": -1
}

Supports GET /generate for UI options form.

Caption Engineering (ACE-Step)

The 8 Dimensions

Every caption should cover as many as possible, in 5-8 comma-separated tags:

Style/Genre — metalcore, synthwave, drum and bass, pop, folk
Emotion/Atmosphere — melancholic, euphoric, aggressive, dreamy, dark
Instruments — distorted guitar, 808 bass, strings, piano, synths
Timbre/Texture — warm, crisp, punchy, lush, airy, bright
Vocal — male/female, raspy, clean, powerful, breathy, belting
Production — polished, lo-fi, live, studio, dry, glossy
Era — 80s, 90s, modern, retro, vintage
Speed/Rhythm — driving, groovy, frantic, mid-tempo, laid-back

Rules

5-8 tags max — more degrades quality
BPM/key in parameters, NOT caption — they're separate fields
No conflicting pairs — e.g. "classical strings" + "death metal growls"
Texture words matter heavily — they control mix/production quality
Specific > vague — "melancholic piano ballad, female breathy vocal" > "sad song"
Repeat what you want more of — repetition reinforces

Known Good Captions

pop, piano+strings+guitar, female warm vocal, melancholic intimate, bedroom pop

rock, metal, heavy distorted guitar, powerful drums, melodic vocals, aggressive, epic, dramatic, guitar solo

heavy distorted guitar, fast thrash drums, pounding bass, aggressive, dark

rapcore metal fusion, nu-metal, punchy bass, warm distorted guitar, crisp drums, melodic chorus, heavy grooves, atmospheric, polished production, angsty female vocal, emotional

Tags That Cause Problems

raw, gritty, distorted (without balancing warmth) → metallic scraping, flat bass
heavy bass → boomy/muddy; prefer punchy bass, deep sub-bass, defined bass
aggressive on instruments → harsh overtones; use on emotion/vocal instead
Too many instrument tags → cluttered, muddy mix
"classical" + any heavy genre → contradictory, degrades both

Texture Word Guide

Word	Effect
`warm`	Analog-style saturation, smooth high end
`crisp`	Clean transients, defined attacks
`punchy`	Tight, compressed low-mids, good for bass/kick
`bright`	Boosted highs, airy presence
`lush`	Wide stereo, rich harmonics, reverb-heavy
`dry`	Close-mic sound, minimal reverb
`airy`	Spacious high end, breathy
`polished`	Studio-quality, balanced EQ
`raw`	USE WITH CAUTION — unprocessed, potentially harsh
`gritty`	USE WITH CAUTION — distortion artifacts

Lyrics Engineering (ACE-Step)

Required Structure Tags

ACE-Step REQUIRES section markers to align music with lyrics:

[Intro], [Verse], [Pre-Chorus], [Chorus], [Bridge], [Build], [Drop],
[Breakdown], [Guitar Solo], [Piano Interlude], [Outro]

Vocal Control Tags (on own line inside sections)

[whispered], [raspy vocal], [powerful belting], [spoken word],
[falsetto], [harmonies], [clean vocal]

Energy Tags (on own line inside sections)

[high energy], [low energy], [building energy], [euphoric],
[melancholic], [dreamy], [aggressive]

Lyric Writing Rules

6-10 syllables per line — fits the 5Hz LM planner
Natural phrasing — write like human speech, not poetry
Avoid AI clichés: "neon skies", "electric hearts/dreams", "breaking chains", "rising up", "fire inside"
Section description hints on intro/outro lines: (bass rumbles in), (drums fade to silence)
UPPERCASE = shouted/emphasized
(parentheses) = background vocals/harmonies

🔴 OBOWIĄZKOWA CHECKLISTA PRZED WYSŁANIEM TEKSTU DO GENERACJI

Zanim wyślesz jakikolwiek tekst do ACE-Step — musisz odpowiedzieć sobie na każde z tych pytań i nie wysłać dopóki wszystkie nie są "TAK":

„Czy ten tekst ma sens?” — czy opowiada spójną historię? Czy ma flow od intro do outro? Czy sekcje łączą się logicznie?
„Czy jest gramatycznie poprawny?” — bez błędów ortograficznych, interpunkcyjnych, składniowych. Sprawdź szczególnie polskie znaki, odmianę, przecinki.
„Czy pasuje do autora/projektu?” — czy ton, styl, przekleństwa, energia pasują do artysty (KOSTI/Bonnie Bones)? Czy brzmi jak ta postać?
„Czy muzyka i jej kolejność ma sens?” — czy struktura (Intro→Verse→Chorus→Verse→Bridge→Chorus→Outro) jest logiczna? Czy energia rośnie i opada naturalnie? Czy długość ogólnie ma sens (~120-180s)?
„Czy duration jest odpowiednie?” — 120-180 sekund standard. NIGDY nie wysyłaj duration=150 jeśli nie sprawdziłeś że tyle ma być.
„Czy wiek autora brzmi wiarygodnie?” — nie pisz „mam 15 lat”, „young girl”, „teen” w tekstach dorosłych artystów. KOSTI/Bonnie Bones to dorośli wykonawcy.

Dopiero gdy na każde pytanie odpowiedź brzmi TAK — możesz wysłać do generacji.

Energy Flow Pattern

Intro       → [low energy]       — sparse, building
Verse 1     → [low energy]       — verse, storytelling, restrained
Pre-Chorus  → [building energy]  — tension rising
Chorus      → [high energy]      — maximum impact, full instrumentation
Verse 2     → [low energy]       — second verse, slightly more energy
Pre-Chorus  → [building energy]
Chorus      → [high energy]      — second chorus often bigger (harmonies)
Bridge      → [low energy]       — stripped back, different perspective
Breakdown   → [high energy]      — instrumental intensity (optional)
Final Chorus→ [high energy]      — biggest version
Outro       → [low energy]       — fade out

Genre Reference (from workflow node 317)

Key Genres & Their Tags

Electronic

EDM/House: four-on-the-floor, bright synths, uplifting, dance-driven, glossy production, rhythmic, energetic
Techno: mechanical, hypnotic rhythms, minimalistic, pulsing bass, industrial textures, dark, repetitive
Trance: euphoric, soaring leads, emotional pads, rolling basslines, uplifting, spacious, melodic, anthemic
Drum & Bass: rapid breakbeats, deep sub-bass, high-energy, sharp percussion, rolling rhythms, crisp, driving
Dubstep: heavy bass drops, wobbling synths, aggressive textures, syncopated rhythms, dark, cinematic, gritty
Future Bass: shimmering chords, side-chained synths, emotional, bright leads, bouncy rhythms, glossy, melodic
Trap: booming 808s, sharp hi-hats, atmospheric pads, swaggering, dark, punchy, spacious

Rock/Metal

Classic Rock: crunchy guitars, steady drums, warm analog tone, energetic, melodic, vintage, riff-driven
Hard Rock: heavy riffs, powerful drums, gritty vocals, aggressive, energetic, distorted, bold, driving
Metal: distorted guitars, fast drums, dark atmosphere, aggressive, heavy, intense, powerful, tight
Progressive Metal: complex structures, technical riffs, atmospheric layers, dramatic, epic, polished, dynamic

Urban

Boom Bap: dusty drums, soulful samples, rhythmic, warm textures, punchy kicks, nostalgic, organic
Lo-Fi Hip-Hop: mellow beats, vinyl crackle, soft keys, relaxed, dreamy, warm, minimal, hazy
Drill: sliding 808s, haunting melodies, gritty textures, cold atmosphere, syncopated, tense, urban

Pop

Pop: catchy hooks, bright synths, polished production, upbeat, melodic, modern, radio-ready, clean
Synth-Pop: retro synths, bright pads, melodic, nostalgic, electronic, polished, dreamy, airy
K-Pop: glossy production, bright synths, genre-blending, catchy hooks, polished, theatrical, vibrant

Soft/Ambient

Ambient: soft pads, atmospheric textures, spacious, minimal, calm, evolving, dreamy, subtle, meditative
Cinematic: sweeping strings, dramatic percussion, epic, emotional, grand, polished, powerful

Keyscale & BPM Reference (from workflow node 318)

Genre	Scale	Key Range	BPM Range
EDM/House	Minor, Dorian	D#m–Am	120–128
Techno	Phrygian, Minor	Fm–A#m	125–135
Trance	Major, Mixolydian	A–D	130–142
Drum & Bass	Minor, Dorian	Em–Gm	170–178
Dubstep	Minor, Phrygian	Fm–G#m	138–150
Future Bass	Major, Minor	C–F	140–160
Trap	Harmonic Minor	Fm–Am	130–150
Hip-Hop	Minor, Dorian	Dm–Gm	85–95
Lo-Fi	Dorian, Lydian	Cm–Fm	60–85
Pop	Major, Mixolydian	C–G	90–130
Classic Rock	Minor Pentatonic	Em–Am	100–140
Hard Rock	Minor, Phrygian	Em–Gm	120–160
Metal	Phrygian, Harmonic Minor	Dm–F#m	140–200
Prog Metal	Dorian, Melodic Minor	C#m–F#m	120–180
Blues	Blues Scale, Minor Pentatonic	Em–Am	70–120
Funk	Mixolydian, Dorian	E–A	100–120
Disco	Mixolydian, Major	F–Bb	110–130
R&B	Dorian, Minor	Dm–Gm	60–100
Ambient	Lydian, Dorian	C–F	60–90
Cinematic	Minor, Harmonic Minor	Cm–Fm	60–120
Reggae	Major, Mixolydian	A–D	70–90
K-Pop	Major, Minor	C–F#	100–140
Anime OST	Lydian, Major	C–E	80–160

Structure Planning (from workflow node 320)

The workflow includes an example of how to structure a caption WITH a song structure plan:

metalcore, symphonic elements, theatrical, duet, heavy distorted guitar,
bright piano, studio-polished, dramatic, melodic, epic, intense.

Structure:
- Intro: brief intro dramatically builds to first verse
- Verse 1: atmospheric piano, sets scene, raspy male vocal only
- Verse 2: guitar power chords, groovy, young female vocal only
- Chorus: anthemic, layered, male+female duet harmonies
- Bridge: atmospheric, dreamy, calm, female vocal only
- Build-up: builds to epic instrumental solo
- Instrumental: fast guitar solo, lead licks, virtuoso shred
- End: powerful ending

This can go in the caption to give the model a temporal roadmap.

Scene-by-Scene Prompting (from workflow node 321)

For maximum control, describe each section's instrumentation and mood in prose:

Intro: A metalcore-tinged, symphonic swell opens the track, with bright piano glimmering
over theatrical strings. Tension rises—studio-polished, dramatic—until it snaps into verse.

Verse 1: Drops to atmospheric piano, soft but charged. Raspy male vocal, intimate, whispered.
No guitars—just piano, subtle pads, suspended breath.

Verse 2: Guitar power chords crash in, groovy pulse. Young female vocal, bright and soaring.
Symphonic elements widen the space, cinematic lift.

Chorus: Erupts into anthemic, epic chorus. Male+female duet harmonies. Distorted guitars,
sweeping strings, pounding drums—polished, intense.

Bridge: Everything falls away. Dreamy, atmospheric, weightless. Soft pads, distant piano,
female vocal airy and ethereal. Suspended.

Build-up: Rhythmic pulses return. Low strings, tom rolls, rising synths. Guitars re-enter
in bursts. Energy coils toward instrumental break.

Instrumental: Fast guitar solo, virtuoso shred, rapid licks, melodic flourishes.
Symphonic backing, metalcore precision drums. Flashy, intense, climactic.

Full API Reference

Core Endpoints

Method	Path	Description
GET	`/`	Health check + server statuses
GET	`/workflows`	List available workflows with types
POST	`/generate-and-wait`	PRIMARY — submit, wait, download, save. Use this for all generation.
POST	`/prompt`	Submit workflow, return prompt_id
GET	`/history/:prompt_id`	Get single prompt result
GET	`/history`	Aggregated history from all servers
GET	`/queue`	Aggregated queue (running + pending)
GET	`/view`	Proxy media file download
GET	`/system_stats`	First alive server system info
GET	`/object_info`	Proxy to ComfyUI object_info
GET	`/extensions`	Proxy to ComfyUI extensions

Image Generation (legacy, use generate-and-wait instead)

Method	Path	Description
GET	`/generate`	Get generation options form
POST	`/generate`	Submit image generation
POST	`/upload/image`	Upload image to ComfyUI input dir

Media Management

Method	Path	Description
GET	`/media-list`	List generated files (name, size, date, preview URLs)
POST	`/media-link-once`	Create one-time access token for a file
GET	`/media-once/:token`	Access file via one-time token (no API key needed)

Workflow Injection

Method	Path	Description
POST	`/workflow/:name/prompt`	Quick prompt submit for named workflow (auto-injects)

`POST /generate-and-wait` — Full Reference

curl -s -X POST http://127.0.0.1:8188/generate-and-wait \
  -H "Content-Type: application/json" \
  -d '{
    "workflow": "acestep-rapcore",
    "prompt": "...",
    "lyrics": "...",
    "duration": 200,
    "bpm": 150,
    "keyscale": "E minor",
    "language": "en",
    "seed": -1
  }'

Audio params: prompt (required), lyrics, duration, bpm, keyscale, language, seed
Image params: prompt (required), aspect_ratio, seed, steps, cfg
Common: workflow (default: acestep-rapcore), client_id

Success response:

{
  "status": "ok",
  "file": "/home/genorbox1/.openclaw/workspace/media/comfy/audio/acestep-rapcore_2026-05-19T13-26-54_028.mp3",
  "filename": "acestep-rapcore_2026-05-19T13-26-54_028.mp3",
  "type": "audio",
  "server": "sec",
  "workflow": "acestep-rapcore",
  "file_size": 5882890
}

Output saved with metadata sidecar (.json) in ~/media/comfy/\x3Caudio|images>/.

Operational Notes

Restart

pm2 restart genor-comfy-gate
pm2 logs genor-comfy-gate --lines 20

Status Check

curl -s http://127.0.0.1:8188/ | python3 -m json.tool
curl -s http://127.0.0.1:8188/queue | python3 -m json.tool

Media Location

~/media/comfy/audio/    — generated MP3 files + .json sidecars
~/media/comfy/images/   — generated PNG files + .json sidecars

Gateway Behavior

Submits workflow JSON with injected parameters
Polls /history/:prompt_id every 2s until complete/fail/timeout
Timeout: 600s (10 min) per generation
After completion: waits 3s for file write, then downloads
Saves to media dir with timestamped name + incrementing sequence number
Metadata sidecar written alongside media file

Growing Our Knowledge

When we discover new caption patterns, texture word effects, or workflow tricks:

Update this SKILL.md
Note the date and what we learned in CHANGELOG.md (next to this skill)

Lessons Learned

`lustify-sdxl` — Image Generation Deep Dive

Model: lustifySDXLNSFW_ggwpV7.safetensors — Illustrious-based SDXL checkpoint Tag system: Danbooru-style tags, NOT natural language Sampler: LCM, 12 steps, scheduler=exponential, cfg=1 Output: PNG via SaveImage (node 200) + PreviewImage (node 87)

Full Pipeline

CheckpointLoader(43) → LoRA stack(47,80) → Resolution(17) → KSampler(7, 12 steps LCM) →
  UltimateSDUpscale(88, 2x, 4x-UltraSharp) →
  FaceDetailer NIP(97) → FaceDetailer V(98) → FaceDetailer P(101) →
  FaceDetailer face(104, 1024px, 6 steps) → FaceDetailer hands(105, 2048px, 6 steps) →
  SeedVR2VideoUpscaler(114, 2048px final) → CRT Post-Process(115) → SaveImage(200)

Active LoRAs (node 80)

LoRA	Strength	Purpose
AddMicroDetails v6	0.2	Skin texture, fine details
PersonEnhanceV2 ILL	0.1	Better anatomy/face
TrendCraft Style Detailer v2.4I	0.1	Overall polish/detail

Active LoRAs (node 47)

LoRA	Strength	Purpose
DTLVVTT DMD2 V5-LITE	1.0	DMD2 distillation (faster/better LCM)

FaceDetailer Pipeline

Sequential detailers with YOLO detectors:

NIP (nipples_yolov8s-seg.pt) — nipple detection, 1024px, denoise 0.4
V (nsfw-seg-vagina-x.pt) — vagina detection, 1024px, denoise 0.4
P (nsfw-seg-penis-x.pt) — penis detection, 1024px, denoise 0.4
Face (Anzhc Face seg 768MS v2 y8n.pt) — face detection, 1024px, 6 steps, denoise 0.4
Hands (PitHandDetailer-v2-Test-v9c.pt) — hand detection, 2048px, 6 steps, denoise 0.5

SeedVR2 Upscaler (node 114)

Model: seedvr2_ema_7b_sharp-Q4_K_M.gguf (quantized 7B)
VAE: ema_vae_fp16.safetensors
Final resolution: 2048
Color correction: lab

CRT Post-Process (node 115)

Vibrance: +0.015 (subtle saturation boost)
Vignette: 0.5 strength, 0.7 radius, 2.0 softness

Danbooru Tag Prompting (LUSTIFY)

CRITICAL: LUSTIFY is Illustrious-based — use Danbooru-format tags, NOT natural language descriptions.

Quality/Priority Tags (always include)

masterpiece, best quality, amazing quality, very aesthetic, absurdres

Subject Tags

1girl, solo, cute, petite, pale skin, medium breasts

Clothing/Accessories

gym uniform, white shirt, sports shorts, sneakers, ponytail

Action/Pose (keep it SIMPLE — complex actions confuse the model)

jumping, dynamic pose, looking at viewer

Setting/Light

gym background, afternoon light, dutch angle, from below

Negative Prompt (always)

blurry, worst quality, bad quality, error, melted body, bad anatomy, bad hands, disfigured

What Works

Character portraits work best — this is a hentai/character model
Simple dynamic poses (jumping, running, leaning) — YES
Quality tags first — masterpiece, best quality are weighted
POV/camera tags — dutch angle, from below, from above, close-up
Lighting tags — sunlight, god rays, afternoon light, backlight
Keep tags under ~25 — more dilutes quality

What Fails

Natural language descriptions — "mid-jump over a vaulting horse" → model doesn't understand
Complex multi-object composition — "vaulting horse + girl midair" = garbled anatomy
"photorealistic" tag — fights the anime/illustrious base, produces uncanny results
Overloaded action tags — "jumping + spread legs + leaning forward + vaulting horse" = nightmare
Multiple characters — this workflow is tuned for 1girl, solo

Image Generation Parameters

{
  "workflow": "lustify-sdxl",
  "prompt": "masterpiece, best quality, 1girl, cute, ...",
  "aspect_ratio": "7:9 (Portrait)",
  "seed": -1
}

Valid aspect ratios:

1:1 (Square)
4:5 (Portrait)
7:9 (Portrait) ← default, best for single character
3:2 (Landscape)
16:9 (Landscape)
9:16 (Portrait)

Additional optional params: megapixels (default 1.5), steps, cfg, denoise, sampler_name, scheduler

Adding a New Workflow (any modality)

Export workflow JSON from ComfyUI → save to workflows/\x3Cname>.json

Add entry to WORKFLOW_INFO in server.js:

'\x3Cname>': { file: '\x3Cname>.json', type: 'audio'|'image'|'video', ext: 'mp3'|'png'|'mp4',
            promptNode: '94', promptField: 'tags', lyricsNode: '252', lyricsField: 'String',
            outputNode: '104' }

Restart: pm2 restart genor-comfy-gate
Test, then document in this SKILL.md

The gateway auto-handles: prompt injection, duration, BPM/keyscale (audio), aspect_ratio (image), seed, polling, download from correct server, save to media dir, metadata sidecar.

Lessons Learned

2026-05-19 — Image Generation

LUSTIFY is Illustrious-based, uses Danbooru tags — natural language prompts produce garbled results
Quality tags (masterpiece, best quality) must come FIRST — they're weighted
Complex action scenes fail — model is trained for character portraits, keep poses simple
"photorealistic" tag on anime model = uncanny valley, avoid
Keep prompts under 25 tags — overloading dilutes quality
Pipeline has SeedVR2 upscaler (7B GGUF) + 5-stage FaceDetailer → 2048px final output
Face/hand detailers produce excellent close-up quality

2026-05-19 — Audio Generation

Download 400 bug: getOutputInfo() function returned undefined filenames despite reading them from history correctly. Fixed by inlining output scanning in the handler.
Load balancer: PRIMARY-first when idle, ALL→SECONDARY when PRIMARY busy (not round-robin).
Workflow cleanup: Removed duplicate nodes 401, 402. Lyrics now go through node 252 (String) → node 94.
Caption quality: raw, gritty, heavy drops cause metallic scraping and flat bass. Use warm, crisp, punchy, polished for clean instruments.
5-8 tags sweet spot for SFT merge model. More degrades quality.
8 dimensions matter: Missing emotion/timbre = flat results. Cover: genre, emotion, instruments, timbre, vocal, production, era, rhythm.

安全使用建议

Install only if you intend to run a persistent local media gateway. Before use, set a strong API_KEY, verify which backend ComfyUI servers will receive prompts, restrict the service to trusted local clients, review or remove MCP restart/upload tools, and treat generated media links as bearer links rather than true one-time private links.

能力标签

cryptorequires-oauth-tokenrequires-sensitive-credentials

能力评估

⚠ Purpose & Capability

The stated purpose, a ComfyUI/OpenAI media gateway, matches much of the code, including generation, workflow management, and media storage. The concern is that the same gateway also exposes high-impact controls: MCP service restart, raw workflow execution, workflow upload, workflow deletion, tokenized media access, and default routing to hardcoded backend server addresses.

⚠ Instruction Scope

The instructions disclose many endpoints, but the sensitive surfaces are under-scoped: some workflow and media listing endpoints are unauthenticated, MCP clients can invoke restart and workflow upload after only gateway auth, and auth silently falls back to a hardcoded published API key if API_KEY is unset. The docs also mention GCG_API_KEY, while the code checks API_KEY, increasing the chance of an insecure default deployment.

⚠ Install Mechanism

The installer runs npm install, installs PM2 globally if missing, starts the gateway under PM2, and saves the PM2 process list. That persistent, host-level setup is not hidden, but it is high-impact for a skill package and lacks an explicit opt-in confirmation.

⚠ Credentials

Writing generated media and metadata sidecars is expected for this gateway, but prompts and generation metadata can persist locally, generated requests are sent to hardcoded HTTP backend servers by default, and tokenized media URLs grant access to anyone holding the link until expiry.

⚠ Persistence & Privilege

PM2 persistence, a background queue monitor, and an MCP-exposed restart command give the skill ongoing process-control authority beyond an ordinary on-demand skill. This is not evidence of malicious intent, but it requires clearer scoping and user control.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install genor-comfy-gate
安装完成后，直接呼叫该 Skill 的名称或使用 /genor-comfy-gate 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v2.0.0

v2.0.0 — MAJOR: Bundled entire MCP server into skill package! Now includes server.js, lib/, install.sh, pm2 config. Install with clawhub and run directly. No more separate project checkout needed. Removed lustify-sdxl (deprecated). Personal data audit completed. Environment config via env.example.

v1.1.0

v1.1.0 — Personal data audit: removed hardcoded paths, API key from console logs, hostname from fallbacks. Fixed MCP restart handler (ESM compat). Made MEDIA_DIR configurable via env. Added env.example.

元数据

Slug genor-comfy-gate

版本 2.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 2

常见问题

Genor-Comfy-Gate 是什么？

Comprehensive multi-modal gateway for ComfyUI enabling audio generation with ACE-Step 1.5 and photorealistic image creation via SDXL workflows. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 31 次。

如何安装 Genor-Comfy-Gate？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install genor-comfy-gate」即可一键安装，无需额外配置。

Genor-Comfy-Gate 是免费的吗？

是的，Genor-Comfy-Gate 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Genor-Comfy-Gate 支持哪些平台？

Genor-Comfy-Gate 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Genor-Comfy-Gate？

由 Krzysztof（@genortg）开发并维护，当前版本 v2.0.0。

Genor-Comfy-Gate

Genor-Comfy-Gate — Comprehensive Skill

Modalities

Gateway

Backend Servers

Load Balancing Logic (in pickServer())

Workflows

acestep-rapcore — ACE-Step 1.5 Audio Generation

Node Map

Reference Nodes (informational, in workflow but not connected)

Generation Parameters

lustify-sdxl — Image Generation

Caption Engineering (ACE-Step)

The 8 Dimensions

Rules

Known Good Captions

Tags That Cause Problems

Texture Word Guide

Lyrics Engineering (ACE-Step)

Required Structure Tags

Vocal Control Tags (on own line inside sections)

Energy Tags (on own line inside sections)

Lyric Writing Rules

🔴 OBOWIĄZKOWA CHECKLISTA PRZED WYSŁANIEM TEKSTU DO GENERACJI

Energy Flow Pattern

Genre Reference (from workflow node 317)

Key Genres & Their Tags

Keyscale & BPM Reference (from workflow node 318)

Structure Planning (from workflow node 320)

Scene-by-Scene Prompting (from workflow node 321)

Full API Reference

Core Endpoints

Image Generation (legacy, use generate-and-wait instead)

Media Management

Workflow Injection

POST /generate-and-wait — Full Reference

Operational Notes

Restart

Status Check

Media Location

Gateway Behavior

Growing Our Knowledge

Lessons Learned

lustify-sdxl — Image Generation Deep Dive

Full Pipeline

Active LoRAs (node 80)

Active LoRAs (node 47)

FaceDetailer Pipeline

SeedVR2 Upscaler (node 114)

CRT Post-Process (node 115)

Danbooru Tag Prompting (LUSTIFY)

Quality/Priority Tags (always include)

Subject Tags

Clothing/Accessories

Action/Pose (keep it SIMPLE — complex actions confuse the model)

Setting/Light

Negative Prompt (always)

What Works

What Fails

Image Generation Parameters

Adding a New Workflow (any modality)

Lessons Learned

2026-05-19 — Image Generation

2026-05-19 — Audio Generation

Genor-Comfy-Gate 是什么？

如何安装 Genor-Comfy-Gate？

Genor-Comfy-Gate 是免费的吗？

Genor-Comfy-Gate 支持哪些平台？

谁开发了 Genor-Comfy-Gate？

💬 留言讨论

Load Balancing Logic (in `pickServer()`)

`acestep-rapcore` — ACE-Step 1.5 Audio Generation

`lustify-sdxl` — Image Generation

`POST /generate-and-wait` — Full Reference

`lustify-sdxl` — Image Generation Deep Dive