/install genor-comfy-gate
Genor-Comfy-Gate — Comprehensive Skill
THE authoritative reference for ALL ComfyUI operations through our gateway. Multi-modal: audio, images, video (future). Read this before any generation. Updated as we learn.
Modalities
| Type | Status | Workflow | Model |
|---|---|---|---|
| 🎵 Audio | ✅ Active | acestep-rapcore |
ACE-Step 1.5 SFT merge |
| 🖼️ Image | ✅ Active | lustify-sdxl |
LUSTIFY SDXL |
| 🎬 Video | 🔜 Planned | — | — |
The gateway is modality-agnostic — it submits any workflow JSON to ComfyUI, polls, waits, downloads, and saves. Adding a new modality means adding a workflow file + WORKFLOW_INFO entry. The type field determines output dir (audio/ or images/).
Gateway
| Property | Value |
|---|---|
| Endpoint | http://127.0.0.1:8188 |
| Auth | x-api-key: gcg-4d... header (localhost exempt) |
| Managed by | pm2 (genor-comfy-gate) |
| Location | ~/projects/Genor-Comfy-Gate/ |
| Config | server.js (inline SERVERS array) |
Backend Servers
| ID | URL | GPU | VRAM | Priority |
|---|---|---|---|---|
| pri | http://100.125.137.96:8169 |
RTX 3090 | 24GB | ★ PRIMARY |
| sec | http://100.80.161.74:8169 |
RTX 3080 Laptop | 16GB | Secondary |
Load Balancing Logic (in pickServer())
- PRIMARY always preferred when IDLE (0 running tasks)
- If PRIMARY has ANY running task → ALL new requests → SECONDARY
- If SECONDARY offline → fallback to PRIMARY regardless
- Download ALWAYS from the server that generated the file (
server.url)
Workflows
acestep-rapcore — ACE-Step 1.5 Audio Generation
Model: aceStep15Music_sft17BAIO.safetensors (ACE-Step 1.5 SFT merge)
Workflow Pipeline:
CheckpointLoader(160) → AnySwitch(model/clip/vae) → TextEncode(94) → KSampler(35 steps, dpmpp_3m_sde, beta, cfg=1) → VAEDecodeTiled → SaveAudioMP3(104)
Lyrics: String(252) → TextEncode.lyrics
Duration: mxSlider(274) → TextEncode + EmptyLatent
Negative: ConditioningZeroOut(47) → zeroes the positive conditioning
Node Map
| Node | Class | Role | Injections |
|---|---|---|---|
| 94 | TextEncodeAceStepAudio1.5 |
Main text encoder | prompt → tags, lyrics ← 252, bpm, keyscale, duration ← 274, language |
| 252 | String |
Lyrics feed into node 94 | lyrics → String |
| 3 | KSampler |
Denoising (35 steps, dpmpp_3m_sde, beta, cfg=1) | seed ← 307 |
| 98 | EmptyAceStep1.5LatentAudio |
Creates latent audio space | seconds ← 274 |
| 104 | SaveAudioMP3 |
Output V0 MP3 | — |
| 128 | VAEDecodeAudioTiled |
VAE decode (tile=512, overlap=64) | — |
| 160 | CheckpointLoaderSimple |
Loads model | — |
| 274 | mxSlider |
Song duration (seconds) | duration → Xi and Xf |
| 307 | Seed (rgthree) |
Global seed | seed → seed |
| 257 | Text Concatenate |
Builds output filename | artist+title+path |
| 47 | ConditioningZeroOut |
Negative prompt (zeroed) | — |
| 78 | ModelSamplingAuraFlow |
Shift=13 | Bypassed by default — use model_sampling: true to enable |
Reference Nodes (informational, in workflow but not connected)
| Node | Content |
|---|---|
| 317 | Genre description table (38 genres with tags) |
| 318 | Keyscale/BPM reference table (38 genres × scale + key + BPM) |
| 320 | Structure example (metalcore duet with timeline) |
| 321 | Preset example (detailed scene-by-scene prompt) |
| 319 | LLM input example (NSFW lyrics prompt format) |
| 400 | Disconnected tags node (original rapcore tags, kept for reference) |
Generation Parameters
{
"workflow": "acestep-rapcore",
"prompt": "comma-separated tags (under 512 chars)",
"lyrics": "structured lyrics with [section] tags",
"duration": 180,
"bpm": 150,
"keyscale": "E minor",
"language": "en",
"seed": -1
}
All parameters EXCEPT prompt and lyrics are optional. Omitted parameters keep their workflow defaults.
model_sampling (optional, boolean): Enables ModelSamplingAuraFlow (shift=13) for acestep-aio. Bypassed by default — it's 50/50 whether it improves quality, so safer to leave off. Set model_sampling: true if you want to experiment with it on.
lustify-sdxl — Image Generation
Model: LUSTIFY SDXL NSFW photorealistic
Sampler: LCM, 4 steps, cfg=1
Output: PNG
{
"workflow": "lustify-sdxl",
"prompt": "photo of...",
"aspect_ratio": "896x1152",
"seed": -1
}
Supports GET /generate for UI options form.
Caption Engineering (ACE-Step)
The 8 Dimensions
Every caption should cover as many as possible, in 5-8 comma-separated tags:
- Style/Genre — metalcore, synthwave, drum and bass, pop, folk
- Emotion/Atmosphere — melancholic, euphoric, aggressive, dreamy, dark
- Instruments — distorted guitar, 808 bass, strings, piano, synths
- Timbre/Texture — warm, crisp, punchy, lush, airy, bright
- Vocal — male/female, raspy, clean, powerful, breathy, belting
- Production — polished, lo-fi, live, studio, dry, glossy
- Era — 80s, 90s, modern, retro, vintage
- Speed/Rhythm — driving, groovy, frantic, mid-tempo, laid-back
Rules
- 5-8 tags max — more degrades quality
- BPM/key in parameters, NOT caption — they're separate fields
- No conflicting pairs — e.g. "classical strings" + "death metal growls"
- Texture words matter heavily — they control mix/production quality
- Specific > vague — "melancholic piano ballad, female breathy vocal" > "sad song"
- Repeat what you want more of — repetition reinforces
Known Good Captions
pop, piano+strings+guitar, female warm vocal, melancholic intimate, bedroom pop
rock, metal, heavy distorted guitar, powerful drums, melodic vocals, aggressive, epic, dramatic, guitar solo
heavy distorted guitar, fast thrash drums, pounding bass, aggressive, dark
rapcore metal fusion, nu-metal, punchy bass, warm distorted guitar, crisp drums, melodic chorus, heavy grooves, atmospheric, polished production, angsty female vocal, emotional
Tags That Cause Problems
raw,gritty,distorted(without balancing warmth) → metallic scraping, flat bassheavy bass→ boomy/muddy; preferpunchy bass,deep sub-bass,defined bassaggressiveon instruments → harsh overtones; use on emotion/vocal instead- Too many instrument tags → cluttered, muddy mix
- "classical" + any heavy genre → contradictory, degrades both
Texture Word Guide
| Word | Effect |
|---|---|
warm |
Analog-style saturation, smooth high end |
crisp |
Clean transients, defined attacks |
punchy |
Tight, compressed low-mids, good for bass/kick |
bright |
Boosted highs, airy presence |
lush |
Wide stereo, rich harmonics, reverb-heavy |
dry |
Close-mic sound, minimal reverb |
airy |
Spacious high end, breathy |
polished |
Studio-quality, balanced EQ |
raw |
USE WITH CAUTION — unprocessed, potentially harsh |
gritty |
USE WITH CAUTION — distortion artifacts |
Lyrics Engineering (ACE-Step)
Required Structure Tags
ACE-Step REQUIRES section markers to align music with lyrics:
[Intro], [Verse], [Pre-Chorus], [Chorus], [Bridge], [Build], [Drop],
[Breakdown], [Guitar Solo], [Piano Interlude], [Outro]
Vocal Control Tags (on own line inside sections)
[whispered], [raspy vocal], [powerful belting], [spoken word],
[falsetto], [harmonies], [clean vocal]
Energy Tags (on own line inside sections)
[high energy], [low energy], [building energy], [euphoric],
[melancholic], [dreamy], [aggressive]
Lyric Writing Rules
- 6-10 syllables per line — fits the 5Hz LM planner
- Natural phrasing — write like human speech, not poetry
- Avoid AI clichés: "neon skies", "electric hearts/dreams", "breaking chains", "rising up", "fire inside"
- Section description hints on intro/outro lines:
(bass rumbles in),(drums fade to silence) - UPPERCASE = shouted/emphasized
- (parentheses) = background vocals/harmonies
🔴 OBOWIĄZKOWA CHECKLISTA PRZED WYSŁANIEM TEKSTU DO GENERACJI
Zanim wyślesz jakikolwiek tekst do ACE-Step — musisz odpowiedzieć sobie na każde z tych pytań i nie wysłać dopóki wszystkie nie są "TAK":
- „Czy ten tekst ma sens?” — czy opowiada spójną historię? Czy ma flow od intro do outro? Czy sekcje łączą się logicznie?
- „Czy jest gramatycznie poprawny?” — bez błędów ortograficznych, interpunkcyjnych, składniowych. Sprawdź szczególnie polskie znaki, odmianę, przecinki.
- „Czy pasuje do autora/projektu?” — czy ton, styl, przekleństwa, energia pasują do artysty (KOSTI/Bonnie Bones)? Czy brzmi jak ta postać?
- „Czy muzyka i jej kolejność ma sens?” — czy struktura (Intro→Verse→Chorus→Verse→Bridge→Chorus→Outro) jest logiczna? Czy energia rośnie i opada naturalnie? Czy długość ogólnie ma sens (~120-180s)?
- „Czy duration jest odpowiednie?” — 120-180 sekund standard. NIGDY nie wysyłaj duration=150 jeśli nie sprawdziłeś że tyle ma być.
- „Czy wiek autora brzmi wiarygodnie?” — nie pisz „mam 15 lat”, „young girl”, „teen” w tekstach dorosłych artystów. KOSTI/Bonnie Bones to dorośli wykonawcy.
Dopiero gdy na każde pytanie odpowiedź brzmi TAK — możesz wysłać do generacji.
Energy Flow Pattern
Intro → [low energy] — sparse, building
Verse 1 → [low energy] — verse, storytelling, restrained
Pre-Chorus → [building energy] — tension rising
Chorus → [high energy] — maximum impact, full instrumentation
Verse 2 → [low energy] — second verse, slightly more energy
Pre-Chorus → [building energy]
Chorus → [high energy] — second chorus often bigger (harmonies)
Bridge → [low energy] — stripped back, different perspective
Breakdown → [high energy] — instrumental intensity (optional)
Final Chorus→ [high energy] — biggest version
Outro → [low energy] — fade out
Genre Reference (from workflow node 317)
Key Genres & Their Tags
Electronic
- EDM/House:
four-on-the-floor, bright synths, uplifting, dance-driven, glossy production, rhythmic, energetic - Techno:
mechanical, hypnotic rhythms, minimalistic, pulsing bass, industrial textures, dark, repetitive - Trance:
euphoric, soaring leads, emotional pads, rolling basslines, uplifting, spacious, melodic, anthemic - Drum & Bass:
rapid breakbeats, deep sub-bass, high-energy, sharp percussion, rolling rhythms, crisp, driving - Dubstep:
heavy bass drops, wobbling synths, aggressive textures, syncopated rhythms, dark, cinematic, gritty - Future Bass:
shimmering chords, side-chained synths, emotional, bright leads, bouncy rhythms, glossy, melodic - Trap:
booming 808s, sharp hi-hats, atmospheric pads, swaggering, dark, punchy, spacious
Rock/Metal
- Classic Rock:
crunchy guitars, steady drums, warm analog tone, energetic, melodic, vintage, riff-driven - Hard Rock:
heavy riffs, powerful drums, gritty vocals, aggressive, energetic, distorted, bold, driving - Metal:
distorted guitars, fast drums, dark atmosphere, aggressive, heavy, intense, powerful, tight - Progressive Metal:
complex structures, technical riffs, atmospheric layers, dramatic, epic, polished, dynamic
Urban
- Boom Bap:
dusty drums, soulful samples, rhythmic, warm textures, punchy kicks, nostalgic, organic - Lo-Fi Hip-Hop:
mellow beats, vinyl crackle, soft keys, relaxed, dreamy, warm, minimal, hazy - Drill:
sliding 808s, haunting melodies, gritty textures, cold atmosphere, syncopated, tense, urban
Pop
- Pop:
catchy hooks, bright synths, polished production, upbeat, melodic, modern, radio-ready, clean - Synth-Pop:
retro synths, bright pads, melodic, nostalgic, electronic, polished, dreamy, airy - K-Pop:
glossy production, bright synths, genre-blending, catchy hooks, polished, theatrical, vibrant
Soft/Ambient
- Ambient:
soft pads, atmospheric textures, spacious, minimal, calm, evolving, dreamy, subtle, meditative - Cinematic:
sweeping strings, dramatic percussion, epic, emotional, grand, polished, powerful
Keyscale & BPM Reference (from workflow node 318)
| Genre | Scale | Key Range | BPM Range |
|---|---|---|---|
| EDM/House | Minor, Dorian | D#m–Am | 120–128 |
| Techno | Phrygian, Minor | Fm–A#m | 125–135 |
| Trance | Major, Mixolydian | A–D | 130–142 |
| Drum & Bass | Minor, Dorian | Em–Gm | 170–178 |
| Dubstep | Minor, Phrygian | Fm–G#m | 138–150 |
| Future Bass | Major, Minor | C–F | 140–160 |
| Trap | Harmonic Minor | Fm–Am | 130–150 |
| Hip-Hop | Minor, Dorian | Dm–Gm | 85–95 |
| Lo-Fi | Dorian, Lydian | Cm–Fm | 60–85 |
| Pop | Major, Mixolydian | C–G | 90–130 |
| Classic Rock | Minor Pentatonic | Em–Am | 100–140 |
| Hard Rock | Minor, Phrygian | Em–Gm | 120–160 |
| Metal | Phrygian, Harmonic Minor | Dm–F#m | 140–200 |
| Prog Metal | Dorian, Melodic Minor | C#m–F#m | 120–180 |
| Blues | Blues Scale, Minor Pentatonic | Em–Am | 70–120 |
| Funk | Mixolydian, Dorian | E–A | 100–120 |
| Disco | Mixolydian, Major | F–Bb | 110–130 |
| R&B | Dorian, Minor | Dm–Gm | 60–100 |
| Ambient | Lydian, Dorian | C–F | 60–90 |
| Cinematic | Minor, Harmonic Minor | Cm–Fm | 60–120 |
| Reggae | Major, Mixolydian | A–D | 70–90 |
| K-Pop | Major, Minor | C–F# | 100–140 |
| Anime OST | Lydian, Major | C–E | 80–160 |
Structure Planning (from workflow node 320)
The workflow includes an example of how to structure a caption WITH a song structure plan:
metalcore, symphonic elements, theatrical, duet, heavy distorted guitar,
bright piano, studio-polished, dramatic, melodic, epic, intense.
Structure:
- Intro: brief intro dramatically builds to first verse
- Verse 1: atmospheric piano, sets scene, raspy male vocal only
- Verse 2: guitar power chords, groovy, young female vocal only
- Chorus: anthemic, layered, male+female duet harmonies
- Bridge: atmospheric, dreamy, calm, female vocal only
- Build-up: builds to epic instrumental solo
- Instrumental: fast guitar solo, lead licks, virtuoso shred
- End: powerful ending
This can go in the caption to give the model a temporal roadmap.
Scene-by-Scene Prompting (from workflow node 321)
For maximum control, describe each section's instrumentation and mood in prose:
Intro: A metalcore-tinged, symphonic swell opens the track, with bright piano glimmering
over theatrical strings. Tension rises—studio-polished, dramatic—until it snaps into verse.
Verse 1: Drops to atmospheric piano, soft but charged. Raspy male vocal, intimate, whispered.
No guitars—just piano, subtle pads, suspended breath.
Verse 2: Guitar power chords crash in, groovy pulse. Young female vocal, bright and soaring.
Symphonic elements widen the space, cinematic lift.
Chorus: Erupts into anthemic, epic chorus. Male+female duet harmonies. Distorted guitars,
sweeping strings, pounding drums—polished, intense.
Bridge: Everything falls away. Dreamy, atmospheric, weightless. Soft pads, distant piano,
female vocal airy and ethereal. Suspended.
Build-up: Rhythmic pulses return. Low strings, tom rolls, rising synths. Guitars re-enter
in bursts. Energy coils toward instrumental break.
Instrumental: Fast guitar solo, virtuoso shred, rapid licks, melodic flourishes.
Symphonic backing, metalcore precision drums. Flashy, intense, climactic.
Full API Reference
Core Endpoints
| Method | Path | Description |
|---|---|---|
| GET | / |
Health check + server statuses |
| GET | /workflows |
List available workflows with types |
| POST | /generate-and-wait |
PRIMARY — submit, wait, download, save. Use this for all generation. |
| POST | /prompt |
Submit workflow, return prompt_id |
| GET | /history/:prompt_id |
Get single prompt result |
| GET | /history |
Aggregated history from all servers |
| GET | /queue |
Aggregated queue (running + pending) |
| GET | /view |
Proxy media file download |
| GET | /system_stats |
First alive server system info |
| GET | /object_info |
Proxy to ComfyUI object_info |
| GET | /extensions |
Proxy to ComfyUI extensions |
Image Generation (legacy, use generate-and-wait instead)
| Method | Path | Description |
|---|---|---|
| GET | /generate |
Get generation options form |
| POST | /generate |
Submit image generation |
| POST | /upload/image |
Upload image to ComfyUI input dir |
Media Management
| Method | Path | Description |
|---|---|---|
| GET | /media-list |
List generated files (name, size, date, preview URLs) |
| POST | /media-link-once |
Create one-time access token for a file |
| GET | /media-once/:token |
Access file via one-time token (no API key needed) |
Workflow Injection
| Method | Path | Description |
|---|---|---|
| POST | /workflow/:name/prompt |
Quick prompt submit for named workflow (auto-injects) |
POST /generate-and-wait — Full Reference
curl -s -X POST http://127.0.0.1:8188/generate-and-wait \
-H "Content-Type: application/json" \
-d '{
"workflow": "acestep-rapcore",
"prompt": "...",
"lyrics": "...",
"duration": 200,
"bpm": 150,
"keyscale": "E minor",
"language": "en",
"seed": -1
}'
Audio params: prompt (required), lyrics, duration, bpm, keyscale, language, seed
Image params: prompt (required), aspect_ratio, seed, steps, cfg
Common: workflow (default: acestep-rapcore), client_id
Success response:
{
"status": "ok",
"file": "/home/genorbox1/.openclaw/workspace/media/comfy/audio/acestep-rapcore_2026-05-19T13-26-54_028.mp3",
"filename": "acestep-rapcore_2026-05-19T13-26-54_028.mp3",
"type": "audio",
"server": "sec",
"workflow": "acestep-rapcore",
"file_size": 5882890
}
Output saved with metadata sidecar (.json) in ~/media/comfy/\x3Caudio|images>/.
Operational Notes
Restart
pm2 restart genor-comfy-gate
pm2 logs genor-comfy-gate --lines 20
Status Check
curl -s http://127.0.0.1:8188/ | python3 -m json.tool
curl -s http://127.0.0.1:8188/queue | python3 -m json.tool
Media Location
~/media/comfy/audio/ — generated MP3 files + .json sidecars
~/media/comfy/images/ — generated PNG files + .json sidecars
Gateway Behavior
- Submits workflow JSON with injected parameters
- Polls
/history/:prompt_idevery 2s until complete/fail/timeout - Timeout: 600s (10 min) per generation
- After completion: waits 3s for file write, then downloads
- Saves to media dir with timestamped name + incrementing sequence number
- Metadata sidecar written alongside media file
Growing Our Knowledge
When we discover new caption patterns, texture word effects, or workflow tricks:
- Update this SKILL.md
- Note the date and what we learned in
CHANGELOG.md(next to this skill)
Lessons Learned
lustify-sdxl — Image Generation Deep Dive
Model: lustifySDXLNSFW_ggwpV7.safetensors — Illustrious-based SDXL checkpoint
Tag system: Danbooru-style tags, NOT natural language
Sampler: LCM, 12 steps, scheduler=exponential, cfg=1
Output: PNG via SaveImage (node 200) + PreviewImage (node 87)
Full Pipeline
CheckpointLoader(43) → LoRA stack(47,80) → Resolution(17) → KSampler(7, 12 steps LCM) →
UltimateSDUpscale(88, 2x, 4x-UltraSharp) →
FaceDetailer NIP(97) → FaceDetailer V(98) → FaceDetailer P(101) →
FaceDetailer face(104, 1024px, 6 steps) → FaceDetailer hands(105, 2048px, 6 steps) →
SeedVR2VideoUpscaler(114, 2048px final) → CRT Post-Process(115) → SaveImage(200)
Active LoRAs (node 80)
| LoRA | Strength | Purpose |
|---|---|---|
| AddMicroDetails v6 | 0.2 | Skin texture, fine details |
| PersonEnhanceV2 ILL | 0.1 | Better anatomy/face |
| TrendCraft Style Detailer v2.4I | 0.1 | Overall polish/detail |
Active LoRAs (node 47)
| LoRA | Strength | Purpose |
|---|---|---|
| DTLVVTT DMD2 V5-LITE | 1.0 | DMD2 distillation (faster/better LCM) |
FaceDetailer Pipeline
Sequential detailers with YOLO detectors:
- NIP (
nipples_yolov8s-seg.pt) — nipple detection, 1024px, denoise 0.4 - V (
nsfw-seg-vagina-x.pt) — vagina detection, 1024px, denoise 0.4 - P (
nsfw-seg-penis-x.pt) — penis detection, 1024px, denoise 0.4 - Face (
Anzhc Face seg 768MS v2 y8n.pt) — face detection, 1024px, 6 steps, denoise 0.4 - Hands (
PitHandDetailer-v2-Test-v9c.pt) — hand detection, 2048px, 6 steps, denoise 0.5
SeedVR2 Upscaler (node 114)
- Model:
seedvr2_ema_7b_sharp-Q4_K_M.gguf(quantized 7B) - VAE:
ema_vae_fp16.safetensors - Final resolution: 2048
- Color correction: lab
CRT Post-Process (node 115)
- Vibrance: +0.015 (subtle saturation boost)
- Vignette: 0.5 strength, 0.7 radius, 2.0 softness
Danbooru Tag Prompting (LUSTIFY)
CRITICAL: LUSTIFY is Illustrious-based — use Danbooru-format tags, NOT natural language descriptions.
Quality/Priority Tags (always include)
masterpiece, best quality, amazing quality, very aesthetic, absurdres
Subject Tags
1girl, solo, cute, petite, pale skin, medium breasts
Clothing/Accessories
gym uniform, white shirt, sports shorts, sneakers, ponytail
Action/Pose (keep it SIMPLE — complex actions confuse the model)
jumping, dynamic pose, looking at viewer
Setting/Light
gym background, afternoon light, dutch angle, from below
Negative Prompt (always)
blurry, worst quality, bad quality, error, melted body, bad anatomy, bad hands, disfigured
What Works
- Character portraits work best — this is a hentai/character model
- Simple dynamic poses (jumping, running, leaning) — YES
- Quality tags first —
masterpiece, best qualityare weighted - POV/camera tags —
dutch angle,from below,from above,close-up - Lighting tags —
sunlight,god rays,afternoon light,backlight - Keep tags under ~25 — more dilutes quality
What Fails
- Natural language descriptions — "mid-jump over a vaulting horse" → model doesn't understand
- Complex multi-object composition — "vaulting horse + girl midair" = garbled anatomy
- "photorealistic" tag — fights the anime/illustrious base, produces uncanny results
- Overloaded action tags — "jumping + spread legs + leaning forward + vaulting horse" = nightmare
- Multiple characters — this workflow is tuned for
1girl, solo
Image Generation Parameters
{
"workflow": "lustify-sdxl",
"prompt": "masterpiece, best quality, 1girl, cute, ...",
"aspect_ratio": "7:9 (Portrait)",
"seed": -1
}
Valid aspect ratios:
1:1 (Square)4:5 (Portrait)7:9 (Portrait)← default, best for single character3:2 (Landscape)16:9 (Landscape)9:16 (Portrait)
Additional optional params: megapixels (default 1.5), steps, cfg, denoise, sampler_name, scheduler
Adding a New Workflow (any modality)
- Export workflow JSON from ComfyUI → save to
workflows/\x3Cname>.json - Add entry to WORKFLOW_INFO in server.js:
'\x3Cname>': { file: '\x3Cname>.json', type: 'audio'|'image'|'video', ext: 'mp3'|'png'|'mp4', promptNode: '94', promptField: 'tags', lyricsNode: '252', lyricsField: 'String', outputNode: '104' } - Restart:
pm2 restart genor-comfy-gate - Test, then document in this SKILL.md
The gateway auto-handles: prompt injection, duration, BPM/keyscale (audio), aspect_ratio (image), seed, polling, download from correct server, save to media dir, metadata sidecar.
Lessons Learned
2026-05-19 — Image Generation
- LUSTIFY is Illustrious-based, uses Danbooru tags — natural language prompts produce garbled results
- Quality tags (
masterpiece, best quality) must come FIRST — they're weighted - Complex action scenes fail — model is trained for character portraits, keep poses simple
- "photorealistic" tag on anime model = uncanny valley, avoid
- Keep prompts under 25 tags — overloading dilutes quality
- Pipeline has SeedVR2 upscaler (7B GGUF) + 5-stage FaceDetailer → 2048px final output
- Face/hand detailers produce excellent close-up quality
2026-05-19 — Audio Generation
- Download 400 bug:
getOutputInfo()function returned undefined filenames despite reading them from history correctly. Fixed by inlining output scanning in the handler. - Load balancer: PRIMARY-first when idle, ALL→SECONDARY when PRIMARY busy (not round-robin).
- Workflow cleanup: Removed duplicate nodes 401, 402. Lyrics now go through node 252 (String) → node 94.
- Caption quality:
raw,gritty,heavy dropscause metallic scraping and flat bass. Usewarm,crisp,punchy,polishedfor clean instruments. - 5-8 tags sweet spot for SFT merge model. More degrades quality.
- 8 dimensions matter: Missing emotion/timbre = flat results. Cover: genre, emotion, instruments, timbre, vocal, production, era, rhythm.
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install genor-comfy-gate - 安装完成后,直接呼叫该 Skill 的名称或使用
/genor-comfy-gate触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Genor-Comfy-Gate 是什么?
Comprehensive multi-modal gateway for ComfyUI enabling audio generation with ACE-Step 1.5 and photorealistic image creation via SDXL workflows. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 31 次。
如何安装 Genor-Comfy-Gate?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install genor-comfy-gate」即可一键安装,无需额外配置。
Genor-Comfy-Gate 是免费的吗?
是的,Genor-Comfy-Gate 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Genor-Comfy-Gate 支持哪些平台?
Genor-Comfy-Gate 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Genor-Comfy-Gate?
由 Krzysztof(@genortg)开发并维护,当前版本 v2.0.0。