← 返回 Skills 市场
sunshinejnjn

Image with ComfyUI

作者 JnJn · GitHub ↗ · v1.4.9 · MIT-0
cross-platform ✓ 安全检测通过
186
总下载
1
收藏
1
当前安装
12
版本数
在 OpenClaw 中安装
/install image-with-comfyui
功能描述
Call a local ComfyUI instance for text-to-image (T2I), image-to-image/edit (I2I), and image-to-video (I2V) generation. Supports Z-Image, SD3.5 Medium, Qwen I...
使用说明 (SKILL.md)

Image with ComfyUI

Call a local ComfyUI server to generate or edit images and videos. Four modes:

  • T2I (Text → Image) → Z-Image or SD3.5 Medium model
  • I2I (Image → Image / Edit) → Qwen Image Edit model
  • I2V (Image → Video) → Wan2.2 model

When to Use

  • User asks to generate images from text
  • User asks to edit an image
  • User asks to generate a video from an image + text
  • User provides a description and wants visual output

Image-First Conversational Pattern (Image-First Mode)

Detection rules:

  1. User sends only an image (no text, no other message in the same turn)
  2. Within 2 minutes, the user sends a text message that looks like an edit or video request (Chinese keywords like: 修一下/换背景/加个特效/变成动画/把颜色改蓝的)
  3. The text intent is I2I (edit the image) or I2V (animate the image)

Action:

  • Route the remembered image + the new text to image_with_comfyui.py i2i or wan2.2 accordingly
  • Use the latest image received as the --image input
  • Use the text as the --prompt
  • If unsure whether I2I or I2V, default to I2I (edit) unless the text clearly says video/animation
  • Do NOT ask the user for the image again — the agent already has the image from the previous turn

Context tracking:

  • Store the latest image media path (or URL) in a variable when no text is received
  • Clear the stored image after it's used (or after 2 minutes of no new text)
  • Only apply this to the immediately preceding message — don't look back further than 2 minutes

Examples:

  • User: [image: a photo of a dog] → Agent: (wait)
  • User: [text: change the background to a beach] → Agent: calls i2i --image \x3Cpath> --prompt "change the background to a beach"
  • User: [image: a cat sitting on a chair] → Agent: (wait)
  • User: [text: make it stand up and walk] → Agent: calls wan2.2 --image \x3Cpath> --prompt "the cat stands up and walks"

Configuration

Read config.json relative to this SKILL's directory. All values can be overridden by environment variables:

Env Variable Overrides Default
COMFYUI_URL comfyui_url http://localhost:8188
COMFYUI_TIMEOUT timeout_seconds 120
COMFYUI_POLL_INTERVAL poll_interval_seconds 3
COMFYUI_OUTPUT_DIR output_dir /tmp/comfyui_output
OPENCLAW_WORKSPACE workspace_root OpenClaw workspace dir

Workflow Files

Mode Workflow Location
T2I (Z-Image) Z-Image T2I workflows/z-image_t2i_api.json
T2I (SD3.5) SD3.5 Medium T2I workflows/sd3.5-med_t2i_api.json
I2I Qwen Image Edit workflows/qwen_image-edit_api.json
I2V Wan2.2 Image-to-Video workflows/wan2.2_i2v_api.json

Error Handling

The system automatically handles two types of errors:

1. Missing Node Detection

When a workflow references a custom node that isn't installed, the system detects it and reports:

  • Which node is missing (class type name)
  • Which package provides it
  • GitHub URL for manual download
  • Manual install instruction (the script does NOT execute git clone; ComfyUI may be on a remote server)

Example: If ImpactKSamplerBasicPipe is missing:

⚠️ Missing node: `ImpactKSamplerBasicPipe`
📦 Package: ComfyUI-Impact-Pack
🔗 GitHub: https://github.com/ltdrdata/ComfyUI-Impact-Pack
ℹ️ Install manually: cd ComfyUI/custom_nodes && git clone https://github.com/ltdrdata/ComfyUI-Impact-Pack

2. Missing Model Substitution

When a workflow references a model file that doesn't exist, the system attempts to find a compatible substitute:

Requested Model Substitute
sd3.5_medium variants sd3.5_large.safetensors
WAN High → Low or vice versa Swap between variants
Other unknown models No substitution (error returned)

Example: If my_custom_sd3_medium_v2.safetensors is missing:

⚠️ Model missing: `my_custom_sd3_medium_v2.safetensors`
🔄 Substituted: `sd3.5_large.safetensors`
📦 Loader: CheckpointLoaderSimple.ckpt_name

After substitution, the workflow is retried automatically with the substitute model.

3. Missing Utility Node Bypass (UnloadAllModels)

When the workflow references UnloadAllModels (a memory cleanup node) which isn't available, the system automatically bypasses it by rerouting the signal path:

  • Removes the missing UnloadAllModels node
  • Redirects the upstream processing node directly to the downstream output node
  • Generation continues without interruption
  • User receives a warning about the bypass

Example:

⚠️ Workflow missing node: `UnloadAllModels` (memory cleanup, non-critical)
🔄 Auto-bypassed — generation continues

Core Rules

  1. Send media attachment: After generating an image or video, you must send it via the message tool using media or filePath.
  2. Send in original session: Always deliver generated media in the user's original request session/thread — do not send to a separate topic, group, or thread unless explicitly told otherwise.
  3. Don't include paths: Unless the user explicitly asks, never send local file paths, ComfyUI URLs, or other address info.

Prompt Formatting

Z-Image (T2I)

Z-Image works best with structured natural language prompts, not keyword spam.

6-part formula:

Subject + Scene + Composition + Lighting + Style + Constraints

Rules:

  • ✅ Use natural language sentences (not comma-separated tags)
  • ✅ Be specific about subject, camera, lighting, style
  • NO negative prompts — Z-Image Turbo ignores them completely
  • ❌ No weighted tags like (word:1.2)

Example:

A young woman with long wavy blonde hair sits at a wooden café table,
steam rising from a ceramic cup. Shot from a 3/4 angle, close-up framing.
Soft morning light filters through sheer curtains, casting warm golden tones.
Cinematic photography, shallow depth of field, Kodak Portra 400 aesthetic.
No text, no logos, photorealistic skin texture.

Aspect ratios: 1:1, 4:3, 3:4, 16:9, 9:16, 3:2, 2:3


SD3.5 Medium (T2I)

SD3.5 Medium uses natural language prompts with optional negative prompts.

Prompt formula:

[Composition/Angle] + [Subject] + [Scene/Environment] + [Lighting/Color] + [Style/Texture] + [Details]

Rules:

  • Complete natural language sentences — describe like telling a human what to see
  • ✅ Subject first (model prioritizes early text)
  • ✅ Be specific about colors, materials, mood, atmosphere
  • ✅ Mixed CN/EN is fine (Chinese works better for Chinese scenes)
  • ✅ Use --negative for elements to exclude
  • ✅ Default 1:1 (1024×1024), use --aspect to change
  • ✅ Default 20 steps, CFG 4.01 (higher = stronger control)
  • ✅ Seed defaults to random; specify --seed for reproducibility
  • ❌ No comma-separated keyword spam (beautiful, amazing, 4k)
  • ❌ No weighted tags (word:1.2) — SD3.5 doesn't recognize them

Parameter recommendations:

  • CFG: 4-7 (4.01 = softer, 5-7 = stronger control)
  • Steps: 20-25 (below 20 may lack detail)
  • Negative prompt: Highly effective in SD3.5

Common negative prompt words:

blurry, low quality, pixelated, grainy,
overexposed, underexposed, flat lighting,
text, watermark, logo, signature, caption,
poorly drawn face, deformed, mutated, disfigured, extra limbs,
cartoonish (when realism is wanted)

Chinese example:

上海魔都春日花海 — 黄浦江畔,大片郁金香、樱花、油菜花盛开,繁花似锦,
春日和煦阳光,远景陆家嘴三件套天际线,湿润的滨江步道倒映花影,
低饱和胶片色调,文艺清新,广角视野

English example:

Cinematic photography, wide-angle shot of a bustling Tokyo street at night,
neon signs reflecting on wet pavement, people with transparent umbrellas,
moody atmospheric lighting, deep blues and vibrant reds, street photography,
shallow depth of field with bokeh background

Qwen Image Edit (I2I) — Concise Prompts

I2I prompts must be concise and direct. Keep the user's original language.

Rules:

  • Positive prompt only — no negative prompts
  • ✅ Use user's exact words (don't translate or expand)
  • ✅ Concise: "换件红色外套", "把背景换成蓝天白云", "将女孩换成男孩"
  • ❌ Don't translate between languages
  • ❌ Don't over-explain or add details

Prompt routing fix (2026-04-22):

  • The Qwen workflow has TWO TextEncodeQwenImageEditPlus nodes:
    • 115:110 — empty negative prompt node
    • 115:111 — positive prompt node (contains default text like "the girl")
  • Script must route prompts to node 111 (positive), not node 110
  • The prepare_i2i_workflow() function auto-detects by scanning for existing default text

Wan2.2 I2V (Image → Video)

Wan2.2 generates short videos (~5 seconds) from a static image + motion description.

Rules:

  • ✅ Prompt describes actions/movement (not scene description)

  • ✅ Write motion description in English for best results

  • ✅ Focus on "who does what" and "how the camera moves"

  • ❌ Don't describe static scene elements in motion prompt

  • Default: 81 frames (~5s @ 16fps), 4 steps, CFG 4.5

  • Base resolution: 560×720 (3:4, fast and OK quality)

  • Auto-detect input image aspect ratio and select reference resolution:

Resolution Reference

Aspect Fast & OK User Fav WAN 2.2 Native
3:4 560×720 720×912 848×1088
2:3 528×768 656×960 784×1136
9:16 480×848 608×1072 720×1264

Other available resolutions:

  • 3:4: 416×544, 672×864, 784×1008
  • 2:3: 384×576, 624×912, 736×1072
  • 9:16: 368×624, 576×1008, 672×1184

Examples:

prompt: "the cat walks forward and looks at the camera, tail wagging"
prompt: "the girl smiles and turns her head, wind blowing her hair"
prompt: "the person stands in a busy street, camera pans left and slowly zooms in, cars driving, red flag fluttering"

CLI Usage

T2I (Text → Image)

# Z-Image (default model)
python3 image_with_comfyui.py t2i \
  --prompt "Your detailed image description" \
  --aspect 16:9 \
  --steps 9

# SD3.5 Medium
python3 image_with_comfyui.py sd35 \
  --prompt "A beautiful sunset over mountains" \
  --aspect 16:9 \
  --negative "text, watermark, blurry" \
  --steps 20 \
  --cfg 5.5

I2I (Edit Image)

python3 image_with_comfyui.py i2i \
  --prompt "Change background to a beach" \
  --image /path/to/source.jpg \
  --steps 4

I2V (Image → Video)

python3 image_with_comfyui.py wan2.2 \
  --prompt "the person walks forward and smiles" \
  --image /path/to/source.jpg \
  --length 81 --steps 4

Health Check

python3 image_with_comfyui.py test

Output Delivery

Send the media attachment directly. Be minimal.

🚨 Delivery Rules

⚠️ Absolutely forbidden: Only writing text descriptions without actually sending files!

✅ Correct approach: Send MEDIA path with the appropriate prefix for the current Channel

| Channel | Format | Example | |---------|--------|---------|| | WhatsApp | MEDIA:./image.jpg | MEDIA:./angel_video.mp4 | | Telegram | MEDIA: or filePath: | Varies by implementation | | Discord | Direct attachment | Varies by implementation |

Rules:

  1. When sending images/videos/audio on WhatsApp, must add MEDIA: prefix + relative path
  2. Place files in ./media/outbound/ directory to ensure accessibility
  3. Don't just write text descriptions like "video attachment" — that does not equal sending a file
  4. First cp the file to ~/.openclaw/media/outbound/, then send via MEDIA:
  5. Other Channels use their corresponding formats (see respective Channel docs)
  6. Default to sending to the requesting session: Generated images/videos must be sent back to the session that initiated the request, unless the user explicitly says "don't send" or "only save locally"
  7. Pure MEDIA line for WhatsApp — no text prefix: The MEDIA: line must be the sole content of the message, with no [[reply_to_current]], text, or anything else before it. Otherwise WhatsApp splits the text and attachment into two separate messages, making it look like "sent twice". If caption text is needed, use MEDIA:./file.ext caption=description format.

Example delivery

WhatsApp:

MEDIA:./output_image.png

Universal:

[📎 Image attachment via MEDIA prefix]

Never replace actual file sending with text descriptions!


Timeout Reference

Model Timeout
T2I (Z-Image) 100s
SD3.5 Medium 100s
I2I (Qwen) 600s
I2V (Wan2.2) 1000s

Additional Notes

  • ComfyUI unreachable → report error with URL tried
  • Generation fails (empty output) → report prompt_id for debugging
  • Missing required node → report which node was not found
  • Timeout → report elapsed time and prompt_id
安全使用建议
This skill appears to be what it claims: a wrapper around a ComfyUI instance. Before installing, verify that COMFYUI_URL points to a host you control (preferably http://localhost:8188) so your uploaded images and prompts aren't sent to an untrusted external server. Check COMFYUI_OUTPUT_DIR and OPENCLAW_WORKSPACE to confirm where generated media and temporary files are stored. Be aware the skill will remember the most recent image for short-lived "image-first" edits (up to ~2 minutes) so the agent can perform edits without re-requesting the image — if you don't want that behavior, don't enable autonomous invocation or remove/adjust that logic. Finally, ensure your local ComfyUI installation (models and custom nodes) is trusted, since the skill will call that service and rely on its nodes/workflows.
功能分析
Type: OpenClaw Skill Name: image-with-comfyui Version: 1.4.9 The skill bundle provides a legitimate interface for an AI agent to interact with a local ComfyUI instance for image and video generation. The core script (image_with_comfyui.py) includes robust security practices, specifically a sanitize_filename function that prevents path traversal attacks by validating filenames received from the ComfyUI API. The instructions in SKILL.md and README.md are focused on operational logic, error handling for missing nodes/models, and specific delivery formats (e.g., for WhatsApp) without any evidence of malicious prompt injection or unauthorized data access.
能力标签
crypto
能力评估
Purpose & Capability
Name/description (ComfyUI T2I/I2I/I2V) aligns with required binaries (python3), env vars (COMFYUI_URL, timeouts, output dir, workspace), and included workflows and code. The NODE_PACKAGE_MAP and workflow files are consistent with ComfyUI usage.
Instruction Scope
SKILL.md instructs only ComfyUI-related actions (calling API, selecting workflows, saving and returning generated media) and to read config.json; it does direct the agent to remember the immediately preceding image for up to 2 minutes (image-first mode), which is within the skill's image-editing purpose. It does not instruct reading unrelated system files or secret stores.
Install Mechanism
No install spec is provided (instruction-only runtime + bundled script), so nothing is downloaded or installed by the skill itself. This minimizes install-time risk. The included code expects an existing ComfyUI installation and does not perform automatic git clones or remote installs.
Credentials
Requested env vars (COMFYUI_URL, COMFYUI_TIMEOUT, COMFYUI_POLL_INTERVAL, COMFYUI_OUTPUT_DIR, OPENCLAW_WORKSPACE) are appropriate for the stated functionality. Note: COMFYUI_URL is user-controlled — if set to a remote host rather than localhost, user media and prompts will be sent to that endpoint. No unrelated credentials or secrets are requested.
Persistence & Privilege
Skill is not always-included and uses normal agent invocation. It stores the recent image context for short-lived image-first behavior (up to 2 minutes) as described; it does not request persistent platform-wide privileges or alter other skills' configurations.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install image-with-comfyui
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /image-with-comfyui 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.4.9
Minor update
v1.4.8
Full English translation of Chinese content; removed internal lesson source notes; updated to 1.4.8
v1.4.7
Fixed print_result_video: local_paths was str list but code called .stat() expecting Path objects; added Path() conversion for reliable video output printing.
v1.4.6
Added rule: deliver generated media in the user's original request session/thread, not in a separate topic.
v1.4.5
### Security - Added sanitize_filename() to prevent path traversal attacks: filenames from the ComfyUI API are now validated against output_dir before being written. Rejects empty names, .., path separators, and any filename that resolves outside the output directory. Falls back to timestamped name if unsafe. - Applied sanitization across all output paths: get_output_images(), get_output_videos(), save_b64_image(), save_b64_video().,
v1.4.4
### Added - Declared required environment variables in SKILL.md metadata (COMFYUI_URL, COMFYUI_TIMEOUT, COMFYUI_POLL_INTERVAL, COMFYUI_OUTPUT_DIR, OPENCLAW_WORKSPACE) to resolve registry warning about undocumented env vars ### Fixed - Fixed typo in ImpactKSamplerAdvancedBasicPipe node name in NODE_PACKAGE_MAP
v1.4.3
### Changed - Removed hardcoded git clone install commands from NODE_PACKAGE_MAP. The script no longer includes auto-executable install commands since ComfyUI may be on a remote server not reachable from this agent. Install commands now displayed as manual instructions in error output. ### Added - OPENCLAW_WORKSPACE environment variable documented in configuration table ### Removed - test_all_workflows.py (removed in 1.4.1)
v1.4.2
### Changed - Updated default ComfyUI URL in documentation from remote server to localhost:8188 to reflect standard local installation
v1.4.1
### Removed - Removed test_all_workflows.py script that contained hardcoded system-specific paths (user media paths, localhost defaults)
v1.4.0
### Added - **wait_for_completion error detection**: Detects ComfyUI status_str: error and returns immediately instead of polling until timeout. Fixes 'No images found' false negatives. - **print_result output**: Shows filename, size, path on successful generation. Previously empty function leaving users uncertain. - **Image-First Conversational Pattern**: Auto-routes image+text within 2 min to Qwen I2I or Wan2.2 I2V. ### Fixed - I2I no longer silently fails on ComfyUI backend errors. Error details now visible.
v1.3.0
1.3.0: output dir auto-creation, I2V auto resolution detection (no more portrait-for-landscape), extended WAN2_2_RES with 16:9/1:1/4:3/3:2 categories
v1.2.0
Full English translation, system output localization, config URL updates, README docs, UnloadAllModels bypass, error handling, t2i/i2i/i2v workflows
元数据
Slug image-with-comfyui
版本 1.4.9
许可证 MIT-0
累计安装 1
当前安装数 1
历史版本数 12
常见问题

Image with ComfyUI 是什么?

Call a local ComfyUI instance for text-to-image (T2I), image-to-image/edit (I2I), and image-to-video (I2V) generation. Supports Z-Image, SD3.5 Medium, Qwen I... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 186 次。

如何安装 Image with ComfyUI?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install image-with-comfyui」即可一键安装,无需额外配置。

Image with ComfyUI 是免费的吗?

是的,Image with ComfyUI 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Image with ComfyUI 支持哪些平台?

Image with ComfyUI 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Image with ComfyUI?

由 JnJn(@sunshinejnjn)开发并维护,当前版本 v1.4.9。

💬 留言讨论