功能描述

Agent skill for running registered ComfyUI workflows through a stable CLI. Supports image, video, music, and speech generation on a local or trusted self-hos...

使用说明 (SKILL.md)

\r \r

comfyui-agent-skill-mie\r

Name: comfyui-agent-skill-mie
Author: miemieeeee

\r

Purpose\r

\r Run registered ComfyUI workflows through a stable Agent-facing CLI, with prompt enhancement, fail-fast errors, and structured JSON results.\r \r \r Use this skill when the user asks to:\r \r

Generate an image from text.\r
Generate a new image inspired by a reference image.\r
Edit an input image while preserving some structure or subject details.\r
Generate text-to-video or image-to-video MP4 output.\r
Generate music / instrumental / song-style MP3 output.\r
Synthesize spoken voice audio with Qwen3-TTS.\r
Check whether a ComfyUI server is available.\r \r Do not use this skill when the user only wants prompt writing, brainstorming, or discussion without actual generation. Do not use it when the ComfyUI server is unavailable.\r \r

Hard Rules\r

\r

Source mode: run CLI commands from the skill root (the directory containing SKILL.md and scripts/).\r
Tool-install mode: comfyui-agent-skill-mie / comfyui-skill can be run from any directory.\r
Source mode: use uv run --no-sync python -m comfyui (or uv run --no-sync comfyui-skill) for runtime calls.\r
Tool-install mode: use comfyui-skill (or comfyui-agent-skill-mie) directly; do not wrap with uv run.\r
Use registered workflows only. Do not run arbitrary unreviewed ComfyUI workflow JSON.\r
If server health fails, stop generation and return/handle SERVER_UNAVAILABLE; do not search disk for ComfyUI installs or guess ports.\r
Do not create or edit config.local.json unless the user explicitly wants a persistent server URL. For one-off runs, use --server or COMFYUI_URL.\r
For reference_to_image, inspect the reference image with Agent vision and create a prompt. Do not upload that reference image to ComfyUI.\r
For image_to_image and image_to_video, upload the provided local image with --image.\r
Analyzer-generated workflow configs require human review before activation.\r \r

Setup\r

\r Recommended install (tool-install mode):\r \r

pipx install comfyui-agent-skill-mie\r
```\r
\r
- Install package: `comfyui-agent-skill-mie`\r
- Main command: `comfyui-agent-skill-mie`\r
- Short alias: `comfyui-skill`\r
\r
Prerequisites:\r
\r
- ComfyUI server with `GET /system_stats` available.\r
- Python 3.10+.\r
- Source mode only: `uv`.\r
- Required ComfyUI models/custom nodes for the selected workflow.\r
\r
Networking note:\r
\r
- Default local examples use `http://127.0.0.1:8188` for same-environment setups.\r
- If the agent runs inside WSL/container/sandbox while ComfyUI runs on the host OS, `127.0.0.1` may refer to the runtime itself. Try `--server http://localhost:8188` or the host machine IP (and optionally persist it via `save-server`).\r
\r
Initial setup from the skill root:\r
\r
```bash\r
uv sync\r
uv run --no-sync python -m comfyui --help\r
```\r
\r
Tool-install mode:\r
\r
```bash\r
comfyui-agent-skill-mie --help\r
comfyui-agent-skill-mie check\r
comfyui-skill --help\r
comfyui-skill check\r
```\r
\r
## Quick Workflow Choice\r
\r
Minimal decision tree:\r
\r
- User gives text only → `generate -p "..."` (defaults to `z_image_turbo`)\r
- User gives a reference image and wants a new similar image → vision → `reference_to_image` prompt → run `z_image_turbo`\r
- User gives an input image and wants edits → `generate --workflow klein_edit --image input_image=... -p "..."`\r
- User wants TTS / voice audio → `generate --workflow qwen3_tts --speech-text "..." --instruct "..."`\r
- User wants video → `ltx_23_t2v_distill` (text→video) or `ltx_23_i2v_distilled` (image→video)\r
\r
| User intent | Workflow / mode | Required command shape |\r
|-------------|-----------------|------------------------|\r
| Text to image | `z_image_turbo` default | `generate -p "prompt"` |\r
| Qwen Image 2512 | `qwen_image_2512_4step` | `generate --workflow qwen_image_2512_4step -p "prompt"` |\r
| Similar image from reference | Agent vision + T2I | Read reference image, create English prompt, then T2I |\r
| Edit image | `klein_edit` | `generate --workflow klein_edit --image input_image=photo.png -p "edit prompt"` |\r
| Text to video | `ltx_23_t2v_distill` | `generate --workflow ltx_23_t2v_distill -p "shot prompt"` |\r
| Image to video | `ltx_23_i2v_distilled` | `generate --workflow ltx_23_i2v_distilled --image input_image=photo.png -p "motion prompt"` |\r
| Text to music | `ace_step_15_music` | `generate --workflow ace_step_15_music -p "music tags"` |\r
| Text to speech | `qwen3_tts` | `generate --workflow qwen3_tts --speech-text "..." --instruct "..."` |\r
\r
For workflow-specific size rules, capability boundaries, and examples, read [references/workflows.md](references/workflows.md).\r
\r
## Core Commands\r
\r
Environment doctor (check server + preflight registered workflows):\r
\r
Tool-install mode:\r
\r
```bash\r
comfyui-skill doctor\r
```\r
\r
Source mode:\r
\r
```bash\r
uv run --no-sync python -m comfyui doctor\r
```\r
\r
If it exits with code 0, the environment is ready for all checked workflows. Exit code 1 means missing nodes/models or server is unreachable (see JSON payload).\r
\r
Health check:\r
\r
```bash\r
uv run --no-sync python -m comfyui check\r
```\r
\r
Tool-install mode:\r
\r
```bash\r
comfyui-skill check\r
```\r
\r
Generate an image:\r
\r
```bash\r
uv run --no-sync python -m comfyui generate -p "a cute cat sitting on a windowsill at golden hour"\r
```\r
\r
Tool-install mode:\r
\r
```bash\r
comfyui-skill generate -p "a cute cat sitting on a windowsill at golden hour"\r
```\r
\r
Generate with a specific workflow and server:\r
\r
```bash\r
uv run --no-sync python -m comfyui generate --workflow z_image_turbo --server http://192.168.1.100:8188 -p "a landscape"\r
```\r
\r
Save a persistent server URL only when the user asks for it:\r
\r
```bash\r
uv run --no-sync python -m comfyui save-server http://192.168.1.100:8188\r
```\r
\r
Preflight a workflow before a long run:\r
\r
```bash\r
uv run --no-sync python -m comfyui generate --workflow qwen_image_2512_4step --preflight\r
```\r
\r
Show progress for long jobs:\r
\r
```bash\r
uv run --no-sync python -m comfyui generate --workflow ltx_23_t2v_distill -p "cinematic waves at sunset, slow pan" --progress\r
```\r
\r
Full CLI options, output path behavior, async submit/poll, and error code details are in [references/cli.md](references/cli.md).\r
\r
## Prompt Enhancement\r
\r
Before generation, convert the user's intent into the right workflow inputs.\r
\r
| Type | Read this file | Use when |\r
|------|----------------|----------|\r
| `character` | [references/prompt_enhancement/character.md](references/prompt_enhancement/character.md) | Portrait, person, character, figure photo |\r
| `reference_to_image` | [references/prompt_enhancement/reference_to_image.md](references/prompt_enhancement/reference_to_image.md) | User provides a reference image and wants a new similar image |\r
| `image_to_image` | [references/prompt_enhancement/image_to_image.md](references/prompt_enhancement/image_to_image.md) | User provides an input image and wants to edit it |\r
| `text_to_speech` | [references/prompt_enhancement/text_to_speech.md](references/prompt_enhancement/text_to_speech.md) | User gives a short voice description and needs full Qwen3-TTS instruction |\r
\r
Reference-to-image flow:\r
\r
1. Ensure a usable reference image exists; otherwise return Agent error `NO_REFERENCE_IMAGE`.\r
2. Ensure this runtime can inspect images; otherwise return Agent error `VISION_UNAVAILABLE`.\r
3. Read the reference prompt enhancement file and create one English prompt.\r
4. Call T2I generation with that prompt. Do not pass the reference image to ComfyUI.\r
\r
Image-to-image flow:\r
\r
1. Ensure a local image path is available.\r
2. Read the image edit prompt enhancement file.\r
3. Call `klein_edit` with `--image input_image=path`.\r
\r
Text-to-speech flow:\r
\r
1. Split user intent into spoken content and voice/style instruction.\r
2. Read the TTS prompt enhancement file.\r
3. Call `qwen3_tts` with `--speech-text` and `--instruct`; do not use positional prompt.\r
\r
## Fail-Fast and Recovery\r
\r
CLI failures are structured JSON on stdout. Agent-only pre-check failures for reference images should also be JSON.\r
\r
Agent-only error shape:\r
\r
```json\r
{\r
  "source": "agent",\r
  "success": false,\r
  "error": {"code": "VISION_UNAVAILABLE", "message": "Cannot read reference image in this runtime."}\r
}\r
```\r
\r
Required fail-fast behavior:\r
\r
- Missing prompt: return/handle `EMPTY_PROMPT`.\r
- Unregistered workflow: return/handle `WORKFLOW_NOT_REGISTERED`.\r
- Server unavailable: return/handle `SERVER_UNAVAILABLE` and ask whether ComfyUI is running locally or on another machine.\r
- Missing reference image before `reference_to_image`: return Agent error `NO_REFERENCE_IMAGE`; do not call CLI.\r
- No vision for `reference_to_image`: return Agent error `VISION_UNAVAILABLE`; do not call CLI.\r
- Missing image for image workflows: return/handle `NO_INPUT_IMAGE` or `INPUT_IMAGE_NOT_FOUND`.\r
- Missing custom nodes/models during preflight: return/handle `PREFLIGHT_MISSING_NODES` or `PREFLIGHT_MISSING_MODELS`.\r
\r
When the user provides a remote ComfyUI address, save it only if they want persistence:\r
\r
```bash\r
uv run --no-sync python -m comfyui save-server http://\x3Caddress>:\x3Cport>\r
```\r
\r
Otherwise retry the original command with `--server http://\x3Caddress>:\x3Cport>`.\r
\r
## Output Handling\r
\r
After successful generation, present the result to the user. Do not silently parse JSON and stop.\r
\r
- For images, display the file when the runtime supports local image display; otherwise provide the absolute/local path from `outputs[].path`.\r
- For MP3/MP4, provide the path or use the runtime's media display/playback capability when available.\r
- Prefer omitting `--output`; the CLI writes to a per-job directory under `results/` and returns exact paths in JSON.\r
- For `--count > 1`, parse the wrapper object and present each result.\r
\r
See [references/cli.md](references/cli.md) for JSON schemas and output directory rules.\r
\r
## References\r
\r
- [references/workflows.md](references/workflows.md) — workflow selection, capabilities, size rules, examples.\r
- [references/cli.md](references/cli.md) — CLI contract, async jobs, output paths, JSON schemas, error codes.\r
- `references/prompt_enhancement/` — prompt enhancement instructions.\r

安全使用建议

Before installing, verify the package source, use only a local or trusted ComfyUI server, avoid unreviewed workflow JSON, and only persist a server URL or choose output directories when you explicitly want that behavior.

功能分析

Type: OpenClaw Skill Name: comfyui-agent-skill-mie Version: 1.0.0 The comfyui-agent-skill-mie bundle is a well-structured tool designed to allow an AI agent to interact with a ComfyUI server for image, video, and audio generation. The code implements a robust 'registered workflow' system, which prevents the execution of arbitrary, unreviewed ComfyUI graphs, significantly reducing the risk of server-side exploitation. It includes comprehensive 'preflight' and 'doctor' checks (scripts/comfyui/preflight.py, scripts/comfyui/cli_admin.py) to validate server capabilities before submission. The SKILL.md instructions are defensive, explicitly forbidding the agent from guessing ports or running unreviewed JSON. No evidence of data exfiltration, malicious persistence, or unauthorized remote execution was found.

能力标签

crypto

能力评估

ℹ Purpose & Capability

The stated purpose, packaged workflows, and CLI instructions align around generating images, video, music, and speech through registered ComfyUI workflows; the sensitive part is that user prompts and selected image inputs may be sent to the configured ComfyUI server.

✓ Instruction Scope

The instructions include clear containment rules: use registered workflows only, stop when server health fails, avoid arbitrary workflow JSON, and require human review before activating analyzer-generated workflow configs.

ℹ Install Mechanism

The registry says there is no install spec, while SKILL.md recommends user-directed pipx/uv installation of a CLI package. This is purpose-aligned, but users should verify the package/provenance before installing executable code.

ℹ Credentials

Network access to a local or trusted self-hosted ComfyUI endpoint, local media uploads, and output file writes are proportionate to the generation purpose, provided the server and output paths are user-approved.

ℹ Persistence & Privilege

The skill supports persistent server configuration only when explicitly requested and stores generated outputs/jobs locally; no credential or account authority is declared.

版本历史

v1.0.0

comfyui-agent-skill-mie 1.0.0 - Initial release of the skill for running registered ComfyUI workflows via a stable CLI. - Supports image, video, music, and speech generation using a local or trusted self-hosted ComfyUI server. - Includes prompt enhancement, fail-fast error handling, and structured JSON results. - Enforces strict usage of registered workflows only; arbitrary workflow JSON is not supported. - Provides clear hard rules, recommended install steps, core commands, and workflow selection guidance in documentation.

元数据

Slug comfyui-agent-skill-mie

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

comfyui-agent-skill-mie 是什么？

Agent skill for running registered ComfyUI workflows through a stable CLI. Supports image, video, music, and speech generation on a local or trusted self-hos... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 36 次。

如何安装 comfyui-agent-skill-mie？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install comfyui-agent-skill-mie」即可一键安装，无需额外配置。

comfyui-agent-skill-mie 是免费的吗？

是的，comfyui-agent-skill-mie 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

comfyui-agent-skill-mie 支持哪些平台？

comfyui-agent-skill-mie 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 comfyui-agent-skill-mie？

由黎黎原上咩（@miemieeeee）开发并维护，当前版本 v1.0.0。

comfyui-agent-skill-mie