功能描述

Generate images using multiple AI models — Midjourney (via Legnext.ai), Flux, Nano Banana Pro (Gemini), Ideogram, Recraft, and more via fal.ai. Intelligently...

使用说明 (SKILL.md)

Image Generation Skill

Name: Image Gen
Author: wells1137

This skill generates images using the best AI model for each use case. Model selection is the most important decision — read the dispatch logic carefully before generating.

🧠 Intelligent Dispatch Logic

Always select the model based on the user's actual need, not just the request surface.

Decision Tree

Does the request involve MULTIPLE images that share characters, scenes, or story continuity?
  ├─ YES → Use NANO BANANA (Gemini)
  │         Reason: Gemini understands context holistically; supports reference_images
  │         for character/scene consistency across a series (storyboard, comic, sequence)
  │
  └─ NO → Is it a SINGLE standalone image?
            ├─ Artistic / cinematic / painterly / highly detailed?
            │   → Use MIDJOURNEY
            │
            ├─ Photorealistic / portrait / product photo?
            │   → Use FLUX PRO
            │
            ├─ Contains TEXT (logo, poster, sign, infographic)?
            │   → Use IDEOGRAM
            │
            ├─ Vector / icon / flat design / brand asset?
            │   → Use RECRAFT
            │
            ├─ Quick draft / fast iteration (speed priority)?
            │   → Use FLUX SCHNELL (\x3C2s)
            │
            └─ General purpose / balanced?
                → Use FLUX DEV

Model Capability Matrix

Model	ID	Artistic	Photorealism	Text	Context Continuity	Speed	Cost
Midjourney	`midjourney`	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐	❌ (no context)	~30s	~$0.05
Nano Banana Pro	`nano-banana`	⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐⭐	~20s	$0.15
Flux Pro	`flux-pro`	⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐	❌	~5s	~$0.05
Flux Dev	`flux-dev`	⭐⭐⭐	⭐⭐⭐⭐	⭐⭐	❌	~8s	~$0.03
Flux Schnell	`flux-schnell`	⭐⭐	⭐⭐⭐	⭐⭐	❌	\x3C2s	~$0.003
Ideogram v3	`ideogram`	⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐⭐	❌	~10s	~$0.08
Recraft v3	`recraft`	⭐⭐⭐⭐	⭐⭐	⭐⭐⭐⭐	❌	~8s	~$0.04
SDXL Lightning	`sdxl`	⭐⭐⭐	⭐⭐⭐	⭐⭐	❌	~3s	~$0.01

When to Use Nano Banana (Critical)

Use Nano Banana whenever the user's request involves:

Storyboard / 分镜图: Multiple frames that tell a story with the same characters
Comic strip / 漫画: Sequential panels with consistent characters
Character series: Multiple images of the same person/character in different poses or scenes
Scene continuation: "Now show the same girl in the forest" (referencing a previous image)
Style consistency: A set of images that must share the same visual style/world

Nano Banana uses Google's Gemini 3 Pro multimodal architecture, which understands context holistically rather than keyword-matching. It supports up to 14 reference images for maintaining character and scene consistency.

How to Use This Skill

Analyze the request: Is it a single image or a series? Does it need context continuity?
Select model: Use the decision tree above.
Enhance the prompt: Add style, lighting, and quality descriptors appropriate for the model.
Inform the user: Tell them which model you're using and why, and that generation has started.
Run the script: Use exec tool with sufficient timeout.
Deliver the result: Send image URL(s) to the user.

Calling the Generation Script

node {baseDir}/generate.js \
  --model \x3Cmodel_id> \
  --prompt "\x3Cenhanced prompt>" \
  [--aspect-ratio \x3Cratio>] \
  [--num-images \x3C1-4>] \
  [--negative-prompt "\x3Cnegative prompt>"] \
  [--reference-images "\x3Curl1,url2,...>"]

Parameters:

--model: One of midjourney, flux-pro, flux-dev, flux-schnell, sdxl, nano-banana, ideogram, recraft
--prompt: The image generation prompt (required)
--aspect-ratio: e.g. 16:9, 1:1, 9:16, 4:3, 3:4 (default: 1:1)
--num-images: 1-4 (default: 1; Midjourney always returns 4 regardless)
--negative-prompt: Things to avoid (not supported by Midjourney)
--reference-images: Comma-separated image URLs for context/character consistency (Nano Banana only)
--mode: Midjourney speed: turbo (default, ~20-40s), fast (~30-60s), relax (free but slow)

exec timeout: Set at least 120 seconds for Midjourney and Nano Banana; 30 seconds is sufficient for Flux Schnell.

⚡ Midjourney Workflow (Sync Mode — No --async)

Always use sync mode (no --async). The script waits internally until complete.

node {baseDir}/generate.js \
  --model midjourney \
  --prompt "\x3Cenhanced prompt>" \
  --aspect-ratio 16:9

Understanding Midjourney Output

{
  "success": true,
  "model": "midjourney",
  "jobId": "xxxxxxxx-...",
  "imageUrl": "https://cdn.legnext.ai/temp/....png",
  "imageUrls": [
    "https://cdn.legnext.ai/mj/xxxx_0.png",
    "https://cdn.legnext.ai/mj/xxxx_1.png",
    "https://cdn.legnext.ai/mj/xxxx_2.png",
    "https://cdn.legnext.ai/mj/xxxx_3.png"
  ]
}

CRITICAL — image field meanings:

Field	What it is	When to use
`imageUrl`	A 2×2 grid composite of all 4 images	Send as preview so user can see all options
`imageUrls[0]`	Image 1 (top-left)	Send when user wants image 1
`imageUrls[1]`	Image 2 (top-right)	Send when user wants image 2
`imageUrls[2]`	Image 3 (bottom-left)	Send when user wants image 3
`imageUrls[3]`	Image 4 (bottom-right)	Send when user wants image 4

"放大第N张" / "要第N张" / "give me image N" = send imageUrls[N-1] directly. Do NOT call generate.js again.

Midjourney Interaction Flow

After generation:

🎨 生成完成！这是 4 张图的预览：预览图你喜欢哪一张？回复 1、2、3 或 4，我直接发给你高清单图。

When user picks image N:

这是第 N 张的单独高清图：图片 N

🤖 Nano Banana (Gemini) Workflow

Use for storyboards, character series, and any context-dependent multi-image generation.

Single image (no reference)

node {baseDir}/generate.js \
  --model nano-banana \
  --prompt "\x3Cdetailed scene description>" \
  --aspect-ratio 16:9

With reference images (character/scene consistency)

node {baseDir}/generate.js \
  --model nano-banana \
  --prompt "\x3Cscene description, referencing the character/style from the reference images>" \
  --aspect-ratio 16:9 \
  --reference-images "https://url-of-previous-image-1.png,https://url-of-previous-image-2.png"

How to build a storyboard series:

Generate the first frame without reference images (establishes the character/scene)
Use the first frame's URL as --reference-images for the second frame
For subsequent frames, use the most recent 1-3 images as references to maintain consistency
Keep the character description consistent across all prompts

Example storyboard workflow:

Frame 1: node generate.js --model nano-banana --prompt "A young girl with red hair, wearing a blue dress, sitting under a magical treehouse in an enchanted forest, warm golden light, storybook illustration style" --aspect-ratio 16:9

Frame 2: node generate.js --model nano-banana --prompt "The same red-haired girl in blue dress climbing the rope ladder up to the treehouse, excited expression, enchanted forest background, same storybook illustration style" --aspect-ratio 16:9 --reference-images "\x3Cframe1_url>"

Frame 3: node generate.js --model nano-banana --prompt "Inside the magical treehouse, the red-haired girl discovers a glowing book on a wooden shelf, wonder on her face, warm candlelight, same storybook illustration style" --aspect-ratio 16:9 --reference-images "\x3Cframe1_url>,\x3Cframe2_url>"

Nano Banana Output

{
  "success": true,
  "model": "nano-banana",
  "images": ["https://v3b.fal.media/files/...png"],
  "imageUrl": "https://v3b.fal.media/files/...png"
}

Send imageUrl directly to the user (no grid, single image).

Other Models

Flux Pro / Dev / Schnell

Best for photorealistic standalone images. Output format same as Nano Banana (single imageUrl).

node {baseDir}/generate.js --model flux-pro --prompt "\x3Cprompt>" --aspect-ratio 16:9

Ideogram v3

Best for images containing text (logos, posters, signs).

node {baseDir}/generate.js --model ideogram --prompt "A motivational poster with text 'DREAM BIG' in bold typography, sunset gradient background" --aspect-ratio 3:4

Recraft v3

Best for vector-style, icons, flat design.

node {baseDir}/generate.js --model recraft --prompt "A minimal flat design app icon, blue gradient, abstract geometric shape" --aspect-ratio 1:1

Prompt Enhancement Tips

For Midjourney: Add cinematic lighting, ultra detailed, --v 7, --style raw. Legnext supports all MJ parameters.

For Nano Banana: Use natural language descriptions. Describe the character consistently across frames (hair color, clothing, expression). Mention "same style as reference" or "consistent with previous frame".

For Flux: Add masterpiece, highly detailed, sharp focus, professional photography, 8k.

For Ideogram: Be explicit about text content, font style, layout, and color scheme.

For Recraft: Specify vector illustration, flat design, icon style, minimal.

Example Conversations

User: "帮我画一只赛博朋克猫" → Single artistic image → Midjourney → Tell user "🎨 正在用 Midjourney 生成，约 30 秒..." → Send grid preview, ask which one they want

User: "帮我生成一套分镜图，讲述一个女孩在魔法森林的冒险" → Multiple frames with story continuity → Nano Banana → Tell user "🎨 这类有上下文关联的分镜图用 Gemini 生成，能保持角色一致性..." → Generate frame by frame, using previous frames as reference images

User: "要第2张" / "放大第2张" (after Midjourney generation) → Send imageUrls[1] directly. No need to call generate.js again.

User: "做一个 App 图标，蓝色系扁平风格" → Vector/icon → Recraft

User: "生成一张带有'欢迎光临'文字的门牌图" → Text in image → Ideogram

User: "快速生成个草稿看看效果" → Speed priority → Flux Schnell (\x3C2s)

User: "生成一张产品海报，白色背景，一瓶香水" → Photorealistic product → Flux Pro

Environment Variables

Variable	Description
`FAL_KEY`	fal.ai API key (for Flux, Nano Banana, Ideogram, Recraft)
`LEGNEXT_KEY`	Legnext.ai API key (for Midjourney)

安全使用建议

This skill appears internally consistent, but review and consider the following before installing: 1) It will send your prompts and any reference image URLs to fal.ai and api.legnext.ai — do not include private or sensitive images or secrets in prompts. 2) You must supply FAL_KEY and LEGNEXT_KEY; store them securely and use least-privilege keys if possible. 3) The installer pulls @fal-ai/client from npm (normal npm risk); consider auditing that dependency or running in a sandbox. 4) Ensure Node (>=18) is available where the skill will run. 5) Be aware of usage costs and rate limits for fal.ai and Legnext/Midjourney. 6) If you require strict data residency or audit trails, verify how generated images and prompts are stored/served (e.g., Legnext CDN URLs shown in responses). Overall, the skill matches its stated purpose but treat external API keys and uploaded reference images as sensitive.

功能分析

Type: OpenClaw Skill Name: image-gen Version: 2.0.1 The image-gen skill is a legitimate tool for generating images via fal.ai and Legnext.ai (Midjourney). The code in generate.js correctly handles API authentication using environment variables and implements standard logic for image generation, upscaling, and polling. The SKILL.md provides helpful, task-aligned instructions for the AI agent without any signs of prompt injection or malicious redirection.

能力评估

✓ Purpose & Capability

Name/description claim multi-model image generation via fal.ai and Midjourney (Legnext). The skill requires FAL_KEY and LEGNEXT_KEY which directly correspond to fal.ai and Legnext usage. The included generate.js implements calls to fal.ai (via @fal-ai/client) and to api.legnext.ai; no unrelated services or credentials are requested.

✓ Instruction Scope

SKILL.md instructs the agent to analyze the request, pick a model, enhance the prompt, and run the provided node script with sufficient timeout. The runtime instructions map directly to generate.js behavior. The instructions do not ask the agent to read unrelated files, arbitrary environment variables, or exfiltrate data beyond sending prompts/reference image URLs to the stated external services.

✓ Install Mechanism

Install spec is a single npm dependency: @fal-ai/client. package-lock.json shows the package resolved from the public npm registry. No downloads from personal servers, URL shorteners, or extracted archives are used.

✓ Credentials

Only two environment variables are required: FAL_KEY (primary) and LEGNEXT_KEY. Both are justified by the code: FAL_KEY used by @fal-ai/client and LEGNEXT_KEY used in HTTPS requests to api.legnext.ai. There are no extraneous secret requests or config paths.

✓ Persistence & Privilege

The skill is not always-enabled, does not request special system paths, and does not modify other skills' configs. It runs as a user-invoked script (node generate.js). Autonomous model invocation is allowed by default (platform normal) but the skill itself does not demand permanent presence or escalated privileges.

版本历史

v2.0.1

Minor updates

v2.0.0

v2.0.0: Simplified SKILL.md — removed complex async/poll instructions (script handles it internally). Added 'Use when' trigger, Speed column in model table, and unified output format across all models.

v1.5.0

Added image upscaling (AuraSR 4x fast + Creative Upscaler), image cropping (pixel-based and ratio-based), sharp dependency, and comprehensive documentation updates.

v1.4.0

Added Seedream 4.5 (ByteDance) support via fal.ai — native 2K-4K resolution, bilingual prompts, text rendering in images. New model ID: seedream.

v1.3.1

Fix aspect ratio bug, action-without-job-id bug, HTTP status handling, Recraft style support, fal.ai timeout protection

v1.3.0

Security hardening: added strict input validation, prompt length limits, security transparency documentation. Fixed outdated TTAPI references in CONTRIBUTING.md.

v1.2.0

Enable Midjourney Turbo mode by default (--turbo). Reduces generation time from ~30-60s to ~10-20s. Add --mode param: turbo (default) | fast | relax.

v1.1.0

Switch Midjourney provider from TTAPI to Legnext.ai for faster speed and higher stability. Rename TTAPI_KEY to LEGNEXT_KEY. Add upscale-type, variation-type params. Add reroll and describe actions.

v1.0.0

Initial release: unified image generation skill supporting Midjourney (TTAPI), Flux Pro/Dev/Schnell, SDXL Lightning, Nano Banana Pro, Ideogram v3, and Recraft v3 via fal.ai.

元数据

Slug image-gen

版本 2.0.1

许可证 MIT-0

累计安装 46

当前安装数 43

历史版本数 9

常见问题

Image Gen 是什么？

Generate images using multiple AI models — Midjourney (via Legnext.ai), Flux, Nano Banana Pro (Gemini), Ideogram, Recraft, and more via fal.ai. Intelligently... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 3639 次。

如何安装 Image Gen？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install image-gen」即可一键安装，无需额外配置。

Image Gen 是免费的吗？

是的，Image Gen 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Image Gen 支持哪些平台？

Image Gen 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Image Gen？

由 Wells Wu（@wells1137）开发并维护，当前版本 v2.0.1。

Image Gen