Description

AI image generation with OpenAI GPT Image 2, Azure OpenAI, Google, OpenRouter, DashScope, Z.AI GLM-Image, MiniMax, Jimeng, Seedream and Replicate APIs. Suppo...

README (SKILL.md)

Image Generation (AI SDK)

Name: Baoyu Imagine
Author: jimliu

Official API-based image generation. Supports OpenAI GPT Image 2, Azure OpenAI, Google, OpenRouter, DashScope (阿里通义万象), Z.AI GLM-Image, MiniMax, Jimeng (即梦), Seedream (豆包) and Replicate.

User Input Tools

When this skill prompts the user, follow this tool-selection rule (priority order):

Prefer built-in user-input tools exposed by the current agent runtime — e.g., AskUserQuestion, request_user_input, clarify, ask_user, or any equivalent.
Fallback: if no such tool exists, emit a numbered plain-text message and ask the user to reply with the chosen number/answer for each question.
Batching: if the tool supports multiple questions per call, combine all applicable questions into a single call; if only single-question, ask them one at a time in priority order.

Concrete AskUserQuestion references below are examples — substitute the local equivalent in other runtimes.

Script Directory

{baseDir} = this SKILL.md's directory. Main script: {baseDir}/scripts/main.ts. Resolve ${BUN_X}: prefer bun; else npx -y bun; else suggest brew install oven-sh/bun/bun.

Step 0: Load Preferences ⛔ BLOCKING

This step MUST complete before any image generation — generation is blocked until EXTEND.md exists.

Check these paths in order; first hit wins:

Path	Scope
`.baoyu-skills/baoyu-imagine/EXTEND.md`	Project
`${XDG_CONFIG_HOME:-$HOME/.config}/baoyu-skills/baoyu-imagine/EXTEND.md`	XDG
`$HOME/.baoyu-skills/baoyu-imagine/EXTEND.md`	User home

Found → load, parse, apply. If default_model.[provider] is null → ask model only.
Not found → run first-time setup (references/config/first-time-setup.md) using AskUserQuestion to collect provider + model + quality + save location. Save EXTEND.md, then continue. Do not generate images before this completes.

Legacy compatibility: if .baoyu-skills/baoyu-image-gen/EXTEND.md exists and the new path doesn't, the runtime renames it to baoyu-imagine. If both exist, the runtime leaves them alone and uses the new path.

EXTEND.md keys: default provider, default quality, default aspect ratio, default image size, OpenAI image API dialect, default models, batch worker cap, provider-specific batch limits. Schema: references/config/preferences-schema.md.

Usage

Minimum working examples — see references/usage-examples.md for the full set including per-provider invocations and batch mode.

# Basic
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image cat.png

# With aspect ratio and high quality
${BUN_X} {baseDir}/scripts/main.ts --prompt "A landscape" --image out.png --ar 16:9 --quality 2k

# Prompt from files
${BUN_X} {baseDir}/scripts/main.ts --promptfiles system.md content.md --image out.png

# With reference image
${BUN_X} {baseDir}/scripts/main.ts --prompt "Make blue" --image out.png --ref source.png

# Specific provider
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider dashscope --model qwen-image-2.0-pro

# OpenAI GPT Image 2
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider openai --model gpt-image-2

# Batch mode
${BUN_X} {baseDir}/scripts/main.ts --batchfile batch.json --jobs 4

Options

Option	Description
`--prompt \x3Ctext>`, `-p`	Prompt text
`--promptfiles \x3Cfiles...>`	Read prompt from files (concatenated)
`--image \x3Cpath>`	Output image path (required in single-image mode)
`--batchfile \x3Cpath>`	JSON batch file for multi-image generation
`--jobs \x3Ccount>`	Worker count for batch mode (default: auto, max from config, built-in default 10)
`--provider google\|openai\|azure\|openrouter\|dashscope\|zai\|minimax\|jimeng\|seedream\|replicate`	Force provider (default: auto-detect)
`--model \x3Cid>`, `-m`	Model ID — see provider references for defaults and allowed values
`--ar \x3Cratio>`	Aspect ratio (`16:9`, `1:1`, `4:3`, …)
`--size \x3CWxH>`	Explicit size (e.g., `1024x1024`; for `gpt-image-2`, width/height must be multiples of 16, max edge 3840px, ratio no wider than 3:1)
`--quality normal\|2k`	Quality preset (default: `2k`)
`--imageSize 1K\|2K\|4K`	Image size for Google/OpenRouter (default: from quality)
`--imageApiDialect openai-native\|ratio-metadata`	OpenAI-compatible endpoint dialect — use `ratio-metadata` for gateways that expect aspect-ratio `size` plus `metadata.resolution`
`--ref \x3Cfiles...>`	Reference images. Supported by Google multimodal, OpenAI GPT Image edits, Azure OpenAI edits (PNG/JPG only), OpenRouter multimodal models, Replicate supported families, MiniMax subject-reference, Seedream 5.0/4.5/4.0. Not supported by Jimeng, Seedream 3.0, SeedEdit 3.0
`--n \x3Ccount>`	Number of images. Replicate requires `--n 1` (single-output save semantics)
`--json`	JSON output

Environment Variables

Variable	Description
`OPENAI_API_KEY`	OpenAI API key
`AZURE_OPENAI_API_KEY`	Azure OpenAI API key
`OPENROUTER_API_KEY`	OpenRouter API key
`GOOGLE_API_KEY`	Google API key
`DASHSCOPE_API_KEY`	DashScope API key
`ZAI_API_KEY` (alias `BIGMODEL_API_KEY`)	Z.AI API key
`MINIMAX_API_KEY`	MiniMax API key
`REPLICATE_API_TOKEN`	Replicate API token
`JIMENG_ACCESS_KEY_ID`, `JIMENG_SECRET_ACCESS_KEY`	Jimeng (即梦) Volcengine credentials
`ARK_API_KEY`	Seedream (豆包) Volcengine ARK API key
`\x3CPROVIDER>_IMAGE_MODEL`	Per-provider model override (`OPENAI_IMAGE_MODEL`, `GOOGLE_IMAGE_MODEL`, `DASHSCOPE_IMAGE_MODEL`, `ZAI_IMAGE_MODEL`/`BIGMODEL_IMAGE_MODEL`, `MINIMAX_IMAGE_MODEL`, `OPENROUTER_IMAGE_MODEL`, `REPLICATE_IMAGE_MODEL`, `JIMENG_IMAGE_MODEL`, `SEEDREAM_IMAGE_MODEL`)
`AZURE_OPENAI_DEPLOYMENT` (alias `AZURE_OPENAI_IMAGE_MODEL`)	Azure default deployment
`\x3CPROVIDER>_BASE_URL`	Per-provider endpoint override
`AZURE_API_VERSION`	Azure image API version (default `2025-04-01-preview`)
`JIMENG_REGION`	Jimeng region (default `cn-north-1`)
`OPENAI_IMAGE_API_DIALECT`	`openai-native` \| `ratio-metadata`
`OPENROUTER_HTTP_REFERER`, `OPENROUTER_TITLE`	Optional OpenRouter attribution
`BAOYU_IMAGE_GEN_MAX_WORKERS`	Override batch worker cap
`BAOYU_IMAGE_GEN_\x3CPROVIDER>_CONCURRENCY`	Per-provider concurrency (e.g., `BAOYU_IMAGE_GEN_REPLICATE_CONCURRENCY`)
`BAOYU_IMAGE_GEN_\x3CPROVIDER>_START_INTERVAL_MS`	Per-provider start-gap

Load priority: CLI args > EXTEND.md > env vars > \x3Ccwd>/.baoyu-skills/.env > ~/.baoyu-skills/.env

Model Resolution

Priority (highest → lowest) applies to every provider:

CLI flag --model \x3Cid>
EXTEND.md default_model.[provider]
Env var \x3CPROVIDER>_IMAGE_MODEL
Built-in default

For OpenAI, the built-in default is gpt-image-2. gpt-image-1.5, gpt-image-1, and GPT Image snapshots remain selectable with --model or OPENAI_IMAGE_MODEL.

For Azure, --model / default_model.azure is the Azure deployment name. AZURE_OPENAI_DEPLOYMENT is the preferred env var; AZURE_OPENAI_IMAGE_MODEL is kept as a backward-compatible alias. If your Azure deployment is named after the underlying model, use gpt-image-2; otherwise use the exact custom deployment name.

EXTEND.md overrides env vars: if EXTEND.md sets default_model.google: "gemini-3-pro-image-preview" and the env var sets GOOGLE_IMAGE_MODEL=gemini-3.1-flash-image-preview, EXTEND.md wins.

Display model info before each generation:

Using [provider] / [model]
Switch model: --model \x3Cid> | EXTEND.md default_model.[provider] | env \x3CPROVIDER>_IMAGE_MODEL

OpenAI-Compatible Gateway Dialects

provider=openai means the auth and routing entrypoint is OpenAI-compatible. It does not guarantee the upstream image API uses OpenAI native semantics. When a gateway expects a different wire format, set default_image_api_dialect in EXTEND.md, OPENAI_IMAGE_API_DIALECT, or --imageApiDialect:

openai-native: pixel size (1536x1024) and native OpenAI quality fields
ratio-metadata: aspect-ratio size (16:9) plus metadata.resolution (1K|2K|4K) and metadata.orientation

Use openai-native for the OpenAI native API or strict clones; try ratio-metadata for compatibility gateways in front of Gemini or similar models. Current limitation: ratio-metadata applies only to text-to-image; reference-image edits still need openai-native or a provider with first-class edit support.

Provider-Specific Guides

Each provider has its own quirks (model families, size rules, ref support, limits). Read these when the user picks that provider or asks for non-default behavior:

Provider	Reference
DashScope (Qwen-Image families, custom sizes)	`references/providers/dashscope.md`
Z.AI (GLM-Image, cogview-4)	`references/providers/zai.md`
MiniMax (image-01, subject-reference)	`references/providers/minimax.md`
OpenRouter (multimodal models, `/chat/completions` flow)	`references/providers/openrouter.md`
Replicate (nano-banana, Seedream, Wan)	`references/providers/replicate.md`

Provider Selection

--ref provided + no --provider → auto-select Google → OpenAI → Azure → OpenRouter → Replicate → Seedream → MiniMax (MiniMax's subject reference is more specialized toward character/portrait consistency)
--provider specified → use it (if --ref, must be google/openai/azure/openrouter/replicate/seedream/minimax)
Only one API key present → use that provider
Multiple keys → default priority: Google → OpenAI → Azure → OpenRouter → DashScope → Z.AI → MiniMax → Replicate → Jimeng → Seedream

Quality Presets

Preset	Google imageSize	OpenAI size	OpenRouter size	Replicate resolution	Use case
`normal`	1K	1024px target	1K	1K	Quick previews
`2k` (default)	2K	2048px target	2K	2K	Covers, illustrations, infographics

Google/OpenRouter imageSize can be overridden with --imageSize 1K|2K|4K.

For OpenAI native gpt-image-2, normal maps to quality=medium and a low-latency valid size near the requested aspect ratio; 2k maps to quality=high and 2048px-class sizes such as 2048x2048, 2048x1152, or 1152x2048. Use explicit --size for valid custom or 4K outputs, e.g. 3840x2160.

Aspect Ratios

Supported: 1:1, 16:9, 9:16, 4:3, 3:4, 2.35:1.

Google multimodal: imageConfig.aspectRatio
OpenAI: gpt-image-2 uses the closest valid custom size for the requested ratio; older GPT Image and DALL·E models use their closest supported fixed size
OpenRouter: imageGenerationOptions.aspect_ratio; if only --size \x3CWxH> is given, the ratio is inferred
Replicate: behavior is model-specific — google/nano-banana* uses aspect_ratio, bytedance/seedream-* uses documented Replicate ratios, Wan 2.7 maps --ar to a concrete size
MiniMax: official aspect_ratio values; if --size \x3CWxH> is given without --ar, sends width/height for image-01

Generation Mode

Default: sequential. Batch parallel: enabled automatically when --batchfile contains 2+ pending tasks.

Situation	Prefer	Why
One image, or 1-2 simple images	Sequential	Lower coordination overhead, easier debugging
Multiple images with saved prompt files	Batch (`--batchfile`)	Reuses finalized prompts, applies shared throttling/retries, predictable throughput
Each image still needs its own reasoning / prompt writing / style exploration	Subagents	Work is still exploratory, each needs independent analysis
Input is `outline.md` + `prompts/` (e.g. from `baoyu-article-illustrator`)	Batch — use `scripts/build-batch.ts` to assemble the payload	The outline + prompt files already contain everything needed

Rule of thumb: once prompt files are saved and the task is "generate all of these", prefer batch over subagents. Use subagents only when generation is coupled with per-image thinking or divergent creative exploration.

Parallel behavior:

Default worker count is automatic, capped by config, built-in default 10
Provider-specific throttling applies only in batch mode; defaults are tuned for throughput while avoiding RPM bursts
Override with --jobs \x3Ccount>
Each image retries up to 3 attempts
Final output includes success count, failure count, and per-image failure reasons

Error Handling

Missing API key → error with setup instructions
Generation failure → auto-retry up to 3 attempts per image
Invalid aspect ratio → warning, proceed with default
Reference images with unsupported provider/model → error with fix hint

References

File	Content
`references/usage-examples.md`	Extended CLI examples across providers and batch mode
`references/providers/dashscope.md`	DashScope families, sizes, limits
`references/providers/zai.md`	Z.AI GLM-image / cogview-4
`references/providers/minimax.md`	MiniMax image-01 + subject reference
`references/providers/openrouter.md`	OpenRouter multimodal flow
`references/providers/replicate.md`	Replicate supported families + guardrails
`references/config/preferences-schema.md`	EXTEND.md schema
`references/config/first-time-setup.md`	First-time setup flow

Extension Support

Custom configurations via EXTEND.md. See Step 0 for paths and schema.

Usage Guidance

This skill appears to do what it says: run TypeScript CLI scripts to generate images across many providers. Before installing, consider: (1) it will read and write configuration under .baoyu-skills in your project or $HOME and may rename a legacy EXTEND.md — back up those paths if you need to preserve them; (2) to use any cloud provider you must supply that provider's API key (the tool uses whatever OPENAI_API_KEY, GOOGLE_API_KEY, REPLICATE_API_TOKEN, etc. are present), so only provide keys you trust and prefer least-privileged keys or billing-limited accounts; (3) reference images are read locally and may be uploaded/encoded to provider endpoints (avoid using sensitive images as refs); (4) run the code review yourself (scripts/main.ts and provider modules are included) or execute in a sandboxed environment if you are unsure; (5) confirm the GitHub homepage/source is trustworthy (provided link: https://github.com/JimLiu/baoyu-skills#baoyu-imagine) and consider using dummy keys first to verify behavior.

Capability Analysis

Type: OpenClaw Skill Name: baoyu-imagine Version: 1.104.0 The baoyu-imagine skill is a comprehensive image generation tool supporting multiple AI providers including OpenAI, Google, Azure, and others. It features a robust architecture for handling single and batch generation tasks, configuration management via EXTEND.md, and provider-specific parameter mapping. While the Google provider implementation (scripts/providers/google.ts) uses execFileSync to call curl as a workaround for proxy-related fetch issues, it does so using argument arrays which mitigates shell injection risks. The instructions in SKILL.md are focused on procedural setup and tool usage, and the code logic is consistent with the stated purpose of AI image generation without any evidence of data exfiltration or malicious intent.

Capability Tags

cryptorequires-sensitive-credentials

Capability Assessment

✓ Purpose & Capability

Name/description (multi-provider image generation) matches the code and SKILL.md: scripts/main.ts and provider modules implement many image providers, CLI flags cover prompt/ref/batch use cases, and env vars listed are the expected API keys for those providers. The declared requirement of bun or npx aligns with the scripts being TypeScript/Node.

ℹ Instruction Scope

Runtime instructions are focused on image generation and preference management. They require reading prompt files and reference images and will read/write EXTEND.md in project or user config locations before any generation. The 'Step 0' blocking behavior and renaming of a legacy EXTEND.md are explicit and may create or move files under .baoyu-skills in the project or home directory. Reference images may be encoded and uploaded to providers (e.g., Minimax notes local refs are sent as Data URLs); this is expected for ref-capable providers but is worth noting for sensitive local images.

✓ Install Mechanism

There is no installer that downloads arbitrary code; the package is instruction/script-based and expects to be run with bun or npx. No external URL downloads or opaque installers are present in the provided files. The presence of TypeScript source files and test harnesses is consistent with the stated execution model.

ℹ Credentials

The registry metadata lists no 'required' env vars, but SKILL.md and scripts reference many provider API keys (OPENAI_API_KEY, GOOGLE_API_KEY, AZURE_OPENAI_API_KEY, REPLICATE_API_TOKEN, DASHSCOPE_API_KEY, MINIMAX_API_KEY, etc.). This is proportionate to a multi-provider image tool, but users should be aware the tool will use any of those keys present in the environment or in the stated .env locations. The skill does not request unrelated credentials.

ℹ Persistence & Privilege

always:false (no forced global inclusion). The skill writes EXTEND.md into project or user config directories and may rename a legacy config file; it also loads .env files from project/home as described. These are reasonable for storing preferences but mean the skill will create/modify files under .baoyu-skills in project or home.

Version History

v1.104.0

## baoyu-imagine 1.104.0 - Add `gpt-image-2` support for OpenAI image generation and edits. - Make `gpt-image-2` the default OpenAI model and update Azure deployment guidance. - Document official size and quality mapping, custom-size constraints, and 4K usage examples.

v1.103.1

## 1.103.1 - 2026-04-13 ### Fixes - `baoyu-markdown-to-html`: decode HTML entities and strip tags from article summary - `baoyu-post-to-weibo`: decode HTML entities and strip tags from article summary

v1.103.0

## 1.103.0 - 2026-04-12 ### Features - baoyu-diagram: add multi-diagram mode for article-wide diagram generation ### Fixes - baoyu-article-illustrator: prevent color names and hex codes from appearing as visible text in generated images - baoyu-cover-image: prevent color names and hex codes from appearing as visible text in generated images - baoyu-image-cards: prevent color names from appearing as visible text in generated images - baoyu-post-to-wechat: decode HTML entities and strip tags from article summary

v1.0.1

- Adds legacy compatibility: if `.baoyu-skills/baoyu-image-gen/EXTEND.md` exists and the new path does not, it will be renamed to `.baoyu-skills/baoyu-imagine/EXTEND.md` at runtime. - Both old and new EXTEND.md can coexist; if both exist, the new path is always used. - No user action required; migration is automatic for existing users upgrading from baoyu-image-gen.

v1.0.0

baoyu-imagine 1.0.0 - Initial release for AI image generation using multiple providers (OpenAI, Azure OpenAI, Google, OpenRouter, DashScope, MiniMax, Jimeng, Seedream, Replicate). - Supports text-to-image, reference images, aspect ratios, quality presets, and batch generation from prompt files. - Preferences and setup guided by EXTEND.md configuration file for key defaults (provider, model, quality, etc). - Includes single-image and batch (parallel worker) modes, with flexible CLI options and JSON batch input. - Comprehensive environment variable and provider-specific configuration support.

Metadata

Slug baoyu-imagine

Version 1.104.0

License MIT-0

All-time Installs 14

Active Installs 14

Total Versions 5

Frequently Asked Questions

What is Baoyu Imagine?

AI image generation with OpenAI GPT Image 2, Azure OpenAI, Google, OpenRouter, DashScope, Z.AI GLM-Image, MiniMax, Jimeng, Seedream and Replicate APIs. Suppo... It is an AI Agent Skill for Claude Code / OpenClaw, with 1004 downloads so far.

How do I install Baoyu Imagine?

Run "/install baoyu-imagine" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Baoyu Imagine free?

Yes, Baoyu Imagine is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Baoyu Imagine support?

Baoyu Imagine is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Baoyu Imagine?

It is built and maintained by Jim Liu 宝玉 (@jimliu); the current version is v1.104.0.

More Skills

Baoyu Imagine