← 返回 Skills 市场
nengnengz

Baoyu Image Gen

作者 nengnengZ · GitHub ↗ · v0.1.2 · MIT-0
cross-platform ⚠ suspicious
422
总下载
1
收藏
2
当前安装
3
版本数
在 OpenClaw 中安装
/install baoyu-image-gen-2
功能描述
AI image generation with OpenAI, Google, OpenRouter, DashScope, Jimeng, Seedream and Replicate APIs. Supports text-to-image, reference images, aspect ratios,...
使用说明 (SKILL.md)

Image Generation (AI SDK)

Official API-based image generation. Supports OpenAI, Google, OpenRouter, DashScope (阿里通义万象), Jimeng (即梦), Seedream (豆包) and Replicate providers.

Script Directory

Agent Execution:

  1. {baseDir} = this SKILL.md file's directory
  2. Script path = {baseDir}/scripts/main.ts
  3. Resolve ${BUN_X} runtime: if bun installed → bun; if npx available → npx -y bun; else suggest installing bun

Step 0: Load Preferences ⛔ BLOCKING

CRITICAL: This step MUST complete BEFORE any image generation. Do NOT skip or defer.

Check EXTEND.md existence (priority: project → user):

# macOS, Linux, WSL, Git Bash
test -f .baoyu-skills/baoyu-image-gen/EXTEND.md && echo "project"
test -f "${XDG_CONFIG_HOME:-$HOME/.config}/baoyu-skills/baoyu-image-gen/EXTEND.md" && echo "xdg"
test -f "$HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md" && echo "user"
# PowerShell (Windows)
if (Test-Path .baoyu-skills/baoyu-image-gen/EXTEND.md) { "project" }
$xdg = if ($env:XDG_CONFIG_HOME) { $env:XDG_CONFIG_HOME } else { "$HOME/.config" }
if (Test-Path "$xdg/baoyu-skills/baoyu-image-gen/EXTEND.md") { "xdg" }
if (Test-Path "$HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md") { "user" }
Result Action
Found Load, parse, apply settings. If default_model.[provider] is null → ask model only (Flow 2)
Not found ⛔ Run first-time setup (references/config/first-time-setup.md) → Save EXTEND.md → Then continue

CRITICAL: If not found, complete the full setup (provider + model + quality + save location) using AskUserQuestion BEFORE generating any images. Generation is BLOCKED until EXTEND.md is created.

Path Location
.baoyu-skills/baoyu-image-gen/EXTEND.md Project directory
$HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md User home

EXTEND.md Supports: Default provider | Default quality | Default aspect ratio | Default image size | Default models | Batch worker cap | Provider-specific batch limits

Schema: references/config/preferences-schema.md

Usage

# Basic
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image cat.png

# With aspect ratio
${BUN_X} {baseDir}/scripts/main.ts --prompt "A landscape" --image out.png --ar 16:9

# High quality
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --quality 2k

# From prompt files
${BUN_X} {baseDir}/scripts/main.ts --promptfiles system.md content.md --image out.png

# With reference images (Google, OpenAI, OpenRouter, Replicate, or Seedream 4.0/4.5/5.0)
${BUN_X} {baseDir}/scripts/main.ts --prompt "Make blue" --image out.png --ref source.png

# With reference images (explicit provider/model)
${BUN_X} {baseDir}/scripts/main.ts --prompt "Make blue" --image out.png --provider google --model gemini-3-pro-image-preview --ref source.png

# OpenRouter (recommended default model)
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider openrouter

# OpenRouter with reference images
${BUN_X} {baseDir}/scripts/main.ts --prompt "Make blue" --image out.png --provider openrouter --model google/gemini-3.1-flash-image-preview --ref source.png

# Specific provider
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider openai

# DashScope (阿里通义万象)
${BUN_X} {baseDir}/scripts/main.ts --prompt "一只可爱的猫" --image out.png --provider dashscope

# DashScope Qwen-Image 2.0 Pro (recommended for custom sizes and text rendering)
${BUN_X} {baseDir}/scripts/main.ts --prompt "为咖啡品牌设计一张 21:9 横幅海报,包含清晰中文标题" --image out.png --provider dashscope --model qwen-image-2.0-pro --size 2048x872

# DashScope legacy Qwen fixed-size model
${BUN_X} {baseDir}/scripts/main.ts --prompt "一张电影感海报" --image out.png --provider dashscope --model qwen-image-max --size 1664x928

# Replicate (google/nano-banana-pro)
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate

# Replicate with specific model
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate --model google/nano-banana

# Batch mode with saved prompt files
${BUN_X} {baseDir}/scripts/main.ts --batchfile batch.json

# Batch mode with explicit worker count
${BUN_X} {baseDir}/scripts/main.ts --batchfile batch.json --jobs 4 --json

Batch File Format

{
  "jobs": 4,
  "tasks": [
    {
      "id": "hero",
      "promptFiles": ["prompts/hero.md"],
      "image": "out/hero.png",
      "provider": "replicate",
      "model": "google/nano-banana-pro",
      "ar": "16:9",
      "quality": "2k"
    },
    {
      "id": "diagram",
      "promptFiles": ["prompts/diagram.md"],
      "image": "out/diagram.png",
      "ref": ["references/original.png"]
    }
  ]
}

Paths in promptFiles, image, and ref are resolved relative to the batch file's directory. jobs is optional (overridden by CLI --jobs). Top-level array format (without jobs wrapper) is also accepted.

Options

Option Description
--prompt \x3Ctext>, -p Prompt text
--promptfiles \x3Cfiles...> Read prompt from files (concatenated)
--image \x3Cpath> Output image path (required in single-image mode)
--batchfile \x3Cpath> JSON batch file for multi-image generation
--jobs \x3Ccount> Worker count for batch mode (default: auto, max from config, built-in default 10)
--provider google|openai|openrouter|dashscope|jimeng|seedream|replicate Force provider (default: auto-detect)
--model \x3Cid>, -m Model ID (Google: gemini-3-pro-image-preview; OpenAI: gpt-image-1.5; OpenRouter: google/gemini-3.1-flash-image-preview; DashScope: qwen-image-2.0-pro)
--ar \x3Cratio> Aspect ratio (e.g., 16:9, 1:1, 4:3)
--size \x3CWxH> Size (e.g., 1024x1024)
--quality normal|2k Quality preset (default: 2k)
--imageSize 1K|2K|4K Image size for Google/OpenRouter (default: from quality)
--ref \x3Cfiles...> Reference images. Supported by Google multimodal, OpenAI GPT Image edits, OpenRouter multimodal models, Replicate, and Seedream 5.0/4.5/4.0. Not supported by Jimeng, Seedream 3.0, or removed SeedEdit 3.0
--n \x3Ccount> Number of images
--json JSON output

Environment Variables

Variable Description
OPENAI_API_KEY OpenAI API key
OPENROUTER_API_KEY OpenRouter API key
GOOGLE_API_KEY Google API key
DASHSCOPE_API_KEY DashScope API key (阿里云)
REPLICATE_API_TOKEN Replicate API token
JIMENG_ACCESS_KEY_ID Jimeng (即梦) Volcengine access key
JIMENG_SECRET_ACCESS_KEY Jimeng (即梦) Volcengine secret key
ARK_API_KEY Seedream (豆包) Volcengine ARK API key
OPENAI_IMAGE_MODEL OpenAI model override
OPENROUTER_IMAGE_MODEL OpenRouter model override (default: google/gemini-3.1-flash-image-preview)
GOOGLE_IMAGE_MODEL Google model override
DASHSCOPE_IMAGE_MODEL DashScope model override (default: qwen-image-2.0-pro)
REPLICATE_IMAGE_MODEL Replicate model override (default: google/nano-banana-pro)
JIMENG_IMAGE_MODEL Jimeng model override (default: jimeng_t2i_v40)
SEEDREAM_IMAGE_MODEL Seedream model override (default: doubao-seedream-5-0-260128)
OPENAI_BASE_URL Custom OpenAI endpoint
OPENROUTER_BASE_URL Custom OpenRouter endpoint (default: https://openrouter.ai/api/v1)
OPENROUTER_HTTP_REFERER Optional app/site URL for OpenRouter attribution
OPENROUTER_TITLE Optional app name for OpenRouter attribution
GOOGLE_BASE_URL Custom Google endpoint
DASHSCOPE_BASE_URL Custom DashScope endpoint
REPLICATE_BASE_URL Custom Replicate endpoint
JIMENG_BASE_URL Custom Jimeng endpoint (default: https://visual.volcengineapi.com)
JIMENG_REGION Jimeng region (default: cn-north-1)
SEEDREAM_BASE_URL Custom Seedream endpoint (default: https://ark.cn-beijing.volces.com/api/v3)
BAOYU_IMAGE_GEN_MAX_WORKERS Override batch worker cap
BAOYU_IMAGE_GEN_\x3CPROVIDER>_CONCURRENCY Override provider concurrency, e.g. BAOYU_IMAGE_GEN_REPLICATE_CONCURRENCY
BAOYU_IMAGE_GEN_\x3CPROVIDER>_START_INTERVAL_MS Override provider start gap, e.g. BAOYU_IMAGE_GEN_REPLICATE_START_INTERVAL_MS

Load Priority: CLI args > EXTEND.md > env vars > \x3Ccwd>/.baoyu-skills/.env > ~/.baoyu-skills/.env

Model Resolution

Model priority (highest → lowest), applies to all providers:

  1. CLI flag: --model \x3Cid>
  2. EXTEND.md: default_model.[provider]
  3. Env var: \x3CPROVIDER>_IMAGE_MODEL (e.g., GOOGLE_IMAGE_MODEL)
  4. Built-in default

EXTEND.md overrides env vars. If both EXTEND.md default_model.google: "gemini-3-pro-image-preview" and env var GOOGLE_IMAGE_MODEL=gemini-3.1-flash-image-preview exist, EXTEND.md wins.

Agent MUST display model info before each generation:

  • Show: Using [provider] / [model]
  • Show switch hint: Switch model: --model \x3Cid> | EXTEND.md default_model.[provider] | env \x3CPROVIDER>_IMAGE_MODEL

DashScope Models

Use --model qwen-image-2.0-pro or set default_model.dashscope / DASHSCOPE_IMAGE_MODEL when the user wants official Qwen-Image behavior.

Official DashScope model families:

  • qwen-image-2.0-pro, qwen-image-2.0-pro-2026-03-03, qwen-image-2.0, qwen-image-2.0-2026-03-03
    • Free-form size in 宽*高 format
    • Total pixels must stay between 512*512 and 2048*2048
    • Default size is approximately 1024*1024
    • Best choice for custom ratios such as 21:9 and text-heavy Chinese/English layouts
  • qwen-image-max, qwen-image-max-2025-12-30, qwen-image-plus, qwen-image-plus-2026-01-09, qwen-image
    • Fixed sizes only: 1664*928, 1472*1104, 1328*1328, 1104*1472, 928*1664
    • Default size is 1664*928
    • qwen-image currently has the same capability as qwen-image-plus
  • Legacy DashScope models such as z-image-turbo, z-image-ultra, wanx-v1
    • Keep using them only when the user explicitly asks for legacy behavior or compatibility

When translating CLI args into DashScope behavior:

  • --size wins over --ar
  • For qwen-image-2.0*, prefer explicit --size; otherwise infer from --ar and use the official recommended resolutions below
  • For qwen-image-max/plus/image, only use the five official fixed sizes; if the requested ratio is not covered, switch to qwen-image-2.0-pro
  • --quality is a baoyu-image-gen compatibility preset, not a native DashScope API field. Mapping normal / 2k onto the qwen-image-2.0* table below is an implementation inference, not an official API guarantee

Recommended qwen-image-2.0* sizes for common aspect ratios:

Ratio normal 2k
1:1 1024*1024 1536*1536
2:3 768*1152 1024*1536
3:2 1152*768 1536*1024
3:4 960*1280 1080*1440
4:3 1280*960 1440*1080
9:16 720*1280 1080*1920
16:9 1280*720 1920*1080
21:9 1344*576 2048*872

DashScope official APIs also expose negative_prompt, prompt_extend, and watermark, but baoyu-image-gen does not expose them as dedicated CLI flags today.

Official references:

OpenRouter Models

Use full OpenRouter model IDs, e.g.:

  • google/gemini-3.1-flash-image-preview (recommended, supports image output and reference-image workflows)
  • google/gemini-2.5-flash-image-preview
  • black-forest-labs/flux.2-pro
  • Other OpenRouter image-capable model IDs

Notes:

  • OpenRouter image generation uses /chat/completions, not the OpenAI /images endpoints
  • If --ref is used, choose a multimodal model that supports image input and image output
  • --imageSize maps to OpenRouter imageGenerationOptions.size; --size \x3CWxH> is converted to the nearest OpenRouter size and inferred aspect ratio when possible

Replicate Models

Supported model formats:

  • owner/name (recommended for official models), e.g. google/nano-banana-pro
  • owner/name:version (community models by version), e.g. stability-ai/sdxl:\x3Cversion>

Examples:

# Use Replicate default model
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate

# Override model explicitly
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate --model google/nano-banana

Provider Selection

  1. --ref provided + no --provider → auto-select Google first, then OpenAI, then OpenRouter, then Replicate (Jimeng and Seedream do not support reference images)
  2. --provider specified → use it (if --ref, must be google, openai, openrouter, or replicate)
  3. Only one API key available → use that provider
  4. Multiple available → default to Google

Quality Presets

Preset Google imageSize OpenAI Size OpenRouter size Replicate resolution Use Case
normal 1K 1024px 1K 1K Quick previews
2k (default) 2K 2048px 2K 2K Covers, illustrations, infographics

Google/OpenRouter imageSize: Can be overridden with --imageSize 1K|2K|4K

Aspect Ratios

Supported: 1:1, 16:9, 9:16, 4:3, 3:4, 2.35:1

  • Google multimodal: uses imageConfig.aspectRatio
  • OpenAI: maps to closest supported size
  • OpenRouter: sends imageGenerationOptions.aspect_ratio; if only --size \x3CWxH> is given, aspect ratio is inferred automatically
  • Replicate: passes aspect_ratio to model; when --ref is provided without --ar, defaults to match_input_image

Generation Mode

Default: Sequential generation.

Batch Parallel Generation: When --batchfile contains 2 or more pending tasks, the script automatically enables parallel generation.

Mode When to Use
Sequential (default) Normal usage, single images, small batches
Parallel batch Batch mode with 2+ tasks

Execution choice:

Situation Preferred approach Why
One image, or 1-2 simple images Sequential Lower coordination overhead and easier debugging
Multiple images already have saved prompt files Batch (--batchfile) Reuses finalized prompts, applies shared throttling/retries, and gives predictable throughput
Each image still needs separate reasoning, prompt writing, or style exploration Subagents The work is still exploratory, so each image may need independent analysis before generation
Output comes from baoyu-article-illustrator with outline.md + prompts/ Batch (build-batch.ts -> --batchfile) That workflow already produces prompt files, so direct batch execution is the intended path

Rule of thumb:

  • Prefer batch over subagents once prompt files are already saved and the task is "generate all of these"
  • Use subagents only when generation is coupled with per-image thinking, rewriting, or divergent creative exploration

Parallel behavior:

  • Default worker count is automatic, capped by config, built-in default 10
  • Provider-specific throttling is applied only in batch mode, and the built-in defaults are tuned for faster throughput while still avoiding obvious RPM bursts
  • You can override worker count with --jobs \x3Ccount>
  • Each image retries automatically up to 3 attempts
  • Final output includes success count, failure count, and per-image failure reasons

Error Handling

  • Missing API key → error with setup instructions
  • Generation failure → auto-retry up to 3 attempts per image
  • Invalid aspect ratio → warning, proceed with default
  • Reference images with unsupported provider/model → error with fix hint

Extension Support

Custom configurations via EXTEND.md. See Preferences section for paths and supported options.

安全使用建议
Do not install blindly. The SKILL.md expects to run {baseDir}/scripts/main.ts but the skill bundle contains no runtime code — ask for the script or an install spec before trusting this skill. Verify where the code will come from (the GitHub homepage) and inspect it for network endpoints and credential handling. Expect that using this skill will require API keys for providers (OpenAI, Google, Replicate, DashScope, etc.); confirm which environment variables or auth flows are required and that keys are stored securely. Be aware the skill will create/modify EXTEND.md in your project or $HOME; review its content. If you cannot review the runtime code or prefer isolation, run this in a sandboxed environment or decline until the author supplies the missing scripts and a clear list of required credentials.
功能分析
Type: OpenClaw Skill Name: baoyu-image-gen-2 Version: 0.1.2 The skill bundle provides a comprehensive and well-documented interface for AI image generation across multiple providers including Google, OpenAI, and DashScope. The instructions in SKILL.md and first-time-setup.md define a clear configuration flow using EXTEND.md and standard CLI execution via a TypeScript script (main.ts). There are no indicators of data exfiltration, malicious persistence, or prompt injection attacks; the behavior is entirely consistent with the stated purpose of an image generation tool.
能力评估
Purpose & Capability
The SKILL.md claims multi-provider image generation (OpenAI, Google, OpenRouter, DashScope, Jimeng, Seedream, Replicate) but the registry metadata declares no required environment variables or primary credential. A networked multi-provider image tool would normally require provider API keys/tokens; those are not declared. Also the SKILL.md references scripts/main.ts (the actual runtime) but the skill bundle contains no code files — only the SKILL.md and reference docs. This mismatch between claimed capabilities and what is present is concerning.
Instruction Scope
Runtime instructions tell the agent to: detect bun/npx, compute baseDir, and run {baseDir}/scripts/main.ts; require a blocking first-time setup that reads/writes EXTEND.md under project or $HOME; and use AskUserQuestion flows. Those file reads/writes (project and home) are consistent with saving preferences, but the instructions do not mention obtaining or using provider API credentials even though networked provider calls are implied. Crucially, the instructions call a local script that is not included in the bundle — the agent would attempt to run a non-existent program unless code is provided elsewhere.
Install Mechanism
There is no install spec (instruction-only), which reduces immediate disk-write/install risk. The skill requires bun or npx to run a TypeScript runtime; requiring those binaries is reasonable for a TypeScript script. However, because the script referred to in SKILL.md is absent from the provided files, there is uncertainty about what would be installed or executed in a real installation from the referenced homepage/repo.
Credentials
The skill integrates many external providers but declares no required env vars or credentials. Calling these APIs normally requires tokens (e.g., OPENAI_API_KEY, GOOGLE_API_KEY/OAuth, REPLICATE_API_TOKEN, DashScope credentials). The absence of declared credentials is disproportionate to the stated networked functionality and hides what secrets the runtime will actually use or request.
Persistence & Privilege
The skill does not request always:true and does not claim to modify other skills or system-wide settings. It does instruct writing EXTEND.md into project or user home directories (preference storage), which is a reasonable local persistence for a preferences file.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install baoyu-image-gen-2
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /baoyu-image-gen-2 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.1.2
baoyu-image-gen-2 v0.1.2 - No file changes detected in this release. - Functionality, options, and documentation remain unchanged from previous version.
v0.1.1
- Removed all TypeScript source, test, provider, and supporting files (16 files deleted). - No changes to documentation or user-facing options. - All logic and implementation have been removed; this version effectively disables the skill's functionality.
v0.1.0
baoyu-image-gen-2 v0.1.0 - Initial release of baoyu-image-gen-2 supporting AI image generation via OpenAI, Google, OpenRouter, DashScope, Jimeng, Seedream, and Replicate APIs. - Features text-to-image, reference image support, aspect ratios, batch generation, and flexible default/provider/model settings. - Includes robust CLI usage with detailed options for prompts, files, batch jobs, reference images, and output formats. - Implements strict preference/setup loading before any image generation (EXTEND.md schema and setup flow). - Supports per-provider environment variable configuration and provider/model overrides. - Batch file structure and parallel/seq worker management included for efficient high-throughput image generation.
元数据
Slug baoyu-image-gen-2
版本 0.1.2
许可证 MIT-0
累计安装 3
当前安装数 2
历史版本数 3
常见问题

Baoyu Image Gen 是什么?

AI image generation with OpenAI, Google, OpenRouter, DashScope, Jimeng, Seedream and Replicate APIs. Supports text-to-image, reference images, aspect ratios,... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 422 次。

如何安装 Baoyu Image Gen?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install baoyu-image-gen-2」即可一键安装,无需额外配置。

Baoyu Image Gen 是免费的吗?

是的,Baoyu Image Gen 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Baoyu Image Gen 支持哪些平台?

Baoyu Image Gen 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Baoyu Image Gen?

由 nengnengZ(@nengnengz)开发并维护,当前版本 v0.1.2。

💬 留言讨论