Description

Generate images using ChatGPT's GPT-Image-2 model via browser automation (CDP). Shares the user's daily Brave Browser (port 9222) via the brave-browser-agent...

README (SKILL.md)

GPT Image Generation (ChatGPT GPT-Image-2)

Name: GPT Image Generator
Author: mayf3

Generate images via ChatGPT's GPT-Image-2 model by automating the shared Brave Browser through CDP.

本技能共享用户日常 Brave 浏览器（端口 9222），与 brave-browser-agent 使用同一个浏览器实例。 绝不启动/重启/关闭浏览器，只 attach。

Prerequisites

Brave Browser running on port 9222 (shared with brave-browser-agent)
ChatGPT account logged in at chatgpt.com (free or Plus)
Python 3 + websockets pip package

Shared Scripts

复用 brave-browser-agent 的 CDP 脚本（端口 9222）：

CDP 操作: brave-browser-agent/scripts/cdp_exec.py
浏览器状态检查: brave-browser-agent/scripts/check_brave.py

本技能自带 ChatGPT 专用脚本：

图片提取: {{SKILL_DIR}}/scripts/extract_image.py

Workflow

Step 1: Check Browser Status

python3 brave-browser-agent/scripts/check_brave.py

如果 9222 无响应，告诉用户：

"Brave Browser 未启动远程调试，请关闭所有 Brave 窗口后用以下命令重新打开： /Applications/Brave\ Browser.app/Contents/MacOS/Brave\ Browser --remote-debugging-port=9222"

Step 2: Open ChatGPT

使用 CDP WebSocket 在浏览器中新建标签页（/json/new 可能返回 405，用 Target.createTarget 代替）：

# 通过 CDP 创建新标签页
import json, urllib.request, asyncio, websockets

info = json.loads(urllib.request.urlopen("http://localhost:9222/json/version").read())
ws_url = info.get("webSocketDebuggerUrl")

async def create_tab():
    async with websockets.connect(ws_url, max_size=50*1024*1024) as ws:
        await ws.send(json.dumps({"id": 1, "method": "Target.createTarget", "params": {"url": "https://chatgpt.com/"}}))
        resp = json.loads(await asyncio.wait_for(ws.recv(), timeout=10))
        return resp["result"]["targetId"]

tab_id = asyncio.run(create_tab())

或者复用已有的 ChatGPT 标签：

python3 brave-browser-agent/scripts/cdp_exec.py list
# 找到 chatgpt.com 的 tab，记下 TAB_ID

保存为 CHATGPT_TAB 供后续步骤使用。

Step 3: Wait for Page Load

sleep 3
python3 brave-browser-agent/scripts/cdp_exec.py screenshot $CHATGPT_TAB /tmp/chatgpt-state.png

确认页面加载完成（看到输入框）。

Step 4: Input Prompt

ChatGPT 使用 \x3Cdiv id="prompt-textarea" contenteditable="true"> 作为输入框。

python3 brave-browser-agent/scripts/cdp_exec.py eval $CHATGPT_TAB '
(function() {
    var el = document.querySelector("#prompt-textarea");
    if (!el) return "NO_EDITOR";
    el.focus();
    el.textContent = "YOUR_PROMPT_HERE";
    el.dispatchEvent(new InputEvent("input", {bubbles: true, inputType: "insertText", data: "YOUR_PROMPT_HERE"}));
    return "TEXT_SET";
})()
'

提示词增强：根据用户需求，添加风格、质量、构图等关键词。参考 Prompt Tips。

Step 5: Click Send

python3 brave-browser-agent/scripts/cdp_exec.py eval $CHATGPT_TAB '
(function() {
    var btn = document.querySelector("button[data-testid=\"send-button\"]");
    if (btn) { btn.click(); return "CLICKED_SEND"; }
    return "NO_SEND_BTN";
})()
'

Step 6: Wait for Generation

GPT-Image-2 生成通常需要 15-40 秒。用截图轮询：

# 等待 20 秒
sleep 20
python3 brave-browser-agent/scripts/cdp_exec.py screenshot $CHATGPT_TAB /tmp/chatgpt-result.png

检查是否还在生成中：

python3 brave-browser-agent/scripts/cdp_exec.py eval $CHATGPT_TAB '
(function() {
    var imgs = document.querySelectorAll("img");
    var count = 0;
    for (var i = 0; i \x3C imgs.length; i++) {
        var w = imgs[i].naturalWidth || imgs[i].width || 0;
        if (w >= 200) count++;
    }
    return count;
})()
'

如果 count >= 1 → 图片已生成，继续提取
如果 count == 0 → 还在生成，再等 10-15 秒

Step 7: Extract Image

使用专用提取脚本（fetch + blob 方式，无 CORS 问题）：

mkdir -p /tmp/openclaw
python3 {{SKILL_DIR}}/scripts/extract_image.py $CHATGPT_TAB /tmp/openclaw/gpt-output.png

Step 8: Send to User

openclaw message send \
  --channel feishu \
  --target \x3Cchat_id> \
  --media /tmp/openclaw/gpt-output.png \
  --message "🎨 Generated by ChatGPT GPT-Image-2"

Prompt Tips

GPT-Image-2 对自然语言描述理解极强，支持：

示例

a golden retriever puppy sitting in a field of sunflowers,
photorealistic, warm golden hour lighting, shallow depth of field, 8k, masterpiece

GPT-Image-2 特色能力

🎯 文字渲染：可以在图片中生成准确的文字（比其他模型强很多）
🎯 写实风格：照片级真实感极佳
🎯 复杂场景：理解复杂的构图和多元素场景
🎯 风格控制：支持各种艺术风格

风格关键词

摄影: "photorealistic, cinematic, shot on 35mm, depth of field"
插画: "digital illustration, concept art, vibrant"
3D: "3D render, octane render, Pixar style"
动漫: "anime style, studio ghibli, makoto shinkai"
油画: "oil painting, impressionist, textured"

Comparison: GPT-Image-2 vs Gemini Imagen

特性	GPT-Image-2	Gemini Imagen
文字渲染	⭐⭐⭐⭐⭐ 极强	⭐⭐⭐ 一般
写实感	⭐⭐⭐⭐⭐ 照片级	⭐⭐⭐⭐ 很好
艺术风格	⭐⭐⭐⭐ 广泛	⭐⭐⭐⭐ 广泛
速度	15-40s	10-30s
免费	✅ 免费/Plus	✅ 免费
提示词理解	⭐⭐⭐⭐⭐ 极强	⭐⭐⭐⭐ 好

Troubleshooting

No image / text-only response: ChatGPT 可能没理解你要生成图片。确保提示词包含 "generate/create/draw an image" 等明确指令。
"NO_EDITOR": 页面未加载完。等待几秒后重试，或检查 ChatGPT 是否需要登录。
"NO_SEND_BTN": 输入框可能为空（React 未检测到内容变化）。尝试用 textContent + InputEvent 方式。
CDP errors / 9222 not responding: 浏览器未运行。让用户重启 Brave with --remote-debugging-port=9222。绝不绝不自动启动或关闭浏览器。
Image too small / CORS error: 使用 fetch + blob 方式而非 canvas。extract_image.py 已自动处理。
Rate limit: ChatGPT 免费用户有生成次数限制。建议用户升级 Plus 或稍后再试。

Usage Guidance

Review before installing. Use this only if you are comfortable letting the agent control a logged-in Brave session. Prefer a separate Brave profile or browser instance for ChatGPT, close sensitive tabs first, and relaunch Brave without remote debugging after use. Generated images are saved under /tmp/openclaw and the workflow sends the result through the configured Feishu message target.

Capability Assessment

⚠ Purpose & Capability

The ChatGPT image-generation workflow is coherent, but the chosen capability is broad: it attaches to the user's daily Brave session over CDP, can run browser JavaScript, take screenshots, list tabs, and use the logged-in ChatGPT account.

⚠ Instruction Scope

The instructions mostly target a ChatGPT tab, but they rely on raw CDP evaluation against a daily browser profile and do not require an isolated profile or clearly warn that remote debugging can expose unrelated tabs and sessions.

ℹ Install Mechanism

No hidden installer or package mutation is present. The skill requires Python, websockets, an already available brave-browser-agent script set, and Brave launched with remote debugging.

⚠ Credentials

Using the user's normal Brave profile with --remote-debugging-port=9222 is disproportionate for image generation because it grants automation access to the broader browser environment, not just ChatGPT.

ℹ Persistence & Privilege

The skill does not install a daemon or persistent background worker, and it explicitly says not to start, restart, or close the browser automatically. However, if the user follows the setup command, Brave remains exposed to local CDP control until closed or relaunched normally.

Version History

v1.0.0

Initial release of gpt-image-gen: Generate images via ChatGPT's GPT-Image-2 model using browser automation. - Automates image generation on chatgpt.com by controlling Brave Browser (with remote debugging). - Reuses existing `brave-browser-agent` scripts for browser operations and status checks; shares the same browser session. - Includes a dedicated script to extract generated images directly from ChatGPT. - Provides step-by-step guidance for prompt input, image trigger, waiting for generation, and image extraction. - Offers detailed prompt engineering tips and comparison with Gemini Imagen models. - Chinese-language support and troubleshooting guidance built-in.

Metadata

Slug gpt-image-gen

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is GPT Image Generator?

Generate images using ChatGPT's GPT-Image-2 model via browser automation (CDP). Shares the user's daily Brave Browser (port 9222) via the brave-browser-agent... It is an AI Agent Skill for Claude Code / OpenClaw, with 42 downloads so far.

How do I install GPT Image Generator?

Run "/install gpt-image-gen" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is GPT Image Generator free?

Yes, GPT Image Generator is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does GPT Image Generator support?

GPT Image Generator is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created GPT Image Generator?

It is built and maintained by mayf3 (@mayf3); the current version is v1.0.0.

More Skills

GPT Image Generator