Description

Generate images using ChatGPT/DALL-E through OpenClaw browser automation. Use when the user wants to create images via ChatGPT's web interface with their log...

README (SKILL.md)

ChatGPT Image Generation

Name: Chatgpt Image Gen
Author: sonim1

Generate images using ChatGPT's DALL-E integration through OpenClaw browser automation.

Prerequisites

Chrome Extension Installation:
- Install OpenClaw Browser Relay from Chrome Web Store
- Or use the extension that comes with OpenClaw
Initial Setup (one-time):
- Open ChatGPT (chatgpt.com) in Chrome/Brave
- Login to your ChatGPT account (Pro subscription recommended for best quality)
- Click the OpenClaw extension icon on the ChatGPT tab to attach it
- The badge should show "ON" when attached

How It Works

This skill uses OpenClaw's built-in browser tool with Chrome extension relay (profile="chrome") to control an already-logged-in ChatGPT tab. This bypasses ChatGPT's bot detection because it uses your real browser session.

CLI Command Reference

IMPORTANT: There is NO browser act subcommand. Each action is a direct subcommand.

Action	CLI Syntax
List tabs	`openclaw browser tabs`
Snapshot	`openclaw browser snapshot --target-id \x3CID>`
Click	`openclaw browser click \x3Cref> --target-id \x3CID>`
Type	`openclaw browser type \x3Cref> "\x3Ctext>" --target-id \x3CID>`
Press key	`openclaw browser press \x3Ckey> --target-id \x3CID>`
Navigate	`openclaw browser navigate \x3Curl> --target-id \x3CID>`
Screenshot	`openclaw browser screenshot --target-id \x3CID>`

\x3Cref> and \x3Ctext> are positional arguments (no --ref flag)
--target-id accepts a full ID or unique prefix (e.g. 77CB instead of 77CB8A574E8A44861C5FE49388EF6ABC)
--profile is a parent option on openclaw browser, not on subcommands

Workflow

1. List Attached Tabs

openclaw browser tabs

Look for a tab with URL containing chatgpt.com. Note the targetId.

2. Get Snapshot (find element refs)

openclaw browser snapshot --target-id \x3CID> --format ai --efficient

This outputs a tree with refs like e23, e589, etc. Always run snapshot before interacting.

3. Click an Element

openclaw browser click e23 --target-id \x3CID>

4. Type Text

openclaw browser type e589 "Generate an image: a futuristic city at sunset" --target-id \x3CID>

Add --submit to press Enter after typing:

openclaw browser type e589 "Generate an image: a cat riding a skateboard" --target-id \x3CID> --submit

5. Press a Key

openclaw browser press Enter --target-id \x3CID>

6. Wait for Generation

Use sleep to wait for DALL-E to generate (30-60 seconds):

sleep 45

Then take a new snapshot to check the result:

openclaw browser snapshot --target-id \x3CID> --format ai --efficient

Complete Example Session

# 1. List tabs, find the ChatGPT tab targetId
openclaw browser tabs

# 2. Take snapshot to find element refs
openclaw browser snapshot --target-id 4535E --format ai --efficient

# 3. Click input field (check ref from snapshot, usually labeled "Ask anything")
openclaw browser click e589 --target-id 4535E

# 4. Type prompt and submit
openclaw browser type e589 "Generate an image: a futuristic city at sunset" --target-id 4535E --submit

# 5. Wait for DALL-E generation
sleep 45

# 6. Take new snapshot to see result and find download button
openclaw browser snapshot --target-id 4535E --format ai --efficient

# 7. Click download button (ref from new snapshot)
openclaw browser click e745 --target-id 4535E

Troubleshooting

"Can't reach the OpenClaw browser control service":

Gateway restart needed: openclaw gateway restart
Or restart via OpenClaw menu bar app

"Chrome extension relay is running, but no tab is connected":

ChatGPT tab is not attached
Go to the ChatGPT tab and click the OpenClaw extension icon

"ref is required" error:

You need to specify which element to interact with
Run snapshot first to get the refs

Command not found / Unknown command:

Do NOT use browser act — use direct subcommands: browser click, browser type, browser press
ref is a positional argument: browser click e23, NOT browser click --ref e23

Image generation timeout:

DALL-E generation takes 30-60 seconds
Use sleep 45 then re-snapshot to check

Bot detection / Login issues:

The tab must be already logged in via your real browser
Use the Chrome extension relay (attached tab), not the isolated browser

Tips

Keep ChatGPT tab open: Once attached, keep the tab open for future use
Check targetId: The targetId changes if you close/reopen the tab — always run tabs first
Use --submit: The type command supports --submit to press Enter automatically
Unique prefix: --target-id accepts a unique prefix, no need for the full 32-char ID
Pro subscription: ChatGPT Pro gives better image quality and faster generation

Security Note

This approach uses your actual Chrome browser session, so it inherits all your ChatGPT permissions and settings. No credentials are stored or transmitted - everything happens in your existing browser session.

Usage Guidance

This skill is coherent for automating image generation via your ChatGPT browser session, but it requires installing and trusting a Chrome extension that will control an actual logged-in browser tab. Before using: 1) Verify the extension is legitimate (publisher, Web Store listing, permissions, reviews). 2) Prefer installing the extension in a separate browser profile with only the ChatGPT account signed in to limit exposure. 3) Do not keep sensitive data or other logged-in accounts in that profile while using the skill. 4) Be aware the SKILL.md explicitly suggests bypassing bot detection — that may violate ChatGPT's terms of service. 5) If possible, consider using the official API (OpenAI image endpoints) instead of browser automation for better auditability. If you cannot verify the extension's trustworthiness or accept the privacy trade-offs, do not install or use this skill.

Capability Analysis

Type: OpenClaw Skill Name: chatgpt-image-gen Version: 1.0.0 The skill bundle provides instructions for an AI agent to automate image generation on ChatGPT via OpenClaw's browser relay tool. It uses standard browser automation commands (click, type, snapshot) to interact with an existing, user-authenticated ChatGPT session. There are no indicators of data exfiltration, malicious execution, or prompt injection; the workflow is transparent and strictly aligned with the stated purpose in SKILL.md.

Capability Assessment

✓ Purpose & Capability

Name and description match the instructions: the skill is an instruction-only guide for using OpenClaw's browser automation to generate images via ChatGPT/DALL·E in a logged-in browser tab. It does not request unrelated credentials, binaries, or installs in the manifest.

ℹ Instruction Scope

SKILL.md instructs the agent to attach a Chrome extension to an already-logged-in ChatGPT tab and then use direct browser control commands (snapshot, click, type, press, screenshot) to drive DALL·E. Those actions are within the declared purpose, but they explicitly rely on controlling your real browser session (including its ChatGPT permissions) and even state 'bypass ChatGPT's bot detection', which has privacy, security, and potential terms-of-service implications.

✓ Install Mechanism

There is no install spec and no code files; the only installation step documented is a Chrome extension install from the Web Store or the extension bundled with OpenClaw. Because the skill itself does not perform downloads or write to disk, install risk from the skill bundle is low. The real risk is the extension you must trust externally.

✓ Credentials

The skill declares no environment variables, credentials, or config paths. It instead instructs use of an already-logged-in browser session; this is proportionate for the stated browser-automation purpose. However, because the extension inherits the browser session, it can access whatever that session has access to—this is a privacy/trust concern rather than a manifest inconsistency.

✓ Persistence & Privilege

Skill flags are default (not always:true). The skill does not request permanent always-on presence nor modify other skills. It requires the user to attach an extension to a browser tab, which gives the extension session-level privileges, but that is external to the skill's bundle.

Version History

v1.0.0

- Initial release of chatgpt-image-gen skill. - Enables DALL-E image generation via ChatGPT’s web interface using OpenClaw browser automation. - Requires setup of the OpenClaw Chrome extension and an already logged-in ChatGPT tab. - Provides detailed CLI workflow for interacting with ChatGPT UI (listing tabs, clicking, typing prompts, downloading images). - Includes troubleshooting tips and best practices for reliable image generation.

Metadata

Slug chatgpt-image-gen

Version 1.0.0

License —

All-time Installs 3

Active Installs 3

Total Versions 1

Frequently Asked Questions

What is Chatgpt Image Gen?

Generate images using ChatGPT/DALL-E through OpenClaw browser automation. Use when the user wants to create images via ChatGPT's web interface with their log... It is an AI Agent Skill for Claude Code / OpenClaw, with 366 downloads so far.

How do I install Chatgpt Image Gen?

Run "/install chatgpt-image-gen" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Chatgpt Image Gen free?

Yes, Chatgpt Image Gen is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Chatgpt Image Gen support?

Chatgpt Image Gen is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Chatgpt Image Gen?

It is built and maintained by sonim1 (@sonim1); the current version is v1.0.0.

More Skills

Chatgpt Image Gen