Description

Generate and edit images via OpenAI gpt-image-2 model. Agent-agnostic CLI — works with any AI agent (Hermes, Claude Code, Codex, OpenClaw, etc.). Supports co...

README (SKILL.md)

gpt-image-2

Name: GPT Image 2 API
Author: jancong

Generate and edit images via OpenAI's gpt-image-2 model. Agent-agnostic — designed to work with any AI agent or standalone from the command line.

Quick Start

# 1. Initialize config (one-time)
python3 gpt_image2.py config --init

# 2. Edit the config to set your API key
#    ~/.config/gpt-image-2/config.json

# 3. Generate
python3 gpt_image2.py generate "A cute cat on a windowsill" -o ~/cat.png --quality low

# 4. Edit
python3 gpt_image2.py edit input.png "Change the sofa to green" -o ~/output.png

Configuration

Config priority: --config flag > --base-url/--api-key flags > config file > environment variables > defaults.

Config File Locations (in priority order)

Priority	Path	Notes
1	`$XDG_CONFIG_HOME/gpt-image-2/config.json`	XDG standard (recommended)
2	`~/.config/gpt-image-2/config.json`	Default XDG fallback
3	`~/.gpt-image-2-config.json`	Single-file fallback
4	`~/.hermes/gpt-image-2-config.json`	Legacy Hermes compat

Use python3 gpt_image2.py config --show to see which config is active.

Config File Format

{
  "base_url": "https://api.openai.com/v1",
  "api_key_env": "OPENAI_API_KEY"
}

Field	Type	Description
`base_url`	string	API base URL. Default: `https://api.openai.com/v1`
`api_key`	string	Plaintext API key (not recommended — visible in file)
`api_key_env`	string	Environment variable name holding the key (recommended)

Environment Variables (fallback when no config file)

Variable	Purpose
`GPT_IMAGE2_API_KEY`	API key
`GPT_IMAGE2_BASE_URL`	API base URL

Config Management Commands

# Create template config
python3 gpt_image2.py config --init

# Show active config (keys are masked)
python3 gpt_image2.py config --show

# Overwrite config
python3 gpt_image2.py config --init --force

CLI Reference

generate — Text-to-Image

python3 gpt_image2.py generate "prompt" [options]

Option	Default	Description
`-o, --output`	`~/gpt-image2-output.png`	Output file path
`--quality`	`auto`	`low` (~70s), `medium` (~120s), `high` (~276s)
`--size`	`auto`	`1024x1024`, `1536x1024`, `1024x1536`
`--format`	`png`	`png`, `jpeg`, `webp`
`--n`	`1`	Number of images (1-10)
`--timeout`	`600`	curl timeout in seconds
`--config`	auto-detect	Explicit config file path
`--base-url`	from config	Override API base URL
`--api-key`	from config	Override API key (visible in ps!)

edit — Image-to-Image

python3 gpt_image2.py edit \x3Cimage_path> "edit prompt" [options]

Option	Default	Description
`--mask`	none	PNG mask (transparent=edit area)
`--moderation`	`auto`	`low` or `auto`
(all generate options also apply)

config — Manage Configuration

python3 gpt_image2.py config [--init] [--show] [--force] [--config PATH]

Script Location

The script is at scripts/gpt_image2.py relative to this skill directory.

To find it programmatically from any agent:

# If installed as a Hermes skill:
SCRIPT="$(dirname "$(readlink -f "$0")")/../skills/creative/gpt-image-2/scripts/gpt_image2.py"

# Or copy/symlink it anywhere — it's self-contained with zero dependencies beyond stdlib + curl
cp scripts/gpt_image2.py /usr/local/bin/gpt-image2

The script has zero pip dependencies — only Python 3.8+ stdlib and curl.

API Reference

Generations (Text-to-Image)

Item	Value
Endpoint	`POST {base_url}/images/generations`
Auth	`Authorization: Bearer {api_key}`
Content-Type	`application/json`

Edits (Image-to-Image)

Item	Value
Endpoint	`POST {base_url}/images/edits`
Auth	`Authorization: Bearer {api_key}`
Content-Type	`multipart/form-data`

Parameters

Generations (JSON body):

Param	Type	Required	Description
`model`	string	yes	`gpt-image-2`
`prompt`	string	yes	Text description
`n`	int	no	Number of images (default 1)
`size`	string	no	`1024x1024`, `1536x1024`, `1024x1536`
`quality`	string	no	`low`, `medium`, `high` (default `auto`)
`format`	string	no	`png`, `jpg`, `webp` (default `png`)

Edits (form-data):

Param	Type	Required	Description
`model`	string	yes	`gpt-image-2`
`prompt`	string	yes	Edit instruction
`image`	file	yes	Source image (PNG, max 4 images)
`n`	int	no	Number of outputs (default 1)
`size`	string	no	`1024x1024`, `1536x1024`, `1024x1536`, or `auto`
`quality`	string	no	`low`, `medium`, `high` (default `auto`)

Agent Integration Guide

This skill is designed to be agent-agnostic. Any AI agent can use it by:

Locate the script: Find gpt_image2.py in the skill's scripts/ directory
Call via shell: python3 \x3Cpath>/gpt_image2.py generate "prompt" -o output.png
Parse stdout: The script prints Saved: \x3Cpath> (\x3Csize> KB) on success

Integration Examples

Hermes / Claude Code / Codex / OpenClaw:

python3 /path/to/gpt-image-2/scripts/gpt_image2.py generate "prompt" -o output.png --quality low

From Python (any agent):

import subprocess, json
result = subprocess.run(
    ["python3", script_path, "generate", prompt, "-o", output_path, "--quality", "low"],
    capture_output=True, text=True, timeout=600
)
# Parse result.stdout for "Saved: \x3Cpath>"

From Node.js / TypeScript:

const { execSync } = require('child_process');
const output = execSync(`python3 ${scriptPath} generate "${prompt}" -o ${outputPath}`);
// Parse output.toString() for "Saved: ..."

Workflow: Agent Generates Images

Always use the CLI script — handles config resolution, auth security, and response parsing
Use low quality for drafts, high quality for final output
For edits: --size auto preserves original dimensions (recommended)
The script outputs: HTTP status, time elapsed, output file path and size
Parse the output: look for Saved: \x3Cpath> lines to find generated files

Workflow: Agent Edits Existing Images

Save or locate the source image path
Call gpt_image2.py edit \x3Cimage_path> "\x3Cedit_prompt>" --output \x3Coutput_path>
Edit endpoint can accept up to 4 images via repeated --image flags
Use --size auto to preserve original dimensions

Important Pitfalls

--api-key flag is visible in shell history and ps aux — prefer config file (api_key_env) or environment variables.
The edits endpoint does NOT support response_format — always returns b64_json regardless.
gpt-image-2 generations may time out on some relay endpoints — use --timeout flag (default 600s).
Prompt with special characters — the script writes prompts to temp files internally, avoiding shell escaping issues. No need to worry about quoting.
Authorization header is never passed via -H — the script uses curl -K temp config file, deleted immediately after use. Keys never appear in ps aux.
Config file permissions — the script warns if config has group/other read permissions. Run chmod 600 \x3Cconfig> to fix.
Zero pip dependencies — the script only requires Python 3.8+ stdlib and curl. No installation step needed.
Chinese text in prompts may not render correctly — gpt-image-2's Chinese rendering is unstable; it often ignores Chinese constraints and outputs English text in images. Consider using Gemini for Chinese text rendering.

Usage Guidance

This script is generally coherent with its stated purpose, but review these before installing or using it: (1) The script can read config files (~/.config..., ~/.gpt-image-2-config.json, and legacy ~/.hermes/*) and can read an environment variable whose name is configured—so only put API keys you trust the tool to use. (2) The --base-url option and config.base_url let callers send requests to any HTTP endpoint; only use a base_url you trust (the default is api.openai.com). Malicious base_url settings would cause your API key to be sent to a third party. (3) Avoid storing sensitive keys in plaintext config files; prefer a restricted environment variable and secure file permissions (the tool warns about loose perms). (4) Because the registry metadata lists no required env vars even though the tool documents them, verify the key name and config before handing over credentials. If you don't trust the skill source, inspect the full script locally (especially where base_url and api_key are read and used) and restrict file permissions if you save config files.

Capability Analysis

Type: OpenClaw Skill Name: gpt-image-2-api Version: 2.0.0 The gpt-image-2 skill is a well-implemented utility for image generation and editing via OpenAI-compatible APIs. The core script (scripts/gpt_image2.py) demonstrates security-conscious design by using temporary configuration files for curl headers to prevent API keys from appearing in process listings (ps aux) and by enforcing restrictive file permissions (0o600) on its configuration files. It avoids shell injection by using subprocess.run with argument lists and contains no evidence of data exfiltration, persistence mechanisms, or malicious prompt injection.

Capability Tags

requires-sensitive-credentials

Capability Assessment

ℹ Purpose & Capability

The name/description match the included script: a CLI that posts to an images endpoint and saves base64-encoded images. However the registry metadata declares no required environment variables while both SKILL.md and the script document/accept environment variables (GPT_IMAGE2_API_KEY, GPT_IMAGE2_BASE_URL and configurable api_key_env). This is a mild metadata inconsistency but not by itself malicious.

ℹ Instruction Scope

Runtime instructions and the script are narrowly scoped to image generation/editing via HTTP calls. They reference config files (XDG path, ~/.config, ~/.gpt-image-2-config.json, and legacy ~/.hermes/*) which is expected for config discovery. A notable capability in the instructions: the user (or an integrator) can override base_url to any endpoint, so API keys supplied to the tool may be sent to a non-OpenAI server if a malicious base_url is used.

✓ Install Mechanism

No install spec; the skill ships a self-contained Python script that uses only stdlib + curl. No downloads from arbitrary URLs or package installs are required by the skill itself.

⚠ Credentials

The skill accepts an API key via CLI, config file (plaintext), named env var (GPT_IMAGE2_API_KEY) or an arbitrary env var name set in config.api_key_env. Requiring or reading an env var is reasonable for an API client, but the metadata did not declare any required env vars. The ability to set arbitrary base_url + arbitrary api_key_env increases the risk that credentials could be directed to a third-party service or that an unexpected env var name will be read; the script also documents storing plaintext keys in config files (not recommended).

✓ Persistence & Privilege

The skill does not request always:true, does not auto-enable itself, and does not modify other skills. It only suggests copying the script to PATH (user action). No elevated or persistent system privileges are requested.

Version History

v2.0.0

v2.0: agent-agnostic redesign, XDG config paths, config --init/--show commands, zero pip deps, multi-agent integration guide, removed Hermes-specific references

v1.1.0

Initial publish: text-to-image and image-to-image editing via OpenAI gpt-image-2 API, configurable base_url/api_key, security-first config, curl -K auth safety

Metadata

Slug gpt-image-2-api

Version 2.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 2

Frequently Asked Questions

What is GPT Image 2 API?

Generate and edit images via OpenAI gpt-image-2 model. Agent-agnostic CLI — works with any AI agent (Hermes, Claude Code, Codex, OpenClaw, etc.). Supports co... It is an AI Agent Skill for Claude Code / OpenClaw, with 84 downloads so far.

How do I install GPT Image 2 API?

Run "/install gpt-image-2-api" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is GPT Image 2 API free?

Yes, GPT Image 2 API is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does GPT Image 2 API support?

GPT Image 2 API is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created GPT Image 2 API?

It is built and maintained by Cong Pendy (@jancong); the current version is v2.0.0.

More Skills

GPT Image 2 API