Description

GPT-4o Image Generation & Editing Skill - Create, edit, transform, and analyze images using GPT-4o native image-2 API. Supports text-to-image, inpainting, ou...

README (SKILL.md)

\r \r

Image-2 Skill\r

Name: Image-2 Skill
Author: gpt

\r

Create, edit, transform, and analyze images with GPT-4o's native image generation API\r \r

When to Use This Skill\r

\r Use this skill whenever the user needs to:\r

Generate images from text descriptions ("画一张...", "生成图片...", "create an image of...")\r
Edit existing images with natural language ("把背景去掉", "add a sunset", "换成蓝色")\r
Create variations of an image ("生成几个变体", "make 4 variations")\r
Analyze/describe images ("这张图是什么", "describe this image", "提取文字")\r
Remove backgrounds ("去除背景", "remove background")\r
Style transfer ("变成水彩风格", "make it look like Van Gogh")\r
Create marketing visuals ("设计海报", "make a social media post")\r
Product photography ("产品图", "product shot on white background")\r
UI/UX mockups ("界面设计", "app mockup", "website screenshot")\r \r

Core Workflows\r

\r

Workflow 1: Text-to-Image Generation\r

\r When the user describes an image they want to create:\r \r

Enhance the prompt — Automatically add quality boosters:\r
- Append professional photography/art terms based on context\r
- Add lighting, composition, and mood details if not specified\r
- Specify output format and dimensions if needed\r \r

Call the API — Use generateImage() with the enhanced prompt:\r

const result = await generateImage(enhancedPrompt, { size, quality, style });\r
```\r

\r 3. Save and present — Download the image to the project directory and show the user:\r

Save to ./generated-images/ by default\r
Return the file path and a brief description\r \r

Workflow 2: Image Editing\r

\r When the user wants to modify an existing image:\r \r

Locate the source image — Find the image file path from the conversation context\r
Parse the edit intent — Understand what changes the user wants\r

Call the edit API — Use editImage() with the source and instruction:\r

const result = await editImage(imagePath, editInstruction, { mask: maskPath });\r
```\r

Present the result — Show the edited image and describe what changed\r \r

Workflow 3: Image Analysis\r

\r When the user asks about an image:\r \r

Get the image — From file path or URL\r

Analyze with GPT-4o Vision — Use describeImage():\r

const result = await describeImage(imageSource, question);\r
```\r

Report findings — Present the analysis in a structured format\r \r

Workflow 4: Batch Generation\r

\r When the user needs multiple images:\r \r

Parse the batch request — Understand variations needed\r
Generate in parallel — Call generateImage() for each variant\r
Organize results — Save with descriptive filenames\r \r

Prompt Enhancement Rules\r

\r When generating images, automatically enhance the user's prompt:\r \r

Quality Boosters (always append unless user specifies quality)\r

professional quality, high resolution, sharp details\r
```\r
\r
### Context-Based Additions\r
| User Intent | Auto-Add |\r
|-------------|----------|\r
| Product photo | "studio lighting, clean background, commercial photography" |\r
| Portrait | "professional portrait photography, natural lighting" |\r
| Social media | "eye-catching, vibrant colors, modern design" |\r
| Illustration | "detailed illustration, professional artist quality" |\r
| Logo/branding | "clean vector style, scalable, minimal details" |\r
| Architecture | "architectural visualization, realistic rendering" |\r
| Food | "appetizing, food styling, professional food photography" |\r
| UI mockup | "clean design, modern interface, pixel-perfect" |\r
\r
### Size Recommendations\r
| Use Case | Recommended Size |\r
|----------|-----------------|\r
| Social media post | `1024x1024` (square) |\r
| Story/vertical | `1024x1792` |\r
| Banner/landscape | `1792x1024` |\r
| Product listing | `1024x1024` |\r
| Presentation | `1792x1024` |\r
| Wallpaper | `1792x1024` |\r
\r
## Style Presets\r
\r
Quick style references for common requests:\r
\r
| Preset Name | Style Description |\r
|-------------|-------------------|\r
| `product` | Clean white background, studio lighting, commercial photography |\r
| `lifestyle` | Natural setting, warm lighting, aspirational mood |\r
| `minimalist` | Simple composition, negative space, clean lines |\r
| `vintage` | Retro color grading, film grain, nostalgic mood |\r
| `futuristic` | Neon accents, dark background, sci-fi aesthetic |\r
| `watercolor` | Soft edges, pastel palette, artistic brush strokes |\r
| `3d-render` | Octane render, realistic materials, dramatic lighting |\r
| `anime` | Japanese animation style, vibrant, expressive |\r
| `sketch` | Pencil drawing, hand-drawn, artistic |\r
| `flat-design` | Vector style, bold colors, geometric shapes |\r
\r
## API Reference\r
\r
### `generateImage(prompt, options)`\r
Generate a new image from text description.\r
\r
**Parameters:**\r
- `prompt` (string) — Image description (auto-enhanced by this skill)\r
- `options` (object):\r
  - `size` — `1024x1024` | `1024x1792` | `1792x1024` (default: `1024x1024`)\r
  - `quality` — `standard` | `hd` (default: `standard`)\r
  - `style` — `vivid` | `natural` (default: `vivid`)\r
  - `model` — `gpt-image-2` | `dall-e-3` (default: `gpt-image-2`)\r
  - `saveTo` — File path to save the image (default: `./generated-images/`)\r
\r
**Returns:** `{ success, url, localPath, revisedPrompt }`\r
\r
### `editImage(imagePath, prompt, options)`\r
Edit an existing image with natural language instructions.\r
\r
**Parameters:**\r
- `imagePath` (string) — Path to the source image\r
- `prompt` (string) — Edit instruction\r
- `options` (object):\r
  - `mask` — Path to mask image (white = edit area, black = keep)\r
  - `size` — Output size\r
  - `model` — `gpt-image-2` | `dall-e-3` (default: `gpt-image-2`)\r
\r
**Returns:** `{ success, url, localPath }`\r
\r
### `generateVariations(imagePath, options)`\r
Generate creative variations of an existing image.\r
\r
**Parameters:**\r
- `imagePath` (string) — Path to the source image\r
- `options` (object):\r
  - `count` — Number of variations 1-4 (default: 2)\r
  - `size` — Output size\r
\r
**Returns:** `{ success, variations: [{ url, localPath }] }`\r
\r
### `describeImage(imageSource, question)`\r
Analyze an image using GPT-4o Vision.\r
\r
**Parameters:**\r
- `imageSource` (string) — File path or URL of the image\r
- `question` (string|null) — Specific question about the image (default: general description)\r
\r
**Returns:** `{ success, description }`\r
\r
### `downloadImage(url, savePath)`\r
Download a generated image to local storage.\r
\r
**Parameters:**\r
- `url` (string) — Image URL from generation API\r
- `savePath` (string|null) — Local file path (default: auto-generated in `./generated-images/`)\r
\r
**Returns:** `{ success, localPath }`\r
\r
## Error Handling\r
\r
| Error | Cause | Resolution |\r
|-------|-------|------------|\r
| `Invalid API key` | OPENAI_API_KEY not set or invalid | Check environment variable |\r
| `Content policy violation` | Prompt violates safety guidelines | Rephrase the prompt |\r
| `Rate limit exceeded` | Too many requests | Wait and retry with backoff |\r
| `Image too large` | Source image exceeds size limit | Resize to under 4MB |\r
| `Timeout` | Generation took too long | Simplify prompt or retry |\r
\r
## Best Practices\r
\r
1. **Always enhance prompts** — Don't pass raw user input directly to the API\r
2. **Save locally** — Download generated images; URLs expire after 1 hour\r
3. **Use appropriate sizes** — Match the output size to the use case\r
4. **Prefer gpt-image-2** — Better quality and text rendering than dall-e-3\r
5. **Batch thoughtfully** — Generate 2-4 images max per request to avoid rate limits\r
6. **Describe edits clearly** — Be specific about what to change and where\r
\r
## Changelog\r
\r
### v1.1.0\r
- Added GPT-4o native image generation support (gpt-image-2 model)\r
- Added automatic prompt enhancement workflow\r
- Added image download and local save functionality\r
- Added style presets for quick reference\r
- Added batch generation workflow\r
- Improved error handling and documentation\r
\r
### v1.0.0\r
- Initial release with DALL-E 3 support\r
- Basic generate, edit, variations, and describe functions\r
\r
---\r
\r
**Tags:** `image-generation` `AI-art` `GPT-4o` `image-2` `gpt-image-2` `visual-creation` `marketing` `product-photos` `illustration` `design` `openai` `dall-e` `image-editing` `background-removal` `style-transfer` `ui-mockup`\r

Usage Guidance

This skill appears coherent and implements what it claims: it calls OpenAI via the official npm package and saves/reads images locally. Before installing/using it, consider: (1) Only provide image file paths or URLs you trust—the skill will read local files and download remote images when asked. (2) Any images you send to the API will be transmitted to OpenAI — review privacy/data handling and billing implications for your OPENAI_API_KEY. (3) Use an API key with limited scope/quotas if possible, monitor usage, and rotate keys if you suspect misuse. (4) If you want further assurance, review the remainder of the JS file (functions truncated here) and run the skill in a sandboxed environment first. Overall there are no red flags in dependencies, env vars, or installation method.

Capability Tags

requires-sensitive-credentials

Capability Assessment

✓ Purpose & Capability

Name/description match the requested dependency (openai) and the code. The only required environment variable is OPENAI_API_KEY which is appropriate for a skill that calls OpenAI image APIs. Declared models, saving behavior, and prompt-enhancement features align with image generation/editing purpose.

ℹ Instruction Scope

SKILL.md and the code direct the agent to read/save image files, download images from arbitrary URLs, and convert local files/URLs to base64 for API consumption. This is expected for editing/generation workflows, but it means the skill will read any file path or URL provided in the conversation context—so provide only safe/intentional paths and URLs. The SKILL.md does not instruct reading unrelated system files or other environment variables.

✓ Install Mechanism

Install uses the official 'openai' npm package declared in package.json. There are no downloads from personal servers, shorteners, or unknown archives in the install metadata; this is a standard Node dependency and proportionate to the task.

✓ Credentials

Only OPENAI_API_KEY is required and declared as primaryEnv. The code uses process.env.OPENAI_API_KEY and no other secrets or unrelated credentials are requested. This is proportionate for calling OpenAI APIs.

✓ Persistence & Privilege

always: false and no special system persistence is requested. The skill writes generated images to ./generated-images (its own directory) which is expected behavior and not a privilege escalation or modification of other skills or system-wide configuration.

Version History

v1.0.1

- Removed the requirement for the `curl` binary in the install prerequisites. - No other changes made; documentation, features, and API remain the same.

v1.0.0

Initial release of the Image2 skill. - Generate, edit, and transform images using GPT-4o's image2 API. - Supports text-to-image generation, image editing, image variations, and image analysis. - Includes prompting tips, usage examples, and technical details. - Compatible with PNG, JPEG, and WebP output formats. - Requires OpenAI API key with image generation enabled.

Metadata

Slug image-2

Version 1.0.1

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 2

Frequently Asked Questions

What is Image-2 Skill?

GPT-4o Image Generation & Editing Skill - Create, edit, transform, and analyze images using GPT-4o native image-2 API. Supports text-to-image, inpainting, ou... It is an AI Agent Skill for Claude Code / OpenClaw, with 73 downloads so far.

How do I install Image-2 Skill?

Run "/install image-2" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Image-2 Skill free?

Yes, Image-2 Skill is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Image-2 Skill support?

Image-2 Skill is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Image-2 Skill?

It is built and maintained by gpt (@gpt); the current version is v1.0.1.

More Skills

Image-2 Skill