Description

Generate or edit images with Gemini using the Google GenAI SDK. Use when the user asks to create, transform, render, or save one or more images in an OpenCla...

README (SKILL.md)

\r \r

Image Generation\r

Name: gemini-image-generation
Author: ztj7728

\r Use this skill when you need to create one or more image files from a text prompt, or edit one or more existing images with Gemini.\r \r

Requirements\r

\r \r

~/.openclaw/openclaw.json must include $.skills.entries["gemini-image-generation"].enabled set to true.\r
~/.openclaw/openclaw.json must include $.skills.entries["gemini-image-generation"].env with the following keys and values:\r
GEMINI_API_KEY required\r
GEMINI_MODEL_ID required\r
GEMINI_BASE_URL optional\r \r
example ~/.openclaw/openclaw.json:\r

{\r
  ......,\r
  "skills": {\r
    "entries": {\r
      "gemini-image-generation": {\r
        "enabled": true,\r
        "env": {\r
          "GEMINI_API_KEY": "sk-xxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",\r
          "GEMINI_MODEL_ID": "gemini-3.1-flash-image-preview",\r
          "GEMINI_BASE_URL": "https://custom-endpoint.com"\r
        }\r
      }\r
    }\r
  },\r
  ......\r
}\r
```\r
- Node.js must be installed in the workspace environment.\r
- Install dependencies once with `npm install` from the skill root.\r
\r
## When To Use\r
\r
- The user asks to generate a new image from a text prompt.\r
- The user asks to modify, restyle, extend, or otherwise edit one or more existing images.\r
- The user wants the generated image saved to a workspace file.\r
- The task should be handled through a reusable OpenClaw skill instead of ad hoc SDK code.\r
\r
## Procedure\r
\r
1. Convert the user request into a single clear image prompt.\r
2. If the user supplied source images, choose or confirm the input file path or paths inside the workspace.\r
3. If the user specified a target aspect ratio or size, pass them through as `--aspectRatio` and `--imageSize`.\r
4. Choose an output path inside the workspace unless the user already provided one.\r
5. For text-to-image, run [generate-image.mjs](./scripts/generate-image.mjs) with `--prompt`, `--output`, and optional image config arguments.\r
6. For image editing, run [edit-image.mjs](./scripts/edit-image.mjs) with `--prompt`, one or more `--input` values, `--output`, and optional image config arguments.\r
7. Read the api key from `GEMINI_API_KEY` and the model ID from `GEMINI_MODEL_ID` in the environment.\r
8. Optionally, read the base URL from `GEMINI_BASE_URL` in the environment for custom endpoints.\r
9. Return the saved image path or paths to the user.\r
10. After returning each image path, also output `MEDIA:\x3Cimage_path>` (e.g. `MEDIA:outputs/gemini-native-image.png`) so the image is displayed inline in the conversation.\r
\r
## Commands\r
\r
```powershell\r
node ./skills/gemini-image-generation/scripts/generate-image.mjs --prompt "Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme" --output "outputs/gemini-native-image.png"\r
```\r
\r
```powershell\r
node ./skills/gemini-image-generation/scripts/generate-image.mjs --prompt "Create a wide cinematic food photo of a nano banana dish in a fancy restaurant with a Gemini theme" --output "outputs/gemini-wide.png" --aspectRatio "16:9" --imageSize "2K"\r
```\r
\r
```powershell\r
node ./skills/gemini-image-generation/scripts/edit-image.mjs --prompt "Turn this cat into a watercolor illustration eating a nano-banana in a fancy restaurant under the Gemini constellation" --input "inputs/cat.png" --output "outputs/cat-watercolor.png" --aspectRatio "5:4" --imageSize "2K"\r
```\r
\r
```powershell\r
node ./skills/gemini-image-generation/scripts/edit-image.mjs --prompt "Create an office group photo of these people making funny faces" --input "inputs/person-1.jpg" --input "inputs/person-2.jpg" --input "inputs/person-3.jpg" --output "outputs/group-photo.png"\r
```\r
\r
## Notes\r
\r
- The script prints `TEXT:` lines for model text and `IMAGE:` lines for each saved file.\r
- After the skill finishes, always present every generated image to the user by outputting `MEDIA:\x3Cpath>` for each saved image path. This ensures the image is rendered inline in the conversation alongside the file path.\r
- The final JSON summary only includes generated image paths and optional image config so prompts, model IDs, and source image paths are not echoed back into logs.\r
- Saved file extensions follow the returned image mime type. If the requested output path uses a different suffix, the scripts keep the base name and write the file with the returned type instead.\r
- If the model returns multiple images, the scripts save them as `name-1.png`, `name-2.png`, and so on.\r
- `edit-image.mjs` supports repeated `--input` flags. You can also pass a comma-separated list to a single `--input` value.\r
- `edit-image.mjs` infers the source mime type from `.png`, `.jpg`, `.jpeg`, or `.webp`. Use one `--mime-type` for all inputs, or repeat `--mime-type` so it lines up with each `--input`.\r
- Both scripts accept `--aspectRatio` and `--imageSize`. They also accept the kebab-case forms `--aspect-ratio` and `--image-size`.\r
- The scripts only send `config.imageConfig` when at least one of those parameters is provided.

Usage Guidance

This skill appears coherent and implements image generation/editing via Google GenAI. Before installing: 1) Only enable it if you trust the skill source and are comfortable sending prompts and any source images to Gemini (the skill base64-encodes and uploads input images to the API). 2) Keep GEMINI_API_KEY secret (store it in your OpenClaw skill config as instructed). 3) If you use GEMINI_BASE_URL, ensure it points to a trusted endpoint (a custom base URL could redirect requests to a non-Google host). 4) Run 'npm install' in the skill directory to install @google/genai, and review that dependency if you have concerns. 5) Be mindful of privacy: do not send PII or sensitive images unless you accept they will be processed by the configured GenAI endpoint.

Capability Analysis

Type: OpenClaw Skill Name: gemini-image-generation Version: 1.0.10 The skill provides image generation and editing capabilities but includes high-risk behaviors and suspicious formatting. It allows arbitrary file read and write access via the `--input` and `--output` arguments in `edit-image.mjs` and `generate-image.mjs`, which lacks path sanitization. Furthermore, it supports a `GEMINI_BASE_URL` environment variable that can redirect sensitive data, including the `GEMINI_API_KEY` and local file content, to an arbitrary external endpoint. The `README.md` file also utilizes character-level spacing obfuscation, a common technique for evading simple static analysis filters.

Capability Assessment

✓ Purpose & Capability

Name/description, required binaries (node, npm), and required env vars (GEMINI_API_KEY, GEMINI_MODEL_ID) align with the declared purpose of calling Google GenAI (Gemini) to generate/edit images. The package.json depends on @google/genai which is appropriate for this functionality.

✓ Instruction Scope

SKILL.md and the scripts only instruct reading workspace image files, reading GEMINI_* environment variables, invoking the GoogleGenAI client, and saving returned images to workspace. There are no instructions to read unrelated system files, other credentials, or to send data to unexpected endpoints. The skill will of course transmit prompts and any provided source images to the Gemini API (expected for image editing).

ℹ Install Mechanism

No formal install spec is included (instruction-only install), but package.json and SKILL.md instruct the user to run 'npm install' in the skill root. This is expected for a Node-based skill; there is no third-party binary download or untrusted URL referenced.

✓ Credentials

Requested env vars are limited and appropriate: GEMINI_API_KEY (primary) and GEMINI_MODEL_ID are required; GEMINI_BASE_URL is optional for custom endpoints. No unrelated credentials or broad system config paths are requested.

✓ Persistence & Privilege

The skill does not request always:true, does not modify other skills, and requires explicit enabling in ~/.openclaw/openclaw.json. Autonomous invocation is allowed (platform default) but not combined with elevated persistence or unrelated credential access.

Version History

v1.0.10

No user-facing changes in this release. - Version bump to 1.0.10 with no file modifications or content updates detected.

v1.0.9

- Added instructions to output MEDIA:<image_path> for each generated image so they appear inline in conversations. - Clarified that every generated image path must be followed by a MEDIA line after task completion. - Updated example commands to reflect the new folder structure (`./skills/gemini-image-generation/`). - No code changes; SKILL.md documentation updated for improved user experience.

v1.0.8

- Clarified skill activation requirements: now documents explicit settings needed in ~/.openclaw/openclaw.json. - Added example configuration block for enabling the skill and setting required environment variables. - Installation instructions updated to specify running npm install from the skill root. - No changes to core commands, features, or supported parameters. - General documentation clarification and improved onboarding steps.

v1.0.7

- Added new script: scripts/gemini-image-runtime.mjs. - File extension of saved images now follows the returned image mime type; output uses the requested base name but will match returned type. - Minor update to documentation to clarify file extension handling.

v1.0.6

- Added npm as a required binary for the skill environment. - Updated documentation metadata to specify both node and npm as required.

v1.0.5

- Updated metadata to move optionalEnv into the openclaw object and adjust its format. - No changes to code or functionality.

v1.0.4

- Declared GEMINI_BASE_URL as optional in the skill metadata and clarified its usage in the environment. - Moved GEMINI_BASE_URL from required to optional environment variables in the OpenClaw metadata. - No functional changes to the skill's behavior.

v1.0.3

Initial release providing Gemini-based image generation and editing. - Added scripts for generating images from text prompts and editing images using Gemini (generate-image.mjs, edit-image.mjs). - Includes documentation in README.md and usage guidelines in SKILL.md. - Supports options for custom aspect ratio, image size, and multiple inputs. - Outputs saved images and a JSON summary with paths and config details. - Environment variables required: GEMINI_API_KEY, GEMINI_MODEL_ID, and optional GEMINI_BASE_URL.

v1.0.2

- Removed sample scripts and documentation files: README.md, package.json, scripts/edit-image.mjs, scripts/generate-image.mjs. - The skill now provides only metadata and usage documentation; no execution scripts or supporting files are included. - Metadata updated in SKILL.md to new OpenClaw format with emoji and revised environment variable requirements.

v1.0.1

- Added a metadata block detailing environment and dependency requirements. - The final JSON summary now includes only generated image paths and optional image config, omitting prompts, model IDs, and source image paths from logs.

v1.0.0

- Initial release of the Gemini image generation skill. - Supports generating images from text prompts and editing existing images using the Google GenAI SDK. - Handles multiple input images, aspect ratio, and image size options. - Saves generated or edited images to specified output paths. - Requires environment variables for API key and model ID.

Metadata

Slug gemini-image-generation

Version 1.0.10

License MIT-0

All-time Installs 3

Active Installs 3

Total Versions 11

Frequently Asked Questions

What is gemini-image-generation?

Generate or edit images with Gemini using the Google GenAI SDK. Use when the user asks to create, transform, render, or save one or more images in an OpenCla... It is an AI Agent Skill for Claude Code / OpenClaw, with 626 downloads so far.

How do I install gemini-image-generation?

Run "/install gemini-image-generation" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is gemini-image-generation free?

Yes, gemini-image-generation is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does gemini-image-generation support?

gemini-image-generation is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created gemini-image-generation?

It is built and maintained by Joe (@ztj7728); the current version is v1.0.10.

More Skills

gemini-image-generation