功能描述

Generate images and videos with AIsa. Four image models (Google Gemini 3 Pro Image, Alibaba Wan 2.7 image + image-pro, ByteDance Seedream) and four Wan video...

使用说明 (SKILL.md)

Media Gen 🎬

Name: Openclaw Media Gen
Author: baofeng-tech

Generate images and videos with a single AIsa API key. Full support for every image and video model AIsa routes through its Unified LLM Gateway, across three different endpoint paths.

Compatibility

Works with any agentskills.io-compatible harness, including:

Claude Code and Claude (Anthropic)
OpenAI Codex
Cursor
Gemini CLI (Google)
OpenCode, Goose, OpenClaw, Hermes
and any other harness that implements the Agent Skills specification

Requires Python 3, a POSIX shell, and AISA_API_KEY (get one at aisa.one).

🔥 What You Can Do

Image — Gemini (base64 inline)

"Generate a cyberpunk-style city nightscape, neon lights, rainy night, cinematic feel"

Image — Wan 2.7 (URL in chat response)

"Generate an ultra-detailed product shot of a red panda, studio lighting, sharp focus"

Image — Seedream (OpenAI-compatible, large format)

"Generate a 2048×2048 magazine cover: neo-noir detective portrait, film grain"

Video — text-to-video (Wan t2v)

"Sweeping establishing shot of a neon cyberpunk skyline at dusk, 5 seconds"

Video — image-to-video (Wan i2v)

"Starting from this reference image, gentle camera push-in with parallax"

Supported Models

Image generation — 4 models, 3 endpoints

Model	Developer	Endpoint	Notes
`gemini-3-pro-image-preview`	Google	`POST /v1/models/{model}:generateContent`	Images returned as base64 in `candidates[].parts[].inline_data`
`wan2.7-image`	Alibaba	`POST /v1/chat/completions`	Images returned as URL parts in `choices[].message.content[]` (type=`image`). $0.030/image
`wan2.7-image-pro`	Alibaba	`POST /v1/chat/completions`	Higher fidelity. $0.075/image
`seedream-4-5-251128`	ByteDance	`POST /v1/images/generations`	OpenAI-compatible. Minimum 3,686,400 pixels (e.g. 1920×1920). $0.040/image

Video generation — 4 Wan variants, 1 endpoint

Model	Kind	Image field	Output SR
`wan2.6-t2v`	text-to-video	none	1080
`wan2.6-i2v`	image-to-video	`input.img_url` (string)	720
`wan2.7-t2v`	text-to-video	none	720
`wan2.7-i2v`	image-to-video	`input.media` (array) ⚠	720

⚠ Schema trap on wan2.7-i2v. It takes the reference image in input.media (array of URLs), not input.img_url like wan2.6-i2v. Submissions without media return HTTP 200 with a task_id, then fail downstream with InvalidParameter: Field required: input.media. The bundled client routes this automatically — just pass --img-url and pick the model.

Quick Start

export AISA_API_KEY="your-key"

# Any image model — client routes to the right endpoint
python3 scripts/media_gen_client.py image \
  --model gemini-3-pro-image-preview \
  --prompt "A cute red panda, cinematic lighting" \
  --out out.png

python3 scripts/media_gen_client.py image \
  --model wan2.7-image-pro \
  --prompt "Ultra-detailed product shot of a red panda" \
  --out out.png

python3 scripts/media_gen_client.py image \
  --model seedream-4-5-251128 \
  --prompt "Neo-noir detective portrait, film grain" \
  --size 2048x2048 \
  --out out.png

# Video — text-to-video (no image needed)
python3 scripts/media_gen_client.py video-create \
  --model wan2.7-t2v \
  --prompt "Sweeping shot of a neon cyberpunk skyline"

# Video — image-to-video on wan2.7-i2v (client routes to input.media[])
python3 scripts/media_gen_client.py video-create \
  --model wan2.7-i2v \
  --prompt "gentle zoom with parallax" \
  --img-url "https://example.com/reference.jpg" \
  --duration 5

# Wait and download
python3 scripts/media_gen_client.py video-wait \
  --task-id \x3Ctask_id> --download --out out.mp4

🖼️ Image Generation — endpoint reference

Gemini family → `POST /v1/models/{model}:generateContent`

Documentation: Google Gemini Chat.

curl -X POST "https://api.aisa.one/v1/models/gemini-3-pro-image-preview:generateContent" \
  -H "Authorization: Bearer $AISA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents":[
      {"role":"user","parts":[{"text":"A cute red panda, cinematic lighting"}]}
    ]
  }'

Response contains candidates[].parts[].inline_data with {mime_type, data} where data is a base64 PNG.

Wan 2.7 family → `POST /v1/chat/completions`

Documentation: Image Generation via Chat.

Critical rule: messages[].content must be an array of typed parts. A plain string returns HTTP 400 invalid_parameter_error.

curl -X POST "https://api.aisa.one/v1/chat/completions" \
  -H "Authorization: Bearer $AISA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "wan2.7-image",
    "messages": [
      {"role":"user","content":[
        {"type":"text","text":"A cute red panda, ultra-detailed, cinematic lighting"}
      ]}
    ],
    "n": 1
  }'

Images come back as {type: "image", image: "\x3Curl>"} parts inside choices[].message.content[].

Seedream → `POST /v1/images/generations`

Documentation: OpenAI-Compatible Image Generations.

curl -X POST "https://api.aisa.one/v1/images/generations" \
  -H "Authorization: Bearer $AISA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "seedream-4-5-251128",
    "prompt": "A cute red panda, ultra-detailed, cinematic lighting",
    "n": 1,
    "size": "2048x2048"
  }'

Response: data[].url or data[].b64_json. Upstream enforces a minimum of 3,686,400 pixels. 1024×1024 and 1536×1536 get rejected. Any aspect ratio works as long as width × height ≥ 3,686,400.

🎞️ Video Generation — endpoint reference

Create task → `POST /apis/v1/services/aigc/video-generation/video-synthesis`

Documentation: Create video generation task. Header X-DashScope-Async: enable is required.

# wan2.6-t2v — text-to-video
curl -X POST "https://api.aisa.one/apis/v1/services/aigc/video-generation/video-synthesis" \
  -H "Authorization: Bearer $AISA_API_KEY" \
  -H "Content-Type: application/json" \
  -H "X-DashScope-Async: enable" \
  -d '{
    "model":"wan2.6-t2v",
    "input":{"prompt":"cinematic close-up, slow push-in"},
    "parameters":{"resolution":"720P","duration":5}
  }'

# wan2.7-i2v — image-to-video (⚠ input.media not input.img_url)
curl -X POST "https://api.aisa.one/apis/v1/services/aigc/video-generation/video-synthesis" \
  -H "Authorization: Bearer $AISA_API_KEY" \
  -H "Content-Type: application/json" \
  -H "X-DashScope-Async: enable" \
  -d '{
    "model":"wan2.7-i2v",
    "input":{
      "prompt":"gentle zoom with parallax",
      "media":["https://example.com/reference.jpg"]
    },
    "parameters":{"resolution":"720P","duration":5}
  }'

Poll task → `GET /apis/v1/services/aigc/tasks/{task_id}`

Documentation: Get video generation task result.

task_id is a path parameter. The query-string form ?task_id=... returns HTTP 500 unsupported uri.

curl "https://api.aisa.one/apis/v1/services/aigc/tasks/YOUR_TASK_ID" \
  -H "Authorization: Bearer $AISA_API_KEY"

Python Client

The bundled client at scripts/media_gen_client.py auto-routes each image model to the correct endpoint and normalizes the response to a saved file.

# Image — model picks the endpoint
python3 scripts/media_gen_client.py image \
  --model \x3Cgemini-3-pro-image-preview | wan2.7-image | wan2.7-image-pro | seedream-4-5-251128> \
  --prompt "..." \
  --out out.png

# Video — create task
python3 scripts/media_gen_client.py video-create \
  --model \x3Cwan2.6-t2v | wan2.6-i2v | wan2.7-t2v | wan2.7-i2v> \
  --prompt "..." \
  [--img-url https://... (required for -i2v models)] \
  [--duration 5|10] \
  [--resolution 720P|1080P]

# Video — poll / wait / download
python3 scripts/media_gen_client.py video-status --task-id \x3Cid>
python3 scripts/media_gen_client.py video-wait --task-id \x3Cid> --poll 10 --timeout 600
python3 scripts/media_gen_client.py video-wait --task-id \x3Cid> --download --out out.mp4

API Reference

This skill calls the following AIsa endpoints directly:

Google Gemini Chat — generateContent — Gemini image models
Image Generation via Chat — Wan 2.7 image family
OpenAI-Compatible Image Generations — Seedream
Create video generation task — all 4 Wan video variants
Get video generation task result — async polling

See the full AIsa API Reference for the complete catalog.

License

MIT — see LICENSE at the repo root.

安全使用建议

Before installing, confirm you trust the AIsa service and are comfortable providing AISA_API_KEY. Review generated-media costs and output file paths when invoking the bundled client.

功能分析

Type: OpenClaw Skill Name: openclaw-media-gen-aisa Version: 1.0.0 The skill provides a legitimate interface for generating AI images and videos via the AIsa API (api.aisa.one). The Python client (scripts/media_gen_client.py) uses standard library modules to handle API requests and download media files, with no evidence of data exfiltration, obfuscation, or malicious execution.

能力标签

requires-sensitive-credentials

能力评估

✓ Purpose & Capability

The stated purpose is AI image/video generation, and the disclosed models, endpoints, and bundled Python client are aligned with that purpose.

ℹ Instruction Scope

The skill includes user-directed command examples for generating media and downloading results. These are expected for the purpose, but users should review prompts, costs, and output paths before running them.

✓ Install Mechanism

There is no install script or package manager step; the skill requires python3 and uses a bundled script.

ℹ Credentials

The skill requires AISA_API_KEY and sends requests to api.aisa.one. This is proportionate for the AIsa media-generation service, but the key is sensitive and may authorize billable API use.

✓ Persistence & Privilege

No background persistence, startup hooks, or privilege escalation are shown. Local file writes appear limited to user-specified generated media outputs.

版本历史

v1.0.0

Initial release of openclaw-media-gen-aisa: unified image and video generation with multiple models via a single API key. - Supports 4 image models (Gemini, Wan 2.7, Wan 2.7 Pro, Seedream) and 4 Wan video variants (t2v/i2v, v2.6/v2.7). - Single API key setup; the client auto-routes requests to the correct model/endpoint. - Works across OpenClaw, Claude Code, Cursor, OpenAI Codex, Gemini CLI, Hermes, and other AgentSkills-compatible tools. - Includes quick-start CLI and full endpoint documentation for all models. - Handles image-to-video model schema differences (e.g., input.media vs. input.img_url) automatically. - Python 3 and POSIX shell required.

元数据

Slug openclaw-media-gen-aisa

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

Openclaw Media Gen 是什么？

Generate images and videos with AIsa. Four image models (Google Gemini 3 Pro Image, Alibaba Wan 2.7 image + image-pro, ByteDance Seedream) and four Wan video... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 53 次。

如何安装 Openclaw Media Gen？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install openclaw-media-gen-aisa」即可一键安装，无需额外配置。

Openclaw Media Gen 是免费的吗？

是的，Openclaw Media Gen 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Openclaw Media Gen 支持哪些平台？

Openclaw Media Gen 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Openclaw Media Gen？

由 baofeng-tech（@baofeng-tech）开发并维护，当前版本 v1.0.0。

Openclaw Media Gen