← Back to Skills Marketplace
zhy2015

hidream-model-gen

by harry zhu · GitHub ↗ · v1.0.5 · MIT-0
cross-platform ✓ Security Clean
305
Downloads
0
Stars
0
Active Installs
6
Versions
Install in OpenClaw
/install hidream-model-gen
Description
Generate images and videos using Vivago AI (智小象) platform. Supports text-to-image, image-to-image, image-to-video, and keyframe-to-video generation. Use when...
README (SKILL.md)

Vivago AI Skill

Integration with Vivago AI (智小象) platform for AI-powered image and video generation.

Supported Features

Image Generation

  • Text to Image (txt2img): Generate images from text descriptions
  • Image to Image (img2img): Transform existing images based on prompts, including style transfer, image editing, and multi-image fusion

Video Generation

  • Text to Video (txt2vid): Generate videos from text descriptions
  • Image to Video (img2vid): Generate videos from static images
  • Keyframe to Video (keyframe_to_video): Generate transition videos from start and end keyframes
  • Video Templates (template_to_video): 181 pre-defined video effects
  • Supports multiple model versions (v3Pro, v3L, kling-video-o1)

Additional Features

  • Image upload to Vivago storage
  • Batch generation (up to 4 images)
  • Multiple aspect ratios (1:1, 4:3, 3:4, 16:9, 9:16)
  • Automatic retry with polling

Architecture

Core Modules

scripts/
├── vivago_client.py       # Main API client
├── template_manager.py    # Template management
├── config_loader.py       # Configuration loading
├── enums.py              # Type enums (TaskStatus, AspectRatio, etc.)
├── exceptions.py         # Structured exceptions
└── config/               # Modular configuration files

Code Quality

  • Type Safety: Complete type annotations and enums
  • Exception Handling: Structured exception hierarchy
  • CI/CD: GitHub Actions for automated testing
  • Modular Config: Split configuration files for maintainability

Setup

Prerequisites

Before using this skill, you need to obtain a Vivago.ai API Token:

Step 1: Login to Vivago.ai

  1. Visit https://vivago.ai/ and log in to your account
  2. Check your remaining credits and consider subscribing to a suitable plan if needed

Step 2: Obtain Your Token

  1. After logging in, visit https://vivago.ai/prod-api/user/token
  2. The page will return your API Token (in JWT format)
  3. Copy this Token for configuration

Security Note: The Token is your credential for accessing the API. Please keep it secure and do not share it with others.

Environment Variables

Security Note: For secure deployments and AI Agents, the system requires the token to be passed strictly via the HIDREAM_AUTHORIZATION environment variable.

Export it securely in your current session:

export HIDREAM_AUTHORIZATION="your_vivago_api_token"

Note: STORAGE_AK and STORAGE_SK are deprecated and removed. The image upload uses secure pre-signed URLs provided by the Vivago API.

File Output Configuration

Important: By default, all generated resources (JSON results, downloaded images, and videos) will be output to the assets/ directory within the current working folder. Ensure this directory exists or the system has permission to create it.

Installation

pip install -r requirements.txt

Usage

Python API

from scripts import create_client, VivagoClient
from scripts.enums import AspectRatio, PortName, TaskStatus
from scripts.exceptions import TaskFailedError, TaskTimeoutError

# Create client
client = create_client()

# Text to image
results = client.text_to_image(
    prompt="a beautiful sunset over mountains",
    port=PortName.KLING_IMAGE,  # or PortName.NANO_BANANA
    wh_ratio=AspectRatio.RATIO_16_9,
    batch_size=2
)

# Image to video (using local image)
results = client.image_to_video(
    prompt="camera slowly zooming out",
    image_uuid=client.upload_image("/path/to/image.jpg"),
    port=PortName.V3PRO,
    wh_ratio=AspectRatio.RATIO_16_9,
    duration=5
)

# Keyframe to video (using start and end images)
results = client.keyframe_to_video(
    prompt="smooth transition from start to end",
    start_image_uuid=client.upload_image("/path/to/start.jpg"),
    end_image_uuid=client.upload_image("/path/to/end.jpg"),
    port=PortName.V3PRO,
    wh_ratio=AspectRatio.RATIO_16_9,
    duration=5
)

# Video Templates - use pre-defined effects
results = client.template_to_video(
    image_uuid=client.upload_image("/path/to/image.jpg"),
    template="ghibli",  # See available templates below
    wh_ratio=AspectRatio.RATIO_9_16
)

Error Handling

from scripts.exceptions import (
    TaskFailedError,
    TaskRejectedError,
    TaskTimeoutError,
    InvalidPortError
)

try:
    results = client.image_to_video(...)
except TaskFailedError as e:
    print(f"Task failed: {e.task_id}")
except TaskRejectedError as e:
    print(f"Content rejected: {e.reason}")
except TaskTimeoutError as e:
    print(f"Timeout after {e.timeout_seconds}s")
except InvalidPortError as e:
    print(f"Invalid port: {e.port}, available: {e.available}")

Command Line (Best for AI Agents)

For AI Agents: The easiest way to use this skill is through the provided CLI scripts. They automatically handle API communication, polling, and result parsing. By default, they use HiDream's native models.

Text to Image:

python3 scripts/txt2img.py \
  --prompt "a futuristic city" \
  --wh-ratio 16:9 \
  --batch-size 2 \
  --output ./assets/results.json

Note: This defaults to the hidream-txt2img model.

Text to Video:

python3 scripts/txt2vid.py \
  --prompt "a cybernetic dragon flying over a futuristic city" \
  --wh-ratio 16:9 \
  --duration 5 \
  --output ./assets/video_results.json

Note: This defaults to the v3Pro model.

Image to Video:

python3 scripts/img2video.py \
  --prompt "slow motion falling leaves" \
  --image ./assets/source_image.jpg \
  --duration 5 \
  --output ./assets/video.json

API Reference

Enums

from scripts.enums import (
    TaskStatus,      # PENDING, COMPLETED, PROCESSING, FAILED, REJECTED
    AspectRatio,     # RATIO_1_1, RATIO_4_3, RATIO_16_9, etc.
    PortCategory,    # TEXT_TO_IMAGE, IMAGE_TO_VIDEO, etc.
    PortName         # KLING_IMAGE, V3PRO, NANO_BANANA, etc.
)

Models

Feature Available Versions Default
Text to Image v3L (HiDream), kling-image-o1 v3L (via port hidream-txt2img)
Image to Video v3Pro, v3L, kling-video-o1 v3Pro
Keyframe to Video v3Pro, v3L v3Pro

Note for AI Agents: By default, all CLI tools (txt2img.py, txt2vid.py) are pre-configured to use HiDream's native models (hidream-txt2img for images, v3Pro for videos). You don't need to specify the model unless explicitly requested by the user.

Aspect Ratios

  • 1:1 - Square
  • 4:3 - Standard
  • 3:4 - Portrait
  • 16:9 - Widescreen
  • 9:16 - Mobile/Vertical

Task Status Codes

from scripts.enums import TaskStatus

TaskStatus.PENDING     # 0 - Pending
TaskStatus.COMPLETED   # 1 - Completed
TaskStatus.PROCESSING  # 2 - Processing
TaskStatus.FAILED      # 3 - Failed
TaskStatus.REJECTED    # 4 - Rejected (content review)

File Structure

vivago-ai-skill/
├── scripts/
│   ├── __init__.py         # Package exports
│   ├── vivago_client.py    # Core API client
│   ├── template_manager.py # Template management
│   ├── config_loader.py    # Configuration loader
│   ├── enums.py            # Type enums
│   ├── exceptions.py       # Exception classes
│   ├── logging_config.py   # Logging configuration
│   └── config/             # Modular config files
│       ├── base.json
│       ├── text_to_image.json
│       ├── image_to_video.json
│       └── ...
├── tests/
│   ├── conftest.py         # Pytest configuration
│   ├── archive/            # Archived tests
│   └── ...
├── docs/                   # Documentation
├── .github/workflows/      # CI configuration
├── requirements.txt
├── README.md
└── SKILL.md               # This file

Important Notes

Feishu Channel Messaging Guidelines

When sending generated content through Feishu (飞书) channel:

Content Type Send Method Example
Images ✅ Direct file upload Attach image file directly
Videos Must send as link https://media.vivago.ai/{video_uuid}

⚠️ Critical: Videos CANNOT be sent as file attachments in Feishu. Always construct and send the direct media URL:

https://media.vivago.ai/b1268f08-ac32-4b83-863f-a419797d768e.mp4

Why: Feishu does not support playable video attachments. Sending video files directly will result in delivery failure or unplayable content.

Image Download

Images can be downloaded using the correct URL format:

https://storage.vivago.ai/image/{image_name}.jpg

Example:

from scripts import create_client
import requests

client = create_client()

# Generate image
results = client.text_to_image(prompt="a cute cat")
image_name = results[0].get('image', '')

# Download image
image_url = f"https://storage.vivago.ai/image/{image_name}.jpg"
response = requests.get(image_url)
with open("output.jpg", "wb") as f:
    f.write(response.content)

Sending via Feishu:

# Download and send through Feishu
image_data = requests.get(image_url).content
# Then send image_data as file attachment via Feishu API

Asynchronous Processing

  • API calls are asynchronous with automatic polling
  • Images are automatically resized to max 1024px on longest side before upload
  • Video generation supports 5 or 10 second durations
  • Batch size for images: 1-4, for videos: 1
  • All API calls include automatic retry logic

Error Handling

The client handles common errors:

  • Network timeouts (with retry)
  • Rate limiting (with exponential backoff)
  • Invalid parameters (validation before API call)
  • Task failures (structured exceptions)

Exception Hierarchy

VivagoError (base)
├── VivagoAPIError
├── MissingCredentialError
├── InvalidPortError
├── ImageUploadError
├── TemplateNotFoundError
└── TaskError
    ├── TaskFailedError
    ├── TaskRejectedError
    └── TaskTimeoutError

Video Templates Reference

The following 181 video templates are available via template_to_video():

Quick Categories

Category Count Example Templates
Style Transfer 20+ ghibli, 1930s-2000s vintage styles
Harry Potter 4 magic_reveal_ravenclaw, gryffindor, hufflepuff, slytherin
Wings/Fantasy 10+ angel_wings, phoenix_wings, crystal_wings, fire_wings
Superheroes 5+ iron_man, cat_woman, ghost_rider
Dance 10+ apt, dadada, dance, limbo_dance
Effects 15+ ash_out, metallic_liquid, flash_flood
Thanksgiving 10+ turkey_chasing, autumn_feast, gratitude_photo
Comics/Cartoon 8+ gta_star, anime_figure, bring_comics_to_life
Products 8+ glasses_display, music_box, food_product_display
Scenes 20+ romantic_kiss, graduation, starship_chef

Popular Templates

Template ID Description
ghibli / ghibli2 Studio Ghibli animation style
magic_reveal_ravenclaw Harry Potter Ravenclaw transformation
magic_reveal_gryffindor Harry Potter Gryffindor transformation
magic_reveal_hufflepuff Harry Potter Hufflepuff transformation
magic_reveal_slytherin Harry Potter Slytherin transformation
iron_man Iron Man armor assembly
angel_wings / phoenix_wings / crystal_wings / fire_wings Wing transformations
cat_woman Cat Woman style
ghost_rider Ghost Rider flaming skull
joker Joker villain style
mermaid Mermaid underwater scene
snow_white Snow White princess
barbie Barbie princess transformation
me_in_hand Miniature figure in hand
music_box Rotating figure on music box
anime_figure Transform into anime figure
gta_star GTA game style transformation
apt / dadada / dance Dance templates
ash_out Disintegrate into ashes
eye_of_the_storm Thunder god awakening
metallic_liquid Metal mask transformation
flash_flood Water/flood effect
turkey_chasing / turkey_away / turkey_giant Thanksgiving turkey scenes
autumn_feast / autumn_stroll Autumn scenes
renovation_of_old_photos Colorize B&W photos
graduation Graduation ceremony
glasses / glasses_display Glasses/eyewear showcase
bikini / sexy_man / sexy_pants Fashion/beach
romantic_kiss / boyfriends_rose / girlfriends_rose Romantic scenes
ai_archaeologist / starship_chef / cyber_cooker Sci-fi characters
jungle_reign / panther_queen / roar_of_the_dustlands / tiger_snuggle Animal companions
instant_sadness / headphone_vibe / relax Emotion/reaction
frost_alert Cold/freeze effect
bald_me Bald transformation
boom_hair / curl_pop / long_hair Hair transformations
muscles Muscle transformation
face_punch / gun_point Action effects
static_shot / tracking_shot / orbit_shot / push_in / zoom_out / handheld_shot Camera movements
earth_zoom_in / earth_zoom_out Earth zoom effects

View All Templates

from scripts.template_manager import get_template_manager

manager = get_template_manager()
templates = manager.list_templates()

print(f"Total templates: {len(templates)}")
for tid, name in sorted(templates.items()):
    print(f"  {tid}: {name}")

Usage Example

from scripts import create_client

client = create_client()

# Upload image
image_uuid = client.upload_image("/path/to/photo.jpg")

# Apply Ghibli style template
results = client.template_to_video(
    image_uuid=image_uuid,
    template="ghibli",
    wh_ratio="9:16"
)

# Harry Potter transformation
results = client.template_to_video(
    image_uuid=image_uuid,
    template="magic_reveal_ravenclaw",
    wh_ratio="9:16"
)

Changelog

v0.9.0 (2026-03-09)

  • ✅ Code review complete (P0-P3)
  • ✅ Added GitHub Actions CI
  • ✅ Added type safety module (enums.py)
  • ✅ Added structured exceptions (exceptions.py)
  • ✅ Split configuration into modular files
  • ✅ Archived redundant code and tests
  • ✅ Pinned dependency versions

v0.8.2 (2026-03-08)

  • ✅ Template testing: 44 templates, 40 passed (90.9%)
  • ✅ Fixed metallic_liquid naming issue
  • ✅ Marked long_hair as deprecated

v0.8.0 (2026-03-07)

  • ✅ Completed Tier 1-4 testing
  • ✅ Established smart test optimization system
Usage Guidance
This skill appears to do what it says: it uploads images and requests generation from Vivago (vivago.ai) using the HIDREAM_AUTHORIZATION bearer token. Before installing or running it: 1) Only provide a Vivago API token you control and understand (keep it secret). 2) Be aware that any images you pass will be uploaded to Vivago's servers — avoid sending sensitive or private images. 3) Verify you trust the Vivago service and check its terms/privacy (retention & reuse of images). 4) Because the package includes executable Python code, inspect the code if you run it in sensitive environments; consider running it in an isolated environment (container/VM) and review network egress policies. 5) Note minor inconsistencies: the registry lists no install spec even though requirements.txt and source files are included, and some scripts reference an alternate env var (HIDREAM_TOKEN) and deprecated STORAGE_AK/STORAGE_SK — you can ignore those or set HIDREAM_AUTHORIZATION as instructed. 6) If you need higher assurance, ask the publisher for provenance (homepage or repository) — the skill's source/homepage is not provided in the metadata.
Capability Analysis
Type: OpenClaw Skill Name: hidream-model-gen Version: 1.0.5 The skill bundle is a legitimate and well-structured integration for the Vivago AI platform (vivago.ai). It provides a comprehensive API client (vivago_client.py) and several CLI utilities (txt2img.py, txt2vid.py, img2video.py) for generating AI images and videos. The code uses standard security practices, such as requiring authentication tokens via environment variables (HIDREAM_AUTHORIZATION) and performing image processing locally using the Pillow library. No indicators of data exfiltration, malicious execution, or prompt injection were found; all network activity is directed toward official Vivago AI endpoints.
Capability Assessment
Purpose & Capability
Name/description match the code and manifest: the code implements text->image, img->video, keyframe->video, template handling, and uploads to vivago.ai endpoints. The single required env var (HIDREAM_AUTHORIZATION) is exactly the API bearer token the client uses.
Instruction Scope
SKILL.md and the CLI scripts instruct the agent to call the packaged Python scripts, upload local images to Vivago (via pre-signed URLs), poll for results, and save outputs to an assets/ directory. The runtime instructions and code operate within that scope and do not request or read unrelated system secrets or remote endpoints outside vivago.ai and its storage domains.
Install Mechanism
No registry install spec is declared (skill is instruction-only in registry), but a requirements.txt and full Python source are included in the package. Installation is the normal pip-based flow (pip install -r requirements.txt). No downloads from arbitrary URLs or extract/install steps were found.
Credentials
The skill requires one environment variable (HIDREAM_AUTHORIZATION) which is used as a Bearer token. Some scripts reference HIDREAM_TOKEN as a fallback and mention deprecated STORAGE_AK/STORAGE_SK — these are not required for normal operation but appear as backward-compatible fallbacks. No unrelated credentials (e.g., AWS keys, GitHub tokens) are requested.
Persistence & Privilege
The skill does not request always: true and does not modify other skills or system-wide agent settings. It writes generated assets to a local assets/ directory (and /tmp for intermediate files) which is expected behavior for a generator tool.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install hidream-model-gen
  3. After installation, invoke the skill by name or use /hidream-model-gen
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.5
Version 1.0.5 - Changed CLI script instructions to use python3 instead of python. - No other user-facing changes.
v1.0.4
Version 1.0.4 - No code or documentation changes detected in this release. - Functionality, dependencies, and documentation remain unchanged from the previous version.
v1.0.3
- Added explicit dependencies on "requests" and "pillow". - Now requires environment variable HIDREAM_AUTHORIZATION for authentication (no .env fallback). - Security and setup documentation updated: strictly use environment variables for secrets; removed .env file mention. - Updated CLI usage examples to use the ./assets directory for output paths. - Minor documentation clarifications and improved security notes regarding credentials.
v1.0.2
No user-facing changes in this release; documentation and configuration remain the same as the previous version.
v1.0.1
- Removed the unused script file `scripts/verify_fix.py`. - SKILL.md explicitly lists the required environment variable `HIDREAM_AUTHORIZATION`. - Updated documentation to clarify that only `HIDREAM_AUTHORIZATION` is needed, and deprecated variables (`STORAGE_AK`, `STORAGE_SK`) are no longer supported. - Added a security note to ensure `.env` does not contain unrelated or sensitive credentials.
v1.0.0
Initial release of hidream-model-gen: Vivago AI image & video generation skill - Enables AI-powered text-to-image, image-to-image (incl. style transfer), text-to-video, image-to-video, and keyframe-to-video generation via Vivago AI. - Supports multiple model versions (v3Pro, v3L, kling-video-o1) and 181 video template effects. - Handles batch generation, multiple aspect ratios, and secure image upload with pre-signed URLs. - Provides Python API, CLI scripts, and robust error handling. - Fully typed codebase with modular config, structured exceptions, and CI/CD integration. - Outputs results to an assets/ folder by default for easy access.
Metadata
Slug hidream-model-gen
Version 1.0.5
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 6
Frequently Asked Questions

What is hidream-model-gen?

Generate images and videos using Vivago AI (智小象) platform. Supports text-to-image, image-to-image, image-to-video, and keyframe-to-video generation. Use when... It is an AI Agent Skill for Claude Code / OpenClaw, with 305 downloads so far.

How do I install hidream-model-gen?

Run "/install hidream-model-gen" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is hidream-model-gen free?

Yes, hidream-model-gen is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does hidream-model-gen support?

hidream-model-gen is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created hidream-model-gen?

It is built and maintained by harry zhu (@zhy2015); the current version is v1.0.5.

💬 Comments