Description

Free local AI image and video processing toolkit with cloud AI generation. Local tools: upscale (Real-ESRGAN), face enhance (GFPGAN/CodeFormer), background r...

README (SKILL.md)

Free Image & Video Processing Toolkit

Name: AI Image & Video Toolkit — Free Upscale, Face Enhance, BG Remove & Generation
Author: xixihhhh

7 free local AI tools + cloud AI generation (300+ models via Atlas Cloud API).

Local tools run 100% on your machine — no API keys, no cloud costs. Cloud generation tools provide access to state-of-the-art AI models for image and video creation.

Prerequisites

Python 3.10+ installed
uv installed (brew install uv / pip install uv / winget install astral-sh.uv)
FFmpeg installed (brew install ffmpeg / apt install ffmpeg / winget install ffmpeg)

Available Tools

Tool	Script	What It Does
Image Upscale	`scripts/upscale.py`	2x/4x super resolution using Real-ESRGAN
Face Enhance	`scripts/face-enhance.py`	Restore and enhance faces using GFPGAN + CodeFormer
Background Remove	`scripts/bg-remove.py`	Remove image backgrounds, output transparent PNG
Object Erase	`scripts/erase.py`	Erase unwanted objects using LaMa inpainting
Face Swap	`scripts/face-swap.py`	Swap faces between images using InsightFace
Smart Segment	`scripts/segment.py`	Segment anything in images using FastSAM
Media Process	`scripts/media-process.py`	Convert, compress, resize, extract with FFmpeg
AI Generate	`scripts/ai-generate.py`	Generate images/videos with 300+ cloud AI models

Usage

All scripts use uv run for zero-setup execution — dependencies are automatically installed on first run.

Image Upscale (Real-ESRGAN)

Upscale low-resolution images by 2x or 4x with AI super resolution.

# 4x upscale (default)
uv run scripts/upscale.py input.jpg

# 2x upscale
uv run scripts/upscale.py input.jpg --scale 2

# Upscale with face enhancement
uv run scripts/upscale.py input.jpg --face-enhance

# Batch upscale a folder
uv run scripts/upscale.py ./photos/ --scale 4

# Custom output path
uv run scripts/upscale.py input.jpg -o upscaled.png

Face Enhance (GFPGAN + CodeFormer)

Restore old photos, enhance blurry faces, fix low-quality portraits.

# Enhance faces in an image (GFPGAN, default)
uv run scripts/face-enhance.py photo.jpg

# Use CodeFormer (better fidelity control)
uv run scripts/face-enhance.py photo.jpg --method codeformer

# Adjust fidelity (0=quality, 1=fidelity, default 0.5)
uv run scripts/face-enhance.py photo.jpg --method codeformer --fidelity 0.7

# Also upscale background (2x)
uv run scripts/face-enhance.py photo.jpg --bg-upscale 2

# Batch process
uv run scripts/face-enhance.py ./old-photos/

Background Remove (rembg)

Remove backgrounds from images, output transparent PNG. Supports multiple AI models.

# Remove background (default u2net model)
uv run scripts/bg-remove.py product.jpg

# Use specific model
uv run scripts/bg-remove.py photo.jpg --model isnet-general-use

# Batch process folder
uv run scripts/bg-remove.py ./products/ -o ./transparent/

# Keep only the foreground (alpha matting for fine edges)
uv run scripts/bg-remove.py portrait.jpg --alpha-matting

# Available models: u2net, u2netp, u2net_human_seg, u2net_cloth_seg,
#                   silueta, isnet-general-use, isnet-anime, sam

Object Erase (LaMa Inpainting)

Remove unwanted objects from images using a mask.

# Erase objects (white area in mask = erase)
uv run scripts/erase.py image.png --mask mask.png

# Auto-generate mask from coordinates (x,y,width,height)
uv run scripts/erase.py image.png --region 100,200,150,150

# Batch erase with matching masks (image1.png + image1_mask.png)
uv run scripts/erase.py ./images/ --mask-dir ./masks/

Face Swap (InsightFace)

Swap faces between two images.

# Swap face from source to target
uv run scripts/face-swap.py --source face.jpg --target photo.jpg

# Swap specific face index (when multiple faces detected)
uv run scripts/face-swap.py --source face.jpg --target group.jpg --face-index 0

# Custom output
uv run scripts/face-swap.py --source face.jpg --target photo.jpg -o result.png

Smart Segment (FastSAM)

Segment any object in an image using text prompt, point, or bounding box.

# Segment everything
uv run scripts/segment.py image.jpg

# Segment by text prompt
uv run scripts/segment.py image.jpg --text "the dog"

# Segment by point (x, y)
uv run scripts/segment.py image.jpg --point 400,300

# Segment by bounding box (x1,y1,x2,y2)
uv run scripts/segment.py image.jpg --box 100,100,400,400

# Output mask only
uv run scripts/segment.py image.jpg --text "car" --mask-only

Media Process (FFmpeg)

Convert, compress, resize, extract frames, merge audio/video — powered by FFmpeg.

# Convert format
uv run scripts/media-process.py convert input.mp4 output.webm

# Compress video (target size in MB)
uv run scripts/media-process.py compress input.mp4 --target-size 10

# Resize video
uv run scripts/media-process.py resize input.mp4 --width 1280 --height 720

# Extract frames as images
uv run scripts/media-process.py frames input.mp4 --fps 1 --output ./frames/

# Extract audio
uv run scripts/media-process.py audio input.mp4 -o audio.mp3

# Create GIF from video
uv run scripts/media-process.py gif input.mp4 --start 5 --duration 3 --fps 15

# Trim video
uv run scripts/media-process.py trim input.mp4 --start 00:01:00 --end 00:02:30

# Merge multiple videos
uv run scripts/media-process.py merge video1.mp4 video2.mp4 video3.mp4 -o combined.mp4

# Add watermark
uv run scripts/media-process.py watermark input.mp4 --image logo.png --position bottom-right

# Get media info
uv run scripts/media-process.py info input.mp4

AI Generate (Atlas Cloud API) — Gold Sponsor

Generate images and videos using 300+ state-of-the-art AI models. Requires an Atlas Cloud API key.

IMPORTANT for AI agents: Before calling this script, you MUST first use Atlas Cloud MCP tools to find the correct model ID and its required parameters:

Call atlas_list_models to browse available models, or atlas_search_docs to search for a specific model
Call atlas_get_model_info with the model ID to get the exact parameter schema (different models use different parameters — some use size, others use aspect_ratio + resolution, etc.)
Then call the script with --model \x3Cfull_model_id> and the correct parameters

# Generate image (pass full model ID from Atlas Cloud)
uv run scripts/ai-generate.py image "A cat astronaut on the moon" --model black-forest-labs/flux-schnell --size 1024*1024

# Models using aspect_ratio + resolution (e.g. Nano Banana 2, Imagen4)
uv run scripts/ai-generate.py image "Anime girl with blue hair" --model google/nano-banana-2/text-to-image --aspect-ratio 1:1 --resolution 1k

# Models using size presets (e.g. Seedream)
uv run scripts/ai-generate.py image "Product photo on marble" --model bytedance/seedream-v5.0-lite --size 2048*2048

# Edit existing image
uv run scripts/ai-generate.py image "Make the sky sunset orange" --model bytedance/seedream-v5.0-lite/edit --image photo.jpg

# Generate video
uv run scripts/ai-generate.py video "Timelapse of cherry blossoms" --model alibaba/wan-2.6/text-to-video --size 1280*720

# Image-to-video
uv run scripts/ai-generate.py video "The person starts walking" --model alibaba/wan-2.6/image-to-video --image portrait.jpg

# Pass extra model-specific parameters as JSON
uv run scripts/ai-generate.py image "A logo" --model google/imagen4-ultra --extra '{"num_images": 4}'

# NSFW mode
uv run scripts/ai-generate.py image "Artistic figure study" --model black-forest-labs/flux-dev-lora --nsfw

Setup: Set ATLAS_CLOUD_API_KEY in environment variable or project .env file. Get your key at atlascloud.ai. Note: when using cloud generation, your prompts and image data will be sent to the Atlas Cloud API for processing.

Output

All tools save output to ./output/ by default. Use -o or --output to specify a custom path.

Models

Models are automatically downloaded on first use and cached locally:

Tool	Model	Size	Cache Location
Upscale	RealESRGAN_x4plus	~64MB	`~/.cache/realesrgan/`
Face Enhance	GFPGANv1.4	~348MB	`~/.cache/gfpgan/`
Face Enhance	CodeFormer	~376MB	`~/.cache/codeformer/`
Background Remove	u2net	~176MB	`~/.u2net/`
Object Erase	LaMa	~200MB	`~/.cache/lama/`
Face Swap	buffalo_l + inswapper	~500MB	`~/.insightface/`
Smart Segment	FastSAM-s	~23MB	auto-downloaded by ultralytics

Total first-run download: ~1.5GB. All subsequent runs use cached models.

Tips

GPU Acceleration: All tools automatically use CUDA/MPS if available, falling back to CPU
Batch Processing: Most tools accept a folder path for batch processing
Memory: Face swap and segmentation may need 4GB+ RAM for large images
First Run: First execution downloads AI models — subsequent runs are instant

Workflow Examples

Combine local processing with cloud AI generation:

# 1. Generate a product image with AI
uv run scripts/ai-generate.py image "Minimalist perfume bottle, studio lighting" --model bytedance/seedream-v5.0-lite --size 2048*2048

# 2. Upscale to 4x resolution
uv run scripts/upscale.py ./output/seedream-v5.0-lite_*.png --scale 4

# 3. Remove background for e-commerce
uv run scripts/bg-remove.py ./output/*_x4.png --alpha-matting

# 4. Generate a product video
uv run scripts/ai-generate.py video "A perfume bottle rotating slowly" --model kwaivgi/kling-v3.0-pro/text-to-video --duration 5

# 5. Add watermark to the video
uv run scripts/media-process.py watermark ./output/text-to-video_*.mp4 --image logo.png

Usage Guidance

This skill appears to do what it claims, but consider the following before installing: - The cloud-generation feature will send prompts and (optionally) images to Atlas Cloud; do not use it with sensitive images or proprietary prompts unless you trust that service and key. - Running any script will cause `uv` to auto-install Python packages and the packages/models may download large pretrained weights (Hugging Face, rembg, etc.) into your home or project directories — expect significant disk and network activity. - The face-swap/face-enhance tools enable realistic edits (deepfakes). Be mindful of legal and ethical implications before using them on others' images. - The scripts download model files (e.g., inswapper from Hugging Face) at runtime; if you need an air-gapped or fully-offline setup, inspect and pre-download/verify model artifacts before running. - Only provide ATLAS_CLOUD_API_KEY if you intend to use cloud generation; treat the key like any API secret and avoid storing it in shared repos or exposing it to untrusted environments. If you want a higher-assurance review, ask for a line-by-line audit of any specific script (for example scripts/ai-generate.py and scripts/face-swap.py) or vendor verification of the Atlas Cloud endpoints.

Capability Analysis

Type: OpenClaw Skill Name: free-image-and-video-generation Version: 1.0.3 The skill bundle is a comprehensive toolkit for AI image and video processing, providing local utilities (upscaling, face enhancement, background removal) and cloud-based generation via the Atlas Cloud API. The scripts (e.g., ai-generate.py, media-process.py) follow standard implementation patterns, using safe subprocess calls for FFmpeg and legitimate model downloads from Hugging Face. While the bundle includes promotional content for the Atlas Cloud service, there is no evidence of malicious intent, data exfiltration, or harmful prompt injection.

Capability Assessment

✓ Purpose & Capability

Name/description match the included scripts and requirements. uv is required because scripts are executed via `uv run`; ffmpeg is required for media processing; ATLAS_CLOUD_API_KEY is required only for the cloud generation script. The declared primary credential and binaries align with the described functionality.

ℹ Instruction Scope

SKILL.md and the scripts stay within the image/video processing/generation scope. However, several local scripts will download model weights or call external endpoints at runtime (e.g., huggingface URL in face-swap, rembg/new_session, and ai-generate contacting api.atlascloud.ai). SKILL.md's claim that "Local tools run 100% on your machine" is true regarding API usage but local tools still perform network downloads of pretrained model files and install dependencies when first run.

ℹ Install Mechanism

No explicit install spec is provided (instruction-only install), which is low risk, but `uv run` will auto-install Python dependencies and those packages may themselves fetch model weights and execute arbitrary package code. The code downloads model artifacts from known hosts (Hugging Face and Atlas Cloud); no obscure or shortened URLs were found.

✓ Credentials

Only ATLAS_CLOUD_API_KEY (and optional ATLASCLOUD_API_KEY fallback) is requested and it's appropriate for the Atlas Cloud generation feature. Scripts look for a local .env fallback but do not request unrelated credentials or system secrets.

✓ Persistence & Privilege

The skill does not request always:true or other elevated persistence. It writes model files to expected locations (e.g., ~/.insightface/models) and output folders but does not attempt to modify other skills or global agent configuration. Included .claude/settings.local.json only grants WebSearch permission for convenience.

Version History

v1.0.3

- Added `.claude/settings.local.json` for local configuration settings. - Updated SKILL.md metadata to include version, `openclaw` metadata, and required environment variables/binaries. - No changes to scripts or user-facing behavior.

v1.0.2

Version 1.0.2 - No changes detected in files or documentation. - This release contains no updates to functionality or usage.

v1.0.1

- Added explicit environment variable, credentials, and external services sections for Atlas Cloud API integration in the skill manifest. - Now requires users to provide an Atlas Cloud API key (ATLAS_CLOUD_API_KEY) for cloud-based AI image and video generation. - Atlas Cloud AI generation now formally warns users that image/video data is sent to atlascloud.ai for cloud processing. - Updated uv installation instructions for better clarity. - Removed unused local settings file from the repository.

v1.0.0

Initial release of the free-image-and-video-generation toolkit: - Provides 7 free local AI tools for upscaling, face enhancement, background removal, object erasure, face swapping, segmentation, and general media processing. - Adds support for AI image and video generation via Atlas Cloud API, giving access to 300+ cutting-edge models. - All tools run locally with zero setup using Python, FFmpeg, and uv, with no API keys required for local features. - Includes detailed usage examples for each tool, supporting both single files and batch processing. - Cloud generation tools require Atlas Cloud MCP to discover and select the correct model parameters for robust AI-powered generation.

Metadata

Slug free-image-and-video-generation

Version 1.0.3

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 4

Frequently Asked Questions

What is AI Image & Video Toolkit — Free Upscale, Face Enhance, BG Remove & Generation?

Free local AI image and video processing toolkit with cloud AI generation. Local tools: upscale (Real-ESRGAN), face enhance (GFPGAN/CodeFormer), background r... It is an AI Agent Skill for Claude Code / OpenClaw, with 306 downloads so far.

How do I install AI Image & Video Toolkit — Free Upscale, Face Enhance, BG Remove & Generation?

Run "/install free-image-and-video-generation" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is AI Image & Video Toolkit — Free Upscale, Face Enhance, BG Remove & Generation free?

Yes, AI Image & Video Toolkit — Free Upscale, Face Enhance, BG Remove & Generation is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does AI Image & Video Toolkit — Free Upscale, Face Enhance, BG Remove & Generation support?

AI Image & Video Toolkit — Free Upscale, Face Enhance, BG Remove & Generation is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created AI Image & Video Toolkit — Free Upscale, Face Enhance, BG Remove & Generation?

It is built and maintained by MikeWang (@xixihhhh); the current version is v1.0.3.

More Skills

AI Image & Video Toolkit — Free Upscale, Face Enhance, BG Remove & Generation