Chapter 8

AI Image Generation

Ch08 AI Image Generation: Midjourney/Flux in Practice

The visual backbone of every AI short drama is its images โ€” every storyboard frame, every cover thumbnail, every character portrait determines whether audiences will stay immersed. This chapter systematically breaks down the four leading AI image tools, provides a battle-tested prompt structure for drama production, and shows you how to achieve 9:16 vertical composition and batch-consistent image generation at scale.

Four-Tool Comparison

The AI image generation landscape has stabilized around four main tools for short drama production. The core selection logic is simple: which tool delivers the most photorealistic human characters at the lowest cost per image?

Tool Realism Asian Faces Price Speed Best For
Midjourney v6 โ˜…โ˜…โ˜…โ˜…โ˜… โ˜…โ˜…โ˜…โ˜†โ˜† From $10/mo 30-60s High-quality covers, promotional art
Flux.1 Dev/Pro โ˜…โ˜…โ˜…โ˜…โ˜… โ˜…โ˜…โ˜…โ˜…โ˜† ~$0.05/image 15-30s Batch generation, API integration
Stable Diffusion XL โ˜…โ˜…โ˜…โ˜…โ˜† โ˜…โ˜…โ˜…โ˜…โ˜… Free locally / low cloud cost 5-20s (local) Bulk generation, LoRA fine-tuning
Tongyi Wanxiang โ˜…โ˜…โ˜…โ˜…โ˜† โ˜…โ˜…โ˜…โ˜…โ˜… Free quota + pay-per-use 10-20s China-compliant, clear commercial license

[NOTE] Practical recommendation: For romance/CEO drama tracks, use a Flux + SDXL combination โ€” Flux handles high-quality lead character close-ups, SDXL handles bulk scene images. Midjourney is best for covers and marketing materials. Tongyi Wanxiang is safest for China-market commercial use.

Character Prompt Structure

Prompt structure determines your output ceiling. A professional drama production team builds fixed prompt templates to ensure every generated image falls within the same style range. The standard structure has seven layers, ordered by priority:

Layer 1: Quality & Style Base

[Prompt โ€” Base Layer]

masterpiece, best quality, ultra detailed, photorealistic, 8k uhd, sharp focus, professional photography

Layer 2: Character Appearance

[Prompt โ€” Character Layer]

1 woman, 25 years old, East Asian, beautiful face, high nose bridge, double eyelid,
long straight black hair, fair porcelain skin, elegant temperament

Layer 3: Clothing & Makeup

[Prompt โ€” Wardrobe Layer]

wearing white custom-tailored suit, silk tie, luxury watch,
subtle makeup, red lips, natural blush

Layer 4: Scene & Environment

[Prompt โ€” Scene Layer]

in a luxury penthouse office, floor-to-ceiling windows, city skyline background,
golden sunset light, bokeh background, shallow depth of field

Layer 5: Lighting

[Prompt โ€” Lighting Layer]

soft natural light          // sweet romance track
dramatic rim light          // suspense/thriller track
golden hour sunlight        // warm emotional scenes
studio lighting, key light  // commercial photography feel
moody blue ambient light    // CEO/power drama track

Layer 6: Aspect Ratio & Composition

[Prompt โ€” Ratio Layer]

-- Midjourney --
--ar 9:16 --v 6

-- Flux API --
"width": 768, "height": 1344

-- SDXL recommended --
832x1472 or 768x1344

Layer 7: Negative Prompt

[Negative Prompt]

worst quality, low quality, normal quality, jpeg artifacts, blurry, watermark,
extra fingers, mutated hands, poorly drawn hands, deformed, ugly, bad anatomy,
bad proportions, long neck, missing limbs, extra limbs, cloned face,
text, logo, signature, username

Vertical 9:16 Composition Techniques

Vertical composition is fundamentally different from landscape photography. The core principle for short drama images: the character should occupy at least 60% of the frame height, with their face in the upper third of the image.

[TIP] The golden rule for vertical portraits: Position the character's eyes at 30-35% from the top of the frame. This matches natural mobile viewing behavior. Add "portrait composition, face at upper third" to your prompt to guide the AI toward this framing.

Batch Generation Workflow: Using Seed for Style Consistency

A single great image is easy. Generating 50-100 images with consistent visual style for a full episode is the real challenge. Without a fixed seed, batch outputs will have scattered styles that break audience immersion.

Flux API Batch Generation

[Python / Flux API]

import replicate

BASE_PROMPT = """masterpiece, best quality, photorealistic, 1 woman, 25 years old,
East Asian, long black hair, wearing white dress, luxury apartment interior,
soft natural light, 9:16 portrait, face at upper third"""

FIXED_SEED = 42857301

expressions = [
    "neutral expression, calm",
    "surprised expression, eyes wide",
    "smiling warmly, gentle",
    "angry expression, intense gaze",
    "crying, tears on cheeks",
]

for i, expr in enumerate(expressions):
    output = replicate.run(
        "black-forest-labs/flux-dev",
        input={
            "prompt": f"{BASE_PROMPT}, {expr}",
            "seed": FIXED_SEED,
            "width": 768,
            "height": 1344,
            "num_outputs": 1,
            "guidance_scale": 3.5,
            "num_inference_steps": 28,
        }
    )

[WARNING] Seed is not magic: Seed only guarantees similarity under the same model and parameters. As prompt changes grow larger, character appearance will drift. When changes exceed ~30%, retest your seed, or switch to LoRA-based consistency (see Ch09).

Full Prompt Examples

[Full Example โ€” CEO Male Lead]

-- Positive --
masterpiece, best quality, ultra detailed, photorealistic, 8k uhd, sharp focus,
cinematic lighting, 1 man, 32 years old, East Asian, handsome face, strong jawline,
sharp eyes, black short hair styled back, wearing black custom-tailored suit,
white shirt, open collar, luxury Rolex watch, standing near floor-to-ceiling window,
city skyline at night, dramatic rim light, bokeh, upper body shot, 9:16 vertical

-- Negative --
worst quality, blurry, watermark, extra fingers, deformed face, feminine features

-- Params --
Seed: 7293847  |  Steps: 30  |  CFG: 7  |  Size: 768x1344

[TIP] Chapter Action Checklist:

  1. Register on Replicate.com or Liblib.ai and generate 10 test images in your target drama style;
  2. Record the Seed of your best result and build your first prompt template;
  3. Use that fixed Seed to batch-generate 5 different expressions of the same character;
  4. Set up the prompts/ folder structure and archive your results.

โ† PreviousCh07 Dialogue Writing Next โ†’Ch09 LoRA Character Consistency

Rate this chapter
4.8  / 5  (41 ratings)

๐Ÿ’ฌ Comments