← 返回 Skills 市场
sdk-team

Alibabacloud Avatar Video

作者 alibabacloud-skills-team · GitHub ↗ · v0.0.1 · MIT-0
cross-platform ✓ 安全检测通过
72
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install alibabacloud-avatar-video
功能描述
Use Alibaba Cloud DashScope API and LingMou to generate AI video and speech. Seven capabilities — (1) LivePortrait talking-head (image + audio → video, two-s...
使用说明 (SKILL.md)

Human Avatar — Alibaba Cloud AI Video & Speech

Capabilities overview

Capability Script Model / API Region Summary
LivePortrait live_portrait.py liveportrait cn-beijing Portrait + audio/video → talking video, two steps
EMO portrait_animate.py emo-v1 cn-beijing Portrait + audio → talking head, detect + generate
AA (AnimateAnyone) animate_anyone.py animate-anyone-gen2 cn-beijing Full-body animation: detect → motion template → video
T2I text_to_image.py wan2.x-t2i Multi-region Text → image, default wan2.2-t2i-flash
I2V image_to_video.py wan2.x-i2v Multi-region Image → video; T2I→I2V pipeline supported; default wan2.7-i2v-flash
Qwen TTS qwen_tts.py qwen3-tts-* cn-beijing / Singapore Text → speech; auto model/voice by scene
LingMou avatar_video.py LingMou SDK cn-beijing Template-based digital-human broadcast video

Quick selection guide

Talking head (have audio/video already)     → LivePortrait
Talking head (no audio; synthesize first)   → Qwen TTS → LivePortrait
Full-body dance / motion                    → AA (AnimateAnyone)
Text → image                                → T2I (text_to_image)
Image → video                               → I2V (image_to_video)
Text → video end-to-end                     → T2I → I2V (image_to_video --t2i-prompt)
Enterprise digital human / template news    → LingMou (avatar_video)

Environment setup

pip install requests==2.33.1 dashscope==1.25.15 oss2==2.19.1 numpy==1.26.4
# LingMou additionally:
pip install alibabacloud-lingmou20250527==1.7.0 alibabacloud-tea-openapi==0.4.4
export DASHSCOPE_API_KEY=sk-xxxx               # Beijing-region API key
export ALIBABA_CLOUD_ACCESS_KEY_ID=xxx         # OSS upload
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=xxx
export OSS_BUCKET=your-bucket
export OSS_ENDPOINT=oss-cn-beijing.aliyuncs.com

⚠️ API keys for cn-beijing and Singapore are not interchangeable; use the key for the correct region.
OSS_ENDPOINT may include or omit the https:// prefix; scripts normalize it.


1. LivePortrait — talking-head video

When to use: You have a portrait photo + speech and want a talking-head video quickly.

Flow:

Step 1: liveportrait-detect (sync)  → pass=true
  ↓
Step 2: liveportrait        (async)  → video_url

Image: Single person, front-facing portrait, clear face, no occlusion
Audio: wav/mp3, \x3C 15MB, 1s–3min
Video input: Audio extracted automatically (ffmpeg)

# Image + audio file
python scripts/live_portrait.py \
  --image ./portrait.jpg \
  --audio ./speech.mp3 \
  --template normal --download

# Image + video (extract audio)
python scripts/live_portrait.py \
  --image ./portrait.jpg \
  --video ./speech_video.mp4 \
  --template active --download

# Public URLs
python scripts/live_portrait.py \
  --image-url "https://..." \
  --audio-url "https://..." \
  --mouth-strength 1.2 --download

Motion templates:

  • normal (default, moderate motion)
  • calm (calm; news / storytelling)
  • active (lively; singing / hosting)

2. Qwen TTS — text to speech

When to use: Generate speech files from text (for LivePortrait, EMO, etc.).

Default model: qwen3-tts-vd-realtime-2026-01-15

Auto model selection by scene

Scene --scene Suggested model Suggested voice
default / brand qwen3-tts-vd-realtime-2026-01-15 Cherry
news / documentary / advertising qwen3-tts-instruct-flash-realtime Serena / Ethan
audiobook / drama qwen3-tts-instruct-flash-realtime Cherry / Dylan
customer_service / chatbot / education qwen3-tts-flash-realtime Anna / Ethan
ecommerce / short_video qwen3-tts-flash-realtime Cherry / Chelsie

Available voices

Voice Character
Cherry Bright, sweet female; ads / audiobooks / dubbing
Serena Mature, intellectual female; news / explainers / corporate
Ethan Steady, warm male; education / documentary / training
Dylan Expressive male; radio drama / game VO
Anna Gentle, friendly female; support / assistant / daily
Chelsie Young, fresh female; short video / e-commerce
Thomas Deep, magnetic male; brand / ads
Luna Warm, soft female; meditation / storytelling
# Default (qwen3-tts-vd-realtime + Cherry)
python scripts/qwen_tts.py --text "Hello, welcome to Qwen TTS." --download

# Match by scene
python scripts/qwen_tts.py --text "Today's market..." --scene news --download
python scripts/qwen_tts.py --text "Once upon a time..." --scene audiobook --download

# Style via instructions
python scripts/qwen_tts.py \
  --text "Dear students..." \
  --model qwen3-tts-instruct-flash-realtime \
  --instructions "Warm tone, steady pace, suitable for teaching" \
  --download

# List options
python scripts/qwen_tts.py --list-voices
python scripts/qwen_tts.py --list-models

3. T2I — Wan 2.x text-to-image

When to use: Generate images from text (optionally feed into I2V).

# Default model (wan2.2-t2i-flash, fast)
python scripts/text_to_image.py \
  --prompt "A woman in Hanfu in a peach blossom forest, cinematic, 4K, soft light" \
  --size 960*1696 --download

# Higher quality
python scripts/text_to_image.py \
  --prompt "..." --model wan2.2-t2i-plus --size 1280*1280 --download

# Latest (Wan 2.6)
python scripts/text_to_image.py \
  --prompt "..." --model wan2.6-t2i --size 1280*1280 --n 1 --download

Models:

  • wan2.2-t2i-flash (default, fast, good for tests)
  • wan2.2-t2i-plus (higher quality)
  • wan2.6-t2i (latest; more aspect ratios; sync call)

Common sizes: 1280*1280 (1:1) / 960*1696 (9:16) / 1696*960 (16:9)


4. I2V — Wan 2.x image-to-video

When to use: Turn an image into motion video; supports text-to-video via T2I first.

# Local image → video
python scripts/image_to_video.py \
  --image ./portrait.jpg \
  --prompt "She turns slowly and smiles; dress and petals drift gently" \
  --model wan2.7-i2v \
  --resolution 720P --duration 5 --download

# Pipeline: text → image → video
python scripts/image_to_video.py \
  --t2i-prompt "A woman in Hanfu in a peach blossom forest" \
  --prompt "She turns slowly; petals fall; poetic mood" \
  --download --output result.mp4

# With background music
python scripts/image_to_video.py \
  --image ./portrait.jpg \
  --audio-url "https://..." \
  --prompt "..." --download

Models:

  • wan2.7-i2v (default; includes sound; 5s/10s)
  • wan2.5-i2v-preview (high-quality preview)
  • wan2.2-i2v-plus (no built-in audio; faster)

5. AA AnimateAnyone — full-body animation

When to use: Full-body photo + reference motion video → dance / motion video.

Requirements:

  • Image: Single person, full body front, head to toe, aspect ratio 0.5–2.0
  • Video: Full body in frame from first frame; mp4/avi/mov; fps ≥ 24; 2–60s

Three steps:

Step 1: animate-anyone-detect-gen2   (sync)  → check_pass=true
  ↓
Step 2: animate-anyone-template-gen2 (async)  → template_id (~3–5 min)
  ↓
Step 3: animate-anyone-gen2          (async)  → video_url (~3–5 min)
# Local files (auto convert + OSS upload)
python scripts/animate_anyone.py \
  --image ./portrait_fullbody.jpg \
  --video ./dance.mp4 \
  --download --output result.mp4

# Use image as background
python scripts/animate_anyone.py \
  --image ./portrait.jpg --video ./dance.mp4 \
  --use-ref-img-bg --video-ratio 9:16 --download

# Skip Step 2 (existing template_id)
python scripts/animate_anyone.py \
  --image ./portrait.jpg \
  --template-id "AACT.xxx.xxx" --download

Auto conversion: video webm/mkv/flv → mp4; image webp/heic → jpg; if fps is under 24, normalize to 24 fps


6. EMO — talking head (legacy)

Note: Prefer LivePortrait; EMO suits cases that need stricter lip-sync.

python scripts/portrait_animate.py \
  --image ./portrait.jpg \
  --audio ./speech.mp3 \
  --download

7. LingMou — enterprise template video

When to use: Corporate digital-human news, template-based broadcasts, scripted reads with optional character images.

New workflow (prefer no template_id)

  • If the user provides template_id: use that template to generate.
  • If no template_id:
    1. List existing broadcast templates for the account.
    2. If any exist, pick one at random for creation.
    3. If none, fetch public templates and copy up to 3 into the account.
    4. Pick one at random from the copy results and continue.
  • Caveat: After a public template is copied, the copy may not yet be a fully “ready-to-render” template; some copies are still drafts and may lack clips, assets, or variable bindings—complete them in LingMou.
  • If the user only gives an image and “make a talking video” without a script: confirm the spoken copy before generating.

What scripts/avatar_video.py supports

  • --list-templates: list account templates
  • --list-public-templates: list public templates (SDK 1.7.0+)
  • --copy-public-templates: copy up to 3 public templates (SDK 1.7.0+)
  • Omit --template-id: random existing template
  • When local templates are empty: auto try public-template copy as fallback
  • --show-template-detail: template detail and replaceable variables
  • Fills input text into template text variables (prefers text_content / test_text)
  • If generation fails right after copying a public template, surfaces a clear error that the template may still need completion (no silent failure)
# List templates
python scripts/avatar_video.py --list-templates

# Public templates (SDK 1.7.0+)
python scripts/avatar_video.py --list-public-templates

# Copy up to 3 public templates (SDK 1.7.0+)
python scripts/avatar_video.py --copy-public-templates

# No template_id — random existing template
python scripts/avatar_video.py \
  --text "Hello, welcome to today's tech news." \
  --download

# Specific template_id
python scripts/avatar_video.py \
  --template-id "BS1b2WNnRMu4ouRzT4clY9Jhg" \
  --text "Hello, welcome to today's tech news." \
  --download

# Detail for randomly chosen template
python scripts/avatar_video.py \
  --show-template-detail \
  --text "This is a test script for broadcast."

Conversational usage

When the user says things like:

  • “Make a talking video from this image”
  • “Digital-human broadcast for me”
  • “Upload image and make a news read”

Do this:

  1. Check whether they already gave copy/script ready to read.
  2. If not, ask: “What is the exact script to read? You can give bullet points and I can turn them into broadcast-ready copy.”
  3. With script in hand, run LingMou: prefer random existing template; if none locally, try public copy.
  4. If they uploaded a portrait but the template API does not use it, explain: this path is template-driven; for image-driven talking head, use LivePortrait or EMO.

API reference links

安全使用建议
This skill appears coherent with its stated purpose, but it requires sensitive Alibaba credentials and will write into your cloud account. Before installing or running it: - Use least-privilege credentials: create a RAM user/role scoped to only the required OSS object prefix and DashScope/LingMou actions you actually need; avoid using root keys. - Scope OSS permissions to a single bucket/prefix (e.g., human-avatar/*) and avoid giving global OSSFullAccess if you can. - Be aware the scripts upload local files to your OSS bucket and generate signed URLs; these signed URLs are public (or time-limited) and used by DashScope to fetch media. Set short expiry and enforce lifecycle rules to delete temporary objects. - The LingMou code can copy public templates into your account — this will create resources in your LingMou account. If you do not want that, don't run the copy/list-public options or ensure you have at least one template in the account beforehand. - Review the provided Python scripts locally before running and run them in an isolated virtualenv; SKILL.md suggests pip installing packages — inspect those packages and use pinned versions if desired. - Rotate keys after testing and prefer short-lived credentials where possible (RAM role, STS tokens). If you want, I can: list exact places in the code where uploads and template-copying happen, extract the minimal IAM actions needed, or produce a safe example command line to run a dry-run (read-only) check first.
功能分析
Type: OpenClaw Skill Name: alibabacloud-avatar-video Version: 0.0.1 The skill bundle provides a comprehensive set of tools for generating AI video and speech using Alibaba Cloud's DashScope and LingMou APIs. The scripts (e.g., animate_anyone.py, live_portrait.py, qwen_tts.py) follow security best practices by using environment variables for credentials, implementing path sanitization in input_validation.py to prevent directory traversal, and avoiding shell=True in subprocess calls. All external communication is directed to official Alibaba Cloud endpoints (dashscope.aliyuncs.com and lingmou.cn-beijing.aliyuncs.com), and the instructions in SKILL.md are strictly aligned with the stated functional purposes.
能力标签
requires-sensitive-credentials
能力评估
Purpose & Capability
Name/description (AI avatar/video/speech) match the actual requirements: DashScope API key for model calls, Alibaba AK/SK + OSS info for uploads, and ffmpeg/ffprobe for media conversion. Required binaries and env vars are expected for the listed capabilities.
Instruction Scope
Runtime instructions and scripts stay within the declared purpose (media conversion, OSS upload, DashScope/LingMou API calls). Two items to note: (1) some scripts will automatically upload user media to your OSS bucket and produce signed GET URLs (expected for service), and (2) the LingMou pipeline can auto-copy public templates into your LingMou account as a fallback when no account templates exist — this is a write operation against your LingMou account and may create templates without explicit user approval if the script is run with those flags or the automatic-copy code path is taken.
Install Mechanism
This is an instruction-only skill (no packaged installer). SKILL.md recommends pip installing dashscope, oss2, and alibabacloud-lingmou packages — which is normal but means code from PyPI will be installed into the environment. No arbitrary downloads or extract-from-URL installs are present in the manifest.
Credentials
The skill requests DASHSCOPE_API_KEY and Alibaba Cloud AK/SK + OSS_BUCKET/OSS_ENDPOINT — these are necessary for DashScope calls, OSS uploads, and LingMou operations. The referenced IAM policies in docs recommend broad permissions (e.g., AliyunOSSFullAccess, LingMou full actions); while functionally convenient, these permissions are broader than strictly necessary unless scoped carefully. The scripts also reference optional envs (e.g., LINGMOU_ENDPOINT, LINGMOU_REGION, DASHSCOPE_BASE_URL, LINGMOU_VENV_PYTHON) that are not listed in requires.env but are harmless defaults.
Persistence & Privilege
The skill does not request 'always: true' or system-wide privileges. It will create objects in your OSS bucket and can copy templates into your LingMou account (account-side changes). Those behaviors are consistent with its purpose but are persistent side effects in your cloud account and should be considered when granting credentials.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install alibabacloud-avatar-video
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /alibabacloud-avatar-video 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.0.1
Initial release providing comprehensive Alibaba Cloud AI avatar and video generation features. - Supports seven core capabilities: LivePortrait talking-head, EMO talking-head, AnimateAnyone full-body animation, T2I (text-to-image), I2V (image-to-video), Qwen TTS (text-to-speech), and LingMou template-based avatar videos. - Details API usage, required environment variables, and setup instructions. - Includes a quick selection guide for choosing the right pipeline based on input and desired output. - Lists supported models/regions for each workflow and provides command-line usage examples. - Outlines available motion templates and voice options for video/speech generation.
元数据
Slug alibabacloud-avatar-video
版本 0.0.1
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Alibabacloud Avatar Video 是什么?

Use Alibaba Cloud DashScope API and LingMou to generate AI video and speech. Seven capabilities — (1) LivePortrait talking-head (image + audio → video, two-s... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 72 次。

如何安装 Alibabacloud Avatar Video?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install alibabacloud-avatar-video」即可一键安装,无需额外配置。

Alibabacloud Avatar Video 是免费的吗?

是的,Alibabacloud Avatar Video 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Alibabacloud Avatar Video 支持哪些平台?

Alibabacloud Avatar Video 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Alibabacloud Avatar Video?

由 alibabacloud-skills-team(@sdk-team)开发并维护,当前版本 v0.0.1。

💬 留言讨论