← 返回 Skills 市场
dlazyai

Idea2video

作者 dlazy · GitHub ↗ · v1.1.0 · MIT-0
cross-platform ✓ 安全检测通过
149
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install idea2video
功能描述
Turn a user's idea into a detailed video pipeline by generating story, characters, portraits, scenes, shots, keyframes, and concatenated shot videos via a pl...
使用说明 (SKILL.md)

身份验证 (Authentication)

所有请求都需要 dLazy API key。推荐使用 dlazy login 完成登录:

dlazy login

该命令使用设备码流程(远程终端也可用),登录成功后 自动把 API key 写入本地 CLI 配置,无需手动复制粘贴。

备选:手动设置 API Key

如果你已有 API key,也可以直接保存:

dlazy auth set YOUR_API_KEY

CLI 会把 key 保存在你的用户配置目录(macOS/Linux 上为 ~/.dlazy/config.json,Windows 上为 %USERPROFILE%\.dlazy\config.json),文件权限仅限当前操作系统用户访问。你也可以用 DLAZY_API_KEY 环境变量按次传入。

手动获取 API Key

  1. 登录或在 dlazy.com 创建账号
  2. 访问 dlazy.com/dashboard/organization/api-key
  3. 复制 API Key 区域显示的密钥

每个 key 都属于你自己的 dLazy 组织,可在同一控制面板随时轮换或吊销

关于与来源 (Provenance)

如果你不希望在系统上长期保留一个全局 CLI,可以按需运行:

npx @dlazy/[email protected] \x3Ccommand>

如选择全局安装,技能的 metadata.clawdbot.install 字段已固定到 npm install -g @dlazy/[email protected]。安装前建议先到 GitHub 仓库审阅源码。

工作原理 (How It Works)

此技能是 dLazy 托管 API 的轻量封装。调用时:

  • 你提供的提示词与参数会发送到 dLazy API(api.dlazy.com)进行推理。
  • 传入图像 / 视频 / 音频字段的本地文件路径会被 CLI 上传到 dLazy 媒体存储(files.dlazy.com),以便模型读取 —— 与任何云端生成 API 的流程一致。
  • API 返回的生成结果 URL 由 files.dlazy.com 托管。

这是标准的 SaaS 调用模式;技能本身不会越权访问网络或文件系统,所有动作都由 dLazy CLI 完成。

Idea → Video Generation Plan

English · 中文

Turn a user's idea into the full pipeline: story → characters → 3-view portraits → scenes → shots → keyframes → shot videos → concat. First emit a plan template for the user to confirm, then expand it into canvas shapes and call drawToCanvas.

Workflow Overview (5 states)

Every reply must start with this line:

  • **Current State:** [state] | **Next:** [goal]
State Goal Needs user confirmation
1. Requirement gathering Lock idea / audience / style / scale
2. Plan generation Build plan template; show node summary ✅ (strict gate)
3. Plan adjustment Patch the template per user feedback
4. Canvas expansion Expand template into flat shapes ❌ (internal)
5. Apply to canvas Call drawToCanvas to write shapes

State 1: Requirement Gathering

Collect these inputs; ask if any is missing:

  • idea — the core creative seed (one sentence to one paragraph)
  • user_requirement — audience / runtime / max scenes / max shots (optional)
  • style — visual style ("realistic warm", "cyberpunk", "watercolor 2D"...)
  • aspectRatio — defaults to 16:9; alternatives 9:16 / 1:1
  • sceneCount — let the model decide by default, but disclose
  • shotsPerScene — let the model decide by default

Output a bulleted requirement list, ending with:

  • \x3Csuggestion>Requirements ready — confirm to enter plan generation?\x3C/suggestion>

State 2: Plan Generation

Build a plan template per the Plan Template Schema (see Appendix A).

Construction rules:

  1. Strictly use models registered in config/models/. Recommended for idea2video:

    • qwen3_6-plus — every LLM step (story / characters / script / storyboard / shot decomposition)
    • banana-pro — character 3-view portraits, shot first/last frames
    • veo_3_1-fast — shot videos (i2v)
    • merge — video concatenation
  2. Mirror the canonical 7-segment idea2video structure (Appendix B):

    • develop_story (LLM)
    • extract_characters (LLM, parse=json)
    • portraits (map: front → side/back)
    • write_script (LLM, parse=json)
    • scenes map (with nested shots map)
      • storyboard (LLM, parse=json)
      • shots map: shot_descfirst_framelast_frame(when) → shot_video
      • scene_concat (merge)
    • final_video (merge)
  3. Reference rules (critical, do not get wrong):

    • Whole-text injection of an upstream → promptRefs: ["$node.X"]; do not inline shape:// inside prompt.
    • Sub-field injection from upstream JSON → keep {{$node.X.json.field}} placeholder inside prompt.
    • Media references (image/video/audio) → put in images / videos / audio arrays; values use $node.X or shape://shape:X.
    • Cross-iteration aggregation inside a map → $node.\x3CmapId>[*].\x3CbodyId> (e.g. $node.portraits[*].front).
    • Inside a map, current item is $item, index is $idx; nested maps access outer index via $ctx.\x3CouterMapId>.idx.
  4. Do not paraphrase tool prompts — keep field names aligned with each model's inputSchema.

  5. when for conditional nodes (e.g. last_frame only when variation_type ∈ {medium, large}):

    "when": { "$in": ["$node.shot_desc.json.variation_type", ["medium", "large"]] }
    

When presenting to the user, summarize in plain language, do not expose raw JSON:

The plan will create X nodes:
  · 1 story node
  · 1 character-extraction node
  · Character 3-views (front + side + back, expanded per character)
  · 1 scenes node
  · Per scene: 1 storyboard node + N shots (each shot = shot description + first frame + [last frame] + video) + 1 concat node
  · 1 final concat node

Models:
  · LLM: qwen3_6-plus
  · Image: banana-pro
  · Video: veo_3_1-fast
  · Concat: merge

End with:

  • \x3Csuggestion>Plan ready — confirm to expand to canvas? Or tell me what to adjust.\x3C/suggestion>

State 3: Plan Adjustment

Common requests:

  • Swap a model ("use doubao-seedream-4_5 for image")
  • Change structure ("drop the last-frame branch", "add a narration audio node")
  • Change scale ("limit to 1 character", "fix 3 shots per scene")

Patch the template, re-summarize, wait for explicit confirmation again.

State 4: Canvas Expansion (internal)

Expand the plan template into a flat shape list suitable for drawToCanvas.

Expansion rules

  1. tool node → 1 shape:
    • Shape type is determined by the model's output type:
      • qwen3_6-plustext
      • banana-pro / doubao-seedream-*image
      • veo_* / doubao-seedance-* / kling-*video
      • mergevideo (or audio if merging audios)
    • shape.id = shape:\x3CtemplatePath> or shape:\x3CtemplatePath>__i\x3Citer> (inside a map)
    • shape.props.model = template model
    • shape.props.input = template input, with all $node.X / $item.X / {{...}} resolved to literals or shape://shape:Y whenever possible
    • shape.props.input.promptRefs is built from template promptRefs: each $node.Xshape://shape:X
    • shape.parentId = enclosing frame shape id (when inside a map)
    • shape.meta.fromTemplateId = the dotted template path (e.g., scenes.shots.first_frame)
  2. map node → 1 frame shape + body subtree per iteration:
    • frame type: "frame", props.name = the map's name
    • frame itself runs no model
  3. Skip nodes whose when is false. If when references an upstream not yet completed (e.g. shot_desc.json.variation_type), expand optimistically: still emit the shape with status: "pending"; the runtime expander will reconcile after upstream completes.
  4. Unresolved {{$node.X.json.field}} placeholders stay in the prompt string (status pending). Do not substitute placeholder text.
  5. Coordinates (x, y, w, h) are not part of the plan — compute at drawToCanvas time:
    • Lay out columns along data flow; 800px column gap.
    • Stack same-column nodes vertically with 100px gap.
    • Frame size = bounding box of children + 100px padding.
    • Map children: horizontal vs. vertical follows direction.
    • Default sizes: text 600×400, image 1600×900 (16:9) or 1024×1024 (1:1), video 1600×900, frame auto.

State 5: Apply to Canvas

Call drawToCanvas with createShapes = the expanded shape list.

Pre-flight checks before the call:

  • Every shape's props.input validates against the corresponding model's inputSchema (drawToCanvas re-checks; pre-checking saves a round-trip).
  • Every shape://shape:X reference points to an X present in the same createShapes payload.
  • Frames appear before children (parentId exists).

After success, reply:

✅ Plan added to canvas (N nodes, M pending). 
Click "Run Workflow" on the canvas to execute the whole pipeline.

Appendix A: Plan Template Schema (for construction)

Top level:

{
  "version": 1,
  "name": "idea2video",
  "inputs": { "idea": {...}, "user_requirement": {...}, "style": {...} },
  "output": "$node.final_video.url",
  "nodes": [ /* tool or map nodes */ ]
}

Nodes:

// tool node
{
  "id": "\x3Cunique>",
  "kind": "tool",
  "model": "\x3Cid registered in config/models>",
  "name": "\x3Cdisplay name; may use {{$item.X}} / {{$idx}} templates>",
  "parse": "json",                  // optional — url contains JSON
  "when": { "$in": [...] },        // optional — conditional node
  "input": {
    "prompt": "...containing {{$node.X.json.field}} placeholders...",
    "promptRefs": ["$node.upstream"],  // whole-text injection
    "images": ["$node.front"],       // media references
    "imageSize": "1K",
    ...
  }
}

// map node
{
  "id": "\x3Cunique>",
  "kind": "map",
  "name": "\x3Cframe name>",
  "over": "$node.upstream.json",   // must resolve to an array
  "mode": "parallel" | "sequential",
  "direction": "horizontal" | "vertical",
  "body": [ /* child template nodes */ ]
}

Appendix B: Canonical idea2video Structure

Assemble the plan with this fixed shape:

  1. develop_story — qwen3_6-plus; prompt uses {{$input.idea}} and {{$input.user_requirement}}
  2. extract_characters — qwen3_6-plus; parse: "json"; promptRefs: ["$node.develop_story"]
  3. portraits — map(over=$node.extract_characters.json, parallel, horizontal)
    • body: frontside (images:[$node.front])back (images:[$node.front])
  4. write_script — qwen3_6-plus; parse: "json"; promptRefs: ["$node.develop_story"]
  5. scenes — map(over=$node.write_script.json, sequential, vertical)
    • body:
      • storyboard — qwen3_6-plus; parse: "json"; promptRefs: ["$node.extract_characters"]
      • shots — map(over=$node.storyboard.json, parallel, vertical)
        • body:
          • shot_desc — qwen3_6-plus; parse: "json"; promptRefs: ["$node.extract_characters"]
          • first_frame — banana-pro; prompt with {{$node.shot_desc.json.ff_desc}}; images: $node.portraits[*].front
          • last_frame — banana-pro; when: variation_type ∈ {medium, large}; images: [$node.first_frame]
          • shot_video — veo_3_1-fast; prompt with {{$node.shot_desc.json.motion_desc}} and {{$node.shot_desc.json.audio_desc}}; images includes first_frame (+ last_frame if present)
      • scene_concat — merge; videos: $node.shots[*].shot_video
  6. final_video — merge; videos: $node.scenes[*].scene_concat

Output Norms

  • Never expose raw plan JSON, shape JSON, or taskId to the user — summarize with bullets.
  • If drawToCanvas fails (usually input not matching the model's schema), explain why and propose a fix.
  • Strict gates: states 2 and 3 must wait for an explicit "confirm / continue / proceed" before advancing.

🛠️ 执行与生成指南 (CRITICAL EXECUTION INSTRUCTIONS)

你是可以执行终端命令的智能 Agent!

【严格禁止行为】

  • 严禁:将提示词保存到任何文件中(如 txt, md)。
  • 严禁:要求用户自己去第三方平台(如 Midjourney)生成图片。
  • 严禁:一次性批量生成所有图片,或一次性执行多个命令。

【必须遵循的交互与执行流程】 你必须严格分步执行,并在每一步停下来等待用户回复:

  1. 第一步:主动收集需求。当用户提出需求时,不要做任何设计和生成,先向用户提问(如产品特点、目标人群、想要几张图等)。必须等待用户回答。
  2. 第二步:输出草案并请求确认。根据用户的回答,制定套图计划,并输出第一张图的提示词草案。询问用户:“是否确认这个提示词,可以开始生成第一张图了吗?” 必须等待用户回答“确认”。
  3. 第三步:单次执行终端命令。用户确认后,你必须使用终端执行命令(如 dlazy seedream-4.5 --prompt "..."),每次只能执行一个生成命令。重要:必须使用同步命令,绝不要在命令末尾加 &,绝不要使用 &&,这是在 Windows PowerShell 下运行!
  4. 第四步:交付与循环。命令返回结果后,把图片 URL 发给用户,并询问“对这张满意吗?我们可以继续生成下一张了吗?”。收到确认后再继续下一步。
安全使用建议
Before installing, make sure you trust the dLazy CLI and cloud service. Expect to log in with a dLazy API key, store that key locally, and send prompts or selected media files to dLazy for generation. Avoid using confidential content unless you have reviewed dLazy's privacy, retention, and account controls.
功能分析
Type: OpenClaw Skill Name: idea2video Version: 1.1.0 The idea2video skill bundle is a legitimate integration for the dLazy video generation service. It provides a structured multi-state workflow (requirement gathering, planning, and execution) for an AI agent to interact with the dLazy CLI tool. The skill includes detailed instructions in SKILL.md and SKILL-cn.md to ensure the agent follows a step-by-step process with explicit user confirmations. It utilizes a standard CLI authentication model, storing API keys in a local configuration file (~/.dlazy/config.json), and points to official service endpoints (api.dlazy.com). No indicators of malicious intent, data exfiltration, or unauthorized command execution were found.
能力标签
requires-sensitive-credentials
能力评估
Purpose & Capability
The described capabilities—planning a story-to-video workflow, creating canvas nodes, and using image/video generation models—match the stated Idea2video purpose.
Instruction Scope
The workflow includes explicit confirmation gates before plan generation and before expanding the plan to canvas; the later drawToCanvas action is disclosed and purpose-aligned.
Install Mechanism
Although the registry says there is no install spec, the skill text documents use of a pinned external npm CLI via global install or npx. This is disclosed and version-pinned, but users should trust and review that CLI separately.
Credentials
The skill requires a dLazy API key and sends prompts, parameters, and explicitly supplied media files to dLazy endpoints, which is expected for a hosted video-generation service.
Persistence & Privilege
The dLazy login flow stores an API key in a local user config file. This is disclosed and scoped to dLazy, with no evidence of background persistence or unrelated privilege use.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install idea2video
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /idea2video 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.1.0
idea2video 1.1.0 - Introduced a stateful workflow to guide users from idea input through video generation, with explicit user confirmation at each major planning step. - Added a canonical 7-segment idea-to-video pipeline: story → characters → 3-view portraits → script → scenes/shots → video generation → concatenation. - Enforced strict use of registered models for each workflow stage (qwen3_6-plus, banana-pro, veo_3_1-fast, merge). - Included detailed instructions and schema for plan template construction and canvas expansion. - Provided comprehensive authentication and usage instructions for both CLI and API key management. - Enhanced user guidance and suggestion prompts throughout the workflow.
元数据
Slug idea2video
版本 1.1.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Idea2video 是什么?

Turn a user's idea into a detailed video pipeline by generating story, characters, portraits, scenes, shots, keyframes, and concatenated shot videos via a pl... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 149 次。

如何安装 Idea2video?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install idea2video」即可一键安装,无需额外配置。

Idea2video 是免费的吗?

是的,Idea2video 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Idea2video 支持哪些平台?

Idea2video 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Idea2video?

由 dlazy(@dlazyai)开发并维护,当前版本 v1.1.0。

💬 留言讨论