midasheng-audio-generate
/install midasheng-audio-generate
midasheng-audio-generate
Audio scene generation from text descriptions. Generates WAV audio with speech, sound effects, music, and environmental sounds.
1. Trigger
Use this skill when the user requests audio, sound effects, or music generation based on a text description.
2. Execution Steps
Step 1: Design the Audio Scene (Prompt Refinement)
Before calling the API, you must act as an expert Audio Scene Architect and Foley Designer. Deeply understand the user's natural language input (which may be in any language) and translate it into a highly structured tagged string based on real-world acoustic logic and scene realism.
Prompt Tag Definition:
\x3C|caption|>: The overall, comprehensive description of the audio scene.\x3C|speech|>: Speaker identity (e.g., middle-aged man, energetic girl) and speaking style.\x3C|asr|>: The actual transcript / spoken dialogue.\x3C|sfx|>: Specific sound effects present in the audio (e.g., footsteps, doorbell, dog barking).\x3C|music|>: Description of background music (e.g., soft jazz, tense orchestral).\x3C|env|>: Environmental or ambient background noise (e.g., city bustle, forest wind and crickets).
Crucial Generation Rules:
- Scene Enrichment: Do not merely copy the user's input! Act as a sound designer and logically enrich the scene.
- Speech & Dialogue Generation: If the user explicitly mentions speech or implies a speaking scenario, creatively generate a reasonable and vivid transcript for the
\x3C|speech|>and\x3C|asr|>fields. - Strict ASR Formatting: For the
\x3C|asr|>tag, output only the raw spoken text. Do not include any speaker labels or narration, such as “man:”, “speaker1:”, or “a man says”. - Omit Missing Elements: If any element is not relevant, directly omit its corresponding tag.
- Language & Case Constraint: The entire generated prompt string MUST be in lowercase English, including
\x3C|asr|>content. - Strict Output: Output ONLY the formatted tagged string internally for the next step.
Step 2: Execute Command
curl -X POST "https://llmplus.ai.xiaomi.com/dasheng/audio/gen" \
-H "Content-Type: application/json" \
-d "{\"text\": \"\x3CFORMATTED_PROMPT_STRING>\"}" \
-o \x3CFILENAME.wav>
3. Queue Status
Query Command
curl -X POST "https://llmplus.ai.xiaomi.com/metrics?path=/dasheng/audio/gen"
Returned Fields
active: Number of currently active requestsavg_latency_ms: Average processing latency (milliseconds)- Estimated wait time = active × avg_latency_ms
When to Call
- When the IM is about to timeout but the audiogen service has not returned a result: Check the queue status and inform the user, asking them to inquire again later.
- When the user asks about task progress later but the service still hasn't returned: Check the latest queue status and report it back to the user.
Status Levels
- 🟢 active=0 or estimated wait \x3C5s → Service idle
- 🟡 Estimated wait 5-30s → Slight queue
- 🔴 Estimated wait >30s → Queue is long, recommend trying again later
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install midasheng-audio-generate - 安装完成后,直接呼叫该 Skill 的名称或使用
/midasheng-audio-generate触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
midasheng-audio-generate 是什么?
Generate immersive audio scenes—complete with speech, sound effects, music, and ambient sounds by text descriptions. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 221 次。
如何安装 midasheng-audio-generate?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install midasheng-audio-generate」即可一键安装,无需额外配置。
midasheng-audio-generate 是免费的吗?
是的,midasheng-audio-generate 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
midasheng-audio-generate 支持哪些平台?
midasheng-audio-generate 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 midasheng-audio-generate?
由 Junbo Zhang(@jimbozhang)开发并维护,当前版本 v1.1.5。