← 返回 Skills 市场
yjx-research

ControlFoley Audio Generator

作者 Jianxuan Yang · GitHub ↗ · v1.0.8 · MIT-0
cross-platform ✓ 安全检测通过
189
总下载
2
收藏
0
当前安装
9
版本数
在 OpenClaw 中安装
/install controlfoley-audio-generator
功能描述
A multi-functional audio generation tool for SFX generation, video-to-audio and text-to-audio. 多功能音频生成工具,集成可控视频生成音频、文本生成音频等功能.
使用说明 (SKILL.md)

ControlFoley Audio Generator

A multi-functional audio generation tool powered by the ControlFoley model, integrating video sound effect (SFX) generation, video background music composition, text-to-audio and other functions to realize diversified creative audio generation.

This tool supports four modes: Video-to-Audio (V2A), Text-Controlled Video-to-Audio (TC-V2A), Audio-Controlled Video-to-Audio (AC-V2A), and Text-to-Audio (T2A).

Basic Info

Field Value
Service Operator Xiaomi LLM Plus Team
API Endpoint https://controlfoley.ai.xiaomi.com
Open Source Repo https://github.com/xiaomi-research/controlfoley
Project Page https://yjx-research.github.io/ControlFoley_web_page/
Online Demo https://yjx-research.github.io/ControlFoley_web_page/#try-gen
Model Weights https://huggingface.co/YJX-Xiaomi/ControlFoley/
API Key Not required
Script Path scripts/foley.py

Prerequisites

python3 --version   # Python 3.x
curl --version      # curl for API submission
ffmpeg -version     # optional, for audio format conversion

Modes

Mode Command Input Output Description
V2A v2a video.mp4 Video file .mp4 + .flac Generate audio matching the video content
TC-V2A v2a video.mp4 --prompt "text" Video + text .mp4 + .flac Generate audio aligned with text prompts while staying synchronized with the video
AC-V2A v2a video.mp4 --ref-audio ref.wav Video + reference audio .mp4 + .flac Generate audio with timbre matching reference audio while staying synchronized with the video
T2A t2a "prompt" Text description .flac Generate audio from text descriptions

Usage (CLI version)

1. Text-to-Audio (T2A, default 8s)

python3 scripts/foley.py t2a "dog barking loudly in a park"

2. Video-to-Audio (V2A)

python3 scripts/foley.py v2a input.mp4

3. Text-Controlled Video-to-Audio (TC-V2A)

python3 scripts/foley.py v2a input.mp4 --prompt "footsteps on gravel with birds chirping"

4. Audio-Controlled Video-to-Audio (AC-V2A)

python3 scripts/foley.py v2a input.mp4 --ref-audio reference.wav

5. Specify duration

python3 scripts/foley.py t2a "A mountain stream murmurs, its gentle current lapping against the pebbles." --duration 15

6. Generate multiple candidates

python3 scripts/foley.py t2a "cat purring softly" --count 3

7. Fixed seed (reproducible results)

python3 scripts/foley.py t2a "rain on a tin roof" --seed 42

8. List available models

python3 scripts/foley.py models

Usage (API version)

POST

curl -X POST "https://controlfoley.ai.xiaomi.com/api/v1/v2a/submit" -F "file=@video_path" -F "prompt=footsteps on gravel with birds chirping"

return

{"taskId": "xxx", "message": "Task submitted successfully"}

GET

1. Available Models

curl -X GET "https://controlfoley.ai.xiaomi.com/api/v1/v2a/models" 

return

{"models":[{"name":"ControlFoley","enabled":true}]}

2. Status Inquiry

curl -X GET "https://controlfoley.ai.xiaomi.com/api/v1/v2a/status/{taskId}" 

return

  1. success:
{"urls":["{Domain name}/ControlFoley_output/{taskId}/{filename}"],"status":"success","done":true}
  1. processing:
{"status":"processing","done":false}
  1. pending:
{"status":"pending","queue_pos":1,"queue_position":1,"total_queue":2,"done":false}

3. Result Download

curl -X GET "https://controlfoley.ai.xiaomi.com/api/v1/v2a/ControlFoley_output/{taskId}/{filename}" --output ./output.flac

4. Status Inquiry & Result Download

curl -X GET "https://controlfoley.ai.xiaomi.com/api/v1/v2a/status_download/{taskId}" --output-dir ./output --output audio.zip

Parameters

T2A (Text-to-Audio)

Parameter Description Default Example
prompt Audio description text (required) "dog barking in park"
--model Model ID ControlFoley --model ControlFoley
--duration Audio length in seconds (max 30) 8 --duration 15
--negative Negative prompt to exclude unwanted sounds --negative "noise, human voice"
--cfg CFG strength — higher = stricter prompt adherence 4.5 --cfg 6.0
--count Number of variants to generate (1–5) 1 --count 3
--seed Fixed random seed for reproducibility --seed 42
-o/--outdir Output directory ./output -o ./my_audio

V2A (Video-to-Audio)

Parameter Description Default Example
video Input video path (required) input.mp4
--model Model ID ControlFoley --model ControlFoley
--prompt Text prompt to guide audio generation (TC-V2A) --prompt "keyboard tapping"
--negative Negative prompt to exclude unwanted sounds --negative "music, noise"
--ref-audio Reference audio file for timbre control (AC-V2A) --ref-audio reference.wav
--cfg CFG strength 4.5 --cfg 7.0
--count Number of variants to generate (1–5) 1 --count 2
--seed Fixed random seed (not forwarded to API currently) --seed 42
-o/--outdir Output directory ./output -o ./results

Prompt Tips

  • Be specific: "cat footsteps on wooden floor" beats "cat sound"
  • Use negative prompts: --negative "human voice, music, noise" to filter unwanted audio
  • CFG tuning: high CFG (6.0–7.5) for precise control, low CFG (3.0–4.5) for creative freedom

Output & Post-Processing

  • Audio: .flac (44100 Hz, lossless)
  • Video: .mp4 (original video + generated audio track)
  • Results saved to --outdir, paths printed to stdout

Convert to MP3 for sharing:

ffmpeg -i output.flac -codec:a libmp3lame -qscale:a 2 output.mp3

Error Handling

Issue Cause Fix
Internal URL inaccessible Result URL uses .xiaomi.srv internal domain Script auto-falls back to /api/v1/v2a/ControlFoley_output/{task_id}/{filename}
Queue busy Task is waiting Script auto-polls up to ~5 min; check load via curl $API_BASE/health
Model unavailable Model not enabled Run foley.py models to see available models
Task timeout Service overloaded Resubmit the task

API Reference

See ./references/api-reference.md for full endpoint documentation.

⚠️ Privacy & Security

  • Service Operator: Cloud processing is operated by the Xiaomi LLM Plus Team at https://controlfoley.ai.xiaomi.com
  • Data Upload: V2A/TC-V2A/AC-V2A modes upload the full video file to the remote service for processing. Do not upload videos containing sensitive personal or identifiable information
  • Data Processing: Uploaded videos and audio are used solely for audio generation. Results are returned via URL. Refer to the Xiaomi LLM Plus Team's terms of service for data retention and access control policies
  • No API Key Required: The service requires no authentication — please use it responsibly to avoid unnecessary load
  • Recommendation: Before first use, validate with a small, non-sensitive test clip
安全使用建议
This skill uploads any video, audio, or prompt you provide to a remote service (https://controlfoley.ai.xiaomi.com) and saves returned audio locally. Before installing or using it: (1) confirm you trust the remote endpoint and avoid uploading sensitive or private media, (2) note that the script invokes the local curl binary (and requires python3); SKILL.md also mentions ffmpeg for optional conversions — ensure those tools are installed if needed, (3) review the referenced upstream GitHub/project pages yourself to verify provenance and privacy policy, and (4) if you need an offline or self-hosted workflow, this skill is not suitable because it relies on the remote API.
功能分析
Type: OpenClaw Skill Name: controlfoley-audio-generator Version: 1.0.8 The bundle is a legitimate tool for generating audio from text or video using the ControlFoley model hosted by Xiaomi. The script `scripts/foley.py` facilitates task submission and result downloading via the `https://controlfoley.ai.xiaomi.com` API. It includes transparent documentation regarding data privacy and lacks any signs of malicious behavior, obfuscation, or intentional vulnerabilities.
能力标签
requires-sensitive-credentials
能力评估
Purpose & Capability
The skill's name/description (audio SFX, V2A, T2A) align with the included script and API references. Minor inconsistency: the registry metadata lists no required binaries, but SKILL.md and scripts rely on python3 and call curl (subprocess). SKILL.md also mentions ffmpeg as optional. These binaries are reasonable for the stated purpose but should be declared in metadata.
Instruction Scope
SKILL.md and scripts limit actions to submitting tasks to the specified API, polling for status, downloading results, and writing outputs to the chosen output directory. The code checks input file existence and does not read unrelated system files or environment variables. The main runtime behavior is uploading user-provided media/text to the remote API and saving returned files.
Install Mechanism
There is no install spec; this is an instruction-only skill with a bundled Python script. No installers, third-party packages, or arbitrary downloads are performed by the skill itself.
Credentials
The skill declares no environment variables or credentials and the code does not attempt to access secrets. All network communication goes to controlfoley.ai.xiaomi.com (and documented fallback endpoints). No unrelated service credentials are requested.
Persistence & Privilege
The skill is not always-enabled and does not modify other skills or system-wide configuration. It runs on invocation and does not request special persistent privileges.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install controlfoley-audio-generator
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /controlfoley-audio-generator 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.8
- Updated privacy and security section to add clear guidelines on data handling, processing, and user recommendations. - Removed duplicated and verbose API usage examples to streamline documentation. - Kept the API and CLI usage, parameters, and error handling instructions unchanged. - No functional or interface changes; documentation improvements only.
v1.0.7
Version 1.0.7 - Added _meta.json metadata file. - Updated API endpoint and documentation: now points to https://controlfoley.ai.xiaomi.com (was https://llmplus.ai.xiaomi.com). - Expanded and clarified SKILL.md usage instructions for both CLI and API, including new cURL examples and result retrieval methods. - Improved documentation for API parameters and response formats, including successful, processing, and pending task statuses. - Removed GitHub and ClawHub social promotion from documentation. - No breaking changes to model functionality.
v1.0.6
- Added a "star us" message with links to the project's GitHub and ClawHub pages. - No functional or API changes; documentation only.
v1.0.5
- No user-facing changes in this version. - No file changes detected; documentation and functionality remain the same.
v1.0.4
Version 1.0.4 - No file changes detected in this release. - No feature updates, bug fixes, or documentation changes.
v1.0.3
- Clarified that this documentation applies to the CLI version by updating the title to "ControlFoley Audio Generator (CLI version)". - No functional or API changes; documentation only. - Improved accuracy and clarity of documentation scope.
v1.0.2
- Updated the tool description to a shorter, more concise format in both English and Chinese. - No changes to code or functionality; documentation only.
v1.0.1
- Removed the README.md and README_zh.md files. - No new features or functional changes; documentation files were cleaned up. - Skill description and core usage remain unchanged.
v1.0.0
Initial release of ControlFoley Audio Generator. - Supports AI-generated audio and foley using the ControlFoley Audio Generator API. - Two core modes: Video-to-Audio (V2A) for syncing sounds to video, and Text-to-Audio (T2A) for generating audio from descriptions. - Additional capabilities: text-controlled video dubbing (TC-V2A), audio style transfer (AC-V2A). - Suitable for a wide range of sound effects: games, nature, animals, mechanical, ads, and more. - Includes CLI tool with configurable parameters (model, seed, negative prompts, duration, etc.). - Built-in workarounds for API quirks and network issues; no authentication required.
元数据
Slug controlfoley-audio-generator
版本 1.0.8
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 9
常见问题

ControlFoley Audio Generator 是什么?

A multi-functional audio generation tool for SFX generation, video-to-audio and text-to-audio. 多功能音频生成工具,集成可控视频生成音频、文本生成音频等功能. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 189 次。

如何安装 ControlFoley Audio Generator?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install controlfoley-audio-generator」即可一键安装,无需额外配置。

ControlFoley Audio Generator 是免费的吗?

是的,ControlFoley Audio Generator 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

ControlFoley Audio Generator 支持哪些平台?

ControlFoley Audio Generator 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 ControlFoley Audio Generator?

由 Jianxuan Yang(@yjx-research)开发并维护,当前版本 v1.0.8。

💬 留言讨论