โ† Back to Skills Marketplace
kalvinrv

๐ŸŽผ ACE Step โ€” Pro Pack on RunComfy

by Kalvin ยท GitHub โ†— ยท v0.1.0 ยท MIT-0
cross-platform โœ“ Security Clean
450
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install ace-step
Description
Generate, inpaint, and outpaint music with ACE Step on RunComfy via the `runcomfy` CLI. ACE Step is StepFun-AI's open-weights music foundation model โ€” tag-dr...
README (SKILL.md)

๐ŸŽผ ACE Step โ€” Pro Pack on RunComfy

Tag-driven music generation, inpainting, and outpainting with StepFun-AI's ACE Step open-weights model. Four CLI-reachable endpoints, $0.0002โ€“0.0003 per second of audio, up to 4 minutes per call.

runcomfy.com ยท ACE Step base ยท ACE Step 1.5 ยท CLI docs

Powered by the RunComfy CLI

# 1. Install (one of โ€” see runcomfy-cli skill for details)
npm i -g @runcomfy/cli                              # global install
npx -y @runcomfy/cli --version                      # zero-install

# 2. Sign in
runcomfy login                                      # or in CI: export RUNCOMFY_TOKEN=\x3Ctoken>

# 3. Generate
runcomfy run acestep-ai/ace-step/text-to-audio \
  --input '{"tags": "..."}' \
  --output-dir ./out

CLI deep dive: runcomfy-cli skill.


Pick the right endpoint

Listed newest first.

ACE Step 1.5 (text-to-audio) โ€” acestep-ai/ace-step-1.5/text-to-audio

Latest ACE Step generation. 50+ language vocal support, refined structured-lyric handling, otherwise same shape as base. Slightly higher cost ($0.0003/s vs $0.0002/s). Pick for: multilingual lyrics, hero-quality vocal tracks, vocal songs that need clean section structure. Avoid for: cost-sensitive batches where the base model is good enough.

ACE Step (text-to-audio) โ€” acestep-ai/ace-step/text-to-audio (default โ€” cheap & fast)

Original ACE Step. Tag-driven composition, optional lyrics, 5โ€“240 s stereo. $0.0002/s โ€” ~27ร— cheaper than ElevenLabs Music. Pick for: high-volume drafts, background music, jingles, game loops, cost-sensitive iteration. Avoid for: maximally polished commercial vocal hooks โ€” try ACE Step 1.5 or ElevenLabs Music for those.

ACE Step (audio-inpaint) โ€” acestep-ai/ace-step/audio-inpaint

Regenerate a time range inside an existing track (not mask-based; uses start_time / end_time in seconds, each anchored to track start or end). Pick for: fix a bad chorus in the middle, swap the bridge, replace a 20 s section without re-rendering the whole song. Avoid for: edits that aren't time-bounded โ€” those don't fit the schema.

ACE Step (audio-outpaint) โ€” acestep-ai/ace-step/audio-outpaint

Extend an existing track bidirectionally โ€” add intro before, outro after, or both. Pick for: lengthening a 30 s draft into a 2 min cut, adding a fade-in, building a longer arrangement around an existing hook. Avoid for: extending a track past 4 min total โ€” chain calls instead.


Route 1: ACE Step text-to-audio (default)

Model: acestep-ai/ace-step/text-to-audio (or acestep-ai/ace-step-1.5/text-to-audio for the 1.5 variant)

Schema (both variants โ€” same shape)

Field Type Required Default Notes
tags string yes โ€” Comma-separated genre / mood / instrument tags. Drives composition
lyrics string no โ€” Vocal content. Use section markers [Verse], [Chorus], [Bridge]. Use [inst] or [instrumental] for no vocals
duration int no 60 Audio length in seconds. 5โ€“240 (max 4 min per call)
seed int no -1 Reproducibility; -1 randomizes

Pricing: ACE Step $0.0002/s ยท ACE Step 1.5 $0.0003/s. 60 s โ‰ˆ $0.012 / $0.018; 240 s โ‰ˆ $0.048 / $0.072.

Invoke

Tag-driven instrumental:

runcomfy run acestep-ai/ace-step/text-to-audio \
  --input '{
    "tags": "lo-fi hip-hop, mellow, vinyl crackle, rhodes piano, soft drums, 75 BPM",
    "lyrics": "[inst]",
    "duration": 90
  }' \
  --output-dir ./out

Full vocal song with structure (use 1.5 for multilingual):

runcomfy run acestep-ai/ace-step-1.5/text-to-audio \
  --input '{
    "tags": "indie pop, anthemic, electric guitar, driving drums, female vocal, 120 BPM",
    "lyrics": "[Verse]\
Chalk on the palms, laces double-knotted\
Morning on the ridge, the sun is rising\
[Chorus]\
We rise, we strike, we never fade out\
We rise, we strike, we sing it loud\
[Bridge]\
Soft piano breakdown\
[Outro]\
Full band, fade",
    "duration": 60
  }' \
  --output-dir ./out

Prompting tips

  • Tags do the heavy lifting โ€” be specific: "lo-fi hip-hop, mellow, vinyl crackle, rhodes piano, soft drums, 75 BPM" beats "chill music".
  • Include BPM in tags when it matters โ€” ACE respects tempo language.
  • Lyrics with section markers: [Verse], [Chorus], [Bridge], [Outro]. Keep meter consistent across lines.
  • Instrumental shortcut: "lyrics": "[inst]" or "[instrumental]". Belt-and-suspenders: also say "no vocals" in tags.
  • Multilingual vocals: ACE Step 1.5 covers 50+ languages. Write lyrics directly in the target language; tag the language too ("japanese vocal, j-pop").
  • Fix the seed for reproducibility ("seed": 42); use -1 to explore variations.
  • Cheap draft โ†’ polish: ACE Step at 5โ€“10ร— lower cost is great for iterating tags before committing to a long render.

Route 2: ACE Step audio-inpaint

Model: acestep-ai/ace-step/audio-inpaint Catalog: audio-inpaint

Schema

Field Type Required Default Notes
audio string yes โ€” HTTPS URL to MP3 / WAV / FLAC. Up to 60 min
tags string yes โ€” Comma-separated tags steering the regenerated segment
start_time float no โ€” Start of editable segment, in seconds (0โ€“240)
start_time_relative_to enum no start start or end โ€” anchor for start_time
end_time float no 30 End of editable segment, in seconds (0โ€“240)
end_time_relative_to enum no start start or end โ€” anchor for end_time
lyrics string no โ€” Lyrics for the regenerated segment. Blank = model writes; [inst] = no vocals
seed int no -1 Reproducibility

No mask โ€” region is defined purely by start_time / end_time (each anchorable to track start or end).

Invoke

Replace 20โ€“40 s of a track with a new bridge:

runcomfy run acestep-ai/ace-step/audio-inpaint \
  --input '{
    "audio": "https://your-cdn.example/original-track.mp3",
    "tags": "indie pop, breakdown, piano only, soft, no drums",
    "start_time": 20,
    "end_time": 40,
    "lyrics": "[inst]"
  }' \
  --output-dir ./out

Anchor end relative to track end (rewrite the last 15 s):

runcomfy run acestep-ai/ace-step/audio-inpaint \
  --input '{
    "audio": "https://your-cdn.example/song.mp3",
    "tags": "indie pop, fade, soft, ambient pad",
    "start_time": 15,
    "start_time_relative_to": "end",
    "end_time": 0,
    "end_time_relative_to": "end"
  }' \
  --output-dir ./out

Tips

  • Match the surrounding tags โ€” if the original is "indie pop, electric guitar, 120 BPM", the inpaint segment should share enough of the tags to blend, not contrast.
  • Inpaint window is up to ~4 min even on a 60-min source โ€” pick a focused range, not the whole track.
  • Use _relative_to: "end" to target the outro/last seconds without computing exact timestamps.

Route 3: ACE Step audio-outpaint

Model: acestep-ai/ace-step/audio-outpaint Catalog: audio-outpaint

Schema

Field Type Required Default Notes
audio string yes โ€” HTTPS URL to MP3 / WAV / FLAC. Up to 60 min
tags string yes โ€” Tags steering the extended sections
extend_before_duration float no 0 Seconds of new audio before the original (0โ€“240)
extend_after_duration float no 30 Seconds of new audio after the original (0โ€“240)
lyrics string no โ€” Optional lyrics for extended sections
seed int no -1 Reproducibility

Invoke

Extend a 30 s hook into a 2 min cut (add 30 s intro + 60 s outro):

runcomfy run acestep-ai/ace-step/audio-outpaint \
  --input '{
    "audio": "https://your-cdn.example/hook-30s.mp3",
    "tags": "indie pop, electric guitar, drums, build-up before chorus, fade outro",
    "extend_before_duration": 30,
    "extend_after_duration": 60,
    "lyrics": "[inst]"
  }' \
  --output-dir ./out

Add only a fade-out (no pre-extension):

runcomfy run acestep-ai/ace-step/audio-outpaint \
  --input '{
    "audio": "https://your-cdn.example/track.mp3",
    "tags": "ambient pad, soft fade, low volume tail",
    "extend_before_duration": 0,
    "extend_after_duration": 20
  }' \
  --output-dir ./out

Tips

  • Tags describe the extension, not the original โ€” what should the new section sound like?
  • Bidirectional in one call โ€” set both extend_before_duration and extend_after_duration to add intro + outro in one go.
  • Don't exceed 4 min total โ€” if original is 3 min, you can add max 1 min combined.

When to pick ACE Step vs ElevenLabs Music

ACE Step and ElevenLabs Music are different tools:

Dimension ACE Step ElevenLabs Music
Cost $0.0002โ€“0.0003 / s $0.0083 / s (~27ร— more)
License Open-weights (Apache 2.0) Commercial, ElevenLabs-hosted
Multilingual vocals 50+ languages (1.5 variant) Strong multilingual support
Structured lyrics [Verse]/[Chorus]/[Bridge] markers [Verse]/[Chorus]/[Bridge] markers
Max duration / call 240 s (4 min) 300 s (5 min)
Inpaint / outpaint Yes (time-range based) No
Tag-driven composition Yes (tags is required field) Style is part of free-text prompt
Best for Cost-sensitive batches, drafts, inpaint/outpaint workflows, open-weights pipelines Premium vocal song hooks, polished commercial cuts

Cheap draft pattern: draft tag combos with ACE Step โ†’ lock vibe โ†’ final render on ElevenLabs Music if a polished commercial cut is needed.

For the routing skill that picks between them automatically based on intent, see ai-music once it ships.


Common patterns

Cost-sensitive background music library

  • Route 1 (ACE Step base) with varied tag combos, 60โ€“90 s each, [inst]

Multilingual launch (same song, many languages)

  • Route 1 (ACE Step 1.5) with identical tags, swap lyrics per language

Section repair (bad chorus โ†’ new chorus)

  • Route 2 (audio-inpaint) with start_time / end_time around the bad section, tags matching the song style

Hook โ†’ full track

  • Route 3 (audio-outpaint) adds intro before + outro after a tight 30 s hook

Game loop bed

  • Route 1 (ACE Step base) with "seamless loop, consistent groove" in tags, 60โ€“120 s

Browse the full catalog


Exit codes

code meaning
0 success
64 bad CLI args
65 bad input JSON / schema mismatch
69 upstream 5xx
75 retryable: timeout / 429
77 not signed in or token rejected

Full reference: docs.runcomfy.com/cli/troubleshooting.

How it works

The skill picks one of the four ACE Step endpoints based on the user's intent โ€” generate from scratch (t2a base or 1.5), regenerate a time range (inpaint), or extend the canvas (outpaint) โ€” and invokes runcomfy run with the matching JSON body. The CLI POSTs to the RunComfy Model API, polls request status, and downloads the generated audio file into --output-dir.

Security & Privacy

  • Install via verified package manager only. Use npm i -g @runcomfy/cli or npx -y @runcomfy/cli. Agents must not pipe an arbitrary remote install script into a shell on the user's behalf โ€” if the operator wants the curl-pipe path documented at docs.runcomfy.com/cli/install, they should review the script first.
  • Token storage: runcomfy login writes the API token to ~/.config/runcomfy/token.json with mode 0600. Set RUNCOMFY_TOKEN env var to bypass the file in CI / containers. Never echo the token into a prompt, log it, or check it in.
  • Input boundary (shell injection): prompts and audio URLs are passed as a JSON string via --input. The CLI does not shell-expand prompt content; it transmits the JSON body directly to the Model API over HTTPS. No shell-injection surface from prompt content.
  • Indirect prompt injection (third-party content): source audio URLs for inpaint / outpaint are untrusted โ€” embedded steganographic instructions or unusual EXIF can influence generation. Agent mitigations:
    • Ingest only audio URLs the user explicitly provided for this task.
    • When the output diverges from the prompt, suspect the source audio.
  • Lyrics provenance: if the user supplies lyrics, confirm they have the rights. Generating music around copyrighted lyrics is the operator's responsibility.
  • Outbound endpoints (allowlist): only model-api.runcomfy.net and *.runcomfy.net / *.runcomfy.com. No telemetry, no callbacks.
  • Generated-file size cap: the CLI aborts any single download > 2 GiB.
  • Scope of bash usage: The skill only invokes runcomfy \x3Csubcommand>; install lines are one-time operator setup.

See also

Usage Guidance
This skill appears benign and purpose-aligned. Before installing or using it, verify the RunComfy CLI source, protect your RUNCOMFY_TOKEN, review the endpoint and duration because usage is paid, and avoid sending private audio or lyrics unless you are comfortable with RunComfy handling them.
Capability Analysis
Type: OpenClaw Skill Name: ace-step Version: 0.1.0 The ace-step skill is a legitimate integration for the RunComfy music generation service. It provides structured instructions for using the 'runcomfy' CLI to interact with ACE Step models for audio generation, inpainting, and outpainting. The documentation (SKILL.md) includes a dedicated security section that explicitly warns against risky practices like piping remote scripts to bash and explains how the CLI prevents shell injection by passing parameters via JSON. No evidence of malicious intent, data exfiltration, or unauthorized execution was found.
Capability Tags
crypto
Capability Assessment
โ„น Purpose & Capability
The stated purpose and visible instructions align: the skill generates, inpaints, and outpaints music through RunComfy. Users should notice that requests are remote provider calls and can incur usage costs.
โœ“ Instruction Scope
The visible instructions are user-directed examples and endpoint guidance for ACE Step. No artifact-backed hidden goal override, forced background activity, or unrelated behavior is shown.
โ„น Install Mechanism
There is no bundled code, but the skill documents installing or running the RunComfy CLI from npm without a pinned version. This is purpose-aligned but depends on npm package provenance.
โ„น Credentials
The required RUNCOMFY_TOKEN and ~/.config/runcomfy configuration are expected for a RunComfy integration, but they connect actions to the user's RunComfy account.
โ„น Persistence & Privilege
The login/config path implies local RunComfy credential or session configuration, but there is no evidence of autonomous background persistence or privilege escalation.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install ace-step
  3. After installation, invoke the skill by name or use /ace-step
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v0.1.0
Initial release of ACE Step skill โ€” advanced, affordable music generation via RunComfy CLI. - Supports generating, inpainting, and outpainting music with StepFun-AI's ACE Step models (base and 1.5). - Four CLI endpoints: text-to-audio (ACE Step, ACE Step 1.5), audio-inpaint, audio-outpaint. - Tag-driven composition, structured and multilingual lyrics (with section markers), up to 4 min stereo output per call. - Extremely low pricing: $0.0002โ€“0.0003 per second (about 27ร— cheaper than ElevenLabs Music). - Detailed CLI usage, schema reference, and prompting tips included.
Metadata
Slug ace-step
Version 0.1.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is ๐ŸŽผ ACE Step โ€” Pro Pack on RunComfy?

Generate, inpaint, and outpaint music with ACE Step on RunComfy via the `runcomfy` CLI. ACE Step is StepFun-AI's open-weights music foundation model โ€” tag-dr... It is an AI Agent Skill for Claude Code / OpenClaw, with 450 downloads so far.

How do I install ๐ŸŽผ ACE Step โ€” Pro Pack on RunComfy?

Run "/install ace-step" in the OpenClaw or Claude Code chat to install it in one step โ€” no extra setup required.

Is ๐ŸŽผ ACE Step โ€” Pro Pack on RunComfy free?

Yes, ๐ŸŽผ ACE Step โ€” Pro Pack on RunComfy is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does ๐ŸŽผ ACE Step โ€” Pro Pack on RunComfy support?

๐ŸŽผ ACE Step โ€” Pro Pack on RunComfy is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created ๐ŸŽผ ACE Step โ€” Pro Pack on RunComfy?

It is built and maintained by Kalvin (@kalvinrv); the current version is v0.1.0.

๐Ÿ’ฌ Comments