← 返回 Skills 市场
harrylabsj

Hum2Song

作者 haidong · GitHub ↗ · v3.0.4 · MIT-0
cross-platform ✓ 安全检测通过
248
总下载
0
收藏
0
当前安装
8
版本数
在 OpenClaw 中安装
/install hum2song
功能描述
Hum2Song turns a hummed or sung melody into a complete song with local audio processing, MIDI extraction, and optional AI-assisted arrangement, without uploa...
使用说明 (SKILL.md)

Hum2Song

Turn a hummed melody into a complete song with local audio processing, without uploading sensitive recordings to third-party services.


Overview

This skill converts user humming or singing into complete songs using local AI models. The entire pipeline runs on your machine - no audio data is sent to external services.

Pipeline:

  1. 🎤 Audio Input → 2. 🎵 MIDI Extraction → 3. 🎼 Music Generation → 4. 🎧 Complete Song

Triggers

Use this skill when the user:

  • Hums or sings a melody and wants to turn it into a full song
  • Has an audio recording of humming/singing
  • Wants to create music from their own melodic ideas
  • Asks to "turn my humming into a song"

Requirements

System Dependencies

# macOS
brew install ffmpeg fluidsynth

# Ubuntu/Debian
sudo apt-get install ffmpeg fluidsynth

# Python packages
pip install basic-pitch pretty_midi librosa soundfile numpy

Optional: ACE-Step for Music Generation (User Choice)

ACE-Step is an optional local AI. Users decide whether to install it.

# User manually installs if they want AI generation
# Otherwise, default SoundFont synthesis works without AI
git clone https://github.com/ace-step/ace-step.git
pip install -r ace-step/requirements.txt

Note: First use downloads ~4GB model weights to local cache. No automatic downloads.


Core Workflow

Step 1: Extract MIDI from Audio

Use Basic Pitch (Spotify's open source tool) to convert humming to MIDI:

from basic_pitch.inference import predict
from basic_pitch import ICASSP_2022_MODEL_PATH

# Convert audio to MIDI
model_output, midi_data, note_events = predict("humming.wav")
midi_data.write("extracted.mid")

Step 2: Enhance MIDI Structure

Clean and enhance the extracted MIDI:

import pretty_midi

# Load extracted MIDI
pm = pretty_midi.PrettyMIDI("extracted.mid")

# Quantize notes to fix timing
for instrument in pm.instruments:
    for note in instrument.notes:
        note.start = round(note.start * 4) / 4  # Quantize to 16th notes
        note.end = round(note.end * 4) / 4

# Save enhanced MIDI
pm.write("enhanced.mid")

Step 3: Generate Full Song

Option A: ACE-Step (Local AI, Optional)

Only if user has manually installed ACE-Step:

from ace_step import MusicGenerator

# Load model (runs locally, downloads weights on first use)
generator = MusicGenerator.from_pretrained("ace-step/base")

# Generate music from MIDI
audio = generator.generate_from_midi(
    midi_path="enhanced.mid",
    style="pop",
    mood="upbeat",
    duration=120
)

# Save result
audio.save("complete_song.mp3")

Option B: MIDI + SoundFont (No AI)

import pretty_midi

# Load MIDI
pm = pretty_midi.PrettyMIDI("enhanced.mid")

# Synthesize with high-quality SoundFont
audio_data = pm.fluidsynth(fs=44100, sf2_path="path/to/good_soundfont.sf2")

# Save as WAV
import soundfile as sf
sf.write("complete_song.wav", audio_data, 44100)

Usage

Quick Start

# Run the complete pipeline
python ~/.openclaw/skills/hum2song/scripts/hum2song.py \
  --input my_humming.wav \
  --style pop \
  --mood upbeat \
  --output my_song.mp3

Parameters

Parameter Description Options
--input Input audio file Any audio format
--style Music style pop, rock, jazz, classical, electronic
--mood Song mood upbeat, calm, energetic, melancholic
--duration Target duration (seconds) 30-300
--output Output file path .mp3, .wav, .mid

Response Format

When User Provides Audio

🎵 I'll convert your humming into a complete song!

**Processing Pipeline:**
1️⃣ Extracting melody from your audio...
2️⃣ Converting to MIDI notation...
3️⃣ Generating full arrangement...
4️⃣ Rendering final song...

⏱️ Estimated time: 2-5 minutes

**Settings:**
• Style: {style}
• Mood: {mood}
• Duration: {duration}s

Processing now... 🎧

When Complete

✅ Your song is ready!

📁 Output: {output_path}
🎼 Format: {format}
⏱️ Duration: {actual_duration}s

**What was done:**
• Extracted {note_count} notes from your humming
• Generated {instrument_count} instrument tracks
• Applied {style} style with {mood} mood

💡 You can:
• Adjust style/mood and regenerate
• Export as MIDI for editing in DAW
• Try different AI models for variation

Example Interactions

Example 1: User Hums Audio

User: [uploads humming audio]

AI: 🎵 I'll convert your humming into a complete song!

**Processing Pipeline:**
1️⃣ Extracting melody from your audio...
2️⃣ Converting to MIDI notation...
3️⃣ Generating full arrangement...
4️⃣ Rendering final song...

⏱️ Estimated time: 2-5 minutes

What style would you like?
• Pop (default)
• Rock
• Jazz
• Classical
• Electronic

Example 2: User Specifies Style

User: Turn this humming into a jazz song

AI: 🎷 Jazz transformation incoming!

Processing your humming with jazz style:
• Swing rhythm
• Walking bass line
• Piano chords
• Brush drums

[Processing...]

✅ Your jazz song is ready!
📁 Output: ~/Music/hum2song_jazz_20240321.mp3

Example 3: User Wants MIDI Output

User: Can I get the MIDI file to edit myself?

AI: 🎼 Absolutely!

I'll generate:
• Extracted MIDI (raw melody)
• Enhanced MIDI (quantized, cleaned)
• Full arrangement MIDI (all instruments)

All files will be in: ~/Music/hum2song_export/

Technical Details

Audio Processing

Input Formats: WAV, MP3, M4A, FLAC, OGG Sample Rate: Automatically converted to 44.1kHz Channels: Mono/Stereo → Mono for processing

MIDI Extraction

Model: Basic Pitch (Spotify, ICASSP 2022) Pitch Range: C1 to C8 Note Detection: Polyphonic capable Timing Resolution: 10ms

Music Generation

ACE-Step Model:

  • Size: 1B parameters (base), 3B (large)
  • Training: Licensed music dataset
  • Output: 44.1kHz stereo
  • Latency: ~1s per second of audio on M1 Mac

SoundFont Synthesis:

  • No AI required
  • Real-time synthesis
  • High-quality instrument sounds
  • Deterministic output

Limitations

  • Requires local Python environment setup
  • ACE-Step needs ~4GB RAM for base model
  • Processing time: 2-5 minutes for a 2-minute song
  • Quality depends on humming clarity
  • Complex harmonies may not be fully captured

Privacy & Security

All processing is local - Your audio never leaves your machine ✅ No cloud services - No API keys or external uploads ✅ Open source tools - Basic Pitch, ACE-Step, Pretty MIDI ✅ No data collection - Nothing is logged or transmitted


References

  • basic-pitch.md - Audio to MIDI extraction
  • ace-step.md - AI music generation
  • pretty_midi.md - MIDI processing
  • librosa.md - Audio analysis utilities

Technical Information

Attribute Value
Name Hum2Song
Slug hum2song
Version 3.0.4
Category Audio / Music Generation
Tags music, audio, midi, ai-generation, local-processing
License MIT-0

Note: This skill requires local setup of Python dependencies. All audio processing happens on your device for maximum privacy.

安全使用建议
This skill appears coherent and implements a local humming→MIDI→synthesis pipeline. Before installing/running, consider: - Install dependencies in a dedicated virtualenv to avoid system-wide changes (pip install basic-pitch pretty_midi librosa soundfile numpy). - The ACE‑Step option is optional; if you enable it you will (a) git clone the repository and (b) on first run the model may download ~4GB of weights — verify you have disk space and bandwidth and that you trust the ACE‑Step source. The script adds ~/ace-step to sys.path and imports it; importing untrusted code from that directory can execute arbitrary code, so only clone from a trusted repo. - SoundFont synthesis requires an SF2 file; ensure you trust the SoundFont file source and path. If no SF2 is found the script falls back to default synthesis. - The tool writes output to ~/Music by default and uses temp directories; check file locations and permissions if that matters. - Although this skill does not call external APIs by default, third-party libraries or optional components may perform network actions (model weight fetches). Review any third-party projects (ACE‑Step, basic-pitch) you install if you need a stricter privacy posture. If you accept those points (especially trusting optional model sources), the skill is reasonable to use.
功能分析
Type: OpenClaw Skill Name: hum2song Version: 3.0.4 The Hum2Song skill is a legitimate tool for local audio-to-MIDI conversion and music synthesis. The primary script, `scripts/hum2song.py`, uses well-known open-source libraries (basic-pitch, pretty_midi, librosa) to process audio files entirely on the user's machine without any network exfiltration or suspicious command execution. The documentation in `SKILL.md` and the `references/` directory accurately describes the local-first design and provides clear instructions for dependency management without any evidence of prompt injection or malicious intent.
能力评估
Purpose & Capability
Name/description (local audio → MIDI → song) align with the included script and SKILL.md. Required tools (basic_pitch, pretty_midi, librosa, soundfile, fluidsynth/ffmpeg, optional ACE‑Step) are appropriate for the stated task and nothing unrelated (cloud creds, networking tokens, or unrelated system access) is requested.
Instruction Scope
Runtime instructions and the script operate on local files and libraries and explicitly state "no external API calls by default." The optional ACE‑Step flow will download model weights on first use (~4GB) as documented — this is outside the local-only guarantee in the sense that it uses network to fetch weights, but the SKILL.md explicitly warns and frames this as user-initiated. The script appends ~/ace-step to sys.path before importing ACE‑Step; importing code from that directory will execute whatever is present there, so users should only place trusted code in ~/ace-step or clone from a trusted repository.
Install Mechanism
There is no automated install spec bundled; SKILL.md gives manual install instructions (brew/apt, pip, optional git clone). This is low-risk. The only high-bandwidth download risk is the ACE‑Step model weights (~4GB) which the SKILL.md and references describe as downloaded on first use. All installs are user-initiated (pip/git).
Credentials
The skill requests no environment variables, credentials, or special config paths. It reads/writes typical user files (temporary work dir, ~/Music by default) — this is proportional for a local processing tool.
Persistence & Privilege
The skill is not always-enabled and does not request elevated privileges or modify other skills or global agent settings. It runs as a normal user script and only writes its own temporary/output files.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install hum2song
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /hum2song 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v3.0.4
Add explicit frontmatter description so ClawHub refreshes the public summary in English.
v3.0.3
Remove remaining Chinese from public references and metadata so the ClawHub page is fully English.
v3.0.2
Refresh the public skill page docs in English and align metadata for ClawHub.
v3.0.1
Clarify local-only processing. Remove external API references. Emphasize user choice for optional AI components.
v3.0.0
Restore full hum-to-song pipeline with local processing. Convert humming to complete songs using local AI models - no external API calls, all processing on device.
v2.0.0
Complete rewrite: Changed from fictional CLI tool to actual guidance skill for music recognition services
v1.0.1
Update version
v1.0.0
将哼唱旋律转化为完整歌曲的 AI 音乐创作工具,支持多种风格和格式输出
元数据
Slug hum2song
版本 3.0.4
许可证 MIT-0
累计安装 1
当前安装数 0
历史版本数 8
常见问题

Hum2Song 是什么?

Hum2Song turns a hummed or sung melody into a complete song with local audio processing, MIDI extraction, and optional AI-assisted arrangement, without uploa... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 248 次。

如何安装 Hum2Song?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install hum2song」即可一键安装,无需额外配置。

Hum2Song 是免费的吗?

是的,Hum2Song 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Hum2Song 支持哪些平台?

Hum2Song 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Hum2Song?

由 haidong(@harrylabsj)开发并维护,当前版本 v3.0.4。

💬 留言讨论