← Back to Skills Marketplace

🔌

Vocal Isolation, Background Music Removal then De-Noise

Name: Vocal Isolation, Background Music Removal then De-Noise
Author: speech2srt

by speech2srt · GitHub ↗ · v1.3.1 · MIT-0

cross-platform ⚠ suspicious

178

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install speech-isolate

Description

Vocal isolation / background music removal on remote (FREE) L4 GPU. Trigger when user says: isolate vocals, remove background music, extract voice, 提取人声, 去除背...

Usage Guidance

This skill appears to do what it says (remote Demucs + ClearerVoice inference on Modal), but you should be cautious before installing or running it: - Do NOT run isolate.py directly on your local machine unless you audit the code first. The code will remove and replace ~/.cache (shutil.rmtree on cache_dir) and manipulate ~/checkpoints (rmdir + symlink), which could delete or change files in your home directory if executed locally. The SKILL.md does not mention these destructive filesystem changes. - Prefer running via 'modal run' as intended (the code is designed for Modal remote containers). If you will run anywhere else, inspect and modify the symlink / rmtree logic so it cannot touch your real home directory. - The image build will pip-install large ML packages (torch, demucs, clearvoice). That is expected for the task but be aware it pulls code and models from PyPI/HF at build/runtime; if you require provenance, ask the author for exact package sources/versions or a reproducible image. - The docs mention HF_TOKEN for higher-rate model downloads; you may need to provide that to avoid download failures — the skill does not declare it as required in metadata. - Ask the skill author to (a) explicitly document the symlink/deletion behavior and why it is safe within Modal, (b) avoid destructive operations on Path.home() or at least guard them so they only run inside the Modal environment, and (c) fix minor docs inconsistencies (npx mention, Python version mismatch). If you can't get these clarifications, treat the skill as untrusted for local execution and only run it in a controlled environment (isolated account or throwaway container) after code review.

Capability Analysis

Type: OpenClaw Skill Name: speech-isolate Version: 1.3.1 The skill bundle provides a legitimate pipeline for vocal isolation and speech enhancement using the Modal cloud platform. The code in `isolate.py` and `src/images.py` correctly implements the two-stage processing (Demucs and ClearerVoice) and uses standard Modal CLI commands for data management as described in `SKILL.md`. No evidence of data exfiltration, malicious execution, or prompt injection was found.

Capability Assessment

✓ Purpose & Capability

Name/description claim vocal isolation on a remote GPU; the code uses Modal, Demucs and a speech-enhancement model (ClearerVoice) and creates Modal volumes/images — these are coherent with the stated purpose.

⚠ Instruction Scope

SKILL.md instructs uploading local audio/video to a Modal volume and running the pipeline remotely — that fits the purpose. However, the runtime code modifies Path.home() by deleting/symlinking ~/.cache and manipulating ~/checkpoints (shutil.rmtree(cache_dir) and rmdir() followed by symlink_to()). Those are not documented in SKILL.md and are potentially destructive if the script is executed locally (instead of inside the Modal container). SKILL.md also contains small inconsistencies (mentions 'npx skills add' and advises Python 3.9+ while the image specifies Python 3.11).

ℹ Install Mechanism

There is no external arbitrary download URL; remote image builds use apt_install and pip_install of several packages (ffmpeg, clearvoice, torch, torchaudio, demucs, soundfile). This is expected for ML inference but pulls fairly large native/PyPI packages into the remote image (normal for the task).

ℹ Credentials

The skill declares no required environment variables or credentials. The error-handling doc mentions HF_TOKEN may be needed for model download rate limits (not required but useful). The image sets only non-sensitive env flags (TQDM_DISABLE, HF_HUB_DISABLE_PROGRESS). No unrelated credentials are requested.

✓ Persistence & Privilege

always:false and standard Modal App/Image/Volume usage. The skill creates Modal volumes and an image (remote resources) but does not request permanent agent-wide privileges or modify other skills. The main privilege concern is the filesystem operations in the code (see instruction_scope).

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install speech-isolate
After installation, invoke the skill by name or use /speech-isolate
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.3.1

No user-facing changes or updates in this version. - Version bump from v1.3.0 to v1.3.1 only; no functional or documentation changes detected.

v1.3.0

- Version bumped to 1.3.0. - No code or file changes detected. - Documentation version updated to reflect new release.

v1.1.2

No changes detected in this version. - Version remains at 1.2.0 - No file or documentation updates were made.

v1.1.1

Speech Isolate v1.2.0 introduces speech enhancement with an updated pipeline. - Upgraded to a two-stage process: first vocal separation (Demucs), then speech enhancement/noise removal (ClearerVoice MossFormer2). - Output files now use the suffix _isolated.wav instead of _vocals.wav. - Expanded the description and updated workflow to reflect speech enhancement step. - Other instructions and setup remain the same.

v1.1.0

- Improved documentation with detailed step-by-step workflow and user prompts for file selection and processing. - Clarified setup and prerequisite checks for Python and Modal CLI. - Enhanced guidance on directory preservation and output organization. - Added explicit error handling references and reporting instructions. - Updated triggers and descriptions for broader language support.

Metadata

Slug speech-isolate

Version 1.3.1

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 5

Frequently Asked Questions

What is Vocal Isolation, Background Music Removal then De-Noise?

Vocal isolation / background music removal on remote (FREE) L4 GPU. Trigger when user says: isolate vocals, remove background music, extract voice, 提取人声, 去除背... It is an AI Agent Skill for Claude Code / OpenClaw, with 178 downloads so far.

How do I install Vocal Isolation, Background Music Removal then De-Noise?

Run "/install speech-isolate" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Vocal Isolation, Background Music Removal then De-Noise free?

Yes, Vocal Isolation, Background Music Removal then De-Noise is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Vocal Isolation, Background Music Removal then De-Noise support?

Vocal Isolation, Background Music Removal then De-Noise is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Vocal Isolation, Background Music Removal then De-Noise?

It is built and maintained by speech2srt (@speech2srt); the current version is v1.3.1.

More Skills