← Back to Skills Marketplace
249
Downloads
0
Stars
0
Active Installs
7
Versions
Install in OpenClaw
/install melo-tts-metadata-creator
Description
当用户需要为 **MeloTTS** 训练或微调生成 metadata.list 文件时自动触发。 专门处理 .wav 音频文件和对应的 .txt 转录文本,自动生成符合 MeloTTS 官方最新标准的 metadata.list(格式:音频路径|speaker|语言|文本)。 支持单音色和多音色模式: - wa...
Usage Guidance
This skill's primary feature (building MeloTTS metadata.list, optional Whisper transcription) appears legitimate, but it will automatically modify your Python environment at import/runtime: it downgrades setuptools, issues pip installs, detects GPUs, creates a venv and may restart the script inside it. These actions can affect your system Python or install large packages (torch, whisper). Before installing or running: 1) review and, if necessary, modify the code so package installs occur only after a controlled venv is created (move ensure_package.pip calls after venv setup); 2) run the skill inside an isolated environment (container or dedicated virtual machine) or a user-created venv you trust; 3) audit network access and disk writes (models/, logs/, venv/) and ensure you accept large downloads; 4) if you only need metadata generation and already have transcriptions, consider running a trimmed-down version that skips automatic package installs/Whisper to avoid environment changes.
Capability Analysis
Type: OpenClaw Skill
Name: melo-tts-metadata-creator
Version: 1.0.6
The skill is designed to generate metadata for MeloTTS but contains high-risk environment management behaviors. Specifically, 'ensure_package.py' automatically executes 'pip install --force-reinstall' for setuptools upon being imported, and 'env_manager.py' uses 'subprocess' to create virtual environments, detect GPU hardware via 'nvidia-smi', and install large dependencies like PyTorch. While these capabilities are plausibly required for the stated purpose of audio processing and transcription, the use of automated shell execution and package management represents a significant attack surface and meets the threshold for a suspicious classification despite no clear evidence of intentional malice.
Capability Assessment
Purpose & Capability
The code's functionality (finding wav/txt, optional Whisper transcription, producing metadata.list) matches the description. However, additional capabilities — heavy dependency management, automatic venv creation, CUDA detection and installing GPU/CPU PyTorch wheels and other audio libraries — are broader than the narrow purpose of producing a metadata.list and should be justified.
Instruction Scope
SKILL.md instructs running the generator script, which is reasonable, but the actual code triggers package installation and setuptools downgrades at module-import time (ensure_package.fix_setuptools() and ensure_package.pip calls occur during import). That means simply importing or running the script can modify the host Python environment, examine system tools (nvidia-smi), create directories (models, logs, venv) and restart the process — scope creep beyond just reading audio/text and writing metadata.
Install Mechanism
There is no formal install spec; instead the code calls pip at runtime to install packages (openai-whisper, torch, torchaudio, audio-separator, librosa, etc.) and even forces setuptools to a legacy version. These are network downloads from PyPI / PyTorch indexes and will write to disk. The runtime install operations run automatically and are not constrained to an explicitly created venv (see ordering concern below).
Credentials
The skill declares no required env vars, but it reads/sets RUNNING_IN_VENV and probes system paths (nvidia-smi). More importantly, top-level imports call functions that alter the Python environment (downgrading setuptools, pip installing packages) which is disproportionate to generating a metadata file and unexpected given the SKILL.md promise to use a models/ directory and venv. No secret exfiltration code is present, but the global package modifications are intrusive.
Persistence & Privilege
always:false and no cross-skill config changes — the skill does not request permanent platform elevation. However it creates a venv and writes logs/models/metadata into the skill root (ProjectPaths.MODEL_DIR, LOG_DIR, VENV_DIR). Combined with automatic package installs and a restart into the venv, this grants sustained filesystem and environment presence on the host which the user should be aware of.
How to Use
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install melo-tts-metadata-creator - After installation, invoke the skill by name or use
/melo-tts-metadata-creator - Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.6
No changes detected in this version.
- Version 1.0.6 contains no file or documentation changes from the previous release.
v1.0.5
Version 1.0.5 of melo-tts-metadata-creator
- No code or documentation file changes detected in this update.
- Functionality, parameters, and documentation remain identical to the previous version.
v1.0.4
- No code or documentation changes in this release.
- Version number updated to 1.0.4.
v1.0.3
Version 1.0.3 Changelog
- Clarified and expanded the skill description to highlight usage scenarios, trigger keywords, and restrictions to MeloTTS-related metadata generation only.
- Added a list of example user prompts for skill invocation.
- Included clearer parameter extraction guidelines to help interpret user requests.
- Improved metadata with user-invocable flag.
- No code or logic changes; updates are documentation only.
v1.0.2
- Updated command-line usage to use scripts/generate_metadata_list.py instead of scripts/melo_metadata_gen.py
- No other functional changes detected
v1.0.1
No visible changes in this version.
- No file changes detected between versions 1.0.0 and 1.0.1.
- Behavior, features, and documentation remain unchanged.
v1.0.0
- Initial release of melo-tts-metadata-creator.
- Generates metadata.list for MeloTTS training/fine-tuning, supporting official latest standards.
- Supports .wav files with corresponding .txt transcripts, or auto-transcription with Whisper if .txt is missing.
- Handles both single-speaker and multi-speaker scenarios; recognizes speakers from first-level subdirectories or via --speaker argument.
- Automatically manages Whisper model downloads and error handling for failed downloads or transcriptions.
- Outputs metadata.list in UTF-8 (no BOM), supporting separated storage of audio and transcript files.
Metadata
Frequently Asked Questions
What is melo-tts-metadata-creator?
当用户需要为 **MeloTTS** 训练或微调生成 metadata.list 文件时自动触发。 专门处理 .wav 音频文件和对应的 .txt 转录文本,自动生成符合 MeloTTS 官方最新标准的 metadata.list(格式:音频路径|speaker|语言|文本)。 支持单音色和多音色模式: - wa... It is an AI Agent Skill for Claude Code / OpenClaw, with 249 downloads so far.
How do I install melo-tts-metadata-creator?
Run "/install melo-tts-metadata-creator" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is melo-tts-metadata-creator free?
Yes, melo-tts-metadata-creator is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does melo-tts-metadata-creator support?
melo-tts-metadata-creator is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created melo-tts-metadata-creator?
It is built and maintained by 顶尖王牌程序员 (@wangminrui2022); the current version is v1.0.6.
More Skills