← 返回 Skills 市场

melo-tts-metadata-creator

Name: melo-tts-metadata-creator
Author: wangminrui2022

作者顶尖王牌程序员 · GitHub ↗ · v1.0.6 · MIT-0

cross-platform ⚠ suspicious

249

总下载

当前安装

版本数

在 OpenClaw 中安装

/install melo-tts-metadata-creator

功能描述

当用户需要为 **MeloTTS** 训练或微调生成 metadata.list 文件时自动触发。专门处理 .wav 音频文件和对应的 .txt 转录文本，自动生成符合 MeloTTS 官方最新标准的 metadata.list（格式：音频路径|speaker|语言|文本）。支持单音色和多音色模式： - wa...

安全使用建议

This skill's primary feature (building MeloTTS metadata.list, optional Whisper transcription) appears legitimate, but it will automatically modify your Python environment at import/runtime: it downgrades setuptools, issues pip installs, detects GPUs, creates a venv and may restart the script inside it. These actions can affect your system Python or install large packages (torch, whisper). Before installing or running: 1) review and, if necessary, modify the code so package installs occur only after a controlled venv is created (move ensure_package.pip calls after venv setup); 2) run the skill inside an isolated environment (container or dedicated virtual machine) or a user-created venv you trust; 3) audit network access and disk writes (models/, logs/, venv/) and ensure you accept large downloads; 4) if you only need metadata generation and already have transcriptions, consider running a trimmed-down version that skips automatic package installs/Whisper to avoid environment changes.

功能分析

Type: OpenClaw Skill Name: melo-tts-metadata-creator Version: 1.0.6 The skill is designed to generate metadata for MeloTTS but contains high-risk environment management behaviors. Specifically, 'ensure_package.py' automatically executes 'pip install --force-reinstall' for setuptools upon being imported, and 'env_manager.py' uses 'subprocess' to create virtual environments, detect GPU hardware via 'nvidia-smi', and install large dependencies like PyTorch. While these capabilities are plausibly required for the stated purpose of audio processing and transcription, the use of automated shell execution and package management represents a significant attack surface and meets the threshold for a suspicious classification despite no clear evidence of intentional malice.

能力评估

ℹ Purpose & Capability

The code's functionality (finding wav/txt, optional Whisper transcription, producing metadata.list) matches the description. However, additional capabilities — heavy dependency management, automatic venv creation, CUDA detection and installing GPU/CPU PyTorch wheels and other audio libraries — are broader than the narrow purpose of producing a metadata.list and should be justified.

⚠ Instruction Scope

SKILL.md instructs running the generator script, which is reasonable, but the actual code triggers package installation and setuptools downgrades at module-import time (ensure_package.fix_setuptools() and ensure_package.pip calls occur during import). That means simply importing or running the script can modify the host Python environment, examine system tools (nvidia-smi), create directories (models, logs, venv) and restart the process — scope creep beyond just reading audio/text and writing metadata.

⚠ Install Mechanism

There is no formal install spec; instead the code calls pip at runtime to install packages (openai-whisper, torch, torchaudio, audio-separator, librosa, etc.) and even forces setuptools to a legacy version. These are network downloads from PyPI / PyTorch indexes and will write to disk. The runtime install operations run automatically and are not constrained to an explicitly created venv (see ordering concern below).

⚠ Credentials

The skill declares no required env vars, but it reads/sets RUNNING_IN_VENV and probes system paths (nvidia-smi). More importantly, top-level imports call functions that alter the Python environment (downgrading setuptools, pip installing packages) which is disproportionate to generating a metadata file and unexpected given the SKILL.md promise to use a models/ directory and venv. No secret exfiltration code is present, but the global package modifications are intrusive.

ℹ Persistence & Privilege

always:false and no cross-skill config changes — the skill does not request permanent platform elevation. However it creates a venv and writes logs/models/metadata into the skill root (ProjectPaths.MODEL_DIR, LOG_DIR, VENV_DIR). Combined with automatic package installs and a restart into the venv, this grants sustained filesystem and environment presence on the host which the user should be aware of.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install melo-tts-metadata-creator
安装完成后，直接呼叫该 Skill 的名称或使用 /melo-tts-metadata-creator 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.6

No changes detected in this version. - Version 1.0.6 contains no file or documentation changes from the previous release.

v1.0.5

Version 1.0.5 of melo-tts-metadata-creator - No code or documentation file changes detected in this update. - Functionality, parameters, and documentation remain identical to the previous version.

v1.0.4

- No code or documentation changes in this release. - Version number updated to 1.0.4.

v1.0.3

Version 1.0.3 Changelog - Clarified and expanded the skill description to highlight usage scenarios, trigger keywords, and restrictions to MeloTTS-related metadata generation only. - Added a list of example user prompts for skill invocation. - Included clearer parameter extraction guidelines to help interpret user requests. - Improved metadata with user-invocable flag. - No code or logic changes; updates are documentation only.

v1.0.2

- Updated command-line usage to use scripts/generate_metadata_list.py instead of scripts/melo_metadata_gen.py - No other functional changes detected

v1.0.1

No visible changes in this version. - No file changes detected between versions 1.0.0 and 1.0.1. - Behavior, features, and documentation remain unchanged.

v1.0.0

- Initial release of melo-tts-metadata-creator. - Generates metadata.list for MeloTTS training/fine-tuning, supporting official latest standards. - Supports .wav files with corresponding .txt transcripts, or auto-transcription with Whisper if .txt is missing. - Handles both single-speaker and multi-speaker scenarios; recognizes speakers from first-level subdirectories or via --speaker argument. - Automatically manages Whisper model downloads and error handling for failed downloads or transcriptions. - Outputs metadata.list in UTF-8 (no BOM), supporting separated storage of audio and transcript files.

元数据

Slug melo-tts-metadata-creator

版本 1.0.6

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 7

常见问题

melo-tts-metadata-creator 是什么？

当用户需要为 **MeloTTS** 训练或微调生成 metadata.list 文件时自动触发。专门处理 .wav 音频文件和对应的 .txt 转录文本，自动生成符合 MeloTTS 官方最新标准的 metadata.list（格式：音频路径|speaker|语言|文本）。支持单音色和多音色模式： - wa... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 249 次。

如何安装 melo-tts-metadata-creator？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install melo-tts-metadata-creator」即可一键安装，无需额外配置。

melo-tts-metadata-creator 是免费的吗？

是的，melo-tts-metadata-creator 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

melo-tts-metadata-creator 支持哪些平台？

melo-tts-metadata-creator 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 melo-tts-metadata-creator？

由顶尖王牌程序员（@wangminrui2022）开发并维护，当前版本 v1.0.6。