功能描述

Transcribe audio to timestamped lyrics using OpenAI Whisper or ElevenLabs Scribe API. Outputs LRC, SRT, or JSON with word-level timestamps. Use when users want to transcribe songs, generate LRC files, or extract lyrics with timestamps from audio.

使用说明 (SKILL.md)

Lyrics Transcription Skill

Name: acestep-lyrics-transcription
Author: dumoedss

Transcribe audio files to timestamped lyrics (LRC/SRT/JSON) via OpenAI Whisper or ElevenLabs Scribe API.

API Key Setup Guide

Before transcribing, you MUST check whether the user's API key is configured. Run the following command to check:

cd "{project_root}/{.claude or .codex}/skills/acestep-lyrics-transcription/" && bash ./scripts/acestep-lyrics-transcription.sh config --check-key

This command only reports whether the active provider's API key is set or empty — it does NOT print the actual key value. NEVER read or display the user's API key content. Do not use config --get on key fields or read config.json directly. The config --list command is safe — it automatically masks API keys as *** in output.

If the command reports the key is empty, you MUST stop and guide the user to configure it before proceeding. Do NOT attempt transcription without a valid key — it will fail.

Use AskUserQuestion to ask the user to provide their API key, with the following options and guidance:

Tell the user which provider is currently active (openai or elevenlabs) and that its API key is not configured. Explain that transcription cannot proceed without it.
Provide clear instructions on where to obtain a key:
- OpenAI: Get an API key at https://platform.openai.com/api-keys — requires an OpenAI account with billing enabled. The Whisper API costs ~$0.006/min.
- ElevenLabs: Get an API key at https://elevenlabs.io/app/settings/api-keys — requires an ElevenLabs account. Free tier includes limited credits.
Also offer the option to switch to the other provider if they already have a key for it.

Once the user provides the key, configure it using:

cd "{project_root}/{.claude or .codex}/skills/acestep-lyrics-transcription/" && bash ./scripts/acestep-lyrics-transcription.sh config --set \x3Cprovider>.api_key \x3CKEY>

If the user wants to switch providers, also run:

cd "{project_root}/{.claude or .codex}/skills/acestep-lyrics-transcription/" && bash ./scripts/acestep-lyrics-transcription.sh config --set provider \x3Cprovider_name>

After configuring, re-run config --check-key to verify the key is set before proceeding.

If the API key is already configured, proceed directly to transcription without asking.

Quick Start

# 1. cd to this skill's directory
cd {project_root}/{.claude or .codex}/skills/acestep-lyrics-transcription/

# 2. Configure API key (choose one)
./scripts/acestep-lyrics-transcription.sh config --set openai.api_key sk-...
# or
./scripts/acestep-lyrics-transcription.sh config --set elevenlabs.api_key ...
./scripts/acestep-lyrics-transcription.sh config --set provider elevenlabs

# 3. Transcribe
./scripts/acestep-lyrics-transcription.sh transcribe --audio /path/to/song.mp3 --language zh

# 4. Output saved to: {project_root}/acestep_output/\x3Cfilename>.lrc

Prerequisites

curl, jq, python3 (or python)
An API key for OpenAI or ElevenLabs

Script Usage

./scripts/acestep-lyrics-transcription.sh transcribe --audio \x3Cfile> [options]

Options:
  -a, --audio      Audio file path (required)
  -l, --language   Language code (zh, en, ja, etc.)
  -f, --format     Output format: lrc, srt, json (default: lrc)
  -p, --provider   API provider: openai, elevenlabs (overrides config)
  -o, --output     Output file path (default: acestep_output/\x3Cfilename>.lrc)

Post-Transcription Lyrics Correction (MANDATORY)

CRITICAL: After transcription, you MUST manually correct the LRC file before using it for MV rendering. Transcription models frequently produce errors on sung lyrics:

Proper nouns: "ACE-Step" → "AC step", "Spotify" → "spot a fly"
Similar-sounding words: "arrives" → "eyes", "open source" → "open sores"
Merged/split words: "lighting up" → "lightin' nup"

Correction Workflow

Read the transcribed LRC file using the Read tool
Read the original lyrics from the ACE-Step output JSON file
Use original lyrics as a whole reference: Do NOT attempt line-by-line alignment — transcription often splits, merges, or reorders lines differently from the original. Instead, read the original lyrics in full to understand the correct wording, then scan each LRC line and fix any misrecognized words based on your knowledge of what the original lyrics say.
Fix transcription errors: Replace misrecognized words with the correct original words, keeping the timestamps intact
Write the corrected LRC back using the Write tool

What to Correct

Replace misrecognized words with their correct original versions
Keep all [MM:SS.cc] timestamps exactly as-is (timestamps from transcription are accurate)
Do NOT add structure tags like [Verse] or [Chorus] — the LRC should only have timestamped text lines

Example

Transcribed (wrong):

[00:46.96]AC step alive,
[00:50.80]one point five eyes.

Original lyrics reference:

ACE-Step alive
One point five arrives

Corrected (right):

[00:46.96]ACE-Step alive,
[00:50.80]One point five arrives.

Configuration

Config file: scripts/config.json

# Switch provider
./scripts/acestep-lyrics-transcription.sh config --set provider openai
./scripts/acestep-lyrics-transcription.sh config --set provider elevenlabs

# Set API keys
./scripts/acestep-lyrics-transcription.sh config --set openai.api_key sk-...
./scripts/acestep-lyrics-transcription.sh config --set elevenlabs.api_key ...

# View config
./scripts/acestep-lyrics-transcription.sh config --list

Option	Default	Description
`provider`	`openai`	Active provider: `openai` or `elevenlabs`
`output_format`	`lrc`	Default output: `lrc`, `srt`, or `json`
`openai.api_key`	`""`	OpenAI API key
`openai.api_url`	`https://api.openai.com/v1`	OpenAI API base URL
`openai.model`	`whisper-1`	OpenAI model (whisper-1 for word timestamps)
`elevenlabs.api_key`	`""`	ElevenLabs API key
`elevenlabs.api_url`	`https://api.elevenlabs.io/v1`	ElevenLabs API base URL
`elevenlabs.model`	`scribe_v2`	ElevenLabs model

Provider Notes

Provider	Model	Word Timestamps	Pricing
OpenAI	whisper-1	Yes (segment + word)	$0.006/min
ElevenLabs	scribe_v2	Yes (word-level)	Varies by plan

OpenAI whisper-1 is the only OpenAI model supporting word-level timestamps
ElevenLabs scribe_v2 returns word-level timestamps with type filtering
Both support multilingual transcription

Examples

# Basic transcription (uses config defaults)
./scripts/acestep-lyrics-transcription.sh transcribe --audio song.mp3

# Chinese song to LRC
./scripts/acestep-lyrics-transcription.sh transcribe --audio song.mp3 --language zh

# Use ElevenLabs, output SRT
./scripts/acestep-lyrics-transcription.sh transcribe --audio song.mp3 --provider elevenlabs --format srt

# Custom output path
./scripts/acestep-lyrics-transcription.sh transcribe --audio song.mp3 --output ./my_lyrics.lrc

安全使用建议

This skill appears to do what it says: it uploads user-supplied audio to OpenAI or ElevenLabs to produce timestamped lyrics and saves outputs locally. Before installing, consider: (1) Privacy — audio is sent to a third-party service (OpenAI/ElevenLabs); avoid uploading sensitive audio or confirm provider policy. (2) API keys — you will store provider keys in scripts/config.json (plaintext) so protect that file and your environment. (3) Billing — provider usage may incur costs. (4) The SKILL.md warns not to display API keys; the script will, however, read the key to contact the provider. If you prefer not to store API keys on disk, use a secure secret mechanism or ephemeral keys. If you need deeper review, provide the full remainder of the script (the ElevenLabs section was truncated) so its network calls and file operations can be inspected.

功能分析

Type: OpenClaw Skill Name: acestep-lyrics-transcription Version: 1.0.1 The skill is classified as suspicious due to multiple command injection vulnerabilities in `scripts/acestep-lyrics-transcription.sh`. Specifically, the `set_config` function is vulnerable to `jq` injection, allowing an attacker to manipulate the `config.json` file via crafted input. Additionally, the `curl` commands for API calls and the embedded Python scripts for format conversion directly interpolate user-controlled arguments (`--audio`, `--language`) and file paths without robust shell or Python escaping, creating potential for arbitrary command execution. While the `SKILL.md` explicitly instructs the agent *not* to read or display API keys, these vulnerabilities could be exploited by a malicious user to achieve unauthorized actions, despite the skill's stated purpose being benign.

能力评估

✓ Purpose & Capability

Name/description (lyrics transcription using OpenAI or ElevenLabs) match the included script and SKILL.md: the bash script calls provider APIs to transcribe audio and converts timestamps to LRC/SRT/JSON. Required tools (curl, jq, python) are reasonable for this task.

✓ Instruction Scope

SKILL.md stays on-topic: it instructs checking/setting a provider API key, running the transcribe command, and doing a manual LRC correction workflow. It does not instruct reading unrelated system files or exfiltrating data to unexpected endpoints; network calls are limited to configured provider API URLs.

✓ Install Mechanism

No install spec is provided (instruction-only with bundled script). Nothing in the package attempts to download or install external code during install time.

ℹ Credentials

No platform environment variables are required; the script uses a local config.json to store provider API keys. Requesting OpenAI/ElevenLabs API keys is proportionate to the functionality. Note: keys are stored plaintext in scripts/config.json (typical but sensitive) and the script must read the key to send it in Authorization headers — SKILL.md explicitly warns not to print keys.

✓ Persistence & Privilege

always:false and there is no sign the skill attempts to modify other skills or system-wide agent settings. It is user-invocable and does not request persistent elevated privileges.

版本历史

v1.0.1

- config.example default provider change to elevenlabs

v1.0.0

Initial release of acestep-lyrics-transcription. - Transcribe audio into timestamped lyrics using OpenAI Whisper or ElevenLabs Scribe API. - Supports output formats: LRC, SRT, or JSON with word-level timestamps. - Automatic API key check and user guidance for configuration. - Includes post-transcription workflow for manual correction of generated LRC files using original lyrics as a reference. - CLI tool with options for provider selection, language, format, and output path. - Documentation includes quick start, configuration, and error correction procedures.

元数据

Slug acestep-lyrics-transcription

版本 1.0.1

许可证 —

累计安装 1

当前安装数 1

历史版本数 2

常见问题

acestep-lyrics-transcription 是什么？

Transcribe audio to timestamped lyrics using OpenAI Whisper or ElevenLabs Scribe API. Outputs LRC, SRT, or JSON with word-level timestamps. Use when users want to transcribe songs, generate LRC files, or extract lyrics with timestamps from audio. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 880 次。

如何安装 acestep-lyrics-transcription？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install acestep-lyrics-transcription」即可一键安装，无需额外配置。

acestep-lyrics-transcription 是免费的吗？

是的，acestep-lyrics-transcription 完全免费（开源免费），可自由下载、安装和使用。

acestep-lyrics-transcription 支持哪些平台？

acestep-lyrics-transcription 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 acestep-lyrics-transcription？

由 Sayo（@dumoedss）开发并维护，当前版本 v1.0.1。

acestep-lyrics-transcription