功能描述

音视频处理工具集。支持以下操作： - 从视频文件中提取音频并保存为 WAV 格式 - 对音频文件按指定开始时间和持续时长进行截取 - 播放指定的视频或音频文件（调用系统默认播放器） - 语音识别转文字（Whisper），输出 JSON 格式（含时间戳、置信度） - 提取音频/视频元数据（码率、采样率、时长、编码等...

使用说明 (SKILL.md)

\r \r

Audio Tools Skill\r

Name: audio-tools
Author: risehorizon

\r 音视频处理工具集，支持三项核心功能：提取音频、截取音频片段、播放媒体文件。\r \r

工作目录\r

\r 所有输入文件默认从 D:\workbuddy 读取，输出文件也保存到 D:\workbuddy（除非用户指定其他路径）。\r \r

环境要求\r

\r

Python 版本\r

要求: Python 3.8 或更高版本\r
检查: 运行 python --version 确认\r \r

依赖检查\r

执行前会自动检查以下环境，缺失时给出安装指引：\r

# 检查环境状态\r
python D:\workbuddy\skills\audio-tools\scripts\audio_tools.py --check\r
```\r
\r
检查内容包括：\r
- Python 版本\r
- ffmpeg / ffprobe / ffplay 可用性\r
- moviepy 安装状态\r
- openai-whisper 安装状态\r
\r
## 依赖说明\r
\r
本 Skill 优先使用 **ffmpeg**，查找优先级如下：\r
1. **Skill 本地 bin 目录**（`D:\workbuddy\skills\audio-tools\bin\ffmpeg.exe`）\r
2. **系统 PATH** 中的 ffmpeg\r
3. 如均未找到，自动降级使用 **moviepy**（Python 库，首次使用时自动安装）\r
\r
### 缺失依赖时的提示\r
如果 ffmpeg 和 moviepy 都未找到，脚本会输出：\r
```\r
[WARN] No media processing tool found!\r
\r
[SOLUTION] Choose one of the following:\r
\r
  Option 1 - Bundled ffmpeg (Recommended):\r
           Place ffmpeg.exe in: D:\workbuddy\skills\audio-tools\bin\\r
\r
  Option 2 - System ffmpeg:\r
           Windows: winget install ffmpeg\r
\r
  Option 3 - MoviePy (Python fallback):\r
           pip install moviepy\r
```\r
\r
### 方式一：Bundled ffmpeg（推荐）\r
将 ffmpeg 放入 Skill 本地目录，实现零依赖部署：\r
```\r
D:\workbuddy\skills\audio-tools\\r
├── bin\\r
│   ├── ffmpeg.exe      ← 放入这里\r
│   └── ffprobe.exe     ← 可选，查时长用\r
├── scripts\\r
│   └── audio_tools.py\r
└── SKILL.md\r
```\r
\r
### 方式二：系统 ffmpeg\r
```\r
# Windows - 使用 winget\r
winget install ffmpeg\r
\r
# 或从 https://ffmpeg.org/download.html 下载，解压后将 bin 目录加入系统 PATH\r
```\r
\r
### 方式三：moviepy（备选）\r
```bash\r
pip install moviepy\r
```\r
\r
---\r
\r
## 功能说明 & SOP\r
\r
### 功能 1：提取视频音频\r
\r
**用户意图识别关键词**：提取音频、视频转音频、从视频提取、wav 提取\r
\r
**执行流程**：\r
1. 确认输入视频文件路径（相对路径自动拼接工作目录 `D:\workbuddy`）\r
2. 确认输出 WAV 文件路径（默认与输入同名，后缀改为 `.wav`，保存到 `D:\workbuddy`）\r
3. 调用脚本执行提取\r
4. 输出结果文件路径\r
\r
**调用脚本**：\r
```bash\r
python D:\workbuddy\skills\audio-tools\scripts\audio_tools.py extract \\r
  --input "D:\workbuddy\video.mp4" \\r
  --output "D:\workbuddy\video.wav"\r
```\r
\r
**参数说明**：\r
| 参数 | 必填 | 说明 |\r
|------|------|------|\r
| `--input` | ✅ | 输入视频文件路径（支持 mp4/mkv/avi/mov/flv 等） |\r
| `--output` | ❌ | 输出 WAV 文件路径（默认：同目录同名 .wav） |\r
\r
---\r
\r
### 功能 2：截取音频片段\r
\r
**用户意图识别关键词**：截取音频、剪切音频、音频截取、音频剪辑、clip audio\r
\r
**执行流程**：\r
1. 确认输入音频文件路径\r
2. 确认开始时间（`--start`，格式：秒数 或 `HH:MM:SS`）\r
3. 确认截取时长（`--duration`，单位：秒）\r
4. 确认输出路径（默认：原文件名加 `_clip` 后缀）\r
5. 调用脚本执行截取\r
6. 输出结果文件路径\r
\r
**调用脚本**：\r
```bash\r
python D:\workbuddy\skills\audio-tools\scripts\audio_tools.py clip \\r
  --input "D:\workbuddy\audio.wav" \\r
  --start 30 \\r
  --duration 60 \\r
  --output "D:\workbuddy\audio_clip.wav"\r
```\r
\r
**参数说明**：\r
| 参数 | 必填 | 说明 |\r
|------|------|------|\r
| `--input` | ✅ | 输入音频文件路径（支持 wav/mp3/flac/aac 等） |\r
| `--start` | ✅ | 开始时间，支持秒数（如 `30`）或时间格式（如 `00:00:30`） |\r
| `--duration` | ✅ | 截取时长（秒） |\r
| `--output` | ❌ | 输出文件路径（默认：原文件名加 `_clip` 后缀） |\r
\r
---\r
\r
### 功能 3：播放媒体文件\r
\r
**用户意图识别关键词**：播放视频、播放音频、play video、play audio、打开播放\r
\r
**执行流程**：\r
1. 确认媒体文件路径（相对路径自动拼接工作目录）\r
2. 优先使用 **ffplay**（bundled 模式）播放\r
3. 如 ffplay 不可用，回退到系统默认播放器\r
4. 输出播放状态\r
\r
**播放工具优先级**：\r
1. **ffplay**（Skill 本地 bin/ffplay.exe）- 格式支持最全，无系统依赖\r
2. **系统默认播放器** - 用户习惯，界面友好\r
\r
**调用脚本**：\r
```bash\r
python D:\workbuddy\skills\audio-tools\scripts\audio_tools.py play \\r
  --input "D:\workbuddy\video.mp4"\r
```\r
\r
**参数说明**：\r
| 参数 | 必填 | 说明 |\r
|------|------|------|\r
| `--input` | ✅ | 媒体文件路径（视频或音频均可） |\r
\r
---\r
\r
### 功能 4：语音识别转文字（Whisper）\r
\r
**用户意图识别关键词**：语音转文字、音频转录、语音识别、提取文字内容、transcribe、STT\r
\r
**执行流程**：\r
1. 确认输入音频/视频文件路径\r
2. 确认 Whisper 模型大小（默认 `small`，可选 `tiny/base/small/medium/large`）\r
3. 确认语言（可选，默认自动检测）\r
4. 执行转录，生成 JSON 和 TXT 两份输出\r
5. 输出转录结果摘要\r
\r
**调用脚本**：\r
```bash\r
# 基础用法（自动检测语言，使用 small 模型）\r
python D:\workbuddy\skills\audio-tools\scripts\audio_tools.py transcribe \\r
  --input "D:\workbuddy\lecture.wav"\r
\r
# 指定语言和模型\r
python D:\workbuddy\skills\audio-tools\scripts\audio_tools.py transcribe \\r
  --input "D:\workbuddy\lecture.wav" \\r
  --model small \\r
  --language zh\r
```\r
\r
**参数说明**：\r
| 参数 | 必填 | 说明 |\r
|------|------|------|\r
| `--input` | ✅ | 输入音频/视频文件路径 |\r
| `--output` | ❌ | 输出 JSON 路径（默认：同名.json） |\r
| `--model` | ❌ | Whisper 模型：`tiny/base/small/medium/large`（默认：small） |\r
| `--language` | ❌ | 语言代码，如 `zh`/`en`/`ja`（默认：自动检测） |\r
\r
**输出文件**：\r
- `同名.json` - 完整 JSON，包含文字、时间戳、置信度\r
- `同名.txt` - 纯文本，仅文字内容\r
\r
**JSON 结构示例**：\r
```json\r
{\r
  "text": "完整转录文字...",\r
  "language": "zh",\r
  "duration": 120.5,\r
  "segments": [\r
    {\r
      "id": 0,\r
      "start": 0.0,\r
      "end": 5.2,\r
      "text": "第一段文字",\r
      "confidence": -0.1234,\r
      "no_speech_prob": 0.01\r
    }\r
  ]\r
}\r
```\r
\r
---\r
\r
### 功能 5：提取音频/视频元数据\r
\r
**用户意图识别关键词**：查看音频信息、提取元数据、文件信息、码率、采样率、metadata\r
\r
**执行流程**：\r
1. 确认输入文件路径\r
2. 优先使用 **ffprobe** 获取详细元数据\r
3. ffprobe 不可用时，使用 moviepy 获取基础信息\r
4. 输出 JSON 格式元数据（可选保存到文件）\r
\r
**调用脚本**：\r
```bash\r
# 终端输出元数据\r
python D:\workbuddy\skills\audio-tools\scripts\audio_tools.py metadata \\r
  --input "D:\workbuddy\audio.wav"\r
\r
# 保存到 JSON 文件\r
python D:\workbuddy\skills\audio-tools\scripts\audio_tools.py metadata \\r
  --input "D:\workbuddy\video.mp4" \\r
  --output "D:\workbuddy\meta.json"\r
```\r
\r
**参数说明**：\r
| 参数 | 必填 | 说明 |\r
|------|------|------|\r
| `--input` | ✅ | 输入音频/视频文件路径 |\r
| `--output` | ❌ | 输出 JSON 路径（默认：终端输出） |\r
\r
**输出信息包括**：\r
- 文件基础信息（大小、时长、格式）\r
- 音频流信息（编码、采样率、声道数）\r
- 视频流信息（编码、分辨率、帧率）\r
- 完整 ffprobe 原始数据（如可用）\r
\r
---\r
\r
## AI 使用规范\r
\r
1. **路径处理**：用户提供相对路径时，自动补全为 `D:\workbuddy\\x3C文件名>`\r
2. **工具检测**：执行前先检测 ffmpeg 是否可用，不可用则切换 moviepy\r
3. **错误处理**：脚本执行失败时，读取错误信息并告知用户可能的原因和解决方案\r
4. **输出确认**：操作完成后，明确告知用户输出文件的完整路径和文件大小\r
\r
---\r
\r
## 使用示例\r
\r
> 用户：帮我把 D:\workbuddy\lecture.mp4 里的音频提取出来\r
\r
AI 执行：\r
```bash\r
python D:\workbuddy\skills\audio-tools\scripts\audio_tools.py extract --input "D:\workbuddy\lecture.mp4"\r
```\r
输出：`✅ 提取完成：D:\workbuddy\lecture.wav（大小：12.3 MB）`\r
\r
---\r
\r
> 用户：把 lecture.wav 从第 30 秒开始截取 2 分钟\r
\r
AI 执行：\r
```bash\r
python D:\workbuddy\skills\audio-tools\scripts\audio_tools.py clip --input "D:\workbuddy\lecture.wav" --start 30 --duration 120\r
```\r
输出：`✅ 截取完成：D:\workbuddy\lecture_clip.wav（时长：120 秒）`\r
\r
---\r
\r
> 用户：播放一下 lecture.mp4\r
\r
AI 执行：\r
```bash\r
python D:\workbuddy\skills\audio-tools\scripts\audio_tools.py play --input "D:\workbuddy\lecture.mp4"\r
```\r
输出：`✅ 已调用系统默认播放器打开：D:\workbuddy\lecture.mp4`\r

安全使用建议

This skill appears to do what it says: audio/video processing using ffmpeg/moviepy and Whisper. Before installing, consider the following: 1) Bundled ffmpeg: the skill recommends placing ffmpeg.exe in its local bin; that executable will be invoked by the script — only use a ffmpeg binary from a trusted source (official ffmpeg builds). 2) Runtime pip installs: the script auto-installs Python packages (moviepy, and likely whisper) via pip at runtime. That requires network access and will pull code from PyPI; if you need stricter control, preinstall dependencies in a controlled environment. 3) Fixed work directory: it defaults to D:\workbuddy (Windows); ensure you understand where files will be read/written or supply absolute paths to avoid surprises. 4) Sandbox if unsure: if you do not trust the skill owner or the bundled binaries, run it in an isolated VM/container and inspect any bundled executables. 5) This package makes subprocess calls (ffmpeg, pip); those are expected for media tools but mean it can execute local binaries — validate those binaries. If you want extra assurance, review the remaining portions of scripts/audio_tools.py (transcribe/play/metadata implementations) to confirm there are no network callbacks or unexpected remote endpoints.

能力评估

✓ Purpose & Capability

Name/description (audio extraction, clip, play, transcribe, metadata) matches the provided SKILL.md and the included Python script. Required capabilities (ffmpeg, moviepy, whisper) are reasonable for the stated functionality. No unrelated credentials, binaries, or config paths are requested.

ℹ Instruction Scope

Runtime instructions and the script operate on files under a fixed work directory (D:\workbuddy) and explicitly read/write media files there (or absolute paths provided by the user). The SKILL.md and code instruct the agent to run local python scripts, call ffmpeg/ffprobe/ffplay if present, or use moviepy/whisper. That behavior is expected for the stated purpose, but the fixed Windows work_dir and the practice of preferring a bundled ffmpeg in the skill directory deserve attention (see guidance).

ℹ Install Mechanism

There is no formal install spec, but the script performs runtime actions that install dependencies: it will attempt pip installs (moviepy, and likely whisper on first transcribe). This is moderate-risk but proportionate to the functionality. The skill also documents and encourages bundling ffmpeg.exe in its local bin directory — running a local binary is expected for media handling but increases the importance of trusting that binary.

✓ Credentials

The skill requests no environment variables, credentials, or external config paths. It only needs standard local filesystem access to the declared work_dir and optional ability to execute ffmpeg or python pip. The lack of secret/credential requests is appropriate for an offline media tool.

✓ Persistence & Privilege

always is false and model-invocation/autonomy remains default; the skill does not request permanent platform-wide privileges or modify other skills. It writes outputs to its working directory but does not appear to alter agent configs or other skills.

版本历史

v1.0.0

Audio Tools Skill 1.0.0 – 初始版本发布 - 为了方便使用可以将 ffmpeg.exe，ffprobe.exe 放入 bin 目录中 - 支持从视频中提取音频（输出 WAV 格式） - 支持音频文件按起止时间截取片段 - 可播放指定的音频或视频文件（优先本地 ffplay，可回退到系统播放器） - 集成 Whisper 语音识别转文字，输出含时间戳与置信度的 JSON - 支持提取音视频元数据信息（如码率、采样率、时长等，输出为 JSON） - 脚本自动检测并优先使用本地 ffmpeg，缺失时支持 moviepy Fallback

元数据

Slug emar-audio-tools

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

audio-tools 是什么？

音视频处理工具集。支持以下操作： - 从视频文件中提取音频并保存为 WAV 格式 - 对音频文件按指定开始时间和持续时长进行截取 - 播放指定的视频或音频文件（调用系统默认播放器） - 语音识别转文字（Whisper），输出 JSON 格式（含时间戳、置信度） - 提取音频/视频元数据（码率、采样率、时长、编码等... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 84 次。

如何安装 audio-tools？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install emar-audio-tools」即可一键安装，无需额外配置。

audio-tools 是免费的吗？

是的，audio-tools 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

audio-tools 支持哪些平台？

audio-tools 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 audio-tools？

由 riseHorizon（@risehorizon）开发并维护，当前版本 v1.0.0。

audio-tools