← Back to Skills Marketplace
Audio Note Taker
by
utopiabenben
· GitHub ↗
· v1.0.1
· MIT-0
301
Downloads
0
Stars
0
Active Installs
2
Versions
Install in OpenClaw
/install audio-note-taker
Description
语音笔记助手:录音自动转文字并整理成结构化笔记,支持说话人识别,自动总结要点和行动项
README (SKILL.md)
audio-note-taker - 语音笔记助手
智能语音笔记助手——自动将录音转成结构化文字笔记。
适用场景
- 🎙️ 会议记录:自动转录会议内容,提炼行动项
- 🎓 讲座笔记:课堂/讲座录音转文字,自动整理要点
- 📰 采访整理:语音采访转文字稿,快速生成报道素材
- 💼 工作复盘:项目复盘录音 → 结构化记录
- 📝 日常笔记:快速语音记录 → 文字存档
核心功能
- ✅ 高精度转写:基于 OpenAI Whisper API,支持多种语言
- ✅ 结构化输出:自动划分段落,识别关键信息
- ✅ 智能摘要:提取核心观点、决策、待办事项
- ✅ 说话人区分:可选说话人识别和标记
- ✅ Markdown 格式:输出易读、易编辑的笔记
- ✅ 多种输入:支持音频文件或直接录音
快速开始
基础转写
audio-note-taker /path/to/recording.m4a
# 输出:recording_notes.md
指定主题和格式
audio-note-taker /path/to/meeting.mp3 \
--title "2026-Q1 产品规划会" \
--language zh \
--output meeting_notes.md
启用说话人识别
audio-note-taker /path/to/interview.wav \
--detect-speakers true \
--output interview_transcript.md
生成深度摘要(需配置 LLM)
audio-note-taker /path/to/lecture.mp3 \
--summarize true \
--extract-action-items true \
--output lecture_summary.md
参数说明
| 参数 | 类型 | 默认 | 说明 |
|---|---|---|---|
input |
路径 | 必填 | 音频文件路径(支持 mp3, m4a, wav, ogg 等) |
--title |
字符串 | 自动生成 | 笔记标题 |
--language |
代码 | auto | 音频语言(en, zh, ja, auto 等) |
--output |
路径 | {input}_notes.md |
输出文件路径 |
--detect-speakers |
布尔 | false | 是否识别不同说话人 |
--summarize |
布尔 | false | 生成摘要(需 OPENAI_API_KEY) |
--extract-action-items |
布尔 | false | 提取行动项 |
--model |
字符串 | whisper-1 | Whisper 模型(whisper-1) |
--format |
字符串 | markdown | 输出格式(markdown, txt, json) |
环境变量
| 变量名 | 说明 | 必填 |
|---|---|---|
OPENAI_API_KEY |
OpenAI API 密钥 | ✅ |
OPENAI_BASE_URL |
自定义 API 地址(可选) | ❌ |
NOTE_TAKER_MODEL |
摘要模型(默认 gpt-4-turbo) | ❌ |
输出内容示例
# 会议记录:2026-Q1 产品规划会
**时间**:2026-03-15 14:00-15:30
**地点**:线上
**参会人**:张三、李四、王五
---
## 📝 会议纪要
### 讨论要点
1. Q1 产品上线延期原因分析
2. Q2 核心功能优先级排序
3. 资源分配调整
### ✅ 决议事项
- [x] 确定 Q2 三大核心功能
- [x] 批准额外 2 名开发人力
- [x] 下周三前发布详细排期
### 📋 待办事项
| 负责人 | 任务 | 截止时间 |
|--------|------|---------|
| 张三 | 完成 PRD 文档 | 2026-03-18 |
| 李四 | 技术方案评审 | 2026-03-20 |
| 王五 | 资源配置协调 | 2026-03-17 |
---
## 📄 完整转录(可折叠)
\x3Cdetails>
\x3Csummary>展开查看完整对话\x3C/summary>
[14:00] 张三:大家好,我们今天...
[14:05] 李四:关于延期,我觉得...
...
\x3C/details>
与其他技能集成
- social-publisher:将会议纪要直接整理成公众号/小红书文章
- summarize:对长录音先提取关键信息,再生成摘要
- wechat-formatter:将会议纪要快速格式化为公众号可发内容
技术细节
- 使用 OpenAI Whisper API 进行语音转文字
- 可选集成 GPT 模型进行摘要和行动项提取
- 支持中英文混合识别
- 音频预处理:自动降噪、格式转换(通过 ffmpeg)
- 输出 UTF-8 编码,支持中文排版
安装依赖
# 系统依赖
apt install -y ffmpeg
# Python 依赖(自动安装)
pip install openai>=1.0.0
许可证
MIT
Usage Guidance
This skill appears to do what it says for basic transcription: it sends audio to OpenAI's Whisper API using your OPENAI_API_KEY and writes a markdown notes file. Before installing or using it:
- Review the code yourself (source/audio_note_taker.py) and confirm you are comfortable with sending audio to OpenAI — transcripts are transmitted to OpenAI servers and may contain sensitive content. Use non-sensitive test audio first.
- The advertised features (speaker diarization, GPT-based summarization, integration with other skills) are mostly documented but not implemented — do not assume those are available in this version.
- The SKILL.md mentions OPENAI_BASE_URL and NOTE_TAKER_MODEL but the script does not use them; if you need a custom API host or alternate LLM, you may need to modify the code.
- The installer will pip install the 'openai' package into the user environment; check and audit that dependency if you run in a sensitive environment.
- Consider creating a limited-cost/test API key, monitor API usage and billing, and avoid using production-sensitive audio until you verify the behavior.
If you want a higher confidence verdict, provide (or inspect) a version that implements speaker-diarization and summarization, or a runlog showing network endpoints used at runtime (to confirm no unexpected exfiltration).
Capability Analysis
Type: OpenClaw Skill
Name: audio-note-taker
Version: 1.0.1
The audio-note-taker skill is a legitimate tool designed to transcribe audio files using the OpenAI Whisper API. The code in source/audio_note_taker.py and the install.sh script perform standard operations consistent with the stated purpose, such as checking for dependencies (openai, ffmpeg), verifying environment variables (OPENAI_API_KEY), and processing local audio files to generate Markdown notes.
Capability Assessment
Purpose & Capability
The name/description (audio → structured notes, Whisper + optional GPT summarization, speaker detection) aligns with the code's main behavior: the Python script calls OpenAI's audio transcription API and writes a notes file. However, advertised features such as automatic summarization and speaker-diarization are not implemented (generate_notes currently only inserts a placeholder), and optional env vars documented (OPENAI_BASE_URL, NOTE_TAKER_MODEL) are declared but not used by the code. Also registry version (1.0.1) vs skill.json version (1.0.0) is inconsistent.
Instruction Scope
Runtime instructions and CLI usage in SKILL.md match the script's CLI. The SKILL.md and install.sh ask for ffmpeg for preprocessing and pip install openai; the code accepts an audio file, uses the OpenAI client, and writes an output file. The instructions do not ask the agent to read unrelated files or secrets beyond OPENAI_API_KEY.
Install Mechanism
No complex install spec in registry; included install.sh runs 'pip3 install --user openai>=1.0.0' and checks for ffmpeg. There are no external downloads from untrusted URLs or archive extraction operations. Installation actions are standard and limited.
Credentials
The skill requires only OPENAI_API_KEY which is appropriate for calling OpenAI APIs. SKILL.md documents optional OPENAI_BASE_URL and NOTE_TAKER_MODEL but the Python code does not read these env vars. The script correctly checks for OPENAI_API_KEY before proceeding. No unrelated credentials or config paths are requested.
Persistence & Privilege
The skill does not request always:true, does not modify other skills or system-wide settings, and only writes output notes to a user-specified file. It does not persist credentials or install background services.
How to Use
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install audio-note-taker - After installation, invoke the skill by name or use
/audio-note-taker - Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.1
audio-note-taker 1.0.1
- 重新优化和精简了功能说明与文档结构,突出会议、讲座、采访等场景应用
- 增强了说话人识别、摘要提取和行动项整理等高级功能参数说明
- 明确依赖需要 ffmpeg 和 openai>=1.0.0,完善环境变量说明
- 优化输出 Markdown 格式示例,提升易用性和结果可读性
- 更新描述和元数据,更准确反映实际功能及适用场景
v1.0.0
Initial release of audio-note-taker – 语音笔记助手
- Automatic transcription and structured note generation from audio files
- Supports batch processing, preview mode, and undo/backup functionality
- Extracts keywords, summarizes content, and identifies action items
- Outputs Markdown notes, supporting MP3, WAV, M4A, FLAC, and OGG formats
- Configurable via command-line options and optional JSON config file
Metadata
Frequently Asked Questions
What is Audio Note Taker?
语音笔记助手:录音自动转文字并整理成结构化笔记,支持说话人识别,自动总结要点和行动项. It is an AI Agent Skill for Claude Code / OpenClaw, with 301 downloads so far.
How do I install Audio Note Taker?
Run "/install audio-note-taker" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Audio Note Taker free?
Yes, Audio Note Taker is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Audio Note Taker support?
Audio Note Taker is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Audio Note Taker?
It is built and maintained by utopiabenben (@utopiabenben); the current version is v1.0.1.
More Skills