← Back to Skills Marketplace

Audio Note Taker

Name: Audio Note Taker
Author: utopiabenben

by utopiabenben · GitHub ↗ · v1.0.1 · MIT-0

cross-platform ⚠ suspicious

301

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install audio-note-taker

Description

语音笔记助手：录音自动转文字并整理成结构化笔记，支持说话人识别，自动总结要点和行动项

README (SKILL.md)

audio-note-taker - 语音笔记助手

智能语音笔记助手——自动将录音转成结构化文字笔记。

适用场景

🎙️ 会议记录：自动转录会议内容，提炼行动项
🎓 讲座笔记：课堂/讲座录音转文字，自动整理要点
📰 采访整理：语音采访转文字稿，快速生成报道素材
💼 工作复盘：项目复盘录音 → 结构化记录
📝 日常笔记：快速语音记录 → 文字存档

核心功能

✅ 高精度转写：基于 OpenAI Whisper API，支持多种语言
✅ 结构化输出：自动划分段落，识别关键信息
✅ 智能摘要：提取核心观点、决策、待办事项
✅ 说话人区分：可选说话人识别和标记
✅ Markdown 格式：输出易读、易编辑的笔记
✅ 多种输入：支持音频文件或直接录音

快速开始

基础转写

audio-note-taker /path/to/recording.m4a
# 输出：recording_notes.md

指定主题和格式

audio-note-taker /path/to/meeting.mp3 \
  --title "2026-Q1 产品规划会" \
  --language zh \
  --output meeting_notes.md

启用说话人识别

audio-note-taker /path/to/interview.wav \
  --detect-speakers true \
  --output interview_transcript.md

生成深度摘要（需配置 LLM）

audio-note-taker /path/to/lecture.mp3 \
  --summarize true \
  --extract-action-items true \
  --output lecture_summary.md

参数说明

参数	类型	默认	说明
`input`	路径	必填	音频文件路径（支持 mp3, m4a, wav, ogg 等）
`--title`	字符串	自动生成	笔记标题
`--language`	代码	auto	音频语言（en, zh, ja, auto 等）
`--output`	路径	`{input}_notes.md`	输出文件路径
`--detect-speakers`	布尔	false	是否识别不同说话人
`--summarize`	布尔	false	生成摘要（需 OPENAI_API_KEY）
`--extract-action-items`	布尔	false	提取行动项
`--model`	字符串	whisper-1	Whisper 模型（whisper-1）
`--format`	字符串	markdown	输出格式（markdown, txt, json）

环境变量

变量名	说明	必填
`OPENAI_API_KEY`	OpenAI API 密钥	✅
`OPENAI_BASE_URL`	自定义 API 地址（可选）	❌
`NOTE_TAKER_MODEL`	摘要模型（默认 gpt-4-turbo）	❌

输出内容示例

# 会议记录：2026-Q1 产品规划会
**时间**：2026-03-15 14:00-15:30  
**地点**：线上  
**参会人**：张三、李四、王五

---

## 📝 会议纪要

### 讨论要点

1. Q1 产品上线延期原因分析
2. Q2 核心功能优先级排序
3. 资源分配调整

### ✅ 决议事项

- [x] 确定 Q2 三大核心功能
- [x] 批准额外 2 名开发人力
- [x] 下周三前发布详细排期

### 📋 待办事项

| 负责人 | 任务 | 截止时间 |
|--------|------|---------|
| 张三 | 完成 PRD 文档 | 2026-03-18 |
| 李四 | 技术方案评审 | 2026-03-20 |
| 王五 | 资源配置协调 | 2026-03-17 |

---

## 📄 完整转录（可折叠）

\x3Cdetails>
\x3Csummary>展开查看完整对话\x3C/summary>

[14:00] 张三：大家好，我们今天...
[14:05] 李四：关于延期，我觉得...
...
\x3C/details>

与其他技能集成

social-publisher：将会议纪要直接整理成公众号/小红书文章
summarize：对长录音先提取关键信息，再生成摘要
wechat-formatter：将会议纪要快速格式化为公众号可发内容

技术细节

使用 OpenAI Whisper API 进行语音转文字
可选集成 GPT 模型进行摘要和行动项提取
支持中英文混合识别
音频预处理：自动降噪、格式转换（通过 ffmpeg）
输出 UTF-8 编码，支持中文排版

安装依赖

# 系统依赖
apt install -y ffmpeg

# Python 依赖（自动安装）
pip install openai>=1.0.0

许可证

MIT

Usage Guidance

This skill appears to do what it says for basic transcription: it sends audio to OpenAI's Whisper API using your OPENAI_API_KEY and writes a markdown notes file. Before installing or using it: - Review the code yourself (source/audio_note_taker.py) and confirm you are comfortable with sending audio to OpenAI — transcripts are transmitted to OpenAI servers and may contain sensitive content. Use non-sensitive test audio first. - The advertised features (speaker diarization, GPT-based summarization, integration with other skills) are mostly documented but not implemented — do not assume those are available in this version. - The SKILL.md mentions OPENAI_BASE_URL and NOTE_TAKER_MODEL but the script does not use them; if you need a custom API host or alternate LLM, you may need to modify the code. - The installer will pip install the 'openai' package into the user environment; check and audit that dependency if you run in a sensitive environment. - Consider creating a limited-cost/test API key, monitor API usage and billing, and avoid using production-sensitive audio until you verify the behavior. If you want a higher confidence verdict, provide (or inspect) a version that implements speaker-diarization and summarization, or a runlog showing network endpoints used at runtime (to confirm no unexpected exfiltration).

Capability Analysis

Type: OpenClaw Skill Name: audio-note-taker Version: 1.0.1 The audio-note-taker skill is a legitimate tool designed to transcribe audio files using the OpenAI Whisper API. The code in source/audio_note_taker.py and the install.sh script perform standard operations consistent with the stated purpose, such as checking for dependencies (openai, ffmpeg), verifying environment variables (OPENAI_API_KEY), and processing local audio files to generate Markdown notes.

Capability Assessment

ℹ Purpose & Capability

The name/description (audio → structured notes, Whisper + optional GPT summarization, speaker detection) aligns with the code's main behavior: the Python script calls OpenAI's audio transcription API and writes a notes file. However, advertised features such as automatic summarization and speaker-diarization are not implemented (generate_notes currently only inserts a placeholder), and optional env vars documented (OPENAI_BASE_URL, NOTE_TAKER_MODEL) are declared but not used by the code. Also registry version (1.0.1) vs skill.json version (1.0.0) is inconsistent.

✓ Instruction Scope

Runtime instructions and CLI usage in SKILL.md match the script's CLI. The SKILL.md and install.sh ask for ffmpeg for preprocessing and pip install openai; the code accepts an audio file, uses the OpenAI client, and writes an output file. The instructions do not ask the agent to read unrelated files or secrets beyond OPENAI_API_KEY.

✓ Install Mechanism

No complex install spec in registry; included install.sh runs 'pip3 install --user openai>=1.0.0' and checks for ffmpeg. There are no external downloads from untrusted URLs or archive extraction operations. Installation actions are standard and limited.

ℹ Credentials

The skill requires only OPENAI_API_KEY which is appropriate for calling OpenAI APIs. SKILL.md documents optional OPENAI_BASE_URL and NOTE_TAKER_MODEL but the Python code does not read these env vars. The script correctly checks for OPENAI_API_KEY before proceeding. No unrelated credentials or config paths are requested.

✓ Persistence & Privilege

The skill does not request always:true, does not modify other skills or system-wide settings, and only writes output notes to a user-specified file. It does not persist credentials or install background services.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install audio-note-taker
After installation, invoke the skill by name or use /audio-note-taker
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.1

audio-note-taker 1.0.1 - 重新优化和精简了功能说明与文档结构，突出会议、讲座、采访等场景应用 - 增强了说话人识别、摘要提取和行动项整理等高级功能参数说明 - 明确依赖需要 ffmpeg 和 openai>=1.0.0，完善环境变量说明 - 优化输出 Markdown 格式示例，提升易用性和结果可读性 - 更新描述和元数据，更准确反映实际功能及适用场景

v1.0.0

Initial release of audio-note-taker – 语音笔记助手 - Automatic transcription and structured note generation from audio files - Supports batch processing, preview mode, and undo/backup functionality - Extracts keywords, summarizes content, and identifies action items - Outputs Markdown notes, supporting MP3, WAV, M4A, FLAC, and OGG formats - Configurable via command-line options and optional JSON config file

Metadata

Slug audio-note-taker

Version 1.0.1

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 2

Frequently Asked Questions

What is Audio Note Taker?

语音笔记助手：录音自动转文字并整理成结构化笔记，支持说话人识别，自动总结要点和行动项. It is an AI Agent Skill for Claude Code / OpenClaw, with 301 downloads so far.

How do I install Audio Note Taker?

Run "/install audio-note-taker" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Audio Note Taker free?

Yes, Audio Note Taker is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Audio Note Taker support?

Audio Note Taker is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Audio Note Taker?

It is built and maintained by utopiabenben (@utopiabenben); the current version is v1.0.1.

More Skills