← 返回 Skills 市场
leeleoo

realtime-transcription

作者 Lee · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
41
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install realtime-transcription
功能描述
Real-time transcription of system or microphone audio with automatic summary generation and date-based Markdown archival after stopping or idle timeout.
使用说明 (SKILL.md)

Real-time Transcription Skill

Capture any audio, get a structured summary. Real-time transcription powered by SenseVoice/FunASR.

Features

  • Real-time transcription — stream audio from system (BlackHole) or microphone
  • Auto summary — on stop, generate title + structured summary
  • Date-based archival — results saved to archive/YYYY/MM/DD-HHMM-title.md
  • Idle detection — auto-stops after 60s of silence (configurable)

Skill Location

All files are in ~/.openclaw/skills/realtime-transcription/:

realtime-transcription/
├── SKILL.md                 # This file
├── realtime_asr.py          # Background transcription process
├── summary_prompt.py        # LLM prompt builder & response parser
├── archiver.py              # Markdown archival module
├── references/
│   └── module-reference.md  # Module API reference
├── .tmp/                    # Runtime temp files
└── archive/                 # Archived outputs

Prerequisites

Python Dependencies

pip3 install sounddevice librosa funasr torch numpy

Or use the built-in installer with progress output:

cd ~/.openclaw/skills/realtime-transcription
python3 realtime_asr.py --install-deps

System Audio (optional, macOS)

For macOS system audio capture, install BlackHole: brew install blackhole-2ch

ASR Model

Download the SenseVoice model: modelscope download --model gongjy/SenseVoiceSmall --local_dir ./model/SenseVoiceSmall

Quick Start

Check Dependencies

cd ~/.openclaw/skills/realtime-transcription
python3 realtime_asr.py --check-deps

Expected output:

✅ 所有依赖已安装。
   sounddevice — PyAudio binding for microphone/system audio capture
   librosa — Audio resampling and preprocessing
   funasr — SenseVoice ASR model framework
   torch — PyTorch deep learning runtime
   numpy — Numerical array processing

If dependencies are missing, run python3 realtime_asr.py --install-deps to install them one by one with progress output.

Start Transcription

System audio (BlackHole):

cd ~/.openclaw/skills/realtime-transcription
python3 realtime_asr.py --source blackhole

Microphone:

cd ~/.openclaw/skills/realtime-transcription
python3 realtime_asr.py --source mic

With custom idle timeout (5 minutes):

cd ~/.openclaw/skills/realtime-transcription
python3 realtime_asr.py --source mic --idle-timeout 300

Disable idle timeout:

cd ~/.openclaw/skills/realtime-transcription
python3 realtime_asr.py --source mic --idle-timeout 0

Stop Transcription

Press Ctrl+C in the terminal, or:

kill $(cat .tmp/asr.pid 2>/dev/null) 2>/dev/null; rm -f .tmp/asr.pid

After Stopping — Summary & Archive

  1. Read the transcript: cat .tmp/transcript.txt
  2. Build the LLM prompt:
    cd ~/.openclaw/skills/realtime-transcription
    python3 -c "
    

from summary_prompt import build_summary_prompt print(build_summary_prompt(open('.tmp/transcript.txt').read())) "

3. Send the prompt to yourself (the LLM) to generate TITLE + SUMMARY
4. Parse and archive:
```bash
cd ~/.openclaw/skills/realtime-transcription
python3 -c "
from summary_prompt import parse_summary_response
from archiver import archive
transcript = open('.tmp/transcript.txt').read()
result = parse_summary_response('YOUR_LLM_RESPONSE_HERE')
path = archive(transcript, result['title'], result['summary'], 'blackhole')
print(f'Archived to: {path}')
"

CLI Reference

Flag Default Description
--source blackhole blackhole (system) or mic
--output .tmp/transcript.txt Transcript file path
--state .tmp/asr.pid PID file for process management
--model ./model/SenseVoiceSmall SenseVoice model directory
--idle-timeout 60 Auto-stop after N seconds of silence (0=disable)
--device auto Audio device ID override
--check-deps Check dependencies and exit
--install-deps Install missing dependencies with progress output
--list-devices List available audio input devices

Trigger Words

User says Action
"开始转录" / "transcribe" / "启动转录" Check deps → ask source → start
"停止" / "stop" Stop process → summary → archive
"当前转录内容" Show .tmp/transcript.txt
"检查依赖" Run --check-deps

Output Format

Transcript (.tmp/transcript.txt)

[14:30:00] 你好今天我们来讨论一下AI的发展
[14:30:05] AI技术在各个领域都有广泛应用

Archive (archive/YYYY/MM/DD-HHMM-title.md)

---
title: "AI发展趋势讨论"
date: 2025-05-16
time: "14:30 - 14:38"
source: blackhole
duration: 8m
---

## 摘要

- AI在医疗、金融、教育领域广泛应用
- 未来将更智能和普及

## 完整转录

[14:30:00] 你好今天我们来讨论一下AI的发展
...

Error Handling

Scenario Behavior
Missing dependencies Refuse to start, show install instructions
BlackHole not found Suggest --source mic
Process crashes PID file gone → offer to recover
Empty transcript Warn user, skip summary, no archive
No sound for N seconds Exit code 42, ask user to continue

Exit Codes

Code Meaning
0 Normal stop
1 Dependency check failed
42 Idle timeout — ask user: "⏸️ 已 N 秒没有检测到声音,是否继续录音?(y/n)"
安全使用建议
Only install this if you are comfortable with system/microphone audio being transcribed, summarized by your active LLM, and saved locally. Before use, narrow the tool permissions, verify/pin the ASR model and dependencies, and treat the README’s “zero data leaves machine” claim as unproven unless your LLM is local.
功能分析
Type: OpenClaw Skill Name: realtime-transcription Version: 1.0.0 The skill bundle exhibits several high-risk patterns that create significant security vulnerabilities, although clear malicious intent is not evident. Most critically, the instructions in SKILL.md direct the AI agent to execute a Python snippet via `python3 -c` that embeds raw LLM output directly into a string literal; this is highly susceptible to command injection if the LLM generates a payload that breaks the string context. Additionally, `realtime_asr.py` includes a built-in dependency installer using `pip` and loads ASR models with `trust_remote_code=True`, which allows for remote code execution by design of the underlying library (FunASR/ModelScope). While these features support the stated goal of local audio transcription, they provide an attacker with multiple vectors for arbitrary code execution.
能力评估
Purpose & Capability
Real-time audio transcription and archiving are coherent with the stated purpose, but the captured system/microphone audio can be highly sensitive and the documented LLM summarization creates a data-flow risk.
Instruction Scope
The allowed-tools scope is broader than the workflow needs, including arbitrary python3 -c execution and global Read/Write/cp/mv-style file operations.
Install Mechanism
The skill relies on unpinned Python/model dependencies and loads the ASR model with trust_remote_code=True, which materially increases supply-chain and code-execution risk.
Credentials
Audio capture and local archiving are expected for this skill, but broad filesystem and shell permissions are not clearly bounded to the skill directory or transcript/archive files.
Persistence & Privilege
The skill runs a background transcription process and stores transcripts/archives locally; this is disclosed and purpose-aligned, but users should actively stop recording and manage retained archives.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install realtime-transcription
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /realtime-transcription 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
- Initial release of real-time audio transcription skill with automatic summarization and archival. - Supports both system audio capture (via BlackHole on macOS) and microphone input. - On stop, generates a title and structured summary, saving outputs as dated Markdown files. - Automatic idle detection: stops recording after configurable seconds of silence. - Includes CLI for setup, direct operation, dependency management, and device selection. - Multi-language trigger phrases supported for starting/stopping transcription and checking status.
元数据
Slug realtime-transcription
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

realtime-transcription 是什么?

Real-time transcription of system or microphone audio with automatic summary generation and date-based Markdown archival after stopping or idle timeout. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 41 次。

如何安装 realtime-transcription?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install realtime-transcription」即可一键安装,无需额外配置。

realtime-transcription 是免费的吗?

是的,realtime-transcription 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

realtime-transcription 支持哪些平台?

realtime-transcription 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 realtime-transcription?

由 Lee(@leeleoo)开发并维护,当前版本 v1.0.0。

💬 留言讨论