← 返回 Skills 市场

audio-transcribe-summarize

Name: audio-transcribe-summarize
Author: q1lin570

作者 q1lin570 · GitHub ↗ · v1.0.1 · MIT-0

cross-platform ⚠ suspicious

238

总下载

当前安装

版本数

在 OpenClaw 中安装

/install audio-transcribe-summarize

功能描述

Transcribe audio/video files to text and generate structured summaries using SenseAudio ASR API. Use when the user asks to transcribe, summarize, or take not...

使用说明 (SKILL.md)

Audio/Video Transcription & Summarization

Transcribe audio/video files using the SenseASR API (api.senseaudio.cn), then summarize the content into structured notes.

{baseDir} refers to this skill's directory.

Prerequisites

Environment variable SENSEAUDIO_API_KEY configured (get your key at https://senseaudio.cn/platform/api-key)
Python 3.8+ with requests installed
For large files (>10MB): ffmpeg installed for splitting（macOS: brew install ffmpeg，Windows: ffmpeg.org 下载并加入 PATH，Linux: apt install ffmpeg）

Quick Start

Run the transcription script:

python {baseDir}/scripts/transcribe.py \x3Caudio_file> [--model sense-asr-pro] [--language zh] [--speakers] [--sentiment] [--translate en]

The script outputs a transcript .txt file alongside the source file
Read the transcript and generate a summary (see Summary Format below)

Workflow

Step 1: Assess the Audio File

Check file size and format:

Supported formats: wav, mp3, ogg, flac, aac, m4a, mp4
Max file size per request: 10MB
If file > 10MB, the script auto-splits using ffmpeg

Step 2: Choose the Right Model

Model	Use When
`sense-asr-lite`	Quick batch transcription, simple audio, cost-sensitive
`sense-asr`	General transcription, need speaker separation or timestamps
`sense-asr-pro`	High accuracy needed: meetings, interviews, complex audio
`sense-asr-deepthink`	Noisy audio, dialects, heavy jargon, speech-to-clean-text

Default to sense-asr-pro for best quality.

Step 3: Transcribe

Run the transcription script. Key options:

# Basic transcription
python {baseDir}/scripts/transcribe.py recording.mp3

# Meeting with multiple speakers + emotion
python {baseDir}/scripts/transcribe.py meeting.wav \
  --model sense-asr-pro \
  --speakers --max-speakers 4 \
  --sentiment \
  --timestamps segment

# Transcribe and translate to English
python {baseDir}/scripts/transcribe.py lecture.mp3 \
  --model sense-asr \
  --translate en

Step 4: Summarize

After transcription, read the transcript file and produce a summary using the format below.

Summary Format

Generate summaries in this structure:

# [Title - inferred from content]

**Source**: filename.mp3
**Duration**: X min Y sec
**Date**: YYYY-MM-DD
**Speakers**: [if speaker diarization was used]

## Key Points
- Point 1
- Point 2
- ...

## Detailed Summary
[2-4 paragraph summary of the content organized by topic/chronology]

## Action Items
- [ ] Action item 1 (assigned to Speaker X, if applicable)
- [ ] Action item 2

## Notable Quotes
> "Direct quote from transcript" — Speaker X, [timestamp if available]

## Full Transcript
\x3Cdetails>
\x3Csummary>Click to expand full transcript\x3C/summary>

[Full transcript text here, with speaker labels and timestamps if available]

\x3C/details>

Adapt the template based on content type:

Meeting: emphasize action items, decisions, speaker contributions
Lecture/Talk: emphasize key concepts, learning points, structure
Interview: emphasize Q&A pairs, key responses
Podcast: emphasize topics discussed, interesting insights

API Reference

For full SenseASR API parameters and response formats, see api-reference.md.

安全使用建议

This skill appears to do what it claims (send audio to SenseAudio and produce transcripts/summaries), but note two things before installing/using it: (1) It requires a SENSEAUDIO_API_KEY (the SKILL.md and script require it) even though the registry metadata omitted that — make sure you supply a key and understand where it will be stored. (2) All audio is uploaded to https://api.senseaudio.cn, so transcripts and possibly speaker/emotion metadata are sent to a third party — consider privacy/confidentiality and cost. If you proceed, verify the API host, only use a dedicated API key with appropriate permissions/quota, run the script in an isolated environment if the audio is sensitive, and confirm the registry metadata is corrected or ask the publisher why the API key was not declared.

功能分析

Type: OpenClaw Skill Name: audio-transcribe-summarize Version: 1.0.1 The skill provides a legitimate utility for transcribing and summarizing audio/video files using the SenseAudio ASR API (api.senseaudio.cn). The Python script `scripts/transcribe.py` correctly handles file splitting via `ffmpeg` and communicates with the API as described in the documentation. No evidence of data exfiltration, malicious execution, or prompt injection was found; the code follows best practices such as using argument lists in `subprocess.run` to prevent shell injection.

能力评估

ℹ Purpose & Capability

The skill's name/description (transcribe & summarize using SenseAudio) align with the included code and API reference. However the registry metadata declared no required environment variables while SKILL.md and scripts/transcribe.py clearly require a SENSEAUDIO_API_KEY — an inconsistency between declared requirements and actual needs.

✓ Instruction Scope

SKILL.md instructs the agent to run the included Python script which uploads audio to api.senseaudio.cn and then writes local transcript (.txt/.json) files. The instructions and script operate within the stated purpose and do not attempt to read unrelated system files or additional environment variables beyond SENSEAUDIO_API_KEY. They do call ffmpeg/ffprobe via subprocess which is expected to split large audio files.

✓ Install Mechanism

There is no install spec (instruction-only with an included script). No packages are downloaded at install time. The risk surface is limited to running the provided Python script and any subprocesses it spawns (ffmpeg).

⚠ Credentials

The script requires SENSEAUDIO_API_KEY (used in Authorization header) but the registry metadata did not declare this environment variable. Requesting an API key for the remote ASR service is proportional to the functionality, but the metadata omission is misleading and could cause users to miss a sensitive requirement. Other environment access is minimal (PATH lookups for ffmpeg).

✓ Persistence & Privilege

The skill is not always-enabled and is user-invocable. It does not request elevated or persistent platform privileges and does not modify other skills or system-wide configuration. Autonomous invocation is allowed by default but is not combined with other high-risk patterns here.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install audio-transcribe-summarize
安装完成后，直接呼叫该 Skill 的名称或使用 /audio-transcribe-summarize 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.1

- Removed the `.env` file from the repository. - Updated setup instructions: now require configuring the `SENSEAUDIO_API_KEY` environment variable instead of using a `.env` file. - Prerequisites section now provides OS-specific installation steps for ffmpeg. - Dependency on `python-dotenv` is no longer mentioned; only `requests` is required. - Maintains existing workflow and summary guidelines.

v1.0.0

- Initial release of audio-transcribe-summarize skill. - Transcribes audio/video files to text using the SenseAudio ASR API. - Supports automatic splitting of large files and multiple audio formats. - Generates structured summaries tailored for meetings, lectures, interviews, and podcasts. - Provides customizable transcription options, including speaker separation, sentiment analysis, and translation. - Includes a markdown-based summary template for consistent and readable output.

元数据

Slug audio-transcribe-summarize

版本 1.0.1

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 2

常见问题