← Back to Skills Marketplace
q1lin570

audio-transcribe-summarize

by q1lin570 · GitHub ↗ · v1.0.1 · MIT-0
cross-platform ⚠ suspicious
238
Downloads
0
Stars
0
Active Installs
2
Versions
Install in OpenClaw
/install audio-transcribe-summarize
Description
Transcribe audio/video files to text and generate structured summaries using SenseAudio ASR API. Use when the user asks to transcribe, summarize, or take not...
README (SKILL.md)

Audio/Video Transcription & Summarization

Transcribe audio/video files using the SenseASR API (api.senseaudio.cn), then summarize the content into structured notes.

{baseDir} refers to this skill's directory.

Prerequisites

  • Environment variable SENSEAUDIO_API_KEY configured (get your key at https://senseaudio.cn/platform/api-key)
  • Python 3.8+ with requests installed
  • For large files (>10MB): ffmpeg installed for splitting(macOS: brew install ffmpeg,Windows: ffmpeg.org 下载并加入 PATH,Linux: apt install ffmpeg

Quick Start

  1. Run the transcription script:
python {baseDir}/scripts/transcribe.py \x3Caudio_file> [--model sense-asr-pro] [--language zh] [--speakers] [--sentiment] [--translate en]
  1. The script outputs a transcript .txt file alongside the source file
  2. Read the transcript and generate a summary (see Summary Format below)

Workflow

Step 1: Assess the Audio File

Check file size and format:

  • Supported formats: wav, mp3, ogg, flac, aac, m4a, mp4
  • Max file size per request: 10MB
  • If file > 10MB, the script auto-splits using ffmpeg

Step 2: Choose the Right Model

Model Use When
sense-asr-lite Quick batch transcription, simple audio, cost-sensitive
sense-asr General transcription, need speaker separation or timestamps
sense-asr-pro High accuracy needed: meetings, interviews, complex audio
sense-asr-deepthink Noisy audio, dialects, heavy jargon, speech-to-clean-text

Default to sense-asr-pro for best quality.

Step 3: Transcribe

Run the transcription script. Key options:

# Basic transcription
python {baseDir}/scripts/transcribe.py recording.mp3

# Meeting with multiple speakers + emotion
python {baseDir}/scripts/transcribe.py meeting.wav \
  --model sense-asr-pro \
  --speakers --max-speakers 4 \
  --sentiment \
  --timestamps segment

# Transcribe and translate to English
python {baseDir}/scripts/transcribe.py lecture.mp3 \
  --model sense-asr \
  --translate en

Step 4: Summarize

After transcription, read the transcript file and produce a summary using the format below.

Summary Format

Generate summaries in this structure:

# [Title - inferred from content]

**Source**: filename.mp3
**Duration**: X min Y sec
**Date**: YYYY-MM-DD
**Speakers**: [if speaker diarization was used]

## Key Points
- Point 1
- Point 2
- ...

## Detailed Summary
[2-4 paragraph summary of the content organized by topic/chronology]

## Action Items
- [ ] Action item 1 (assigned to Speaker X, if applicable)
- [ ] Action item 2

## Notable Quotes
> "Direct quote from transcript" — Speaker X, [timestamp if available]

## Full Transcript
\x3Cdetails>
\x3Csummary>Click to expand full transcript\x3C/summary>

[Full transcript text here, with speaker labels and timestamps if available]

\x3C/details>

Adapt the template based on content type:

  • Meeting: emphasize action items, decisions, speaker contributions
  • Lecture/Talk: emphasize key concepts, learning points, structure
  • Interview: emphasize Q&A pairs, key responses
  • Podcast: emphasize topics discussed, interesting insights

API Reference

For full SenseASR API parameters and response formats, see api-reference.md.

Usage Guidance
This skill appears to do what it claims (send audio to SenseAudio and produce transcripts/summaries), but note two things before installing/using it: (1) It requires a SENSEAUDIO_API_KEY (the SKILL.md and script require it) even though the registry metadata omitted that — make sure you supply a key and understand where it will be stored. (2) All audio is uploaded to https://api.senseaudio.cn, so transcripts and possibly speaker/emotion metadata are sent to a third party — consider privacy/confidentiality and cost. If you proceed, verify the API host, only use a dedicated API key with appropriate permissions/quota, run the script in an isolated environment if the audio is sensitive, and confirm the registry metadata is corrected or ask the publisher why the API key was not declared.
Capability Analysis
Type: OpenClaw Skill Name: audio-transcribe-summarize Version: 1.0.1 The skill provides a legitimate utility for transcribing and summarizing audio/video files using the SenseAudio ASR API (api.senseaudio.cn). The Python script `scripts/transcribe.py` correctly handles file splitting via `ffmpeg` and communicates with the API as described in the documentation. No evidence of data exfiltration, malicious execution, or prompt injection was found; the code follows best practices such as using argument lists in `subprocess.run` to prevent shell injection.
Capability Assessment
Purpose & Capability
The skill's name/description (transcribe & summarize using SenseAudio) align with the included code and API reference. However the registry metadata declared no required environment variables while SKILL.md and scripts/transcribe.py clearly require a SENSEAUDIO_API_KEY — an inconsistency between declared requirements and actual needs.
Instruction Scope
SKILL.md instructs the agent to run the included Python script which uploads audio to api.senseaudio.cn and then writes local transcript (.txt/.json) files. The instructions and script operate within the stated purpose and do not attempt to read unrelated system files or additional environment variables beyond SENSEAUDIO_API_KEY. They do call ffmpeg/ffprobe via subprocess which is expected to split large audio files.
Install Mechanism
There is no install spec (instruction-only with an included script). No packages are downloaded at install time. The risk surface is limited to running the provided Python script and any subprocesses it spawns (ffmpeg).
Credentials
The script requires SENSEAUDIO_API_KEY (used in Authorization header) but the registry metadata did not declare this environment variable. Requesting an API key for the remote ASR service is proportional to the functionality, but the metadata omission is misleading and could cause users to miss a sensitive requirement. Other environment access is minimal (PATH lookups for ffmpeg).
Persistence & Privilege
The skill is not always-enabled and is user-invocable. It does not request elevated or persistent platform privileges and does not modify other skills or system-wide configuration. Autonomous invocation is allowed by default but is not combined with other high-risk patterns here.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install audio-transcribe-summarize
  3. After installation, invoke the skill by name or use /audio-transcribe-summarize
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.1
- Removed the `.env` file from the repository. - Updated setup instructions: now require configuring the `SENSEAUDIO_API_KEY` environment variable instead of using a `.env` file. - Prerequisites section now provides OS-specific installation steps for ffmpeg. - Dependency on `python-dotenv` is no longer mentioned; only `requests` is required. - Maintains existing workflow and summary guidelines.
v1.0.0
- Initial release of audio-transcribe-summarize skill. - Transcribes audio/video files to text using the SenseAudio ASR API. - Supports automatic splitting of large files and multiple audio formats. - Generates structured summaries tailored for meetings, lectures, interviews, and podcasts. - Provides customizable transcription options, including speaker separation, sentiment analysis, and translation. - Includes a markdown-based summary template for consistent and readable output.
Metadata
Slug audio-transcribe-summarize
Version 1.0.1
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 2
Frequently Asked Questions

What is audio-transcribe-summarize?

Transcribe audio/video files to text and generate structured summaries using SenseAudio ASR API. Use when the user asks to transcribe, summarize, or take not... It is an AI Agent Skill for Claude Code / OpenClaw, with 238 downloads so far.

How do I install audio-transcribe-summarize?

Run "/install audio-transcribe-summarize" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is audio-transcribe-summarize free?

Yes, audio-transcribe-summarize is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does audio-transcribe-summarize support?

audio-transcribe-summarize is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created audio-transcribe-summarize?

It is built and maintained by q1lin570 (@q1lin570); the current version is v1.0.1.

💬 Comments