← Back to Skills Marketplace

Audio Summary

Name: Audio Summary
Author: alanoo7

by alanOO7 · GitHub ↗ · v1.0.0

cross-platform ⚠ suspicious

515

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install audio-summary

Description

Automatically extracts audio from video, transcribes it using qwen3-asr-flash, and generates segmented text summaries saved alongside the original file.

README (SKILL.md)

audio-summary Skill

音频/视频转文本总结助手。

功能

自动音频提取：使用 ffmpeg 从 MP4 等视频文件中提取 16k mono 压缩音频，以适配大模型体积限制。
转录转总结：基于百炼 qwen3-asr-flash 模型，自动将音频转换为文字并生成内容分段总结。
大文件支持：通过 48k 压缩，支持最长约 5-8 分钟的视频单次直接转录。

依赖

ffmpeg (已安装在系统路径)
openai Python SDK (已安装)
百炼 API KEY (已在脚本中配置为 sk-76735...)

使用方法

从命令行运行

# 对指定视频进行提取和总结
python .openclaw/workspace/skills/audio-summary/audio_summary_skill.py "C:\Path\To\Your\Video.mp4"

文件位置

提取出的总结文本将自动保存在视频同级目录下，并命名为 视频名_summary.txt。

注意事项

目前单次 Base64 转录限制为 6MB，对于超过 10 分钟的长视频，建议先手动切分或进一步降低码率。
API 费用按 qwen3-asr-flash 模型计费。

Usage Guidance

This skill appears to implement its advertised audio-extraction and transcription functionality, but it contains a hard-coded API key and sends full Base64-encoded audio to an undocumented third-party endpoint (dashscope.aliyuncs.com). Before installing or running it: (1) Do not run it on private/confidential audio as it will transmit the entire audio to that endpoint. (2) Ask the author to remove the embedded API key and require the user to supply their own key via an environment variable or secure config. (3) Verify the identity/trustworthiness of the endpoint (dashscope.aliyuncs.com) and the ASR provider (qwen3-asr-flash). (4) Prefer a version that documents required env vars and exposes the network destination; optionally modify the script to point to a trusted API host or your own account. If you cannot verify the endpoint or the provenance of the embedded key, treat this skill as high-risk and avoid using it with sensitive data.

Capability Analysis

Type: OpenClaw Skill Name: audio-summary Version: 1.0.0 The skill contains a hardcoded Aliyun DashScope API key in 'audio_summary_skill.py', which is a significant security risk and credential leak. Additionally, the script is vulnerable to shell injection because it uses an f-string to construct an 'os.system' command with the 'video_path' input without proper sanitization. While these represent critical security vulnerabilities (RCE and credential exposure), they appear to be unintentional flaws rather than intentional malware designed for exfiltration or persistence.

Capability Assessment

ℹ Purpose & Capability

The code does what the name/description claim: it uses ffmpeg to extract/compress audio and calls a qwen3-asr-flash ASR model via the OpenAI Python client. Declared dependencies in SKILL.md (ffmpeg, openai SDK) match the implementation. However, the skill does not declare any required environment variables or primary credential in the registry metadata, yet the script contains a hard-coded API key and a custom base_url — an inconsistency between what the skill claims to require and what it actually contains.

⚠ Instruction Scope

Runtime instructions and the script convert entire audio files to a Base64 data URI and send that data to a remote model endpoint. The SKILL.md references the '百炼 API KEY' but does not disclose the actual network endpoint used by the code (the code targets dashscope.aliyuncs.com). Sending full audio data to an undeclared third‑party endpoint is a privacy/exfiltration risk. The instructions also recommend running the exact included script path, which will use the embedded key by default.

✓ Install Mechanism

There is no install spec (instruction-only with a single Python script). That lowers supply-chain risk because nothing will be automatically downloaded or extracted during install.

⚠ Credentials

The skill requires an API credential to call the ASR model, but instead of declaring a required env var or asking the user to supply a key, the script hard-codes an API key string and a non-standard base_url. The registry metadata declared no required credentials; embedding a key in the code is disproportionate and insecure. The endpoint in code (dashscope.aliyuncs.com) is not the public qwen/openai domain and is not explained in SKILL.md.

✓ Persistence & Privilege

The skill is not always-enabled and does not request elevated platform privileges or modify other skills/config. It runs only when invoked and does not persist configuration beyond writing its own summary output file in the same directory as the input.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install audio-summary
After installation, invoke the skill by name or use /audio-summary
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

- Initial release of audio-summary skill. - Extracts 16k mono audio from video files using ffmpeg. - Transcribes audio to text and generates content summaries using the Qwen3 ASR Flash model. - Supports video files up to approximately 5-8 minutes after compression. - Summaries are saved as .txt files in the same directory as the source video. - Requires ffmpeg, openai Python SDK, and a configured Baichuan API key.

Metadata

Slug audio-summary

Version 1.0.0

License —

All-time Installs 3

Active Installs 3

Total Versions 1

Frequently Asked Questions

What is Audio Summary?

Automatically extracts audio from video, transcribes it using qwen3-asr-flash, and generates segmented text summaries saved alongside the original file. It is an AI Agent Skill for Claude Code / OpenClaw, with 515 downloads so far.

How do I install Audio Summary?

Run "/install audio-summary" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Audio Summary free?

Yes, Audio Summary is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Audio Summary support?

Audio Summary is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Audio Summary?

It is built and maintained by alanOO7 (@alanoo7); the current version is v1.0.0.

More Skills