Description

Transcribe large audio files (100MB+, up to 1GB/12 hours) with speaker diarization. Uses AssemblyAI API with direct HTTP calls. Supports MP3, WAV, M4A, FLAC,...

README (SKILL.md)

AssemblyAI Large Audio Transcriber

Name: Assembly Large Audio Transcriber
Author: jiadong0723

Transcribe超大音频文件（100MB~1GB）专用方案，零SDK依赖，直接调HTTP API。

功能

支持超大文件：最高 1GB / 12小时音频
说话人分离（Speaker A/B/C…）
词级时间戳
100+语言，自动检测
MP3 / WAV / M4A / FLAC / OGG / WEBM 支持

安装依赖

服务器执行（只需一次）：

pip install requests

设置 API Key

在环境变量中设置：

export ASSEMBLYAI_API_KEY="your-key"

或告知许霸天你的 AssemblyAI API Key，我来配置。

免费额度：每月100分钟；付费约 $0.01/分钟。

使用方式

告诉许霸天：

用 AssemblyAI 转录 [文件路径]

支持本地文件和 URL。

技术方案

第一步：上传文件（针对大文件）

AssemblyAI 要求先上传获取 upload_url，再提交转录任务：

import requests, os, time

API_KEY = os.getenv("ASSEMBLYAI_API_KEY")
HEADERS = {"authorization": API_KEY}

# 1. 上传文件获取 upload_url
def upload_file(file_path):
    with open(file_path, "rb") as f:
        response = requests.post(
            "https://api.assemblyai.com/v2/upload",
            headers=HEADERS,
            data=f,
            timeout=300
        )
    response.raise_for_status()
    return response.json()["upload_url"]

# 2. 提交转录任务
def transcribe(upload_url, language="zh"):
    payload = {
        "audio_url": upload_url,
        "speaker_labels": True,
        "format_text": True,
        "language_code": language if language != "auto" else None,
    }
    if language == "auto":
        payload["language_detection"] = True
    response = requests.post(
        "https://api.assemblyai.com/v2/transcript",
        headers=HEADERS,
        json=payload,
        timeout=30
    )
    response.raise_for_status()
    return response.json()["id"]

# 3. 轮询结果
def wait_for_result(transcript_id, poll_interval=5, max_wait=3600):
    start = time.time()
    while True:
        result = requests.get(
            f"https://api.assemblyai.com/v2/transcript/{transcript_id}",
            headers=HEADERS,
            timeout=30
        )
        result.raise_for_status()
        data = result.json()
        status = data["status"]
        elapsed = time.time() - start
        if status == "completed":
            return data
        elif status == "error":
            raise Exception(f"Transcription error: {data.get('error')}")
        elif elapsed > max_wait:
            raise TimeoutError(f"Timeout after {max_wait}s")
        else:
            print(f"[{elapsed:.0f}s] Status: {status}...")
            time.sleep(poll_interval)

# 4. 完整流程
def transcribe_large_audio(file_path, language="auto"):
    print(f"上传中: {file_path}")
    upload_url = upload_file(file_path)
    print(f"提交转录任务...")
    tid = transcribe(upload_url, language)
    print(f"任务ID: {tid}")
    print("等待转录完成（可能需要数分钟）...")
    result = wait_for_result(tid)
    return result

处理结果

result = transcribe_large_audio("/path/to/meeting.mp3", language="zh")

# 打印带说话人的转录
for utt in result.get("utterances", []):
    speaker = utt.get("speaker", "?")
    text = utt.get("text", "")
    start = utt.get("start", 0) / 1000  # 毫秒→秒
    print(f"[{speaker}] {start:.1f}s: {text}")

# 或打印纯文本
print(result.get("text", ""))

通过 URL 转录（如果文件已在网上）

如果文件可通过公网访问，直接提交 URL 更简单：

def transcribe_url(audio_url, language="zh"):
    payload = {
        "audio_url": audio_url,
        "speaker_labels": True,
        "language_detection": True,
    }
    response = requests.post(
        "https://api.assemblyai.com/v2/transcript",
        headers=HEADERS, json=payload, timeout=30
    )
    response.raise_for_status()
    tid = response.json()["id"]
    result = wait_for_result(tid)
    return result

完整使用示例

import json, sys

file_path = sys.argv[1] if len(sys.argv) > 1 else "meeting.mp3"
language = sys.argv[2] if len(sys.argv) > 2 else "zh"

result = transcribe_large_audio(file_path, language)

output = {
    "file": file_path,
    "language": result.get("language_code"),
    "duration_s": result.get("audio_duration"),
    "transcript": result.get("text"),
    "utterances": [
        {
            "speaker": u.get("speaker"),
            "start_s": round(u.get("start", 0) / 1000, 2),
            "end_s": round(u.get("end", 0) / 1000, 2),
            "text": u.get("text"),
        }
        for u in result.get("utterances", [])
    ]
}

print(json.dumps(output, ensure_ascii=False, indent=2))

大文件处理流程（许霸天专用）

当用户提交超大音频文件时，按以下步骤执行：

确认文件路径和大小
确认 ASSEMBLYAI_API_KEY 已配置
执行上面的 transcribe_large_audio() 流程
轮询直到完成
整理输出：按时间顺序输出每句话，带说话人和时间戳
写文件存档：/workspace/memory/meetings/{日期}-{会议名}_原始转录.md

错误处理

错误	原因	解决
401 Unauthorized	API Key 无效或未设置	检查 ASSEMBLYAI_API_KEY
413 Payload Too Large	文件超 1GB	需分割文件
422 Unprocessable Entity	音频格式不支持	用 ffmpeg 转换格式
429 Rate Limit	超出并发限制	等待后重试，降低轮询频率

文件分割（如果单文件超过1GB）

如遇 1GB 限制，用以下方式分割：

ffmpeg -i large.mp3 -ss 00:00:00 -to 01:00:00 -c copy part1.mp3
ffmpeg -i large.mp3 -ss 01:00:00 -c copy part2.mp3

再分别转录，最后拼接结果。

Usage Guidance

This skill appears to be what it claims: it uploads audio to AssemblyAI and polls for a transcript, and it only needs your ASSEMBLYAI_API_KEY. Before installing or running it: 1) Do not share your API key with anyone—SKILL.md's suggestion to 'tell 许霸天 your API Key' is a social-engineering prompt and unnecessary. 2) Be aware audio is uploaded to AssemblyAI (read their privacy/TOS) — do not upload sensitive audio you don't want processed/stored by a third party. 3) Implementation notes: the bundled Python script reads the entire file into memory (f.read()), which may exhaust RAM for very large files; consider streaming uploads or splitting large files first. 4) SKILL.md recommends pip install requests but the included script uses urllib; this inconsistency is harmless but indicates the package may be lightly maintained—review code yourself. 5) Confirm where transcripts are archived (script writes file_path + '.transcript.json'; SKILL.md's /workspace/memory path is not enforced by the script). If you will use this, supply your own AssemblyAI API key and test with a small file first.

Capability Analysis

Type: OpenClaw Skill Name: jiadong-assembly-large-audio Version: 1.0.0 The skill is a legitimate tool for transcribing large audio files (up to 1GB) using the AssemblyAI API. It provides a Python script (scripts/transcribe_assemblyai.py) that uses standard libraries to handle file uploads, task submission, and result polling. The instructions in SKILL.md are clearly aligned with the stated purpose, guiding the agent to manage API keys, execute the transcription process, and archive results in a specific workspace directory. While the script reads large files entirely into memory (f.read()), which could lead to resource exhaustion, this is a common coding limitation rather than an intentional vulnerability or malicious behavior.

Capability Assessment

✓ Purpose & Capability

Name/description match the code and SKILL.md: both call AssemblyAI endpoints and require ASSEMBLYAI_API_KEY. The requested credential is appropriate for the stated purpose.

ℹ Instruction Scope

Instructions stay within transcription scope (upload, submit, poll, format, archive). Two small scope concerns: SKILL.md suggests writing archives to /workspace/memory/meetings/... but the included script writes a .transcript.json next to the input file (inconsistent); SKILL.md also includes conversational text asking the user to 'tell 许霸天 your API Key'—do not hand your API key to third parties.

✓ Install Mechanism

No install spec (instruction-only + one script) so nothing is written by an installer. SKILL.md suggests 'pip install requests' but the bundled script uses urllib (no requests required). This is an inconsistency but not a malicious install mechanism.

✓ Credentials

Only ASSEMBLYAI_API_KEY is required, which is proportional. Note: SKILL.md's text encouraging users to give their API key to the operator is a potential social-engineering risk—you should keep your key private and provision it yourself.

✓ Persistence & Privilege

always is false and the skill does not request system-wide changes or other skills' credentials. Autonomous invocation is enabled (default) but not combined with other red flags.

Version History

v1.0.0

- Initial release of assembly-large-audio-transcriber skill. - Transcribes large audio files (100MB–1GB, up to 12 hours) with speaker diarization using AssemblyAI API via HTTP calls. - Supports MP3, WAV, M4A, FLAC, OGG, and WEBM formats with no SDK dependencies (only requires `requests`). - Provides word-level timestamps, multi-language support (auto-detection), and detailed output including speaker and time segmentation. - Includes robust error handling and guidance for API limits and file splitting. - Output can be archived to a specified directory with structured Markdown or JSON formatting.

Metadata

Slug jiadong-assembly-large-audio

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Assembly Large Audio Transcriber?

Transcribe large audio files (100MB+, up to 1GB/12 hours) with speaker diarization. Uses AssemblyAI API with direct HTTP calls. Supports MP3, WAV, M4A, FLAC,... It is an AI Agent Skill for Claude Code / OpenClaw, with 139 downloads so far.

How do I install Assembly Large Audio Transcriber?

Run "/install jiadong-assembly-large-audio" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Assembly Large Audio Transcriber free?

Yes, Assembly Large Audio Transcriber is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Assembly Large Audio Transcriber support?

Assembly Large Audio Transcriber is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Assembly Large Audio Transcriber?

It is built and maintained by jiadong0723 (@jiadong0723); the current version is v1.0.0.

More Skills

Assembly Large Audio Transcriber