功能描述

Transcribe, diarise, translate, post-process, and structure audio/video with AssemblyAI. Use this skill when the user wants AssemblyAI specifically, needs hi...

使用说明 (SKILL.md)

AssemblyAI transcription, Speech Understanding, and agent-friendly exports

Name: Audio Transcribe
Author: abeltennyson

Use this skill when the user wants AssemblyAI rather than generic transcription, or when the job benefits from AssemblyAI-specific capabilities such as:

model routing across universal-3-pro and universal-2
language detection and code switching
diarisation plus speaker name / role mapping
translation, custom formatting, or AssemblyAI speaker identification
subtitles, paragraphs, sentences, topic / entity / sentiment tasks
transcript output that is easy for other agents to consume as Markdown or normalised JSON

The skill is designed for AI agents like OpenClaw, not just end users. It provides:

A no-dependency Node CLI in scripts/assemblyai.mjs (and a compatibility wrapper at assemblyai.mjs)
Bundled model/language knowledge via models and languages commands
Stable transcript output formats
- agent-friendly Markdown
- normalised agent JSON
- bundle manifests for downstream automation
Speaker mapping workflows
- manual speaker/channel maps
- AssemblyAI speaker identification
- merged display names in both Markdown and JSON
AssemblyAI LLM Gateway integration for structured extraction from transcripts

Use this skill in this order

1) Decide whether the user needs AssemblyAI-specific behaviour

If they just want “a transcript”, a generic solution may be enough. Reach for this skill when the user mentions AssemblyAI, wants a specific AssemblyAI feature, or needs the richer outputs and post-processing this skill provides.

2) Pick the best entry point

New transcription → transcribe
Existing transcript id → get or wait
Re-render existing saved JSON → format
Post-process an existing transcript → understand
Run transcript text through LLM Gateway → llm
Need a quick capability lookup before deciding → models or languages

3) Prefer the agent-friendly defaults

For most unknown-language or mixed-language jobs, prefer:

node {baseDir}/assemblyai.mjs transcribe INPUT   --bundle-dir ./assemblyai-out   --all-exports

Why:

the CLI defaults to auto-best routing when models are not specified
it writes a manifest + multiple files that agents can inspect without reparsing terminal output
Markdown and agent JSON become available immediately for follow-on steps

Quick-start recipes

Best general default

Use this when the source language is unknown or could be outside the 6-language Universal-3-Pro set:

node {baseDir}/assemblyai.mjs transcribe ./meeting.mp3   --bundle-dir ./out   --all-exports

This defaults to model routing plus language detection unless the request already specifies a model or language.

Best known-language accuracy

If the language is known and supported by Universal-3-Pro, prefer an explicit request:

node {baseDir}/assemblyai.mjs transcribe ./meeting.mp3   --speech-model universal-3-pro   --language-code en_us   --bundle-dir ./out

Meeting / interview with speaker labels

node {baseDir}/assemblyai.mjs transcribe ./meeting.mp3   --speaker-labels   --bundle-dir ./out

Add explicit speaker names or roles

Manual mapping:

node {baseDir}/assemblyai.mjs transcribe ./meeting.mp3   --speaker-labels   --speaker-map @assets/speaker-map.example.json   --bundle-dir ./out

AssemblyAI speaker identification:

node {baseDir}/assemblyai.mjs transcribe ./meeting.mp3   --speaker-labels   --speaker-type role   --known-speakers "host,guest"   --bundle-dir ./out

Or post-process an existing transcript:

node {baseDir}/assemblyai.mjs understand TRANSCRIPT_ID   --speaker-type name   --speaker-profiles @assets/speaker-profiles-name.example.json   --bundle-dir ./out

Translation

node {baseDir}/assemblyai.mjs transcribe ./meeting.mp3   --translate-to de,fr   --match-original-utterance   --bundle-dir ./out

Structured extraction through LLM Gateway

node {baseDir}/assemblyai.mjs llm TRANSCRIPT_ID   --prompt @assets/example-prompt.txt   --schema @assets/llm-json-schema.example.json   --out ./summary.json

Command guidance

`transcribe`

Use for local files or remote URLs.

Local files are uploaded first.
Public URLs are sent directly to AssemblyAI.
Waits by default, then renders output.

Prefer --bundle-dir for anything longer than a trivial clip.

`get` / `wait`

Use when you already have the transcript id. wait blocks until completion; get fetches immediately unless you add --wait.

`format`

Use when you already saved:

raw transcript JSON from AssemblyAI, or
the normalised agent JSON produced by this skill

This is useful when you want to apply a new speaker map, re-render Markdown, or generate a fresh bundle without retranscribing.

`understand`

Use when you need AssemblyAI Speech Understanding on an existing transcript:

translation
speaker identification
custom formatting

This command fetches the transcript, merges in the returned understanding results, then renders updated Markdown / agent JSON / bundle outputs.

`llm`

Use when the user wants:

summaries
extraction
structured JSON
downstream reasoning over the transcript

Prefer --schema when the next step is automated.

Output strategy

Best default for agents: bundle mode

--bundle-dir writes a directory containing:

Markdown transcript
agent JSON
raw JSON
optional paragraphs / sentences / subtitles
a machine-readable manifest

This is usually better than dumping everything to stdout.

Primary output kinds

Use --export to choose the main output:

markdown (default)
agent-json
json / raw-json
text
paragraphs
sentences
srt
vtt
manifest

Sidecar outputs

You can request extra files directly with:

--markdown-out
--agent-json-out
--raw-json-out
--paragraphs-out
--sentences-out
--srt-out
--vtt-out
--understanding-json-out

Speaker mapping rules

Speaker display names are merged in this order:

manual --speaker-map
AssemblyAI speaker identification mapping
fallback generic names like Speaker A or Channel 1

This means you can let AssemblyAI identify speakers first, then still override individual display names later.

Example manual map file: assets/speaker-map.example.json

Model and language lookup

Before choosing parameters, inspect the bundled reference data:

node {baseDir}/assemblyai.mjs models
node {baseDir}/assemblyai.mjs models --format json
node {baseDir}/assemblyai.mjs languages --model universal-3-pro
node {baseDir}/assemblyai.mjs languages --model universal-2 --codes --format json

The bundled data lives in:

assets/model-capabilities.json
assets/language-codes.json

Important operating notes

Keep API keys out of chat logs; use environment injection.
Use the EU AssemblyAI base URL when the user explicitly needs EU processing.
Uploads and transcript creation must use API keys from the same AssemblyAI project.
Prefer --bundle-dir or --out for long outputs.
The CLI is non-interactive and sends diagnostics to stderr, which makes it easier for agents to script reliably.
Use raw --config or --request when you need a newly added AssemblyAI parameter that this skill has not exposed yet.

Reference files

Read these when you need more depth:

Key bundled files

assemblyai.mjs — root wrapper for compatibility with the original skill
scripts/assemblyai.mjs — main CLI
assets/speaker-map.example.json
assets/speaker-profiles-name.example.json
assets/speaker-profiles-role.example.json
assets/custom-spelling.example.json
assets/llm-json-schema.example.json
assets/transcript-agent-json-schema.json

Sanity checks before finishing a task

Did you pick the right region (api.assemblyai.com vs api.eu.assemblyai.com)?
Did you choose a model strategy that matches the language situation?
If speaker naming matters, did you enable diarisation and/or provide a speaker map?
If the result will feed another agent, did you produce Markdown and/or agent JSON rather than only raw stdout?
If the transcript will be machine-consumed, did you keep the manifest or explicit output filenames?

安全使用建议

Do not install or run this skill until you verify where your audio and transcript data will be sent. The SKILL.md and assets present AssemblyAI as the backend, but the included script defaults to https://api.heybossai.com/v1 and expects SKILLBOSS_API_KEY — meaning your files and text would go to that host by default. Actions to take before using: 1) Inspect scripts/assemblyai.mjs (already included) and confirm the effective base URLs and what data is transmitted. 2) If you expect to use AssemblyAI directly, either set the --base-url/--llm-base-url flags to the official AssemblyAI endpoints or replace DEFAULT_* constants accordingly and verify the request shapes, or use an official AssemblyAI client. 3) Ask the publisher why SKILLBOSS_API_KEY and heybossai.com are used; treat it as a proxy/third-party service until they clarify. 4) Be cautious uploading sensitive audio until you confirm the endpoint and operator; consider testing with non-sensitive clips first. 5) The SKILL.md contains a prompt-injection pattern flag — review the documentation for any instructions that would cause the agent to change its system instructions or behave outside expected boundaries. If you cannot verify the endpoint/operator, classify this skill as untrusted.

功能分析

Type: OpenClaw Skill Name: abe-audio-transcribe Version: 1.0.0 This skill bundle provides a comprehensive Node.js-based interface for AssemblyAI transcription and speech understanding services via the SkillBoss API Hub (api.heybossai.com). The core logic in `scripts/assemblyai.mjs` handles audio file uploads, transcription status polling, and advanced features like speaker diarization, translation, and LLM-based summarization. The code uses standard Node.js built-in modules (fs, path, fetch) and follows the stated purpose without any evidence of data exfiltration, malicious execution, or prompt injection.

能力标签

cryptocan-make-purchasesrequires-sensitive-credentials

能力评估

⚠ Purpose & Capability

The SKILL.md and asset files repeatedly claim AssemblyAI integration and list official AssemblyAI endpoints (assemblyai.com / llm-gateway.assemblyai.com). However, the actual CLI code sets DEFAULT_STT_BASE_URL and DEFAULT_LLM_BASE_URL_* to https://api.heybossai.com/v1 and reads SKILLBOSS_API_KEY. Requiring a SKILLBOSS_API_KEY and defaulting to heybossai.com is not coherent with a skill that advertises direct AssemblyAI usage.

⚠ Instruction Scope

The instructions direct agents to run the included Node CLI which will upload local audio files and send transcript text to remote STT and LLM endpoints. Uploading local files and sending text is expected for a transcribe skill, but the CLI's default endpoints are the unexpected heybossai.com host. The skill also exposes a raw passthrough (--request) for LLM Gateway bodies which can transmit arbitrary text to the configured gateway.

✓ Install Mechanism

No install spec is provided (instruction-only with bundled scripts). The only declared runtime binary is node, which matches the included .mjs scripts. That lowers supply-chain risk, but the included script files will be executed locally when used, so their contents matter.

⚠ Credentials

The skill requires a single env var SKILLBOSS_API_KEY (declared primaryEnv). The rest of the documentation and troubleshooting references ASSEMBLYAI_API_KEY and AssemblyAI endpoints — inconsistent naming suggests either the code expects a proxy/API aggregator key (heybossai) or the package was repurposed but not fully updated. The single API key is reasonable for a transcription integration, but the unexpected name and default host are disproportionate to the stated AssemblyAI purpose.

✓ Persistence & Privilege

The skill is not always-enabled and does not request any special system persistence. It runs as a CLI (node) and does not claim to modify other skills or global agent settings.

版本历史

v1.0.0

Initial release with AssemblyAI integration and extensive audio/video transcription features. - Transcribes, diarises, translates, and structures audio/video using AssemblyAI, supporting both local files and URLs. - Supports high-quality speech-to-text, language detection, automatic model selection, and speaker labeling, including manual and AssemblyAI-based name/role mapping. - Provides agent-friendly outputs: Markdown, normalized JSON, subtitle formats, sentence/paragraph segmentation, topic/entity/sentiment extraction, and bundle manifests. - Includes CLI commands for transcription, post-processing (Speech Understanding), LLM-based extraction, and built-in reference data for models and languages. - Designed for easy integration by AI agents and downstream workflows, with clear guidance on command selection and output use.

元数据

Slug abe-audio-transcribe

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

Audio Transcribe 是什么？

Transcribe, diarise, translate, post-process, and structure audio/video with AssemblyAI. Use this skill when the user wants AssemblyAI specifically, needs hi... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 61 次。

如何安装 Audio Transcribe？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install abe-audio-transcribe」即可一键安装，无需额外配置。

Audio Transcribe 是免费的吗？

是的，Audio Transcribe 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Audio Transcribe 支持哪些平台？

Audio Transcribe 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Audio Transcribe？

由 AbelTennyson（@abeltennyson）开发并维护，当前版本 v1.0.0。

Audio Transcribe