/install oatda-transcribe-audio
OATDA Audio Transcription
Transcribe audio files to text through OATDA's unified audio API.
API Key Resolution
All commands need the OATDA API key. Resolve it inline for each exec call:
export OATDA_API_KEY="${OATDA_API_KEY:-$(cat ~/.oatda/credentials.json 2>/dev/null | jq -r '.profiles[.defaultProfile].apiKey' 2>/dev/null)}"
If the key is empty or null, tell the user to get one at https://oatda.com and configure it.
Security: Never print the full API key. Only verify existence or show first 8 chars.
Model Mapping
| User says | Provider | Model |
|---|---|---|
| whisper, whisper-1, openai whisper (default) | openai | whisper-1 |
| transcription, speech to text, stt | openai | whisper-1 |
Default: openai / whisper-1 if no model specified.
If the user provides provider/model format directly (for example openai/whisper-1), split on /.
⚠️ Models change over time. If a model ID fails, query
oatda-list-modelswith?type=audiofirst.
Input Preparation
The transcription endpoint supports:
multipart/form-datawith a local file upload- JSON with a base64 data URL in
file - JSON with
file_base64for providers that support direct base64 payloads
Maximum audio file size is 25MB.
For local files, prefer multipart upload because it is simpler and avoids large JSON bodies.
Discovering Audio Model Parameters
export OATDA_API_KEY="${OATDA_API_KEY:-$(cat ~/.oatda/credentials.json 2>/dev/null | jq -r '.profiles[.defaultProfile].apiKey' 2>/dev/null)}" && \
curl -s -X GET "https://oatda.com/api/v1/llm/models?type=audio" \
-H "Authorization: Bearer $OATDA_API_KEY" | jq '.audio_models[] | {id, supported_params}'
Look for:
audio_modescontainingtranscription- supported
response_formatvalues - optional timestamp, diarization, or streaming support
API Call (multipart)
export OATDA_API_KEY="${OATDA_API_KEY:-$(cat ~/.oatda/credentials.json 2>/dev/null | jq -r '.profiles[.defaultProfile].apiKey' 2>/dev/null)}" && \
curl -s -X POST "https://oatda.com/api/v1/llm/transcriptions" \
-H "Authorization: Bearer $OATDA_API_KEY" \
-F "provider=\x3CPROVIDER>" \
-F "model=\x3CMODEL>" \
-F "file=@\x3CAUDIO_FILE>" \
-F "response_format=json"
Alternative API Call (base64 JSON)
AUDIO_DATA_URL="data:audio/mpeg;base64,$(base64 -w 0 audio.mp3)"
export OATDA_API_KEY="${OATDA_API_KEY:-$(cat ~/.oatda/credentials.json 2>/dev/null | jq -r '.profiles[.defaultProfile].apiKey' 2>/dev/null)}" && \
curl -s -X POST "https://oatda.com/api/v1/llm/transcriptions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OATDA_API_KEY" \
-d "$(jq -n \
--arg provider \"\x3CPROVIDER>\" \
--arg model \"\x3CMODEL>\" \
--arg file \"$AUDIO_DATA_URL\" \
'{provider: $provider, model: $model, file: $file, response_format: \"json\"}')"
Common Parameters
language: ISO-639-1 language code likeen,de,frprompt: Context for names, acronyms, or domain-specific termsresponse_format:json,text,srt,verbose_json,vtt, ordiarized_jsontemperature: 0 to 1timestamp_granularities:wordand/orsegmentchunking_strategy:autohotwords: Provider-specific keyword hintsstream:trueif supported by the selected model
Response Format
The API returns JSON like:
{
"text": "The transcribed text...",
"language": "en",
"duration": 42.5,
"segments": [],
"words": [],
"costs": {
"inputCost": 0,
"outputCost": 0.0001,
"totalCost": 0.0001,
"currency": "USD"
}
}
Present the text field to the user. Include subtitles, segments, or words if the requested format includes them.
Error Handling
| HTTP Status | Meaning | Action |
|---|---|---|
| 401 | Invalid API key | Tell user to check their key |
| 402 | Insufficient credits | Tell user to check balance |
| 400 | Bad request / model not supported | Check model or file format and query oatda-list-models with type=audio |
| 413 | File too large | Keep audio under 25MB or split it |
| 429 | Rate limited or monthly cap | Wait briefly and retry once |
Example
export OATDA_API_KEY="${OATDA_API_KEY:-$(cat ~/.oatda/credentials.json 2>/dev/null | jq -r '.profiles[.defaultProfile].apiKey' 2>/dev/null)}" && \
curl -s -X POST "https://oatda.com/api/v1/llm/transcriptions" \
-H "Authorization: Bearer $OATDA_API_KEY" \
-F "provider=openai" \
-F "model=whisper-1" \
-F "[email protected]" \
-F "response_format=json"
Notes
- Endpoint:
/api/v1/llm/transcriptions - Prefer multipart upload for local files
- Use
response_format=srtorvttfor subtitles - Use
languageto improve recognition when source language is known - Equivalent capability name:
transcribe_audio - Related skills:
oatda-generate-speech,oatda-translate-audio,oatda-list-models
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install oatda-transcribe-audio - 安装完成后,直接呼叫该 Skill 的名称或使用
/oatda-transcribe-audio触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Oatda Transcribe Audio 是什么?
Transcribe audio to text using OATDA's unified audio API. Triggers when the user wants speech-to-text, transcription of meetings, podcasts, voice notes, subt... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 33 次。
如何安装 Oatda Transcribe Audio?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install oatda-transcribe-audio」即可一键安装,无需额外配置。
Oatda Transcribe Audio 是免费的吗?
是的,Oatda Transcribe Audio 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Oatda Transcribe Audio 支持哪些平台?
Oatda Transcribe Audio 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Oatda Transcribe Audio?
由 devcsde(@devcsde)开发并维护,当前版本 v1.0.1。