← Back to Skills Marketplace

MAI Transcribe

Name: MAI Transcribe
Author: robotsbuildrobots

by robotsbuildrobots · GitHub ↗ · v0.1.1 · MIT-0

cross-platform ⚠ suspicious

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install mai-transcribe

Description

Transcribe audio with Microsoft's MAI-Transcribe-1 model via Azure AI Speech.

README (SKILL.md)

MAI-Transcribe-1

Transcribe an audio file via Azure AI Speech using Microsoft's MAI-Transcribe-1 model.

Quick start

node {baseDir}/scripts/transcribe.js /path/to/audio.m4a

Defaults:

Model: mai-transcribe-1
Output: \x3Cinput>.txt
API version: 2025-10-15

Useful flags

node {baseDir}/scripts/transcribe.js /path/to/audio.ogg --out /tmp/transcript.txt
node {baseDir}/scripts/transcribe.js /path/to/audio.m4a --language en-GB
node {baseDir}/scripts/transcribe.js /path/to/audio.m4a --json --out /tmp/transcript.json
node {baseDir}/scripts/transcribe.js /path/to/audio.wav --model mai-transcribe-1
node {baseDir}/scripts/transcribe.js --help

Required env vars

export AZURE_SPEECH_ENDPOINT="https://YOUR-RESOURCE.cognitiveservices.azure.com"
export AZURE_SPEECH_KEY="YOUR_SPEECH_RESOURCE_KEY"

How to get the API key

Go to the Azure portal and open your Speech or Foundry Speech resource.
Open Keys and Endpoint.
Copy:
- the resource endpoint, for example https://your-resource.cognitiveservices.azure.com
- one of the resource keys
Export them:

export AZURE_SPEECH_ENDPOINT="https://YOUR-RESOURCE.cognitiveservices.azure.com"
export AZURE_SPEECH_KEY="YOUR_SPEECH_RESOURCE_KEY"

If gh-style copy-paste chaos is happening, the most important bit is that this skill expects the Speech resource endpoint, not a generic Foundry project URL.

Optional:

export AZURE_SPEECH_API_VERSION="2025-10-15"

API shape

The script calls:

POST {AZURE_SPEECH_ENDPOINT}/speechtotext/transcriptions:transcribe?api-version=2025-10-15

Headers:

Ocp-Apim-Subscription-Key: {AZURE_SPEECH_KEY}

Multipart form fields:

audio
definition

Example definition payload:

{
  "enhancedMode": {
    "enabled": true,
    "model": "mai-transcribe-1"
  }
}

Notes

This is the same style of skill as the Whisper one: a small documented script wrapper, not a built-in OpenClaw media pipeline.
Tested successfully against a live Azure Speech resource.
--json writes the raw Azure response for debugging or downstream processing.
Audio is uploaded to Microsoft for processing.

Usage Guidance

This skill is coherent and implements a straightforward transcription CLI. Before installing, confirm you are comfortable with audio being uploaded to Microsoft (the script posts audio to the Azure Speech endpoint). Provide a Speech resource key with least privilege possible and rotate/revoke the key if needed. Ensure your runtime has a compatible Node version (FormData/Blob/fetch usage may require modern Node). Avoid uploading highly sensitive recordings unless your Azure policy allows it.

Capability Analysis

Type: OpenClaw Skill Name: mai-transcribe Version: 0.1.1 The mai-transcribe skill is a legitimate tool for transcribing audio using Microsoft's Azure AI Speech service. The implementation in scripts/transcribe.js and scripts/common.js follows standard practices, using user-provided environment variables (AZURE_SPEECH_ENDPOINT and AZURE_SPEECH_KEY) to interact with the official Azure API. No evidence of data exfiltration, malicious execution, or prompt injection was found.

Capability Assessment

✓ Purpose & Capability

Name/description (MAI Transcribe) match the requested resources and code. The skill only asks for AZURE_SPEECH_ENDPOINT and AZURE_SPEECH_KEY, requires node, and contains a small CLI that posts audio to the documented Speech API. Nothing requested appears unrelated to transcription.

✓ Instruction Scope

SKILL.md and scripts instruct the agent to run a local Node script that reads a single audio file, uploads it to the configured AZURE_SPEECH_ENDPOINT, and writes a transcript file. The instructions do not request unrelated files, other environment variables, or unexpected external endpoints. The README and SKILL.md explicitly note that audio is uploaded to Microsoft.

✓ Install Mechanism

This is an instruction-only skill with no install spec (lowest risk). The included code files are small, documented, and use standard Node runtime behavior; there are no downloads from arbitrary URLs or extraction steps.

✓ Credentials

Required env vars are AZURE_SPEECH_ENDPOINT and AZURE_SPEECH_KEY (primaryEnv). Those are appropriate and sufficient for calling Azure Speech. No unrelated secrets or config paths are requested. An optional AZURE_SPEECH_API_VERSION is allowed for compatibility.

✓ Persistence & Privilege

always is false and the skill does not request persistent/global agent privileges or modify other skill configs. Autonomous invocation is allowed by default but is not combined with broad or unrelated credential access.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install mai-transcribe
After installation, invoke the skill by name or use /mai-transcribe
Provide required inputs per the skill's parameter spec and get structured output

Version History

v0.1.1

Add Azure Speech key and endpoint setup instructions

v0.1.0

Initial release: skill to use MAI-Transcribe as an alternative to Whisper

Metadata

Slug mai-transcribe

Version 0.1.1

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 2

Frequently Asked Questions

What is MAI Transcribe?

Transcribe audio with Microsoft's MAI-Transcribe-1 model via Azure AI Speech. It is an AI Agent Skill for Claude Code / OpenClaw, with 98 downloads so far.

How do I install MAI Transcribe?

Run "/install mai-transcribe" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is MAI Transcribe free?

Yes, MAI Transcribe is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does MAI Transcribe support?

MAI Transcribe is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created MAI Transcribe?

It is built and maintained by robotsbuildrobots (@robotsbuildrobots); the current version is v0.1.1.

More Skills