← Back to Skills Marketplace
cinience

Alicloud Ai Audio Asr

by cinience · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
335
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install alicloud-ai-audio-asr
Description
Transcribe non-realtime speech with Alibaba Cloud Model Studio Qwen ASR models (`qwen3-asr-flash`, `qwen-audio-asr`, `qwen3-asr-flash-filetrans`). Use when c...
README (SKILL.md)

Category: provider

Model Studio Qwen ASR (Non-Realtime)

Validation

mkdir -p output/alicloud-ai-audio-asr
python -m py_compile skills/ai/audio/alicloud-ai-audio-asr/scripts/transcribe_audio.py && echo "py_compile_ok" > output/alicloud-ai-audio-asr/validate.txt

Pass criteria: command exits 0 and output/alicloud-ai-audio-asr/validate.txt is generated.

Output And Evidence

  • Store transcripts and API responses under output/alicloud-ai-audio-asr/.
  • Keep one command log or sample response per run.

Use Qwen ASR for recorded audio transcription (non-realtime), including short audio sync calls and long audio async jobs.

Critical model names

Use one of these exact model strings:

  • qwen3-asr-flash
  • qwen-audio-asr
  • qwen3-asr-flash-filetrans

Selection guidance:

  • Use qwen3-asr-flash or qwen-audio-asr for short/normal recordings (sync).
  • Use qwen3-asr-flash-filetrans for long-file transcription (async task workflow).

Prerequisites

  • Install SDK dependencies (script uses Python stdlib only):
python3 -m venv .venv
. .venv/bin/activate
  • Set DASHSCOPE_API_KEY in environment, or add dashscope_api_key to ~/.alibabacloud/credentials.

Normalized interface (asr.transcribe)

Request

  • audio (string, required): public URL or local file path.
  • model (string, optional): default qwen3-asr-flash.
  • language_hints (array\x3Cstring>, optional): e.g. zh, en.
  • sample_rate (number, optional)
  • vocabulary_id (string, optional)
  • disfluency_removal_enabled (bool, optional)
  • timestamp_granularities (array\x3Cstring>, optional): e.g. sentence.
  • async (bool, optional): default false for sync models, true for qwen3-asr-flash-filetrans.

Response

  • text (string): normalized transcript text.
  • task_id (string, optional): present for async submission.
  • status (string): SUCCEEDED or submission status.
  • raw (object): original API response.

Quick start (official HTTP API)

Sync transcription (OpenAI-compatible protocol):

curl -sS --location 'https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions' \
  --header "Authorization: Bearer $DASHSCOPE_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "qwen3-asr-flash",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "input_audio",
            "input_audio": {
              "data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
            }
          }
        ]
      }
    ],
    "stream": false,
    "asr_options": {
      "enable_itn": false
    }
  }'

Async long-file transcription (DashScope protocol):

curl -sS --location 'https://dashscope.aliyuncs.com/api/v1/services/audio/asr/transcription' \
  --header "Authorization: Bearer $DASHSCOPE_API_KEY" \
  --header 'X-DashScope-Async: enable' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "qwen3-asr-flash-filetrans",
    "input": {
      "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
    }
  }'

Poll task result:

curl -sS --location "https://dashscope.aliyuncs.com/api/v1/tasks/\x3Ctask_id>" \
  --header "Authorization: Bearer $DASHSCOPE_API_KEY"

Local helper script

Use the bundled script for URL/local-file input and optional async polling:

python skills/ai/audio/alicloud-ai-audio-asr/scripts/transcribe_audio.py \
  --audio "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3" \
  --model qwen3-asr-flash \
  --language-hints zh,en \
  --print-response

Long-file mode:

python skills/ai/audio/alicloud-ai-audio-asr/scripts/transcribe_audio.py \
  --audio "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3" \
  --model qwen3-asr-flash-filetrans \
  --async \
  --wait

Operational guidance

  • For local files, use input_audio.data (data URI) when direct URL is unavailable.
  • Keep language_hints minimal to reduce recognition ambiguity.
  • For async tasks, use 5-20s polling interval with max retry guard.
  • Save normalized outputs under output/alicloud-ai-audio-asr/transcripts/.

Output location

  • Default output: output/alicloud-ai-audio-asr/transcripts/
  • Override base dir with OUTPUT_DIR.

Workflow

  1. Confirm user intent, region, identifiers, and whether the operation is read-only or mutating.
  2. Run one minimal read-only query first to verify connectivity and permissions.
  3. Execute the target operation with explicit parameters and bounded scope.
  4. Verify results and save output/evidence files.

References

  • references/api_reference.md
  • references/sources.md
  • Realtime synthesis is provided by skills/ai/audio/alicloud-ai-audio-tts-realtime/.
Usage Guidance
This skill appears to be a legitimate Alibaba Cloud Qwen ASR helper, but its metadata omits the fact that it needs DASHSCOPE_API_KEY and that the bundled script will read .env files (cwd and repo root) and ~/.alibabacloud/credentials. Before installing: - Expect to provide DASHSCOPE_API_KEY (or add dashscope_api_key to ~/.alibabacloud/credentials). Prefer setting the env var at runtime rather than leaving keys in repo .env files. - Be aware the helper will read .env files and the repo root .env (if a .git directory exists). Remove or move any unrelated secrets from those .env files to avoid accidental use. - The script will base64-encode and upload local audio as data URIs; do not transcribe sensitive audio unless you trust the remote service and network. - Confirm the endpoints (dashscope.aliyuncs.com) and API behavior match your expectations and that you are comfortable sending audio and transcripts to that service. - If you require strict least privilege or want the metadata to be accurate, ask the publisher to declare DASHSCOPE_API_KEY (primary credential) in the skill manifest and to stop implicitly loading repository .env files or to make that behavior opt-in.
Capability Analysis
Type: OpenClaw Skill Name: alicloud-ai-audio-asr Version: 1.0.0 The skill bundle provides a legitimate interface for Alibaba Cloud's Qwen ASR (Automatic Speech Recognition) services. The core logic in `scripts/transcribe_audio.py` uses Python standard libraries to interact with official Alibaba Cloud DashScope endpoints (dashscope.aliyuncs.com) and correctly handles credentials via environment variables or the standard `~/.alibabacloud/credentials` file. No evidence of data exfiltration, malicious execution, or prompt injection was found.
Capability Assessment
Purpose & Capability
The skill's description and SKILL.md describe using Alibaba Cloud DashScope/Qwen ASR and require an API key (DASHSCOPE_API_KEY). However, the skill metadata declares no required environment variables or primary credential. This mismatch (no declared credential but runtime requiring one) is incoherent: a transcription skill legitimately needs an API key, so the metadata should list it.
Instruction Scope
The SKILL.md and bundled script instruct the agent to read environment variables, ~/.alibabacloud/credentials, and local .env files (current working directory and repository root). While reading an ASR API key is expected, automatically loading arbitrary .env files and repo-level .env can pick up unrelated secrets or configuration and incorporate them into the environment used for requests. The script also will base64-encode and upload local audio (data URI), which is expected for local-file transcription but is a potential data exfiltration pathway if users are unaware.
Install Mechanism
No install spec is provided (instruction-only with a helper Python script). The SKILL.md suggests creating a virtualenv but does not download external code or archives. The script uses only Python stdlib, so there is no high-risk install step.
Credentials
The runtime expects DASHSCOPE_API_KEY (and supports ALIBABA_CLOUD_PROFILE/ALICLOUD_PROFILE) but the registry metadata lists no required env vars or primary credential. The script will also load .env files from cwd and repository root and read ~/.alibabacloud/credentials to populate DASHSCOPE_API_KEY. Requiring the ASR API key is proportional to the stated purpose, but silently loading additional .env files and credentials is broader than necessary and not declared.
Persistence & Privilege
The skill does not request always:true and does not attempt to modify other skills or system-wide agent settings. It writes outputs to specified output paths under output/alicloud-ai-audio-asr/, which is consistent with its stated behavior.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install alicloud-ai-audio-asr
  3. After installation, invoke the skill by name or use /alicloud-ai-audio-asr
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
batch publish from alicloud-skills on 2026-03-11
Metadata
Slug alicloud-ai-audio-asr
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Alicloud Ai Audio Asr?

Transcribe non-realtime speech with Alibaba Cloud Model Studio Qwen ASR models (`qwen3-asr-flash`, `qwen-audio-asr`, `qwen3-asr-flash-filetrans`). Use when c... It is an AI Agent Skill for Claude Code / OpenClaw, with 335 downloads so far.

How do I install Alicloud Ai Audio Asr?

Run "/install alicloud-ai-audio-asr" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Alicloud Ai Audio Asr free?

Yes, Alicloud Ai Audio Asr is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Alicloud Ai Audio Asr support?

Alicloud Ai Audio Asr is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Alicloud Ai Audio Asr?

It is built and maintained by cinience (@cinience); the current version is v1.0.0.

💬 Comments