← Back to Skills Marketplace
qingzhe2020

ifly-speed-transcription

by Iflytek AIcloud · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
237
Downloads
0
Stars
1
Active Installs
1
Versions
Install in OpenClaw
/install ifly-speed-transcription
Description
Ultra-fast speech transcription using iFLYTEK Speed Transcription API. Transcribe audio files (WAV/PCM/MP3) up to 5 hours in ~20 seconds per hour. Supports C...
README (SKILL.md)

iFly Speed Transcription

Ultra-fast speech transcription service that converts audio files to text in record time - 1 hour of audio transcribes in ~20 seconds.

Quick Start

# Basic transcription (auto-detect language and dialect)
python3 scripts/transcribe.py /path/to/audio.mp3

# Save to file
python3 scripts/transcribe.py /path/to/audio.wav --output result.txt

# With domain-specific optimization
python3 scripts/transcribe.py /path/to/audio.mp3 --pd medical

# With speaker separation
python3 scripts/transcribe.py /path/to/meeting.mp3 --vspp-on 1 --speaker-num 2

Setup

1. API Credentials

Get credentials from iFlytek Open Platform:

  • APP_ID: Application ID
  • API_KEY: API key for authentication
  • API_SECRET: API secret for signing requests

2. Environment Variables

export XFEI_APP_ID="your_app_id"
export XFEI_API_KEY="your_api_key"
export XFEI_API_SECRET="your_api_secret"

API Parameters

Required Parameters

Parameter Description
file_path Path to audio file (MP3, 16kHz, 16-bit, mono)
--language Language code (default: zh_cn for Chinese+English+202 dialects)
--accent Accent (default: mandarin)

Optional Parameters

Parameter Type Description
--pd string Domain: court, finance, medical, tech, sport, edu, gov, game, ecom, car
--vspp-on int Speaker separation: 0=off, 1=on
--speaker-num int Number of speakers (0=auto, range 1-10)
--output-type int Output: 0=1best, 1=cnlbest, 2=multi-candidate
--postproc-on int Post-processing: 0=off, 1=on (default)
--enable-subtitle int Subtitle mode: 0=document, 1=subtitle
--smoothproc bool Smoothing: true=on, false=off (default: true)
--colloqproc bool Colloquial processing: true=on, false=off
--language-type int Language mode: 1=auto, 2=Chinese, 3=English, 4=Chinese-only
--dhw string Hot words (comma-separated, UTF-8)

Audio Requirements

  • Format: MP3
  • Sample rate: 16kHz
  • Bit depth: 16-bit
  • Channels: Mono (single channel)
  • Size: ≤ 500MB
  • Duration: ≤ 5 hours (recommended: ≥ 5 minutes)

Workflow

1. Upload Audio File

Files \x3C 30MB use direct upload. Files ≥ 30MB use multipart upload (5MB chunks).

2. Create Transcription Task

Submit uploaded file URL with transcription parameters.

3. Poll for Results

Query task status periodically until completion.

Response Format

{
  "task_id": "1568100557463963551003",
  "task_status": "4",
  "text": "Transcribed text content...",
  "segments": [
    {
      "speaker": "spk-0",
      "begin": "0",
      "end": "470",
      "text": "听说。"
    }
  ]
}

Task Status

  • 1: Pending
  • 2: Processing
  • 3: Completed
  • 4: Callback completed
  • -1: Failed

Language Support

autodialect (language=zh_cn)

Automatic recognition of Chinese, English, and 202 Chinese dialects including:

  • Major: Mandarin, Cantonese, Taiwanese, Sichuanese, Shanghainese, Northeastern
  • Full list: 合肥话、芜湖话、皖北话、粤语、北京话、福州话、闽南语、潮汕话、客家话、贵阳话、海口话、石家庄话、太原话、郑州话、东北话、武汉话、长沙话、南京话、南昌话、大连话、呼和浩特话、银川话、西宁话、济南话、西安话、上海话、四川话、台湾话、天津话、乌鲁木齐话、云南话、杭州话、重庆话 (202 total)

Common Use Cases

  1. Meeting Transcription: Convert meeting recordings to text with speaker separation
  2. Interview Recording: Transcribe interviews for documentation
  3. Lecture Recording: Convert academic lectures to searchable text
  4. Voice Notes: Transform voice memos into text notes
  5. Call Center: Analyze customer service calls
  6. Legal Proceedings: Transcribe court hearings with domain optimization
  7. Medical Consultation: Doctor-patient conversation documentation

Error Handling

Error Code Description 友好提示
10107 自定音频编码字段错误 请检查 encoding 的传值是否规范~ (◎_◎)
10303 参数值传递不规范 请检查传参值是否有误哦~ (°∀°)ノ
10043 音频解码失败 请检查所传的音频是否与 encoding 字段描述的编码格式对应呢~
20304 静音音频、音频格式与传参不匹配 检查音频是否为16k、16bit单声道音频哦~ (。•́︿•̀。)

💡 遇到问题?

  • 📖 接口文档:https://console.xfyun.cn/services/ost
  • 💰 购买套餐:https://www.xfyun.cn/services/fast_lfasr?target=price

常见问题 FAQ

Q: 录音文件转写极速版的主要功能是什么? A: 快速地将长段音频(5小时以内)数据转换成文本数据呢~ (๑•̀ㅂ•́)و✧

Q: 录音文件转写极速版支持什么语言? A: 支持中文、英文 + 202种方言免切识别哦! ヽ(✿゚▽゚)ノ

Q: 录音文件转写极速版支持什么应用平台? A: 目前支持 WebAPI 应用平台啦~

Q: 为什么只支持 MP3 格式呀? A: 因为 MP3 格式兼容性好、文件小、传输快呢~ 使用 lame 编码就能轻松接入啦! (◕‿◕)

Tips

  1. For speaker separation: Use --vspp-on 1 for better speaker diarization
  2. For specific domains: Use --pd parameter for improved accuracy
  3. For faster processing: Audio files ≥ 5 minutes are prioritized
  4. For subtitle output: Use --enable-subtitle 1 for subtitle-formatted output
  5. For hot words: Use --dhw="word1,word2" to boost recognition accuracy
Usage Guidance
This skill appears to be a legitimate iFlytek transcription client, but there are inconsistencies you should address before installing: 1) The SKILL.md and scripts require three environment secrets (XFEI_APP_ID, XFEI_API_KEY, XFEI_API_SECRET) but the registry metadata lists no required env vars — confirm you are comfortable providing those API credentials and that metadata is corrected. 2) Review scripts/transcribe.py yourself (or run it in an isolated environment) to confirm it only uploads the audio files you expect and does not read other files. Pay special attention to callback_url usage — avoid setting a callback to an endpoint you don't control because transcription results could be delivered there. 3) The .claude/settings.local.json contains author-local absolute paths and allowed Bash commands (py_compile, zip, read of a Desktop path) — this is likely leftover packaging metadata but inspect/ignore or remove it before deployment. 4) Only provide your iFlytek credentials to trusted code; consider creating a dedicated API key with limited scope/quota for testing. If you want higher assurance, ask the publisher to update registry metadata to declare required env vars and remove any author-local permission files, or run the script in a sandboxed container and monitor network calls to the xfyun endpoints.
Capability Analysis
Type: OpenClaw Skill Name: ifly-speed-transcription Version: 1.0.0 The skill bundle provides a legitimate implementation for transcribing audio files using the iFLYTEK Speed Transcription API. The core logic in `scripts/transcribe.py` handles file uploads (including multipart for large files) and API communication using standard HMAC-SHA256 authentication, with no evidence of data exfiltration or unauthorized execution. While `.claude/settings.local.json` contains local development paths and specific bash permissions, these appear to be unintentional artifacts from the developer's environment rather than malicious components.
Capability Assessment
Purpose & Capability
Functionality (audio upload, multipart upload, create/poll transcription tasks) matches the description of an iFLYTEK speed-transcription client. The code expects iFlytek credentials (app id, api key, api secret), which are appropriate for this purpose. However, the registry metadata lists no required environment variables/credentials even though SKILL.md and scripts clearly require XFEI_APP_ID / XFEI_API_KEY / XFEI_API_SECRET — this metadata omission is an inconsistency.
Instruction Scope
SKILL.md gives concrete runtime instructions (set env vars, run python script, upload/poll workflow). The instructions themselves are scoped to transcription and do not ask for unrelated host data. One oddity: the repository contains a .claude/settings.local.json with Read and Bash permissions pointing to a user-specific Desktop path and zip commands; that file is not required for normal use and appears to be author-local packaging metadata rather than necessary runtime instructions, but it could reveal an over-broad permission intent if honored by an agent runtime.
Install Mechanism
There is no install spec (instruction-only + a Python script). That lowers installation risk; dependencies are standard (requests, urllib3) listed in _meta.json. No remote archive downloads or unusual install sources are present in the provided files.
Credentials
The script and SKILL.md require three secrets (XFEI_APP_ID, XFEI_API_KEY, XFEI_API_SECRET) — these are proportionate to calling the iFlytek API. The concern is that the skill registry metadata does not declare these required env vars (it lists none). This mismatch can lead to accidental omission of required secrets or confusion about what the skill will access. Also the skill supports an optional callback_url parameter — if set to an attacker-controlled endpoint it could be used to exfiltrate transcription results; users should inspect and control any callback_url usage.
Persistence & Privilege
The skill is not always-enabled and uses normal autonomous invocation defaults — no elevated persistence requested. The only persistence/permission artifact is .claude/settings.local.json which enumerates local Bash and Read permissions (including reading an absolute Desktop path and running zip/py_compile). That file appears to be local packaging metadata and is not a necessary runtime privilege for the transcription task, but its presence is unusual and should be reviewed; it could indicate the author tested packaging with broad, user-specific filesystem access.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install ifly-speed-transcription
  3. After installation, invoke the skill by name or use /ifly-speed-transcription
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
- Initial release of ifly-speed-transcription skill. - Provides ultra-fast speech transcription via iFLYTEK Speed Transcription API (up to 5 hours audio in ~20 seconds per hour). - Supports Chinese, English, and 202+ Chinese dialects with automatic detection. - Allows domain-specific tuning, speaker separation, and subtitle output via CLI parameters. - Includes error handling and guides for setup, usage, and common troubleshooting.
Metadata
Slug ifly-speed-transcription
Version 1.0.0
License MIT-0
All-time Installs 1
Active Installs 1
Total Versions 1
Frequently Asked Questions

What is ifly-speed-transcription?

Ultra-fast speech transcription using iFLYTEK Speed Transcription API. Transcribe audio files (WAV/PCM/MP3) up to 5 hours in ~20 seconds per hour. Supports C... It is an AI Agent Skill for Claude Code / OpenClaw, with 237 downloads so far.

How do I install ifly-speed-transcription?

Run "/install ifly-speed-transcription" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is ifly-speed-transcription free?

Yes, ifly-speed-transcription is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does ifly-speed-transcription support?

ifly-speed-transcription is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created ifly-speed-transcription?

It is built and maintained by Iflytek AIcloud (@qingzhe2020); the current version is v1.0.0.

💬 Comments