← 返回 Skills 市场
qingzhe2020

ifly-speed-transcription

作者 Iflytek AIcloud · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
237
总下载
0
收藏
1
当前安装
1
版本数
在 OpenClaw 中安装
/install ifly-speed-transcription
功能描述
Ultra-fast speech transcription using iFLYTEK Speed Transcription API. Transcribe audio files (WAV/PCM/MP3) up to 5 hours in ~20 seconds per hour. Supports C...
使用说明 (SKILL.md)

iFly Speed Transcription

Ultra-fast speech transcription service that converts audio files to text in record time - 1 hour of audio transcribes in ~20 seconds.

Quick Start

# Basic transcription (auto-detect language and dialect)
python3 scripts/transcribe.py /path/to/audio.mp3

# Save to file
python3 scripts/transcribe.py /path/to/audio.wav --output result.txt

# With domain-specific optimization
python3 scripts/transcribe.py /path/to/audio.mp3 --pd medical

# With speaker separation
python3 scripts/transcribe.py /path/to/meeting.mp3 --vspp-on 1 --speaker-num 2

Setup

1. API Credentials

Get credentials from iFlytek Open Platform:

  • APP_ID: Application ID
  • API_KEY: API key for authentication
  • API_SECRET: API secret for signing requests

2. Environment Variables

export XFEI_APP_ID="your_app_id"
export XFEI_API_KEY="your_api_key"
export XFEI_API_SECRET="your_api_secret"

API Parameters

Required Parameters

Parameter Description
file_path Path to audio file (MP3, 16kHz, 16-bit, mono)
--language Language code (default: zh_cn for Chinese+English+202 dialects)
--accent Accent (default: mandarin)

Optional Parameters

Parameter Type Description
--pd string Domain: court, finance, medical, tech, sport, edu, gov, game, ecom, car
--vspp-on int Speaker separation: 0=off, 1=on
--speaker-num int Number of speakers (0=auto, range 1-10)
--output-type int Output: 0=1best, 1=cnlbest, 2=multi-candidate
--postproc-on int Post-processing: 0=off, 1=on (default)
--enable-subtitle int Subtitle mode: 0=document, 1=subtitle
--smoothproc bool Smoothing: true=on, false=off (default: true)
--colloqproc bool Colloquial processing: true=on, false=off
--language-type int Language mode: 1=auto, 2=Chinese, 3=English, 4=Chinese-only
--dhw string Hot words (comma-separated, UTF-8)

Audio Requirements

  • Format: MP3
  • Sample rate: 16kHz
  • Bit depth: 16-bit
  • Channels: Mono (single channel)
  • Size: ≤ 500MB
  • Duration: ≤ 5 hours (recommended: ≥ 5 minutes)

Workflow

1. Upload Audio File

Files \x3C 30MB use direct upload. Files ≥ 30MB use multipart upload (5MB chunks).

2. Create Transcription Task

Submit uploaded file URL with transcription parameters.

3. Poll for Results

Query task status periodically until completion.

Response Format

{
  "task_id": "1568100557463963551003",
  "task_status": "4",
  "text": "Transcribed text content...",
  "segments": [
    {
      "speaker": "spk-0",
      "begin": "0",
      "end": "470",
      "text": "听说。"
    }
  ]
}

Task Status

  • 1: Pending
  • 2: Processing
  • 3: Completed
  • 4: Callback completed
  • -1: Failed

Language Support

autodialect (language=zh_cn)

Automatic recognition of Chinese, English, and 202 Chinese dialects including:

  • Major: Mandarin, Cantonese, Taiwanese, Sichuanese, Shanghainese, Northeastern
  • Full list: 合肥话、芜湖话、皖北话、粤语、北京话、福州话、闽南语、潮汕话、客家话、贵阳话、海口话、石家庄话、太原话、郑州话、东北话、武汉话、长沙话、南京话、南昌话、大连话、呼和浩特话、银川话、西宁话、济南话、西安话、上海话、四川话、台湾话、天津话、乌鲁木齐话、云南话、杭州话、重庆话 (202 total)

Common Use Cases

  1. Meeting Transcription: Convert meeting recordings to text with speaker separation
  2. Interview Recording: Transcribe interviews for documentation
  3. Lecture Recording: Convert academic lectures to searchable text
  4. Voice Notes: Transform voice memos into text notes
  5. Call Center: Analyze customer service calls
  6. Legal Proceedings: Transcribe court hearings with domain optimization
  7. Medical Consultation: Doctor-patient conversation documentation

Error Handling

Error Code Description 友好提示
10107 自定音频编码字段错误 请检查 encoding 的传值是否规范~ (◎_◎)
10303 参数值传递不规范 请检查传参值是否有误哦~ (°∀°)ノ
10043 音频解码失败 请检查所传的音频是否与 encoding 字段描述的编码格式对应呢~
20304 静音音频、音频格式与传参不匹配 检查音频是否为16k、16bit单声道音频哦~ (。•́︿•̀。)

💡 遇到问题?

  • 📖 接口文档:https://console.xfyun.cn/services/ost
  • 💰 购买套餐:https://www.xfyun.cn/services/fast_lfasr?target=price

常见问题 FAQ

Q: 录音文件转写极速版的主要功能是什么? A: 快速地将长段音频(5小时以内)数据转换成文本数据呢~ (๑•̀ㅂ•́)و✧

Q: 录音文件转写极速版支持什么语言? A: 支持中文、英文 + 202种方言免切识别哦! ヽ(✿゚▽゚)ノ

Q: 录音文件转写极速版支持什么应用平台? A: 目前支持 WebAPI 应用平台啦~

Q: 为什么只支持 MP3 格式呀? A: 因为 MP3 格式兼容性好、文件小、传输快呢~ 使用 lame 编码就能轻松接入啦! (◕‿◕)

Tips

  1. For speaker separation: Use --vspp-on 1 for better speaker diarization
  2. For specific domains: Use --pd parameter for improved accuracy
  3. For faster processing: Audio files ≥ 5 minutes are prioritized
  4. For subtitle output: Use --enable-subtitle 1 for subtitle-formatted output
  5. For hot words: Use --dhw="word1,word2" to boost recognition accuracy
安全使用建议
This skill appears to be a legitimate iFlytek transcription client, but there are inconsistencies you should address before installing: 1) The SKILL.md and scripts require three environment secrets (XFEI_APP_ID, XFEI_API_KEY, XFEI_API_SECRET) but the registry metadata lists no required env vars — confirm you are comfortable providing those API credentials and that metadata is corrected. 2) Review scripts/transcribe.py yourself (or run it in an isolated environment) to confirm it only uploads the audio files you expect and does not read other files. Pay special attention to callback_url usage — avoid setting a callback to an endpoint you don't control because transcription results could be delivered there. 3) The .claude/settings.local.json contains author-local absolute paths and allowed Bash commands (py_compile, zip, read of a Desktop path) — this is likely leftover packaging metadata but inspect/ignore or remove it before deployment. 4) Only provide your iFlytek credentials to trusted code; consider creating a dedicated API key with limited scope/quota for testing. If you want higher assurance, ask the publisher to update registry metadata to declare required env vars and remove any author-local permission files, or run the script in a sandboxed container and monitor network calls to the xfyun endpoints.
功能分析
Type: OpenClaw Skill Name: ifly-speed-transcription Version: 1.0.0 The skill bundle provides a legitimate implementation for transcribing audio files using the iFLYTEK Speed Transcription API. The core logic in `scripts/transcribe.py` handles file uploads (including multipart for large files) and API communication using standard HMAC-SHA256 authentication, with no evidence of data exfiltration or unauthorized execution. While `.claude/settings.local.json` contains local development paths and specific bash permissions, these appear to be unintentional artifacts from the developer's environment rather than malicious components.
能力评估
Purpose & Capability
Functionality (audio upload, multipart upload, create/poll transcription tasks) matches the description of an iFLYTEK speed-transcription client. The code expects iFlytek credentials (app id, api key, api secret), which are appropriate for this purpose. However, the registry metadata lists no required environment variables/credentials even though SKILL.md and scripts clearly require XFEI_APP_ID / XFEI_API_KEY / XFEI_API_SECRET — this metadata omission is an inconsistency.
Instruction Scope
SKILL.md gives concrete runtime instructions (set env vars, run python script, upload/poll workflow). The instructions themselves are scoped to transcription and do not ask for unrelated host data. One oddity: the repository contains a .claude/settings.local.json with Read and Bash permissions pointing to a user-specific Desktop path and zip commands; that file is not required for normal use and appears to be author-local packaging metadata rather than necessary runtime instructions, but it could reveal an over-broad permission intent if honored by an agent runtime.
Install Mechanism
There is no install spec (instruction-only + a Python script). That lowers installation risk; dependencies are standard (requests, urllib3) listed in _meta.json. No remote archive downloads or unusual install sources are present in the provided files.
Credentials
The script and SKILL.md require three secrets (XFEI_APP_ID, XFEI_API_KEY, XFEI_API_SECRET) — these are proportionate to calling the iFlytek API. The concern is that the skill registry metadata does not declare these required env vars (it lists none). This mismatch can lead to accidental omission of required secrets or confusion about what the skill will access. Also the skill supports an optional callback_url parameter — if set to an attacker-controlled endpoint it could be used to exfiltrate transcription results; users should inspect and control any callback_url usage.
Persistence & Privilege
The skill is not always-enabled and uses normal autonomous invocation defaults — no elevated persistence requested. The only persistence/permission artifact is .claude/settings.local.json which enumerates local Bash and Read permissions (including reading an absolute Desktop path and running zip/py_compile). That file appears to be local packaging metadata and is not a necessary runtime privilege for the transcription task, but its presence is unusual and should be reviewed; it could indicate the author tested packaging with broad, user-specific filesystem access.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install ifly-speed-transcription
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /ifly-speed-transcription 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
- Initial release of ifly-speed-transcription skill. - Provides ultra-fast speech transcription via iFLYTEK Speed Transcription API (up to 5 hours audio in ~20 seconds per hour). - Supports Chinese, English, and 202+ Chinese dialects with automatic detection. - Allows domain-specific tuning, speaker separation, and subtitle output via CLI parameters. - Includes error handling and guides for setup, usage, and common troubleshooting.
元数据
Slug ifly-speed-transcription
版本 1.0.0
许可证 MIT-0
累计安装 1
当前安装数 1
历史版本数 1
常见问题

ifly-speed-transcription 是什么?

Ultra-fast speech transcription using iFLYTEK Speed Transcription API. Transcribe audio files (WAV/PCM/MP3) up to 5 hours in ~20 seconds per hour. Supports C... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 237 次。

如何安装 ifly-speed-transcription?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install ifly-speed-transcription」即可一键安装,无需额外配置。

ifly-speed-transcription 是免费的吗?

是的,ifly-speed-transcription 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

ifly-speed-transcription 支持哪些平台?

ifly-speed-transcription 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 ifly-speed-transcription?

由 Iflytek AIcloud(@qingzhe2020)开发并维护,当前版本 v1.0.0。

💬 留言讨论