功能描述

Ultra-fast speech transcription using iFLYTEK Speed Transcription API. Transcribe audio files (WAV/PCM/MP3) up to 5 hours in ~20 seconds per hour. Supports C...

使用说明 (SKILL.md)

iFly Speed Transcription

Name: ifly-speed-transcription
Author: qingzhe2020

Ultra-fast speech transcription service that converts audio files to text in record time - 1 hour of audio transcribes in ~20 seconds.

Quick Start

# Basic transcription (auto-detect language and dialect)
python3 scripts/transcribe.py /path/to/audio.mp3

# Save to file
python3 scripts/transcribe.py /path/to/audio.wav --output result.txt

# With domain-specific optimization
python3 scripts/transcribe.py /path/to/audio.mp3 --pd medical

# With speaker separation
python3 scripts/transcribe.py /path/to/meeting.mp3 --vspp-on 1 --speaker-num 2

Setup

1. API Credentials

Get credentials from iFlytek Open Platform:

APP_ID: Application ID
API_KEY: API key for authentication
API_SECRET: API secret for signing requests

2. Environment Variables

export XFEI_APP_ID="your_app_id"
export XFEI_API_KEY="your_api_key"
export XFEI_API_SECRET="your_api_secret"

API Parameters

Required Parameters

Parameter	Description
`file_path`	Path to audio file (MP3, 16kHz, 16-bit, mono)
`--language`	Language code (default: `zh_cn` for Chinese+English+202 dialects)
`--accent`	Accent (default: `mandarin`)

Optional Parameters

Parameter	Type	Description
`--pd`	string	Domain: court, finance, medical, tech, sport, edu, gov, game, ecom, car
`--vspp-on`	int	Speaker separation: 0=off, 1=on
`--speaker-num`	int	Number of speakers (0=auto, range 1-10)
`--output-type`	int	Output: 0=1best, 1=cnlbest, 2=multi-candidate
`--postproc-on`	int	Post-processing: 0=off, 1=on (default)
`--enable-subtitle`	int	Subtitle mode: 0=document, 1=subtitle
`--smoothproc`	bool	Smoothing: true=on, false=off (default: true)
`--colloqproc`	bool	Colloquial processing: true=on, false=off
`--language-type`	int	Language mode: 1=auto, 2=Chinese, 3=English, 4=Chinese-only
`--dhw`	string	Hot words (comma-separated, UTF-8)

Audio Requirements

Format: MP3
Sample rate: 16kHz
Bit depth: 16-bit
Channels: Mono (single channel)
Size: ≤ 500MB
Duration: ≤ 5 hours (recommended: ≥ 5 minutes)

Workflow

1. Upload Audio File

Files \x3C 30MB use direct upload. Files ≥ 30MB use multipart upload (5MB chunks).

2. Create Transcription Task

Submit uploaded file URL with transcription parameters.

3. Poll for Results

Query task status periodically until completion.

Response Format

{
  "task_id": "1568100557463963551003",
  "task_status": "4",
  "text": "Transcribed text content...",
  "segments": [
    {
      "speaker": "spk-0",
      "begin": "0",
      "end": "470",
      "text": "听说。"
    }
  ]
}

Task Status

1: Pending
2: Processing
3: Completed
4: Callback completed
-1: Failed

Language Support

autodialect (language=zh_cn)

Automatic recognition of Chinese, English, and 202 Chinese dialects including:

Major: Mandarin, Cantonese, Taiwanese, Sichuanese, Shanghainese, Northeastern
Full list: 合肥话、芜湖话、皖北话、粤语、北京话、福州话、闽南语、潮汕话、客家话、贵阳话、海口话、石家庄话、太原话、郑州话、东北话、武汉话、长沙话、南京话、南昌话、大连话、呼和浩特话、银川话、西宁话、济南话、西安话、上海话、四川话、台湾话、天津话、乌鲁木齐话、云南话、杭州话、重庆话 (202 total)

Common Use Cases

Meeting Transcription: Convert meeting recordings to text with speaker separation
Interview Recording: Transcribe interviews for documentation
Lecture Recording: Convert academic lectures to searchable text
Voice Notes: Transform voice memos into text notes
Call Center: Analyze customer service calls
Legal Proceedings: Transcribe court hearings with domain optimization
Medical Consultation: Doctor-patient conversation documentation

Error Handling

Error Code	Description	友好提示
10107	自定音频编码字段错误	请检查 encoding 的传值是否规范～ (◎_◎)
10303	参数值传递不规范	请检查传参值是否有误哦～ (°∀°)ﾉ
10043	音频解码失败	请检查所传的音频是否与 encoding 字段描述的编码格式对应呢～
20304	静音音频、音频格式与传参不匹配	检查音频是否为16k、16bit单声道音频哦～ (｡•́︿•̀｡)

💡 遇到问题？

📖 接口文档：https://console.xfyun.cn/services/ost
💰 购买套餐：https://www.xfyun.cn/services/fast_lfasr?target=price

常见问题 FAQ

Q: 录音文件转写极速版的主要功能是什么？ A: 快速地将长段音频（5小时以内）数据转换成文本数据呢～ (๑•̀ㅂ•́)و✧

Q: 录音文件转写极速版支持什么语言？ A: 支持中文、英文 + 202种方言免切识别哦！ヽ(✿ﾟ▽ﾟ)ノ

Q: 录音文件转写极速版支持什么应用平台？ A: 目前支持 WebAPI 应用平台啦～

Q: 为什么只支持 MP3 格式呀？ A: 因为 MP3 格式兼容性好、文件小、传输快呢～使用 lame 编码就能轻松接入啦！ (◕‿◕)

Tips

For speaker separation: Use --vspp-on 1 for better speaker diarization
For specific domains: Use --pd parameter for improved accuracy
For faster processing: Audio files ≥ 5 minutes are prioritized
For subtitle output: Use --enable-subtitle 1 for subtitle-formatted output
For hot words: Use --dhw="word1,word2" to boost recognition accuracy

安全使用建议

This skill appears to be a legitimate iFlytek transcription client, but there are inconsistencies you should address before installing: 1) The SKILL.md and scripts require three environment secrets (XFEI_APP_ID, XFEI_API_KEY, XFEI_API_SECRET) but the registry metadata lists no required env vars — confirm you are comfortable providing those API credentials and that metadata is corrected. 2) Review scripts/transcribe.py yourself (or run it in an isolated environment) to confirm it only uploads the audio files you expect and does not read other files. Pay special attention to callback_url usage — avoid setting a callback to an endpoint you don't control because transcription results could be delivered there. 3) The .claude/settings.local.json contains author-local absolute paths and allowed Bash commands (py_compile, zip, read of a Desktop path) — this is likely leftover packaging metadata but inspect/ignore or remove it before deployment. 4) Only provide your iFlytek credentials to trusted code; consider creating a dedicated API key with limited scope/quota for testing. If you want higher assurance, ask the publisher to update registry metadata to declare required env vars and remove any author-local permission files, or run the script in a sandboxed container and monitor network calls to the xfyun endpoints.

功能分析

Type: OpenClaw Skill Name: ifly-speed-transcription Version: 1.0.0 The skill bundle provides a legitimate implementation for transcribing audio files using the iFLYTEK Speed Transcription API. The core logic in `scripts/transcribe.py` handles file uploads (including multipart for large files) and API communication using standard HMAC-SHA256 authentication, with no evidence of data exfiltration or unauthorized execution. While `.claude/settings.local.json` contains local development paths and specific bash permissions, these appear to be unintentional artifacts from the developer's environment rather than malicious components.

能力评估

ℹ Purpose & Capability

Functionality (audio upload, multipart upload, create/poll transcription tasks) matches the description of an iFLYTEK speed-transcription client. The code expects iFlytek credentials (app id, api key, api secret), which are appropriate for this purpose. However, the registry metadata lists no required environment variables/credentials even though SKILL.md and scripts clearly require XFEI_APP_ID / XFEI_API_KEY / XFEI_API_SECRET — this metadata omission is an inconsistency.

ℹ Instruction Scope

SKILL.md gives concrete runtime instructions (set env vars, run python script, upload/poll workflow). The instructions themselves are scoped to transcription and do not ask for unrelated host data. One oddity: the repository contains a .claude/settings.local.json with Read and Bash permissions pointing to a user-specific Desktop path and zip commands; that file is not required for normal use and appears to be author-local packaging metadata rather than necessary runtime instructions, but it could reveal an over-broad permission intent if honored by an agent runtime.

✓ Install Mechanism

There is no install spec (instruction-only + a Python script). That lowers installation risk; dependencies are standard (requests, urllib3) listed in _meta.json. No remote archive downloads or unusual install sources are present in the provided files.

⚠ Credentials

The script and SKILL.md require three secrets (XFEI_APP_ID, XFEI_API_KEY, XFEI_API_SECRET) — these are proportionate to calling the iFlytek API. The concern is that the skill registry metadata does not declare these required env vars (it lists none). This mismatch can lead to accidental omission of required secrets or confusion about what the skill will access. Also the skill supports an optional callback_url parameter — if set to an attacker-controlled endpoint it could be used to exfiltrate transcription results; users should inspect and control any callback_url usage.

ℹ Persistence & Privilege

The skill is not always-enabled and uses normal autonomous invocation defaults — no elevated persistence requested. The only persistence/permission artifact is .claude/settings.local.json which enumerates local Bash and Read permissions (including reading an absolute Desktop path and running zip/py_compile). That file appears to be local packaging metadata and is not a necessary runtime privilege for the transcription task, but its presence is unusual and should be reviewed; it could indicate the author tested packaging with broad, user-specific filesystem access.

版本历史

v1.0.0

- Initial release of ifly-speed-transcription skill. - Provides ultra-fast speech transcription via iFLYTEK Speed Transcription API (up to 5 hours audio in ~20 seconds per hour). - Supports Chinese, English, and 202+ Chinese dialects with automatic detection. - Allows domain-specific tuning, speaker separation, and subtitle output via CLI parameters. - Includes error handling and guides for setup, usage, and common troubleshooting.

元数据

Slug ifly-speed-transcription

版本 1.0.0

许可证 MIT-0

累计安装 1

当前安装数 1

历史版本数 1

常见问题