← Back to Skills Marketplace
harven-droid

iFlytek ASR - 讯飞语音转文字

by harven-droid · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
432
Downloads
0
Stars
1
Active Installs
1
Versions
Install in OpenClaw
/install iflytek-asr
Description
使用科大讯飞 API 将音频/视频转换为文字。支持本地音频文件转录、YouTube 视频下载并转文字。适用于会议记录、视频字幕、语音笔记等场景。当用户需要语音转文字、音频转录、YouTube 视频转文字时触发。
README (SKILL.md)

讯飞语音转文字 (iFlytek ASR)

使用科大讯飞语音识别 API 将音频文件转换为文本,支持中文方言识别。

功能特性

  • ✅ 支持多种音频格式:mp3, wav, pcm, mp4, m4a, aac, ogg, flac, speex, opus, wma
  • ✅ 支持 YouTube 视频下载并转文本
  • ✅ 文件大小限制:≤500MB
  • ✅ 时长限制:≤5小时
  • ✅ 自动识别中文方言
  • ✅ 自动添加标点符号

前置要求

1. 获取讯飞 API 凭证

  1. 访问 科大讯飞开放平台
  2. 注册/登录账号
  3. 创建应用,选择「语音转写」服务
  4. 获取凭证:
    • XFYUN_APP_ID
    • XFYUN_ACCESS_KEY_ID
    • XFYUN_ACCESS_KEY_SECRET

2. 配置环境变量

在 skill 目录下创建 .env 文件:

XFYUN_APP_ID=your_app_id
XFYUN_ACCESS_KEY_ID=your_access_key_id
XFYUN_ACCESS_KEY_SECRET=your_access_key_secret

3. 安装依赖

pip3 install yt-dlp requests python-dotenv

使用方法

转录本地音频

python3 scripts/speech_to_text.py \x3C音频文件路径> [输出文本路径]

示例:

python3 scripts/speech_to_text.py meeting.mp3
python3 scripts/speech_to_text.py recording.wav output.txt

YouTube 视频转文字

python3 scripts/download_and_transcribe.py "YOUTUBE_URL" [保存目录]

示例:

python3 scripts/download_and_transcribe.py "https://www.youtube.com/watch?v=VIDEO_ID" ~/Downloads

仅下载 YouTube 音频

python3 scripts/download_audio.py "YOUTUBE_URL" [保存目录]

对比:讯飞 vs Whisper

特性 讯飞 ASR Whisper
成本 API 配额(有免费额度) 免费
离线 ❌ 需要网络 ✅ 本地运行
速度 ⭐⭐⭐⭐⭐ 快 ⭐⭐⭐ 较慢
中文准确率 ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐
标点符号 ✅ 自动添加 ❌ 无
方言支持 ✅ 支持 ⭐⭐ 一般

建议:

  • 重要会议/采访 → 讯飞(准确率高、有标点)
  • 日常语音消息 → Whisper(免费、无限制)

API 限制

讯飞免费版:

  • 每日调用次数:500 次
  • 单次文件大小:≤500MB
  • 单次时长:≤5小时

文件结构

iflytek-asr/
├── SKILL.md           # 本文档
├── README.md          # 详细说明
├── QUICKSTART.md      # 快速开始
├── .env.example       # 配置模板
├── requirements.txt   # Python 依赖
└── scripts/
    ├── speech_to_text.py           # 音频转文字
    ├── download_audio.py           # YouTube 下载
    └── download_and_transcribe.py  # 下载+转文字

常见问题

Q: 转录失败怎么办?

  • 检查 API 凭证是否正确
  • 确认文件格式支持
  • 检查网络连接

Q: 如何提高准确率?

  • 确保音频清晰
  • 选择正确的语言/方言
  • 避免背景噪音

许可证

MIT License

Usage Guidance
What to check before installing 1) Metadata mismatch: The registry entry says no credentials or required binaries, but the code and SKILL.md require three XFYUN credentials and the yt-dlp/ffprobe binaries. Treat the metadata omission as a red flag—confirm the author/source before proceeding. 2) Audio upload / privacy: The scripts upload entire audio files to an external API (BASE_URL=https://office-api-ist-dx.iflyaisol.com). This is expected for a cloud ASR, but you should only transcribe audio you are comfortable uploading. Verify the endpoint is an official iFlytek domain and check the vendor's privacy/retention policy if you have sensitive audio. 3) TLS bypass in downloads: The yt-dlp commands include --no-check-certificates. This disables certificate verification for downloads and can expose you to MITM attacks when downloading audio. Consider removing that flag or running yt-dlp without it. 4) Required local binaries: Ensure yt-dlp and ffprobe (from ffmpeg) are installed and come from trusted sources. The code falls back to a filesize-based duration estimate if ffprobe is missing, but accurate duration detection relies on ffprobe. 5) Secrets handling: The install scripts copy .env.example to .env; do not commit or share .env. Inspect the code to ensure no hardcoded keys are present (the package appears to follow that guideline). 6) Source trust: The skill owner and homepage are unknown. If you do not trust the source, run the scripts in an isolated VM/container, audit network endpoints, or prefer an offline transcription alternative (e.g., local Whisper) for sensitive content. If you decide to proceed: inspect/validate BASE_URL (DNS/ownership), remove/avoid --no-check-certificates, supply credentials only in a local .env, and run first with non-sensitive test audio.
Capability Analysis
Type: OpenClaw Skill Name: iflytek-asr Version: 1.0.0 The skill bundle provides legitimate functionality for transcribing audio and downloading YouTube videos using the iFlytek (iFLYTEK) API and yt-dlp. The implementation in scripts like speech_to_text.py and download_audio.py aligns with the stated purpose, using standard libraries (requests, subprocess) for API interaction and media processing. The package includes security-conscious features, such as a warning in package.sh to prevent the accidental inclusion of .env files containing API credentials, and lacks any indicators of malicious intent, data exfiltration, or prompt injection.
Capability Assessment
Purpose & Capability
Name and description describe a cloud ASR skill and the code implements that. However the registry metadata claims no required environment variables or primary credential, while the SKILL.md and scripts clearly require XFYUN_APP_ID, XFYUN_ACCESS_KEY_ID, and XFYUN_ACCESS_KEY_SECRET. The metadata also lists no required binaries, but the scripts call external binaries (yt-dlp and ffprobe). This mismatch between claimed metadata and actual requirements is incoherent and should be corrected/clarified.
Instruction Scope
Runtime instructions and scripts are focused on the stated purpose: downloading audio (yt-dlp) and uploading audio to an iFlytek API for transcription. That said, the scripts will upload entire audio files (potentially sensitive) to an external endpoint (BASE_URL = https://office-api-ist-dx.iflyaisol.com) — which is expected for cloud ASR but important to surface to users. Also the download scripts pass --no-check-certificates to yt-dlp, which weakens TLS validation for downloads and is a security concern.
Install Mechanism
There is no formal registry install spec, but the repo includes install.sh that runs pip3 install -r requirements.txt. The dependencies (yt-dlp, requests, python-dotenv) are reasonable for the task. No high-risk remote binary downloads or obfuscated installers are present. The presence of packaging and install scripts in the bundle is expected, but the registry metadata not reflecting required env vars/binaries is inconsistent.
Credentials
The code needs three sensitive environment values (XFYUN_APP_ID, XFYUN_ACCESS_KEY_ID, XFYUN_ACCESS_KEY_SECRET) to operate, yet the registry metadata declares none and primary credential is unset. Requesting these secrets is proportionate to a cloud ASR service, but the omission from metadata is a transparency problem. No other unrelated secrets are requested.
Persistence & Privilege
The skill does not request always:true, does not modify other skills, and its install script only installs Python packages and creates a .env from a template. It does write downloaded audio and transcript files to disk (as expected) but does not request elevated system privileges.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install iflytek-asr
  3. After installation, invoke the skill by name or use /iflytek-asr
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release of iflytek-asr: - Supports converting audio/video to text using iFlytek ASR API with Chinese dialect recognition. - Accepts multiple audio formats (mp3, wav, pcm, mp4, m4a, aac, ogg, flac, speex, opus, wma), files up to 500MB or 5 hours. - Includes scripts for local audio transcription and YouTube download + transcription. - Automatic punctuation and dialect detection. - Setup instructions for environment variables and dependencies. - Comparison section with Whisper. - MIT license.
Metadata
Slug iflytek-asr
Version 1.0.0
License MIT-0
All-time Installs 2
Active Installs 1
Total Versions 1
Frequently Asked Questions

What is iFlytek ASR - 讯飞语音转文字?

使用科大讯飞 API 将音频/视频转换为文字。支持本地音频文件转录、YouTube 视频下载并转文字。适用于会议记录、视频字幕、语音笔记等场景。当用户需要语音转文字、音频转录、YouTube 视频转文字时触发。 It is an AI Agent Skill for Claude Code / OpenClaw, with 432 downloads so far.

How do I install iFlytek ASR - 讯飞语音转文字?

Run "/install iflytek-asr" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is iFlytek ASR - 讯飞语音转文字 free?

Yes, iFlytek ASR - 讯飞语音转文字 is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does iFlytek ASR - 讯飞语音转文字 support?

iFlytek ASR - 讯飞语音转文字 is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created iFlytek ASR - 讯飞语音转文字?

It is built and maintained by harven-droid (@harven-droid); the current version is v1.0.0.

💬 Comments