← 返回 Skills 市场
harven-droid

iFlytek ASR - 讯飞语音转文字

作者 harven-droid · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
432
总下载
0
收藏
1
当前安装
1
版本数
在 OpenClaw 中安装
/install iflytek-asr
功能描述
使用科大讯飞 API 将音频/视频转换为文字。支持本地音频文件转录、YouTube 视频下载并转文字。适用于会议记录、视频字幕、语音笔记等场景。当用户需要语音转文字、音频转录、YouTube 视频转文字时触发。
使用说明 (SKILL.md)

讯飞语音转文字 (iFlytek ASR)

使用科大讯飞语音识别 API 将音频文件转换为文本,支持中文方言识别。

功能特性

  • ✅ 支持多种音频格式:mp3, wav, pcm, mp4, m4a, aac, ogg, flac, speex, opus, wma
  • ✅ 支持 YouTube 视频下载并转文本
  • ✅ 文件大小限制:≤500MB
  • ✅ 时长限制:≤5小时
  • ✅ 自动识别中文方言
  • ✅ 自动添加标点符号

前置要求

1. 获取讯飞 API 凭证

  1. 访问 科大讯飞开放平台
  2. 注册/登录账号
  3. 创建应用,选择「语音转写」服务
  4. 获取凭证:
    • XFYUN_APP_ID
    • XFYUN_ACCESS_KEY_ID
    • XFYUN_ACCESS_KEY_SECRET

2. 配置环境变量

在 skill 目录下创建 .env 文件:

XFYUN_APP_ID=your_app_id
XFYUN_ACCESS_KEY_ID=your_access_key_id
XFYUN_ACCESS_KEY_SECRET=your_access_key_secret

3. 安装依赖

pip3 install yt-dlp requests python-dotenv

使用方法

转录本地音频

python3 scripts/speech_to_text.py \x3C音频文件路径> [输出文本路径]

示例:

python3 scripts/speech_to_text.py meeting.mp3
python3 scripts/speech_to_text.py recording.wav output.txt

YouTube 视频转文字

python3 scripts/download_and_transcribe.py "YOUTUBE_URL" [保存目录]

示例:

python3 scripts/download_and_transcribe.py "https://www.youtube.com/watch?v=VIDEO_ID" ~/Downloads

仅下载 YouTube 音频

python3 scripts/download_audio.py "YOUTUBE_URL" [保存目录]

对比:讯飞 vs Whisper

特性 讯飞 ASR Whisper
成本 API 配额(有免费额度) 免费
离线 ❌ 需要网络 ✅ 本地运行
速度 ⭐⭐⭐⭐⭐ 快 ⭐⭐⭐ 较慢
中文准确率 ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐
标点符号 ✅ 自动添加 ❌ 无
方言支持 ✅ 支持 ⭐⭐ 一般

建议:

  • 重要会议/采访 → 讯飞(准确率高、有标点)
  • 日常语音消息 → Whisper(免费、无限制)

API 限制

讯飞免费版:

  • 每日调用次数:500 次
  • 单次文件大小:≤500MB
  • 单次时长:≤5小时

文件结构

iflytek-asr/
├── SKILL.md           # 本文档
├── README.md          # 详细说明
├── QUICKSTART.md      # 快速开始
├── .env.example       # 配置模板
├── requirements.txt   # Python 依赖
└── scripts/
    ├── speech_to_text.py           # 音频转文字
    ├── download_audio.py           # YouTube 下载
    └── download_and_transcribe.py  # 下载+转文字

常见问题

Q: 转录失败怎么办?

  • 检查 API 凭证是否正确
  • 确认文件格式支持
  • 检查网络连接

Q: 如何提高准确率?

  • 确保音频清晰
  • 选择正确的语言/方言
  • 避免背景噪音

许可证

MIT License

安全使用建议
What to check before installing 1) Metadata mismatch: The registry entry says no credentials or required binaries, but the code and SKILL.md require three XFYUN credentials and the yt-dlp/ffprobe binaries. Treat the metadata omission as a red flag—confirm the author/source before proceeding. 2) Audio upload / privacy: The scripts upload entire audio files to an external API (BASE_URL=https://office-api-ist-dx.iflyaisol.com). This is expected for a cloud ASR, but you should only transcribe audio you are comfortable uploading. Verify the endpoint is an official iFlytek domain and check the vendor's privacy/retention policy if you have sensitive audio. 3) TLS bypass in downloads: The yt-dlp commands include --no-check-certificates. This disables certificate verification for downloads and can expose you to MITM attacks when downloading audio. Consider removing that flag or running yt-dlp without it. 4) Required local binaries: Ensure yt-dlp and ffprobe (from ffmpeg) are installed and come from trusted sources. The code falls back to a filesize-based duration estimate if ffprobe is missing, but accurate duration detection relies on ffprobe. 5) Secrets handling: The install scripts copy .env.example to .env; do not commit or share .env. Inspect the code to ensure no hardcoded keys are present (the package appears to follow that guideline). 6) Source trust: The skill owner and homepage are unknown. If you do not trust the source, run the scripts in an isolated VM/container, audit network endpoints, or prefer an offline transcription alternative (e.g., local Whisper) for sensitive content. If you decide to proceed: inspect/validate BASE_URL (DNS/ownership), remove/avoid --no-check-certificates, supply credentials only in a local .env, and run first with non-sensitive test audio.
功能分析
Type: OpenClaw Skill Name: iflytek-asr Version: 1.0.0 The skill bundle provides legitimate functionality for transcribing audio and downloading YouTube videos using the iFlytek (iFLYTEK) API and yt-dlp. The implementation in scripts like speech_to_text.py and download_audio.py aligns with the stated purpose, using standard libraries (requests, subprocess) for API interaction and media processing. The package includes security-conscious features, such as a warning in package.sh to prevent the accidental inclusion of .env files containing API credentials, and lacks any indicators of malicious intent, data exfiltration, or prompt injection.
能力评估
Purpose & Capability
Name and description describe a cloud ASR skill and the code implements that. However the registry metadata claims no required environment variables or primary credential, while the SKILL.md and scripts clearly require XFYUN_APP_ID, XFYUN_ACCESS_KEY_ID, and XFYUN_ACCESS_KEY_SECRET. The metadata also lists no required binaries, but the scripts call external binaries (yt-dlp and ffprobe). This mismatch between claimed metadata and actual requirements is incoherent and should be corrected/clarified.
Instruction Scope
Runtime instructions and scripts are focused on the stated purpose: downloading audio (yt-dlp) and uploading audio to an iFlytek API for transcription. That said, the scripts will upload entire audio files (potentially sensitive) to an external endpoint (BASE_URL = https://office-api-ist-dx.iflyaisol.com) — which is expected for cloud ASR but important to surface to users. Also the download scripts pass --no-check-certificates to yt-dlp, which weakens TLS validation for downloads and is a security concern.
Install Mechanism
There is no formal registry install spec, but the repo includes install.sh that runs pip3 install -r requirements.txt. The dependencies (yt-dlp, requests, python-dotenv) are reasonable for the task. No high-risk remote binary downloads or obfuscated installers are present. The presence of packaging and install scripts in the bundle is expected, but the registry metadata not reflecting required env vars/binaries is inconsistent.
Credentials
The code needs three sensitive environment values (XFYUN_APP_ID, XFYUN_ACCESS_KEY_ID, XFYUN_ACCESS_KEY_SECRET) to operate, yet the registry metadata declares none and primary credential is unset. Requesting these secrets is proportionate to a cloud ASR service, but the omission from metadata is a transparency problem. No other unrelated secrets are requested.
Persistence & Privilege
The skill does not request always:true, does not modify other skills, and its install script only installs Python packages and creates a .env from a template. It does write downloaded audio and transcript files to disk (as expected) but does not request elevated system privileges.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install iflytek-asr
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /iflytek-asr 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release of iflytek-asr: - Supports converting audio/video to text using iFlytek ASR API with Chinese dialect recognition. - Accepts multiple audio formats (mp3, wav, pcm, mp4, m4a, aac, ogg, flac, speex, opus, wma), files up to 500MB or 5 hours. - Includes scripts for local audio transcription and YouTube download + transcription. - Automatic punctuation and dialect detection. - Setup instructions for environment variables and dependencies. - Comparison section with Whisper. - MIT license.
元数据
Slug iflytek-asr
版本 1.0.0
许可证 MIT-0
累计安装 2
当前安装数 1
历史版本数 1
常见问题

iFlytek ASR - 讯飞语音转文字 是什么?

使用科大讯飞 API 将音频/视频转换为文字。支持本地音频文件转录、YouTube 视频下载并转文字。适用于会议记录、视频字幕、语音笔记等场景。当用户需要语音转文字、音频转录、YouTube 视频转文字时触发。 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 432 次。

如何安装 iFlytek ASR - 讯飞语音转文字?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install iflytek-asr」即可一键安装,无需额外配置。

iFlytek ASR - 讯飞语音转文字 是免费的吗?

是的,iFlytek ASR - 讯飞语音转文字 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

iFlytek ASR - 讯飞语音转文字 支持哪些平台?

iFlytek ASR - 讯飞语音转文字 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 iFlytek ASR - 讯飞语音转文字?

由 harven-droid(@harven-droid)开发并维护,当前版本 v1.0.0。

💬 留言讨论