Description

使用 faster-whisper 的本地语音转文字工具。支持 GPU 加速的高性能转录，包含词级时间戳和蒸馏模型。当用户要求"转录音频"、"语音转文字"或"whisper"时使用此技能。

README (SKILL.md)

Faster-Whisper 中文版

Name: faster-whisper 中文版 - 高性能本地语音转文字工具
Author: mapleshadow

基于 faster-whisper 的高性能本地语音转文字工具。

安装设置

1. 运行安装脚本

执行安装脚本以创建虚拟环境并安装依赖包。脚本会自动检测 NVIDIA GPU 以启用 CUDA 加速。

./setup.sh

系统要求：

Python 3.10 或更高版本
ffmpeg（系统已安装）

使用方法

使用转换脚本转换音频文件。

适用场景

会议录音转文字纪要
语音笔记转文字记录
音频文件内容提取
访谈录音整理
培训录音转文字材料
视频字幕生成
播客内容转录
语音转文字
音频转文字

基本转录

export HF_HOME=/config/huggingface
export HF_ENDPOINT=https://hf-mirror.com
.venv/bin/python3 scripts/transcribe.py

高级选项

指定模型: .venv/bin/python3 scripts/transcribe.py audio.mp3 --model large-v3-turbo
词级时间戳: .venv/bin/python3 scripts/transcribe.py audio.mp3 --word-timestamps
JSON 输出: .venv/bin/python3 scripts/transcribe.py audio.mp3 --json
语音活动检测（静音去除）: .venv/bin/python3 scripts/transcribe.py audio.mp3 --vad
指定语言: .venv/bin/python3 scripts/transcribe.py audio.mp3 --language zh
GPU 加速: .venv/bin/python3 scripts/transcribe.py audio.mp3 --device cuda
CPU 优化: .venv/bin/python3 scripts/transcribe.py audio.mp3 --device cpu --compute-type int8

完整命令示例

# 中文转录，使用 GPU 加速
.venv/bin/python3 scripts/transcribe.py 会议录音.mp3 --language zh --device cuda --compute-type float16

# 英文转录，包含词级时间戳
.venv/bin/python3 scripts/transcribe.py interview.wav --language en --word-timestamps --json

# 快速 CPU 转录，优化性能
.venv/bin/python3 scripts/transcribe.py audio.m4a --device cpu --compute-type int8 --model distil-large-v3

# 批量处理脚本
.venv/bin/python3 scripts/batch_transcribe.sh /path/to/audio/files/

可用模型

large-v3-turbo (默认):推荐用于多语言或最高准确度任务
large-v3: 原始大模型，准确度最高
distil-large-v3: 速度和准确性的最佳平衡
medium: 中等大小，平衡性能
small: 小型模型，速度快
base: 基础模型，资源需求最低
tiny: 微型模型，速度最快
medium.en, small.en: 仅支持英语的更快版本

模型选择指南

模型	大小	推荐用途	硬件要求
`large-v3-turbo`	1.5GB	专业级转录	高性能 GPU
`medium`	1.5GB	平衡性能	普通配置
`distil-large-v3`	756MB	通用中文转录	中等配置
`small`	500MB	快速转录	低配置
`tiny`	150MB	实时转录	最低配置

性能优化

GPU 加速配置

# NVIDIA GPU (CUDA)
.venv/bin/python3 scripts/transcribe.py audio.mp3 --device cuda --compute-type float16

# Apple Silicon (macOS)
.venv/bin/python3 scripts/transcribe.py audio.mp3 --device mps

CPU 优化配置

# 高性能 CPU
.venv/bin/python3 scripts/transcribe.py audio.mp3 --device cpu --compute-type int8 --beam-size 3

# 低资源环境
.venv/bin/python3 scripts/transcribe.py audio.mp3 --device cpu --compute-type int8 --model small --beam-size 1

故障排除

常见问题

未检测到 GPU: 确保 NVIDIA 驱动和 CUDA 正确安装。CPU 转录速度会显著变慢。
内存不足错误: 使用更小的模型（如 small 或 base）或使用 --compute-type int8
模型下载失败: 设置环境变量 HF_ENDPOINT=https://hf-mirror.com 使用国内镜像
音频格式不支持: 使用 ffmpeg 转换音频格式：ffmpeg -i input.m4a output.wav

错误解决方案

CUDA 不可用

# 检查 CUDA 安装
nvidia-smi

# 如果未安装，重新运行安装脚本
./setup.sh

ffmpeg 未找到

# Ubuntu/Debian
sudo apt install ffmpeg

# macOS
brew install ffmpeg

# CentOS/RHEL
sudo yum install ffmpeg

Python 版本过低

# 检查 Python 版本
python3 --version

# 需要 Python 3.10+

环境变量配置

# 设置 HuggingFace 缓存目录（避免重复下载）
export HF_HOME=/config/huggingface

# 使用国内镜像加速下载
export HF_ENDPOINT=https://hf-mirror.com

# 设置 PyTorch CUDA 版本（如有需要）
export CUDA_VISIBLE_DEVICES=0

批量处理

创建 batch_transcribe.sh 脚本进行批量处理：

#!/bin/bash
# 批量转录脚本
for audio_file in *.mp3 *.wav *.m4a; do
    if [ -f "$audio_file" ]; then
        echo "处理: $audio_file"
        ./scripts/transcribe.py "$audio_file" --output "${audio_file%.*}.txt"
    fi
done

输出格式

纯文本输出

[00:00:00.000 --> 00:00:05.000] 欢迎使用 faster-whisper 语音转文字工具。
[00:00:05.000 --> 00:00:10.000] 这是一个高性能的本地转录解决方案。

JSON 输出

{
  "text": "完整的转录文本...",
  "segments": [
    {
      "start": 0.0,
      "end": 5.0,
      "text": "欢迎使用 faster-whisper 语音转文字工具。",
      "words": [
        {"word": "欢迎", "start": 0.0, "end": 0.5},
        {"word": "使用", "start": 0.5, "end": 1.0}
      ]
    }
  ]
}

更新日志

v1.0.5 (2026-04-16)

重新调整命令
原需激活虚拟环境更改为直接执行虚拟环境的python3

技术支持

如有问题，请：

查看本文档的故障排除部分
检查系统要求是否满足
确保网络连接正常（模型下载需要网络）
查看脚本错误信息进行调试

提示: 首次运行会下载所选模型（large-v3-turbo 约 1.5GB）。请确保有足够的磁盘空间和稳定的网络连接。

Usage Guidance

This skill is consistent with a local faster-whisper transcription tool, but exercise caution before running its install or batch scripts. Key points: - The main risk is HF_ENDPOINT=https://hf-mirror.com: the skill's scripts and examples set this third-party mirror as the model/package endpoint. That will make your machine download model files from this unknown host. Verify the mirror's trustworthiness before using it. Prefer the official Hugging Face endpoints (or your own trusted mirror), or remove/override HF_ENDPOINT. - Review setup.sh and transcribe.py before running. They install packages with pip and may download large binaries (models, PyTorch). Run in an isolated environment (VM or container) and ensure you have sufficient disk space. - The scripts hard-code HF_HOME=/config/huggingface in examples and batch_transcribe.sh; change this to a safe local path if needed to avoid writing to unexpected locations. - If you proceed: run ./setup.sh only after inspecting it, run pip installs inside the created .venv, and monitor network traffic (or pre-download models from known sources). If you cannot verify hf-mirror.com, remove those exports or replace them with official endpoints (e.g., unset HF_ENDPOINT so downloads use defaults). - Because the source and homepage are unknown, prefer running this tool in an isolated environment until you confirm the mirror and packages are trustworthy.

Capability Analysis

Type: OpenClaw Skill Name: faster-whisper-zh Version: 1.0.5 The skill bundle contains a shell injection vulnerability in 'scripts/batch_transcribe.sh' due to the use of 'eval' on a command string constructed with unvalidated file paths. While the tool appears to be a legitimate implementation of 'faster-whisper' for Chinese audio transcription, this flaw could allow arbitrary command execution if the agent processes files with maliciously crafted names. The installation script 'setup.sh' and the main Python script 'scripts/transcribe.py' follow standard practices, including the use of official PyTorch repositories and common HuggingFace mirrors (hf-mirror.com).

Capability Assessment

✓ Purpose & Capability

Name/description, required binaries (ffmpeg, python3), included scripts (transcribe.py, batch_transcribe.sh, setup.sh) and requirements (faster-whisper, torch) line up with a local transcription tool using faster-whisper; nothing required appears unrelated to the stated purpose.

ℹ Instruction Scope

Runtime instructions are focused on installing and running a local transcription pipeline. They instruct running setup.sh, creating a venv, and running the provided Python scripts. The scripts set HF_HOME and HF_ENDPOINT environment variables and perform network downloads to fetch models/packages. They do not read unrelated system credentials or arbitrary files, but they do set HF_ENDPOINT to an external mirror in examples and the batch script.

⚠ Install Mechanism

There is no registry install spec but setup.sh creates a venv and installs packages via pip (requirements.txt and conditional torch installs via the official PyTorch wheel index). However, examples and scripts export HF_ENDPOINT=https://hf-mirror.com and batch_transcribe.sh unconditionally exports that mirror — directing model downloads to a third-party endpoint is high-risk if the mirror is untrusted. Overall install approach (pip in venv) is expected, but the mirror usage is the main concern.

⚠ Credentials

The skill declares no required credentials and does not request secrets. However, it repeatedly sets HF_HOME (/config/huggingface) and HF_ENDPOINT (https://hf-mirror.com) in docs and scripts. HF_HOME is benign (a cache path) but hard-coded /config/huggingface may be outside a user's expectations. HF_ENDPOINT pointing to an unknown third-party mirror is disproportionate and could redirect model downloads to an attacker-controlled host.

✓ Persistence & Privilege

always is false and the skill does not request persistent platform-wide privileges or modify other skills. It only creates a virtualenv and makes local scripts executable — standard behavior for a packaged CLI tool.

Version History

v1.0.5

- 命令行用法已更新，推荐直接调用虚拟环境下的 python3，无需手动激活虚拟环境。 - 文档示例命令统一采用 `.venv/bin/python3` 方式简化使用流程。 - 增加了“语音转文字”“音频转文字”等推荐用途场景表述。 - 默认模型和命令参数顺序做了细微调整，更清晰明了。

v1.0.4

Version 1.0.3 - 文档中的所有命令增加了环境变量设置和虚拟环境的激活及关闭，便于复制粘贴和容器环境使用。 - 强化了所有示例命令的可复用性和完整性。 - 其他内容未作功能变更。

v1.0.3

Version 1.0.3 - 文档中的所有命令增加了环境变量设置和虚拟环境的激活及关闭，便于复制粘贴和容器环境使用。 - 强化了所有示例命令的可复用性和完整性。 - 其他内容未作功能变更。

v1.0.2

No functional or code changes in this release. Documentation formatting and text were revised for clarity and consistency.

v1.0.1

faster-whisper-zh v1.0.1 - 首发：本地高性能中文语音转文字工具，支持 GPU 加速和词级时间戳 - 支持多种语音模型选择与性能优化参数 - 完善的中文说明文档，涵盖安装、命令行用法与常见问题排查 - 内置批量转录脚本、支持文本与 JSON 多种输出格式 - 优化适配国内用户：模型下载加速与本地环境变量配置 - 自定义大模型目录，方便容器环境管理维护

Metadata

Slug faster-whisper-zh

Version 1.0.5

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 5

Frequently Asked Questions

What is faster-whisper 中文版 - 高性能本地语音转文字工具?

使用 faster-whisper 的本地语音转文字工具。支持 GPU 加速的高性能转录，包含词级时间戳和蒸馏模型。当用户要求"转录音频"、"语音转文字"或"whisper"时使用此技能。 It is an AI Agent Skill for Claude Code / OpenClaw, with 120 downloads so far.

How do I install faster-whisper 中文版 - 高性能本地语音转文字工具?

Run "/install faster-whisper-zh" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is faster-whisper 中文版 - 高性能本地语音转文字工具 free?

Yes, faster-whisper 中文版 - 高性能本地语音转文字工具 is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does faster-whisper 中文版 - 高性能本地语音转文字工具 support?

faster-whisper 中文版 - 高性能本地语音转文字工具 is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created faster-whisper 中文版 - 高性能本地语音转文字工具?

It is built and maintained by mapleshadow (@mapleshadow); the current version is v1.0.5.

More Skills

faster-whisper 中文版 - 高性能本地语音转文字工具