← 返回 Skills 市场
yzwu2017

fun-voice-type

作者 Yuzhong WU · GitHub ↗ · v2.0.0 · MIT-0
cross-platform ⚠ suspicious
150
总下载
0
收藏
1
当前安装
4
版本数
在 OpenClaw 中安装
/install fun-voice-type
功能描述
一个语音输入法插件。它基于阿里云FunASR实时语音识别技术,允许用户通过长按快捷键(Right Option键)直接将语音转换为文字并“打”在当前光标所在的任何输入框中。此外,还能将语音翻译为多种语言(例:中英日韩)。
使用说明 (SKILL.md)

激活条件

触发场景 说明
请求语音转译 "实时录音转写"、"语音转文字"、"实时语音翻译"
功能咨询 "怎么用语音打字?"
效率需求 "我不方便打字"、"帮我记录这段话"

核心功能

  • 长按即说:将鼠标光标点击到任何你想输入文字的地方,长按 Right Option \x3Ckbd>⌥\x3C/kbd> 开始录音,松开自动完成。
  • 全场景兼容:无缝支持浏览器、文档编辑器、IM 聊天软件等任何 macOS 标准输入控件。
  • 多语种兼容:支持多语种输入,以及翻译功能(点击fun-voice-type图标选择目标语种)。

环境依赖

1. 系统库依赖

由于使用了 pyaudio,你需要先在系统中安装portaudio以及python依赖:

brew install portaudio
pip install dashscope pynput pyaudio pystray

2. 设置 DashScope API Key

为了安全起见,建议将API Key设置为环境变量:

export DASHSCOPE_API_KEY='你的API_KEY'

如果还没有API Key,建议访问 阿里云 DashScope 控制台,申请并获取API Key。

安装与运行

运行脚本,fun-voice-type将显示为Mac菜单栏右上角的小图标:

nohup python fun-voice-type.py > /dev/null 2>&1

此时长按右Option即可实现语音输入功能。

权限授予

由于该 Skill 需要监听全局键盘按键并模拟键盘输入,在不同系统下需要额外权限:

macOS

  • 辅助功能 (Accessibility):前往 系统设置 -> 隐私与安全性 -> 辅助功能,将你运行脚本的终端(如 Terminal, iTerm2 或 VSCode)勾选开启。
  • 麦克风 (Microphone):首次运行时,系统会弹出麦克风权限请求,请点击允许。
  • 输入监听 (Input Monitoring):同样在隐私设置中确保终端有权监听键盘。

版本: 2.0.0 日期: 2026-03-21

安全使用建议
What to consider before installing: - The code will send microphone audio (via FunASR) and recognized text to DashScope cloud services, and may send recognized text to the qwen-plus model for translation. Do not use it for sensitive audio/text unless you trust DashScope and your API key. - You must set DASHSCOPE_API_KEY in your environment for full functionality, but that env var is not declared in the skill registry metadata — verify this discrepancy with the publisher before providing a real API key. - The script requires Accessibility/Input Monitoring permission for the terminal you run it from and will simulate typing into whatever input has focus. Granting those permissions to a terminal is powerful: consider running in a dedicated account or VM, and avoid using the tool while focused on password fields, banking apps, or other sensitive inputs. - The PKG is small and readable; if you can, review the included script yourself (or have someone you trust do so). Confirm network destinations (dashscope endpoints) and consider monitoring outbound network traffic the first time you run it. - If you need higher assurance: ask the publisher for a verified source/homepage, or request that the registry metadata be updated to declare DASHSCOPE_API_KEY as a required env var and to provide a signed release or provenance information.
功能分析
Type: OpenClaw Skill Name: fun-voice-type Version: 2.0.0 The skill is a legitimate voice-to-text utility that uses Alibaba's FunASR and Qwen LLM for transcription and translation. It requires standard macOS permissions (Microphone, Accessibility) to capture audio and simulate keyboard input, which are necessary for its core functionality. The code in fun-voice-type.py is transparent, uses environment variables for API keys, and lacks any indicators of data exfiltration or malicious execution.
能力评估
Purpose & Capability
The name/description (voice input + translation via FunASR) matches the code and instructions: it records microphone audio, sends frames to DashScope FunASR, optionally sends recognized text to DashScope Generation (qwen-plus) for translation, and types results into the active input. This capability set is coherent for the stated purpose.
Instruction Scope
SKILL.md instructs installing portaudio and several Python packages and to set DASHSCOPE_API_KEY; it also asks the user to grant Accessibility/Input Monitoring and Microphone to the terminal used to run the script. The runtime instructions and code access the environment variable DASHSCOPE_API_KEY, but the package metadata declared 'Required env vars: none' — the instructions therefore access an env var not declared in metadata.
Install Mechanism
There is no install spec (instruction-only with an included script). Required system and Python deps are documented in SKILL.md. No downloads from untrusted URLs or archive extraction are used.
Credentials
The code requires a DashScope API key (DASHSCOPE_API_KEY) to call ASR and generation APIs, which is appropriate for cloud ASR/LLM usage — but the registry metadata does not declare this required env var or a primary credential. The missing declaration is an inconsistency the user should note. No other unrelated credentials are requested.
Persistence & Privilege
The skill is not always-enabled and does not request elevated platform privileges beyond the macOS accessibility/input monitoring grants required to listen to global keys and simulate typing. It does simulate keystrokes into any focused input (expected for an input method), which gives it potential to inject or exfiltrate sensitive text if used with sensitive inputs.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install fun-voice-type
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /fun-voice-type 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v2.0.0
Version 2.0.0 introduces multi-language support and translation features. - Adds the ability to translate speech input into multiple languages (e.g., Chinese, English, Japanese, Korean).
v1.0.2
- Update Icon and Usage Explanation.
v1.0.1
- 更新激活条件,更多关于语音输入和效率类需求场景的描述。 - 优化描述语,强调 Right Option(⌥)为触发键及全场景兼容性。 - 新增“全场景兼容”功能说明,明确支持主流输入环境。 - 小幅完善使用说明和控件键提示,提升文档清晰度。 - 更新版本号及日期。
v1.0.0
fun-voice-type 1.0.0 - 首次发布 - 提供基于阿里云FunASR实时语音识别的语音输入,支持长按右Option键将语音快速转换为文字并自动输入到当前光标处 - 集成菜单栏小图标方便退出。 - 详细说明环境依赖、API Key 配置及 macOS 权限设置
元数据
Slug fun-voice-type
版本 2.0.0
许可证 MIT-0
累计安装 1
当前安装数 1
历史版本数 4
常见问题

fun-voice-type 是什么?

一个语音输入法插件。它基于阿里云FunASR实时语音识别技术,允许用户通过长按快捷键(Right Option键)直接将语音转换为文字并“打”在当前光标所在的任何输入框中。此外,还能将语音翻译为多种语言(例:中英日韩)。 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 150 次。

如何安装 fun-voice-type?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install fun-voice-type」即可一键安装,无需额外配置。

fun-voice-type 是免费的吗?

是的,fun-voice-type 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

fun-voice-type 支持哪些平台?

fun-voice-type 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 fun-voice-type?

由 Yuzhong WU(@yzwu2017)开发并维护,当前版本 v2.0.0。

💬 留言讨论