← Back to Skills Marketplace
tel18610240060-collab

Feishu Speaker

by Buck · GitHub ↗ · v1.1.0
cross-platform ⚠ suspicious
470
Downloads
0
Stars
2
Active Installs
2
Versions
Install in OpenClaw
/install feishu-speaker
Description
飞书双向语音消息工具 - 支持语音转文字接收和文字转语音发送(TTS+Whisper)
README (SKILL.md)

feishu-speaker Skill v1.0

飞书双向语音消息工具 - 让AI助手像真人一样语音交流

支持双向:接收语音(转文字)+ 发送语音(TTS)

智能回复:根据接收消息类型自动选择回复方式


🎯 核心特性

🎤 接收语音(语音 → 文字)

  • 使用 OpenAI Whisper 本地转录
  • 支持中文语音识别
  • 无需联网,本地处理

🔊 发送语音(文字 → 语音)

  • 使用 Edge-TTS 生成高质量语音
  • 支持多种中文音色(男/女/年轻/成熟)
  • 可调节语速(0.5x - 2.0x)

🔄 智能交互

  • 收到语音消息 → 自动转文字理解 → 语音回复
  • 收到文字消息 → 文字理解 → 根据配置选择回复方式

📦 安装依赖

# 1. 安装Whisper(语音转文字)
pip install openai-whisper

# 2. 安装Edge-TTS(文字转语音)
npm install -g edge-tts

# 3. 安装FFmpeg(音频格式转换)
# macOS: brew install ffmpeg
# Ubuntu: apt-get install ffmpeg

🚀 快速开始

1. 配置飞书API凭证

创建文件 ~/.openclaw/.credentials/feishu-app-secret.txt

你的飞书App Secret

获取方式:

  1. 访问 https://open.feishu.cn/app/
  2. 进入你的应用 → 凭证与基础信息
  3. 复制 App Secret

2. 接收语音(自动转文字)

# 转录收到的语音消息
feishu-speaker listen voice.ogg

# 使用更大的模型(更准确但更慢)
feishu-speaker listen voice.ogg --model small

3. 发送语音消息

# 基本使用
feishu-speaker say "你好,这是语音消息"

# 指定音色
feishu-speaker say "晚上好" --voice zh-CN-YunxiNeural

# 调整语速
feishu-speaker say "加快速度" --rate "+30%"

4. 智能回复

# 根据收到的消息类型自动选择回复方式
feishu-speaker reply "收到,我马上处理"

🎨 支持的音色

音色ID 性别 风格 推荐场景
zh-CN-YunxiNeural 年轻、干脆利落 ⭐ 日常交流、快速回复
zh-CN-YunjianNeural 成熟稳重 正式场合、商务沟通
zh-CN-XiaoxiaoNeural 标准女声 温和回复、客服场景
zh-CN-XiaoyiNeural 温柔女声 亲切交流、情感场景

🔧 命令详解

feishu-speaker listen - 语音转文字

feishu-speaker listen \x3C音频文件> [选项]

选项:
  -m, --model \x3C模型>    Whisper模型(tiny/base/small,默认:base)
  -l, --language \x3C语言> 指定语言(默认:zh)
  -o, --output \x3C文件>   输出到文件

示例:
  feishu-speaker listen message.ogg
  feishu-speaker listen voice.mp3 --model small

feishu-speaker say - 文字转语音并发送

feishu-speaker say \x3C文字内容> [选项]

选项:
  -v, --voice \x3C音色>    指定音色(默认:zh-CN-YunxiNeural)
  -r, --rate \x3C速率>     语速调整(默认:+20%)
  -t, --to \x3C用户ID>     指定接收者
  -s, --save \x3C文件>     保存音频文件(不发送)

示例:
  feishu-speaker say "你好"
  feishu-speaker say "会议开始" --voice zh-CN-YunjianNeural
  feishu-speaker say "快速播报" --rate "+50%"

feishu-speaker reply - 智能回复

feishu-speaker reply \x3C文字内容> [选项]

选项:
  --voice               强制语音回复
  --text                强制文字回复
  --auto                根据对方消息类型自动选择(默认)

示例:
  feishu-speaker reply "收到"
  feishu-speaker reply "好的" --voice

⚙️ 配置选项

配置文件:~/.openclaw/skills/feishu-speaker/config/config.json

{
  "default_voice": "zh-CN-YunxiNeural",
  "default_rate": "+20%",
  "default_volume": "+0%",
  "default_pitch": "default",
  "reply_mode": "auto",
  "app_id": "cli_a9037acd2ba19bb5",
  "receiver_id": "ou_94f3936f1896b5378404f377da3fae6f"
}

配置说明:

  • default_voice: 默认TTS音色
  • default_rate: 默认语速(+20% = 1.2倍速)
  • reply_mode: 回复模式
    • auto: 自动匹配(语音→语音,文字→文字)
    • voice: 总是语音回复
    • text: 总是文字回复

📝 使用场景

场景1:AI助手语音交互

# 用户发送语音 → AI转录理解 → 语音回复
# 自动流程:
# 1. 用户:发送语音"帮我查一下明天的天气"
# 2. AI:feishu-speaker listen voice.ogg → 转录为文字
# 3. AI:处理请求 → feishu-speaker reply "明天北京晴天,25度"

场景2:定时语音播报

# 在cron任务中使用
feishu-speaker say "早上好!今日热点已更新,请查看。"

场景3:多音色切换

# 正式场合
feishu-speaker say "会议将在10分钟后开始。" --voice zh-CN-YunjianNeural

# 活泼场合  
feishu-speaker say "好消息!任务提前完成了!" --voice zh-CN-YunxiNeural --rate "+30%"

🔌 技术架构

接收语音流程:
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  飞书语音消息  │ ──→ │  Whisper    │ ──→ │   文字结果   │
│  (ogg格式)   │     │  (本地转录)  │     │  (中文文本)  │
└─────────────┘     └─────────────┘     └─────────────┘

发送语音流程:
┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   文字输入   │ ──→ │  Edge-TTS   │ ──→ │   FFmpeg    │ ──→ │  飞书API    │
│  (中文文本)  │     │  (生成MP3)  │     │ (转opus/ogg)│     │ (发送语音)  │
└─────────────┘     └─────────────┘     └─────────────┘     └─────────────┘

⚠️ 注意事项

  1. 音频格式:飞书语音消息支持 opus/ogg 格式,脚本会自动转换
  2. 文件大小:单条语音建议不超过 10MB
  3. 语速范围:支持 -50% 到 +100%,建议 +20% 左右最自然
  4. 网络要求:发送语音需要访问飞书API(国内网络即可)
  5. 隐私安全:语音转文字在本地处理,不上传到云端

🐛 故障排查

Whisper模型下载失败

# 手动下载模型
python3 -c "import whisper; whisper.load_model('base')"

飞书API返回错误

  1. 检查 App Secret 是否正确配置
  2. 检查接收者ID格式(应以 ou_ 开头)
  3. 检查音频文件格式是否为 opus/ogg

Edge-TTS安装失败

# 使用npx直接运行
npx edge-tts "测试" --voice zh-CN-YunxiNeural --write-media output.mp3

📊 与其他skill对比

功能 feishu-voice feishu-speaker (本skill)
发送语音
接收语音(转文字)
双向交互
智能回复模式
多音色支持
语速调节

🔄 更新计划

v1.1.0 (计划中)

  • 支持更多语音合成引擎(Azure、科大讯飞)
  • 支持语音情感调节(开心、严肃、温柔)
  • 支持实时语音对话(WebSocket)

v1.2.0 (计划中)

  • 支持语音克隆(自定义音色)
  • 支持语音转写后自动摘要
  • 支持批量语音处理

📄 License

MIT License


让飞书沟通更自然,像真人一样语音交流! 🎙️✨


🚀 新增:一键语音回复

reply-voice 脚本(v1.1.0新增)

功能:自动处理完整的语音消息回复流程

  • 接收语音消息 → 转录为文字 → 生成语音回复 → 发送

用法

# 转录语音并发送回复
reply-voice voice.ogg "这是回复内容"

# 仅转录,不发送
reply-voice voice.ogg

完整流程

┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│ 飞书语音消息 │ → │  Whisper   │ → │  Edge-TTS  │ → │  飞书API    │
│  (ogg格式)  │    │  转录文字   │    │  生成语音   │    │  发送语音   │
└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘

依赖

  • Python 3.8+
  • openai-whisper
  • edge-tts
  • ffmpeg
Usage Guidance
This skill appears to do what it says (local Whisper transcription + Edge-TTS + sending to Feishu), but several inconsistencies warrant caution: - Metadata vs. instructions: The skill metadata declares no credentials/config paths, yet SKILL.md and the script require a Feishu App Secret file at ~/.openclaw/.credentials/feishu-app-secret.txt. Expect to provide a sensitive secret; the metadata should have declared that. - Hardcoded defaults: The repository includes an APP_ID and a default RECEIVER_ID. Before using, verify that APP_ID belongs to you (or replace it), and change the receiver_id to the intended recipient. Otherwise the skill may send audio/messages to that default account. - Secret handling: The script reads the App Secret from a plaintext file. If you proceed, store the secret with restrictive permissions (chmod 600) and consider alternative secret mechanisms (agent-managed secret store / environment variables) if available. - Audit before running: Inspect the included shell script and config (you already have them here). Confirm network endpoints are the official Feishu endpoints (open.feishu.cn) and that no other hidden endpoints exist. Run in a controlled environment (sandbox or test account) first. - Operational note: The skill expects pip/npm/ffmpeg/curl/python3 at runtime; ensure these tools are installed from trusted sources. If you need this functionality but don't trust the defaults, ask the author for a version that requires explicit configuration of app_id/receiver_id (no hardcoded defaults) and that documents required credentials in the registry metadata.
Capability Analysis
Type: OpenClaw Skill Name: feishu-speaker Version: 1.1.0 The skill provides legitimate Feishu voice messaging functionality, and the SKILL.md instructions do not contain malicious prompt injection attempts. However, the `scripts/send_voice_feishu.sh` script directly uses shell arguments (`$1`, `$2`, `$3`) in `curl` commands without explicit sanitization. This presents a potential shell injection vulnerability if these arguments are derived from untrusted, user-controlled input, classifying the skill as suspicious due to this unaddressed input sanitization risk.
Capability Assessment
Purpose & Capability
The skill's described purpose (Feishu two-way voice with local Whisper and Edge-TTS) matches the included script and instructions. However the registry metadata declares no required credentials or config paths while SKILL.md and scripts require a stored Feishu App Secret file (~/.openclaw/.credentials/feishu-app-secret.txt) and the code embeds a default APP_ID and receiver_id. The lack of declared credential/config requirements in metadata is inconsistent and surprising.
Instruction Scope
SKILL.md instructs the agent/user to create a plaintext App Secret file in the user's home directory and to install Whisper/Edge-TTS/FFmpeg. The included script reads that file and sends uploaded audio to Feishu APIs — these actions are within the claimed purpose. The concern is that the instructions request writing and reading a sensitive secret at a specific path (not declared in metadata) and the skill automatically uses a hardcoded receiver_id/app_id which could cause messages to be sent to a preconfigured recipient without the user explicitly choosing one.
Install Mechanism
There is no formal install spec (instruction-only), which is low-risk in general. The README asks the user to pip install openai-whisper, npm -g edge-tts and install ffmpeg — reasonable for the functionality. The package relies on curl and python3 at runtime (present in script) but 'required binaries' metadata lists none, a minor metadata mismatch.
Credentials
The skill requires access to a Feishu App Secret (sensitive credential) but does not declare any required env vars or config paths in registry metadata. Instead it instructs storing the secret in a specific file under ~/.openclaw/.credentials/. Additionally, a hardcoded APP_ID and default RECEIVER_ID are included in config and the script — this is disproportionate because the skill should require and document the user's own app_id/receiver, or clearly explain why defaults are included. Using a default receiver_id means the skill could send content to that ID unless the user overrides it.
Persistence & Privilege
The skill is not always-enabled and does not request elevated platform privileges. It does include a script that will run network requests when invoked, but autonomous invocation is the platform default and 'always' is false.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install feishu-speaker
  3. After installation, invoke the skill by name or use /feishu-speaker
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.1.0
v1.1.0 introduces a new one-click voice reply script. - 新增 `reply-voice` 脚本,实现一键语音回复流程:语音消息自动转文字,生成语音回复并发送 - `reply-voice` 支持只转录不回复,或指定回复内容后自动发送语音 - 补充使用说明和完整处理流程 - 升级版本号至 1.1.0
v1.0.0
feishu-speaker v1.0.0 - 首次发布,支持飞书双向语音交互 - 支持将飞书语音消息转录为文字,使用本地OpenAI Whisper模型,无需联网 - 支持文字转语音发送,使用Edge-TTS,提供多种中文音色及语速调节 - 智能回复:自动识别消息类型并选择文字或语音回复方式 - 支持通过命令行接收语音、发送语音和智能自动回复 - 配置灵活,适用多种场景,如AI语音助手、定时语音播报 - 安装和使用文档完善,便于快速上手
Metadata
Slug feishu-speaker
Version 1.1.0
License
All-time Installs 2
Active Installs 2
Total Versions 2
Frequently Asked Questions

What is Feishu Speaker?

飞书双向语音消息工具 - 支持语音转文字接收和文字转语音发送(TTS+Whisper). It is an AI Agent Skill for Claude Code / OpenClaw, with 470 downloads so far.

How do I install Feishu Speaker?

Run "/install feishu-speaker" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Feishu Speaker free?

Yes, Feishu Speaker is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Feishu Speaker support?

Feishu Speaker is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Feishu Speaker?

It is built and maintained by Buck (@tel18610240060-collab); the current version is v1.1.0.

💬 Comments