功能描述

实现飞书语音消息的上传下载、语音转文字及文字转语音，支持与 ElevenLabs 语音服务集成。

使用说明 (SKILL.md)

Feishu Voice Skill - 飞书语音交互技能

Name: 飞书语音
Author: godzff

概述

本技能用于实现飞书与 ElevenLabs 的语音交互，包括：

语音转文字（用户发语音 → 识别内容）
文字转语音（生成语音回复用户）
飞书语音消息的收发

1. 环境配置

1.1 ElevenLabs API Key

export ELEVENLABS_API_KEY="你的API Key"

1.2 FFmpeg 安装

apt-get update && apt-get install -y ffmpeg

2. 语音转文字（用户语音识别）

2.1 下载飞书语音

用户发送语音时，收到的是 file_key，需要通过以下步骤下载：

TOKEN=$(curl -s -X POST "https://open.feishu.cn/open-apis/auth/v3/tenant_access_token/internal" \
  -H "Content-Type: application/json; charset=utf-8" \
  -d '{"app_id":"你的app_id","app_secret":"你的app_secret"}' | grep -o '"tenant_access_token":"[^"]*"' | cut -d'"' -f4)

# 下载语音文件
curl -s "https://open.feishu.cn/open-apis/im/v1/messages/{message_id}/resources/{file_key}?type=file" \
  -H "Authorization: Bearer $TOKEN" -o /path/to/voice.ogg

2.2 ElevenLabs 语音转文字

curl -s -X POST "https://api.elevenlabs.io/v1/speech-to-text?enable_logging=true" \
  -H "xi-api-key: ${ELEVENLABS_API_KEY}" \
  -F model_id="scribe_v1" \
  -F file=@/path/to/voice.ogg

返回结果包含 text 字段，即识别出的文字内容。

3. 文字转语音

3.1 ElevenLabs TTS 生成

curl -s -X POST "https://api.elevenlabs.io/v1/text-to-speech/pNInz6obpgDQGcFmaJgB" \
  -H "Content-Type: application/json" \
  -H "xi-api-key: ${ELEVENLABS_API_KEY}" \
  -d '{
    "text": "要转换的文字",
    "model_id": "eleven_multilingual_v2"
  }' -o /path/to/output.mp3

3.2 转换为飞书兼容格式

飞书语音需要 Ogg/Opus 格式，需要用 FFmpeg 转换：

ffmpeg -i input.mp3 -ar 16000 -ac 1 -acodec libopus output.ogg -y

4. 发送语音消息（飞书）

4.1 Node.js 实现

const { Client } = require('@larksuiteoapi/node-sdk');
const fs = require('fs');

const client = new Client({
  appId: '你的appId',
  appSecret: '你的appSecret',
});

async function sendVoice(filePath, durationMs, receiveId) {
  // 1. 上传语音文件
  const uploadRes = await client.im.file.create({
    data: {
      file_type: 'opus',
      file_name: 'voice.ogg',
      file: fs.createReadStream(filePath),
      duration: durationMs
    }
  });
  
  const fileKey = uploadRes.file_key;
  
  // 2. 发送语音消息
  const sendRes = await client.im.message.create({
    params: { receive_id_type: 'open_id' },
    data: {
      receive_id: receiveId,
      msg_type: 'audio',
      content: JSON.stringify({ file_key: fileKey, duration: durationMs })
    }
  });
  
  return sendRes;
}

5. 常见问题

5.1 语音下载失败

错误: "The app is not the resource sender"

原因: 飞书安全限制，机器人只能下载自己发送的文件

解决: 用户需将语音转发给机器人（转发后机器人成为发送者）

5.2 TTS 生成文件为空

检查: 确认 ELEVENLABS_API_KEY 已设置且有余额

5.3 语音无法播放

检查:

文件格式是否为 Ogg/Opus
duration 参数是否正确
文件是否在允许的目录（workspace 目录）

5.4 消息太长被拦截

钉钉：单条消息超过约7000字符会被拦截，需要拆分多条发送
飞书：同样有限制

6. 飞书权限配置

需要以下权限：

im:message - 消息收发
im:resource - 文件/媒体资源
im:resource:download - 下载消息资源

7. 完整流程示例

用户发送语音
    ↓
1. 获取 message_id 和 file_key
2. 下载语音文件 (type=file)
3. ElevenLabs 语音转文字 → 理解内容
4. 生成回复内容
5. ElevenLabs TTS 生成语音
6. FFmpeg 转为 Ogg 格式
7. 上传并发送语音消息给用户

8. 相关文件位置

临时语音文件: /root/.openclaw/workspace/
TTS 转换: 需要 ffmpeg 支持

最后更新: 2026-02-23

安全使用建议

This skill appears to do what it says (Feishu <-> ElevenLabs voice flows) but has some implementation and setup gaps you should address before installing or running it: - Feishu credentials: SKILL.md requires app_id/app_secret (used to obtain tenant_access_token) and the Node.js sample uses appId/appSecret, but the manifest does not declare or document these environment variables. Treat this as required credentials and only provide them after verifying who/what will store and use them. - ELEVENLABS key: The skill expects ELEVENLABS_API_KEY — keep it in a secure secret store, monitor usage and billing on ElevenLabs. - System install: The instructions call apt-get install -y ffmpeg. Running apt-get requires appropriate privileges and may not be appropriate on all hosts. If you cannot or do not want to install system packages, arrange an alternative (preinstalled ffmpeg, containerized execution, or use a hosted conversion service). - Workspace path and file handling: The doc references /root/.openclaw/workspace/ and temporary files. Confirm where temporary media will be stored and ensure least-privilege file paths (avoid using root-owned directories if not necessary). - Node runtime and dependencies: The Node.js example uses @larksuiteoapi/node-sdk but the manifest doesn't declare runtime requirements. If you plan to run the Node snippet, ensure Node and the SDK are installed securely and that dependency installation is reviewed. - Operational safety: Because the agent can run instructions, confirm whether the agent will execute commands autonomously. If you do not want autonomous installs or network calls, restrict invocation or require manual approval. What would increase confidence: a corrected manifest that declares required Feishu env vars (e.g., FEISHU_APP_ID, FEISHU_APP_SECRET or explicit guidance to use tenant_access_token), explicit runtime requirements (Node, ffmpeg), and clear secure-handling instructions for credentials and temporary files. If you cannot verify these, treat the skill with caution and avoid providing production credentials until you have validated the code in a controlled environment.

功能分析

Type: OpenClaw Skill Name: feishu-voice-lobster Version: 1.0.0 The skill bundle is classified as suspicious due to its reliance on system-level commands and network interactions, which, while seemingly aligned with its stated purpose, introduce significant attack surface. Specifically, `SKILL.md` instructs the agent to execute `apt-get update && apt-get install -y ffmpeg`, granting package management capabilities, and uses multiple `curl` commands to interact with external APIs (Feishu, ElevenLabs) and download files. Although these actions are necessary for the skill's functionality (voice processing and integration), the ability to execute arbitrary shell commands and make network requests represents a high-risk capability that could be exploited if the agent's input is compromised or if the skill's instructions were subtly altered for malicious intent. There is no clear evidence of intentional malicious behavior like data exfiltration to unauthorized endpoints or persistence mechanisms.

能力评估

⚠ Purpose & Capability

The skill claims to implement Feishu voice upload/download, STT, and TTS using ElevenLabs. ElevenLabs API key is declared in skill.json and SKILL.md, which is expected. However, the runtime instructions require Feishu app_id/app_secret (to get tenant_access_token) and a Node.js Lark SDK usage, but the skill's manifest does not declare any required Feishu credentials or Node/runtime requirements. Requiring Feishu credentials is reasonable for the stated purpose, but failing to declare them in metadata is an incoherence that could lead to missing expectations or accidental secret exposure.

ℹ Instruction Scope

SKILL.md gives concrete shell/Node.js steps: fetching a tenant_access_token with app_id/app_secret, downloading message resources, calling ElevenLabs speech-to-text and TTS endpoints, converting audio with ffmpeg, and uploading via the Lark SDK. All network calls go to expected endpoints (open.feishu.cn and api.elevenlabs.io). However, the instructions recommend running system package installs (apt-get install ffmpeg) and reference a root workspace path (/root/.openclaw/workspace/). They also mention using app credentials inline (no guidance on secure storage). These runtime actions expand scope to system-level operations and require care with privileges and secret handling.

ℹ Install Mechanism

There is no declared install spec (instruction-only skill), which is low risk by itself. But SKILL.md instructs executing apt-get update && apt-get install -y ffmpeg to obtain ffmpeg. That is a system package install step the agent/operator would need to perform; it is not encoded in the skill manifest and requires elevated privileges on many hosts. No URLs, archive downloads, or obscure installers are present.

⚠ Credentials

skill.json and SKILL.md declare ELEVENLABS_API_KEY (appropriate). But SKILL.md also requires Feishu app_id/app_secret and Node.js credentials (appId/appSecret) for the upload/send flow, yet those environment variables are not listed in the manifest's required env or primary credential. This mismatch is problematic: the skill needs Feishu credentials to function but doesn't declare them, which can lead to unclear setup and possible ad-hoc credential handling. The skill also references storing temporary files under /root/.openclaw/workspace/, which raises questions about file-permission expectations.

✓ Persistence & Privilege

The skill is not marked always:true, does not request persistent presence, and contains no code that modifies other skills or global agent settings. It only provides runtime instructions and example code snippets. There is no indication of self-enablement or privileged persistence.

版本历史

v1.0.0

Feishu Voice Skill 1.0.0 — Initial Release - Enables voice-to-text (speech recognition) and text-to-voice (TTS) conversion between Feishu and ElevenLabs. - Supports downloading and uploading of Feishu voice messages. - Provides detailed setup instructions for ElevenLabs API and FFmpeg requirements. - Includes step-by-step usage examples for both voice recognition and speech synthesis workflows. - Documents common issues and required bot permissions for smooth operation.

元数据

Slug feishu-voice-lobster

版本 1.0.0

许可证 —

累计安装 2

当前安装数 2

历史版本数 1

常见问题

飞书语音是什么？

实现飞书语音消息的上传下载、语音转文字及文字转语音，支持与 ElevenLabs 语音服务集成。它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 606 次。

如何安装飞书语音？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install feishu-voice-lobster」即可一键安装，无需额外配置。

飞书语音是免费的吗？

是的，飞书语音完全免费（开源免费），可自由下载、安装和使用。

飞书语音支持哪些平台？

飞书语音跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了飞书语音？

由 godzff（@godzff）开发并维护，当前版本 v1.0.0。

飞书语音