← 返回 Skills 市场
adriel1006

Discord Voice Using Deepgram

作者 adriel1006 · GitHub ↗ · v1.0.0
cross-platform ⚠ suspicious
1448
总下载
5
收藏
1
当前安装
1
版本数
在 OpenClaw 中安装
/install discord-voice-deepgram
功能描述
Voice-channel conversations in Discord using Deepgram streaming STT + low-latency TTS
使用说明 (SKILL.md)

Deepgram Discord Voice (Clawdbot/OpenClaw Plugin)

This plugin lets you talk to your agent only from a Discord voice channel.

Pipeline (low latency):

  • Discord voice audio → Deepgram streaming STT (WebSocket)
  • Transcript → your agent
  • Agent reply → Deepgram TTS (/v1/speak streamed HTTP Ogg/Opus)
  • Audio played back into the voice channel

Requirements

  • A Discord bot token (DISCORD_TOKEN)
  • A Deepgram API key (DEEPGRAM_API_KEY)
  • Discord bot permissions in your server:
    • Connect
    • Speak
    • Use Voice Activity

Install

Option A: Install from ClawHub

  1. In your OpenClaw/Clawdbot dashboard, open Skills/Plugins.
  2. Add/install deepgram-discord-voice.
  3. Set the required environment variables.

Option B: Manual install

  1. Copy this folder into your extensions/plugins directory.
  2. Run:
npm install
  1. Restart OpenClaw/Clawdbot.

Configuration

Key settings

  • primaryUser (recommended): Who the bot listens to by default.

    • Best: your Discord user ID (numeric)
    • Also supported: username/display name (e.g., atechy) if unique in-channel
  • allowVoiceSwitch: If true, the primary user can switch who is allowed by voice.

  • wakeWord: Prefix for voice control commands. Default: openclaw.

  • deepgram.sttModel: Default nova-2.

  • deepgram.language: Optional BCP‑47 language tag (e.g., en-US, es, es-EC).

  • ttsVoice: Deepgram Aura voice model (e.g., aura-2-thalia-en).

Example config

{
  "plugins": {
    "entries": {
      "deepgram-discord-voice": {
        "enabled": true,
        "config": {
          "streamingSTT": true,
          "streamingTTS": true,

          "primaryUser": "atechy",
          "allowVoiceSwitch": true,
          "wakeWord": "openclaw",

          "ttsVoice": "aura-2-thalia-en",
          "vadSensitivity": "medium",
          "bargeIn": true,

          "deepgram": {
            "sttModel": "nova-2",
            "language": "en-US"
          }
        }
      }
    }
  }
}

Usage

Join a voice channel

Use the plugin tool or slash command (depends on your OpenClaw setup):

  • Join: action=join with the channelId
  • Leave: action=leave

Talk (voice channel)

Once the bot is connected, just speak.

Safeguard: only listen to you (default)

When primaryUser is set, the plugin will only listen to that user unless you allow someone else.

Let someone else talk (voice commands)

As the primary user, say:

  • openclaw allow \x3Cname>
  • openclaw listen to \x3Cname>

To lock it back:

  • openclaw only me
  • openclaw reset

Switch via tool actions (optional)

  • allow_speaker with user (id / @mention / name)
  • only_me
  • status

Notes

  • Lowest latency comes from streamingSTT=true and streamingTTS=true.
  • Deepgram TTS is streamed over HTTP in Ogg/Opus so Discord can play it immediately.
安全使用建议
Before installing, be aware of these points: - Credentials: The plugin requires a Discord bot token and a Deepgram API key. Decide whether those keys will be stored in OpenClaw/Clawdbot config or environment variables; prefer least-privilege tokens for the Discord bot (only Connect/Speak/Voice Activity), and don't reuse high-privilege tokens. - Agent access: Voice input is forwarded into your embedded agent via runEmbeddedPiAgent. The plugin intentionally supplies an extra system prompt and does not enforce a restrictive 'lane' — meaning the invoked agent may have access to its usual tools and persisted session data. If your agent has tools that can access external services or secrets, voice input could indirectly trigger them. If you don't want that, do not enable this plugin or inspect/modify the runEmbeddedPiAgent call to restrict tool access. - Session persistence: Transcripts and session IDs are stored via the platform session store/workspace. If you handle sensitive conversations, verify where the session store is located and who can read it. - Test safely: Try this in a throwaway Discord server with a bot that has minimal permissions and with a non-production Deepgram key. Review and, if needed, modify the code to (a) explicitly restrict which tools the agent may use when invoked by voice, (b) avoid sending or persisting sensitive context, and (c) require an explicit opt-in to auto-join channels. - If you lack trust in the source (homepage unknown, owner ID only): prefer official/verified plugins or conduct a code review. The behavior is plausible for the stated purpose, but the privilege surface (embedded agent invocation + persisted sessions + undocumented system-prompt injection) merits caution.
功能分析
Type: OpenClaw Skill Name: discord-voice-deepgram Version: 1.0.0 The OpenClaw Deepgram Discord Voice skill bundle appears benign. It integrates Discord voice channels with Deepgram STT/TTS and the OpenClaw agent. All network calls are directed to legitimate Discord and Deepgram APIs. Sensitive API keys (DISCORD_TOKEN, DEEPGRAM_API_KEY) are handled via standard configuration or environment variables. The `index.ts` file passes transcribed user input to the core agent, which is an inherent prompt injection risk in any LLM-based system, but the plugin includes an `extraSystemPrompt` to guide the agent's behavior, acting as a defense rather than an attack. There is no evidence of data exfiltration, unauthorized execution, persistence mechanisms, or obfuscation. The `SKILL.md` provides instructions for users and the OpenClaw platform, not malicious directives for the AI agent.
能力评估
Purpose & Capability
Name/description match the code: this is a Discord voice plugin that uses Deepgram for STT/TTS and routes transcripts to the agent. However, the registry metadata listed no required env vars while the SKILL.md and code expect a Discord token and a Deepgram API key (DISCORD_TOKEN / DEEPGRAM_API_KEY or config.deepgram.apiKey). That mismatch is an inconsistency to be aware of.
Instruction Scope
The SKILL.md and code instruct the plugin to join voice channels, stream audio to Deepgram, and forward transcripts to the embedded agent. The code builds an extraSystemPrompt and calls runEmbeddedPiAgent (the agent is told it has access to its normal tools/skills and the user's Discord ID). The plugin also reads/writes the session store and agent workspace via core-bridge. Those actions go beyond simple STT/TTS plumbing because they give the invoked agent contextual info and access to its usual toolset and persisted session data — a potential surprise/privilege escalation if you weren't expecting that.
Install Mechanism
This is effectively an instruction-plus-source package (package.json present). There's no packaged install spec in the registry, so install is manual via npm (npm install). Dependencies are standard npm packages (discord.js, @discordjs/voice, ws, etc.) from normal registries — no obscure download URLs or extract steps were found.
Credentials
The plugin legitimately needs a Discord bot token and a Deepgram API key. The code reads Deepgram keys from config or environment and attempts to get the Discord token from the host OpenClaw/Clawdbot main config (mainConfig.channels.discord.token or mainConfig.discord.token) rather than directly requiring an env var. This is plausible but should be called out: the plugin expects access to your platform's Discord token storage and may also read Deepgram keys from env/config, so credential placement matters.
Persistence & Privilege
The plugin loads Clawdbot core modules and uses them to resolve agent workspace, session store, and to run an embedded agent. It also creates/updates session entries (saving a session store). It intentionally removed a commented-out 'lane' restriction and passes an extra system prompt telling the agent it 'has access to all your normal tools and skills'. That combination (embedded agent invocation + persisted session state + broad tool access) increases the blast radius of voice-triggered operations and is not clearly surfaced in SKILL.md.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install discord-voice-deepgram
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /discord-voice-deepgram 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release of Deepgram Discord Voice skill for Clawdbot/OpenClaw. - Enables real-time voice conversations in Discord voice channels using Deepgram streaming STT and low-latency TTS. - Configurable to listen only to a specified primary user, with optional voice-activated speaker switching. - Supports wake-word activated voice commands for controlling permissions. - Provides example configurations and clear setup instructions. - Requires DISCORD_TOKEN and DEEPGRAM_API_KEY environment variables. - Delivers agent replies via Deepgram Aura TTS, streamed directly to the channel for minimal latency.
元数据
Slug discord-voice-deepgram
版本 1.0.0
许可证
累计安装 1
当前安装数 1
历史版本数 1
常见问题

Discord Voice Using Deepgram 是什么?

Voice-channel conversations in Discord using Deepgram streaming STT + low-latency TTS. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 1448 次。

如何安装 Discord Voice Using Deepgram?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install discord-voice-deepgram」即可一键安装,无需额外配置。

Discord Voice Using Deepgram 是免费的吗?

是的,Discord Voice Using Deepgram 完全免费(开源免费),可自由下载、安装和使用。

Discord Voice Using Deepgram 支持哪些平台?

Discord Voice Using Deepgram 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Discord Voice Using Deepgram?

由 adriel1006(@adriel1006)开发并维护,当前版本 v1.0.0。

💬 留言讨论