← 返回 Skills 市场
mengbad

Deepgram Voice Workflow

作者 MengBad · GitHub ↗ · v0.1.0 · MIT-0
cross-platform ⚠ suspicious
288
总下载
0
收藏
2
当前安装
1
版本数
在 OpenClaw 中安装
/install deepgram-voice-workflow
功能描述
End-to-end voice workflow with Deepgram STT and TTS. Use when transcribing voice messages, generating spoken replies, or building a shell-based audio pipelin...
使用说明 (SKILL.md)

Deepgram Voice Workflow

Overview

Use this skill for a complete speech workflow:

  1. transcribe audio to text with Deepgram STT
  2. optionally synthesize a spoken reply with Deepgram TTS
  3. return structured outputs that can feed chat or agent pipelines

This skill is the right choice when the task is broader than plain transcription and needs an input-audio to output-audio pipeline.

Quick Start

Transcribe only

{baseDir}/scripts/deepgram-transcribe.sh /path/to/audio.ogg

Generate speech from text

{baseDir}/scripts/deepgram-tts.sh "你好,我是 Neko。"

Run the full pipeline

{baseDir}/scripts/neko-voice-pipeline.sh /path/to/audio.ogg --reply "收到啦,这是语音回复测试。"

Environment

Set DEEPGRAM_API_KEY before use.

The bundled scripts also fall back to reading it from:

  • /root/.openclaw/.env

Workflow Decision

Use deepgram-transcribe.sh when

  • only text transcription is needed
  • the downstream system will generate its own reply
  • the task is speech-to-text only

Use deepgram-tts.sh when

  • text already exists
  • only an MP3 spoken response is needed
  • the workflow is text-to-speech only

Use neko-voice-pipeline.sh when

  • the task begins with an audio file
  • a transcript is needed
  • an optional spoken reply should be generated in the same flow

Outputs

STT output

deepgram-transcribe.sh writes:

  • transcript text file
  • raw API JSON file next to it

TTS output

deepgram-tts.sh writes:

  • MP3 output file

Pipeline output

neko-voice-pipeline.sh prints JSON with:

  • out_dir
  • transcript_path
  • transcript
  • reply_audio_path

This makes it easy to wire into scripts or adapters.

Typical Uses

Prefer this skill for:

  • transcribing Telegram/QQ/OneBot voice messages
  • generating MP3 replies to short voice prompts
  • building bot-side voice input/output automation
  • testing speech pipelines from shell without introducing a full SDK

Notes

  • Defaults are tuned for lightweight practical use, not maximal configurability.
  • deepgram-transcribe.sh defaults to model=nova-2 and language=zh.
  • deepgram-tts.sh defaults to model=aura-2-luna-en; override the model when a different voice is preferred.
  • Inspect the raw JSON transcript response when debugging recognition quality or API errors.

References

Read these files when needed:

  • references/stt-notes.md for transcription details
  • references/tts-notes.md for speech synthesis details
  • references/pipeline-notes.md for end-to-end pipeline behavior
安全使用建议
This skill appears to do what it says (call Deepgram STT/TTS and write transcripts/MP3s), but the package metadata did not declare the required DEEPGRAM_API_KEY — the scripts will fail without it. Before installing or running: 1) do not put sensitive credentials into a shared root file; prefer setting DEEPGRAM_API_KEY in the invoking user's environment rather than relying on /root/.openclaw/.env; 2) verify the Deepgram API key you provide is scoped appropriately (rotate and limit permissions where possible); 3) inspect the three shell scripts yourself (they are short) to confirm you are comfortable with network calls to api.deepgram.com and with files being written to /tmp or your chosen out_dir; and 4) be cautious because the skill source/homepage is unknown — if you need stronger assurance ask the publisher for provenance or a homepage before use.
功能分析
Type: OpenClaw Skill Name: deepgram-voice-workflow Version: 0.1.0 The scripts `deepgram-transcribe.sh` and `deepgram-tts.sh` contain shell injection vulnerabilities because command-line arguments (such as `--model`, `--language`, and `--content-type`) are expanded directly into a double-quoted string within a `curl` command. An attacker could exploit this via prompt injection to execute arbitrary commands. Furthermore, the scripts hardcode a sensitive credential lookup path at `/root/.openclaw/.env`, which is a high-privilege location. While the logic appears intended for legitimate Deepgram API integration, these significant security flaws and risky file access patterns warrant a suspicious classification.
能力评估
Purpose & Capability
Name, description, and included scripts are consistent with an end-to-end Deepgram STT/TTS pipeline; however the registry metadata declares no required environment variables or primary credential while the runtime explicitly requires DEEPGRAM_API_KEY. That mismatch is an incoherence that could surprise users.
Instruction Scope
Runtime instructions and bundled scripts are narrowly scoped to: read an input audio file, call api.deepgram.com, and write transcript/MP3 outputs. However the scripts also look for /root/.openclaw/.env as a fallback for DEEPGRAM_API_KEY (documented in SKILL.md). Reading a root-level config file is outside the minimal scope and was not declared in the registry metadata.
Install Mechanism
No install spec (instruction-only with bundled shell scripts). No remote downloads, no package installs, and no code obfuscation — this is low-risk from an install mechanism perspective.
Credentials
The scripts require a Deepgram API token (DEEPGRAM_API_KEY) to function, but the skill registry lists no required env/primary credential. The fallback that reads /root/.openclaw/.env is a privileged file path not declared. Requesting a single Deepgram key is proportionate to the stated purpose, but the undeclared root-level config access is problematic.
Persistence & Privilege
The skill does not request persistent/system-wide privileges (always=false). It does not modify other skills or system configs. It creates local output files (under /tmp or user-specified directories) which is expected.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install deepgram-voice-workflow
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /deepgram-voice-workflow 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.1.0
Initial public release
元数据
Slug deepgram-voice-workflow
版本 0.1.0
许可证 MIT-0
累计安装 2
当前安装数 2
历史版本数 1
常见问题

Deepgram Voice Workflow 是什么?

End-to-end voice workflow with Deepgram STT and TTS. Use when transcribing voice messages, generating spoken replies, or building a shell-based audio pipelin... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 288 次。

如何安装 Deepgram Voice Workflow?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install deepgram-voice-workflow」即可一键安装,无需额外配置。

Deepgram Voice Workflow 是免费的吗?

是的,Deepgram Voice Workflow 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Deepgram Voice Workflow 支持哪些平台?

Deepgram Voice Workflow 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Deepgram Voice Workflow?

由 MengBad(@mengbad)开发并维护,当前版本 v0.1.0。

💬 留言讨论