← Back to Skills Marketplace
frank-bot07

openclaw-voice

by frank-bot07 · GitHub ↗ · v1.0.0
cross-platform ⚠ suspicious
722
Downloads
2
Stars
4
Active Installs
1
Versions
Install in OpenClaw
/install openclaw-voice
Description
Transcribe audio to text and generate spoken AI responses using Whisper and ElevenLabs via CLI with transcript storage and search.
Usage Guidance
What doesn't add up: the README and SKILL.md promise Whisper/ElevenLabs STT/TTS and a Twilio/real-time calling roadmap, but the shipped code only implements a CLI-backed SQLite transcript/profile manager and an 'interchange' MD generator. Before installing or supplying API keys: - Ask the author which features are implemented now vs. planned. If you expect live STT/TTS or calling, confirm where that code lives and how it will be executed. - Treat the interchange/voice directory as potentially public to other local skills: it writes MD summaries into the workspace root. If transcripts are sensitive, run the skill in an isolated workspace or change the output path. - Don't provide Twilio/ElevenLabs/Anthropic keys until you see explicit code that uses them and you understand where audio/text will be sent and stored. - If you plan to run npm install, be aware better-sqlite3 has native build steps (normal but requires build tooling). Given the mismatches, proceed carefully and request clarification from the package owner; the inconsistencies look more like incomplete/unstable engineering than clearly malicious code, but they affect trust and data exposure.
Capability Analysis
Type: OpenClaw Skill Name: openclaw-voice Version: 1.0.0 The skill contains multiple vulnerabilities. A path traversal vulnerability exists in `src/backup.js` (exposed via `src/cli.js`'s `backup` and `restore` commands), allowing user-controlled paths to potentially write files or create directories in arbitrary locations. More critically, `src/interchange.js` writes user-controlled data (conversation summaries and voice profile descriptions) directly into markdown files (`interchange/voice/state/recent.md` and `interchange/voice/ops/profiles.md`). Since `SKILL.md` and `README.md` explicitly state these interchange files are read by other AI agents, this creates a significant prompt injection vulnerability, allowing an attacker to inject malicious instructions into other agents.
Capability Assessment
Purpose & Capability
The package description and SKILL.md claim Whisper STT, ElevenLabs TTS and (in v1.1) Twilio/Claude realtime call handling. The actual code provides CLI DB management, transcript storage, profile management, backups, and file-based interchange generation but contains no code that calls Whisper, ElevenLabs, Twilio, or external LLM APIs. Dependencies in package.json are only better-sqlite3, commander, and uuid. This is a substantive mismatch between claimed capabilities and implemented capabilities.
Instruction Scope
SKILL.md and VOICE_CALLING_SPEC.md describe use of child_process wrappers for sox/rec/ffplay, realtime WebSocket media servers, and many cloud API flows; none of those commands/APIs appear in the runtime code. The interchange generator writes .md files into a workspace-level 'interchange/voice' directory (three levels up from src), which will make conversation summaries available to other agents/tools on the same workspace. That file-write behavior is explicit and may expose transcripts or metadata beyond the local DB.
Install Mechanism
No external install script or remote downloads are declared; this is an instruction-plus-code skill that relies on standard npm packages (present in package.json and package-lock). There are no URLs or archive extracts in the install spec. Installing via npm would be the normal way to get native dependencies like better-sqlite3 (which has native build steps).
Credentials
Registry metadata lists no required env vars, but VOICE_CALLING_SPEC.md documents multiple sensitive environment variables (TWILIO_*, ELEVENLABS_API_KEY, ANTHROPIC_API_KEY, etc.) for planned features. The current code does not read those env vars, so requesting none is internally consistent for v1 — but the docs indicate future features that will require many secrets. Also, generateInterchange writes conversation summaries into a shared workspace directory; if you later enable networked TTS/STT or calling features, those transcripts could be shared externally if combined with other skills.
Persistence & Privilege
The skill does not request always:true and does not appear to modify other skills. It creates files and directories: data/voice.db, backups/, and workspace-level interchange/voice ops/state files. That gives it persistent disk state and the ability to expose data to other local skills via the interchange files — a functional but noteworthy level of presence.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install openclaw-voice
  3. After installation, invoke the skill by name or use /openclaw-voice
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release: Voice interaction. 10 tests.
Metadata
Slug openclaw-voice
Version 1.0.0
License
All-time Installs 4
Active Installs 4
Total Versions 1
Frequently Asked Questions

What is openclaw-voice?

Transcribe audio to text and generate spoken AI responses using Whisper and ElevenLabs via CLI with transcript storage and search. It is an AI Agent Skill for Claude Code / OpenClaw, with 722 downloads so far.

How do I install openclaw-voice?

Run "/install openclaw-voice" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is openclaw-voice free?

Yes, openclaw-voice is completely free (open-source). You can download, install and use it at no cost.

Which platforms does openclaw-voice support?

openclaw-voice is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created openclaw-voice?

It is built and maintained by frank-bot07 (@frank-bot07); the current version is v1.0.0.

💬 Comments