← Back to Skills Marketplace
ricardotrevisan

Voice Agent

by Ricardo Trevisan · GitHub ↗ · v1.1.0
cross-platform ⚠ suspicious
3716
Downloads
0
Stars
27
Active Installs
3
Versions
Install in OpenClaw
/install voice-agent
Description
Local Voice Input/Output for Agents using the AI Voice Agent API.
Usage Guidance
This client-only skill is coherent, but before installing/using it: 1) ensure you actually run and trust the backend that must be reachable at http://localhost:8000 (the backend will handle Whisper and AWS Polly and will hold any cloud credentials); review the backend source or run it locally in an isolated environment. 2) Be aware the client uploads the audio file you specify to localhost and writes synthesized audio to the output path you provide — avoid pointing it at sensitive files or to paths where overwriting is a risk. 3) The client reads entire files into memory for upload, so very large files may cause memory pressure. 4) If you rely on production AWS credentials, ensure the backend stores and uses them securely (not this client). If you want extra assurance, inspect and run the backend code locally before connecting the skill to non-test data.
Capability Analysis
Type: OpenClaw Skill Name: voice-agent Version: 1.1.0 The `scripts/client.py` file contains critical vulnerabilities that allow for arbitrary file read and write operations. The `transcribe` function reads the content of an arbitrary file path provided as an argument and sends it to `http://localhost:8000/transcribe`. Similarly, the `synthesize` function writes the generated audio content to an arbitrary file path provided as an argument. These flaws, while not explicitly malicious in intent, enable an attacker to read sensitive local files or write arbitrary content to any location on the filesystem, posing a significant risk for data exfiltration or local privilege escalation.
Capability Assessment
Purpose & Capability
The name/description (local voice I/O) matches the included client.py and SKILL.md: the skill is a file-based client that calls a local backend for Whisper STT and AWS Polly TTS. It does claim use of 'local Whisper' and 'AWS Polly' but those services are invoked by the backend at localhost:8000, not by the client — this is reasonable and proportionate for a client-only skill.
Instruction Scope
SKILL.md clearly limits runtime behavior to running the provided client script (transcribe, synthesize, health) against the local backend and explicitly forbids service management. The client uploads user-selected audio files to http://localhost:8000 and writes synthesized audio to a user-specified output path. It does not read other system files or access environment variables beyond standard Python operation.
Install Mechanism
There is no install spec (instruction-only) and included code is zero-dependency Python using the stdlib urllib — nothing is downloaded or installed automatically. This is low-risk from an install perspective.
Credentials
The skill declares no required env vars or credentials, which is consistent because the client talks to a local backend. However, SKILL.md mentions AWS Polly and local Whisper; those will require credentials/configuration in the backend (not in this package). Users should be aware the backend — not this client — will hold any cloud credentials.
Persistence & Privilege
The skill is not marked always:true, does not persist or modify other skills, and does not request elevated privileges. It is user-invocable and uses the agent only when invoked.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install voice-agent
  3. After installation, invoke the skill by name or use /voice-agent
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.1.0
- Switched to client-only operation: service management and container startup are no longer included. - Now requires an external backend API running at http://localhost:8000; setup instructions moved to repo documentation. - Uses local Whisper for speech-to-text and AWS Polly for text-to-speech. - Updated documentation and workflows to clarify prerequisites and error handling. - Removed scripts/start.sh; skill no longer attempts to start backend services automatically.
v1.0.1
voice-agent 1.0.1 - Added documentation for starting the voice agent service if a health check fails or connection error occurs. - Updated example output file extension for synthesized audio from .ogg to .mp3 in documentation. - No functional code changes included.
v1.0.0
Initial release of the Voice Agent skill. - Enables local speech-to-text and text-to-speech interactions using the AI Voice Agent API. - Prioritizes audio input and output; responses to audio input are delivered primarily as audio files. - Provides clear workflow and guidelines to ensure seamless voice-based user interactions. - Includes easy-to-use commands for audio transcription, speech synthesis, and system health checks.
Metadata
Slug voice-agent
Version 1.1.0
License
All-time Installs 28
Active Installs 27
Total Versions 3
Frequently Asked Questions

What is Voice Agent?

Local Voice Input/Output for Agents using the AI Voice Agent API. It is an AI Agent Skill for Claude Code / OpenClaw, with 3716 downloads so far.

How do I install Voice Agent?

Run "/install voice-agent" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Voice Agent free?

Yes, Voice Agent is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Voice Agent support?

Voice Agent is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Voice Agent?

It is built and maintained by Ricardo Trevisan (@ricardotrevisan); the current version is v1.1.0.

💬 Comments