← Back to Skills Marketplace

Voice Agent

Name: Voice Agent
Author: ricardotrevisan

by Ricardo Trevisan · GitHub ↗ · v1.1.0

cross-platform ⚠ suspicious

3716

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install voice-agent

Description

Local Voice Input/Output for Agents using the AI Voice Agent API.

Usage Guidance

This client-only skill is coherent, but before installing/using it: 1) ensure you actually run and trust the backend that must be reachable at http://localhost:8000 (the backend will handle Whisper and AWS Polly and will hold any cloud credentials); review the backend source or run it locally in an isolated environment. 2) Be aware the client uploads the audio file you specify to localhost and writes synthesized audio to the output path you provide — avoid pointing it at sensitive files or to paths where overwriting is a risk. 3) The client reads entire files into memory for upload, so very large files may cause memory pressure. 4) If you rely on production AWS credentials, ensure the backend stores and uses them securely (not this client). If you want extra assurance, inspect and run the backend code locally before connecting the skill to non-test data.

Capability Analysis

Type: OpenClaw Skill Name: voice-agent Version: 1.1.0 The `scripts/client.py` file contains critical vulnerabilities that allow for arbitrary file read and write operations. The `transcribe` function reads the content of an arbitrary file path provided as an argument and sends it to `http://localhost:8000/transcribe`. Similarly, the `synthesize` function writes the generated audio content to an arbitrary file path provided as an argument. These flaws, while not explicitly malicious in intent, enable an attacker to read sensitive local files or write arbitrary content to any location on the filesystem, posing a significant risk for data exfiltration or local privilege escalation.

Capability Assessment

✓ Purpose & Capability

The name/description (local voice I/O) matches the included client.py and SKILL.md: the skill is a file-based client that calls a local backend for Whisper STT and AWS Polly TTS. It does claim use of 'local Whisper' and 'AWS Polly' but those services are invoked by the backend at localhost:8000, not by the client — this is reasonable and proportionate for a client-only skill.

✓ Instruction Scope

SKILL.md clearly limits runtime behavior to running the provided client script (transcribe, synthesize, health) against the local backend and explicitly forbids service management. The client uploads user-selected audio files to http://localhost:8000 and writes synthesized audio to a user-specified output path. It does not read other system files or access environment variables beyond standard Python operation.

✓ Install Mechanism

There is no install spec (instruction-only) and included code is zero-dependency Python using the stdlib urllib — nothing is downloaded or installed automatically. This is low-risk from an install perspective.

ℹ Credentials

The skill declares no required env vars or credentials, which is consistent because the client talks to a local backend. However, SKILL.md mentions AWS Polly and local Whisper; those will require credentials/configuration in the backend (not in this package). Users should be aware the backend — not this client — will hold any cloud credentials.

✓ Persistence & Privilege

The skill is not marked always:true, does not persist or modify other skills, and does not request elevated privileges. It is user-invocable and uses the agent only when invoked.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install voice-agent
After installation, invoke the skill by name or use /voice-agent
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.1.0

- Switched to client-only operation: service management and container startup are no longer included. - Now requires an external backend API running at http://localhost:8000; setup instructions moved to repo documentation. - Uses local Whisper for speech-to-text and AWS Polly for text-to-speech. - Updated documentation and workflows to clarify prerequisites and error handling. - Removed scripts/start.sh; skill no longer attempts to start backend services automatically.

v1.0.1

voice-agent 1.0.1 - Added documentation for starting the voice agent service if a health check fails or connection error occurs. - Updated example output file extension for synthesized audio from .ogg to .mp3 in documentation. - No functional code changes included.

v1.0.0

Initial release of the Voice Agent skill. - Enables local speech-to-text and text-to-speech interactions using the AI Voice Agent API. - Prioritizes audio input and output; responses to audio input are delivered primarily as audio files. - Provides clear workflow and guidelines to ensure seamless voice-based user interactions. - Includes easy-to-use commands for audio transcription, speech synthesis, and system health checks.

Metadata

Slug voice-agent

Version 1.1.0

License —

All-time Installs 28

Active Installs 27

Total Versions 3

Frequently Asked Questions

What is Voice Agent?

Local Voice Input/Output for Agents using the AI Voice Agent API. It is an AI Agent Skill for Claude Code / OpenClaw, with 3716 downloads so far.

How do I install Voice Agent?

Run "/install voice-agent" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Voice Agent free?

Yes, Voice Agent is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Voice Agent support?

Voice Agent is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Voice Agent?

It is built and maintained by Ricardo Trevisan (@ricardotrevisan); the current version is v1.1.0.

More Skills