Description

Premium Portuguese-Brazilian voice interface with neural TTS and Claude AI integration. Features wav2vec2-large-xlsr-53-ptBR for excellent PT-BR understandin...

README (SKILL.md)

Audio PT Auto-Reply v2.0.1 - Premium Voice Interface

Name: Audio PTBR
Author: henrique-simoes

Complete voice interface with superior Brazilian Portuguese understanding and automatic setup.

🌟 Key Features

Superior PT-BR Understanding

Model: wav2vec2-large-xlsr-53-portuguese (jonatasgrosman)
Excellence in: Brazilian Portuguese with slang, expressions, accents
Also supports: English (multilingual)
Quality: State-of-the-art for PT-BR ASR

🤖 Optional Claude Integration

Intelligent responses using Claude API
Falls back to OpenClaw agent automatically
Optional: No API key required, still works with OpenClaw agent
Smart: Better understanding of context and Portuguese nuances

Neural Voice Options (Piper TTS)

Voice	Gender	Quality	Character
jeff	Masculina	Medium	Clear, professional
cadu	Masculina	Medium	Warm, natural
faber	Masculina	Medium	Balanced
miro	Feminina	High	Community voice

Voice Commands

Change voice anytime with:

/voz jeff - Voice: Jeff
/voz cadu - Voice: Cadu
/voz faber - Voice: Faber
/voz miro - Voice: Miro (feminina)
/voz feminina - Automatic: miro
/voz masculina - Automatic: jeff
/voz listar - Show all voices

⚡ Installation (NEW!)

One-Command Installation

bash install.sh

The installer automatically:

✅ Detects your system architecture (ARM64, x86_64)
✅ Downloads Piper TTS
✅ Downloads 4 Brazilian Portuguese voice models (~240MB)
✅ Installs Python dependencies
✅ Validates everything works

No manual downloads. No configuration. Just one command!

🔄 Critical Rules

DEFAULT: AUDIO ONLY - NO TEXT

When user sends audio:

❌ NO transcription shown
❌ NO "Pesquisando...", "Gerando..."
❌ NO confirmations or explanations
✅ ONLY audio reply

TEXT MODE: Say "texto" or "responda em texto" explicitly

📊 Workflow

🎤 Audio Received (PT-BR/EN)
    ↓
🔤 Transcribe (wav2vec2 PT-BR - silent)
    ↓
🤖 AI Response (Claude API or OpenClaw Agent - silent)
    ↓
🗣️ Synthesize (Piper neural - silent)
    ↓
📤 Send Audio Reply (silent)

📁 Scripts

Installation & Setup

install.sh - Automatic installation (run once!)
health_check.py - Validate the installation

Core Processing

transcribe.py - wav2vec2 PT-BR speech recognition
synthesize.py - Piper TTS with voice selection
voice_config.py - Voice preference management
process.sh - Full workflow orchestration

AI Integration

claude_adapter.py - Claude API bridge (intelligent responses)

🔧 Configuration

Optional: Enable Claude Integration

For intelligent AI responses, set your API key:

export ANTHROPIC_API_KEY="sk-your-api-key"

Without this, the skill uses OpenClaw's agent (still great responses!).

Voice Configuration

Current voice is saved automatically in:

~/.openclaw/workspace/.audio_pt_voice_config

📊 Technical Details

ASR Model

Name: jonatasgrosman/wav2vec2-large-xlsr-53-portuguese
Training: Fine-tuned on PT-BR Common Voice + other datasets
Strengths: Brazilian slang, regional expressions, informal speech
License: Apache 2.0

TTS Engine

Engine: Piper (fast, local neural TTS)
Voices: 4 PT-BR options
Speed: Real-time on ARM64/x64
Format: Opus OGG (Telegram optimized)
License: MIT

AI Response (Optional)

Primary: Claude API (when API key provided)
Fallback: OpenClaw Agent (always available)
License: Claude API is proprietary; OpenClaw Agent is included

🚀 Getting Started

Install skill from ClaWHub
Run: bash install.sh
Restart: openclaw gateway restart
Use: Send audio messages, use /voz commands

📋 Requirements

OpenClaw 2026.4.10+
Python 3.8+
300MB free disk space (for voice models)
Internet connection (for initial downloads)
Optional: ANTHROPIC_API_KEY for Claude integration

🔒 Privacy & Security

✅ Audio transcription happens locally (wav2vec2 runs on your machine)
✅ Voice synthesis happens locally (Piper runs on your machine)
⚠️ AI responses:
- Without API key: Processed by OpenClaw Agent (check OpenClaw privacy)
- With API key: Sent to Anthropic (Claude respects prompt privacy per TOS)

📜 License

MIT - Free to use, modify, and redistribute

🙏 Credits

ASR: jonatasgrosman/wav2vec2-large-xlsr-53-portuguese
TTS: Piper by Rhasspy
AI: Claude API by Anthropic (optional)
Voices: Piper Voices repository + TarcisoAmorim community contribution

Usage Guidance

This package is not obviously malicious but contains actions you should review before running: 1) Inspect install.sh closely — it will download and extract binaries and voice models, run pip installs (torch, transformers, anthropic) and may invoke apt-get/homebrew (sudo may be required). Run it only in a controlled environment (VM, container, or a dedicated user account) if you don't fully trust it. 2) If you plan to use Claude, only set ANTHROPIC_API_KEY when you trust the code; otherwise the skill will fall back to local/OpenClaw responses. 3) Prefer installing Python dependencies in a virtualenv rather than globally; review and, if desired, pin or audit packages before pip installing. 4) Confirm network endpoints: GitHub releases and HuggingFace model URLs are used (expected) — verify these URLs if you need assurance. 5) If you want lower risk, run health_check.py and the individual scripts (transcribe/synthesize) manually first to validate behavior before running the one-command installer. 6) Because the registry metadata omits runtime tool requirements (python3, pip, ffmpeg, wget, jq), expect to manually satisfy those prerequisites or update the metadata. If you want, I can extract and summarize the claude_adapter.py, transcribe_universal.py and synthesize_universal.py files to highlight any code paths that contact external services or handle credentials.

Capability Analysis

Type: OpenClaw Skill Name: audio-ptbr-aprimorado Version: 2.0.1 The skill provides a Brazilian Portuguese voice interface using local models and the Claude API. It is classified as suspicious primarily due to an invasive installation script (install.sh) that executes sudo commands to install system packages and downloads external binaries from GitHub and HuggingFace. Additionally, the SKILL.md instructions explicitly direct the AI agent to suppress all text output, transcriptions, and confirmations, which—while consistent with the stated 'voice-only' user experience—could be leveraged to obfuscate the agent's actions. The bundle also includes a 'SECURITY_REVIEW_RESPONSE.md' that acknowledges these risks, but the combination of broad media permissions and suppressed output remains a significant attack surface.

Capability Tags

crypto

Capability Assessment

ℹ Purpose & Capability

The skill name and description (PT-BR ASR + local TTS + optional Claude integration) match the included scripts and workflow. However, the package/registry metadata declares no required binaries or env vars while the code and installer do require system tools (python3/pip, ffmpeg, wget/tar, jq is used at runtime) and optionally the ANTHROPIC_API_KEY. The implementation capabilities are appropriate for the stated purpose, but the metadata omission is inconsistent.

ℹ Instruction Scope

SKILL.md and process.sh limit runtime activity to: local transcription (wav2vec2), optional calls to Anthropic (when ANTHROPIC_API_KEY is present), local synthesis (Piper), and returning MEDIA: directives or calling openclaw CLI if available. There are no obvious instructions to read unrelated secrets or exfiltrate data. Caveats: process.sh invokes external commands and expects jq; environment.py auto-detection may read environment variables and filesystem paths (to detect OpenClaw/Claude contexts) — this is expected but broader than the minimal description. The documentation also instructs running install.sh which modifies system state (see install_mechanism).

⚠ Install Mechanism

There is no formal install spec in the registry, but an install.sh is included and described as 'one-command'. install.sh downloads Piper binaries from GitHub releases and voice models from HuggingFace (both reasonable sources), and uses wget/tar to extract them. It also attempts to auto-install system packages (python3, pip, ffmpeg) via apt-get or brew and runs pip install for heavyweight ML packages (torch/transformers/anthropic) globally unless a venv is present. That means: (a) arbitrary code (archives) are extracted to disk, (b) system package installation and global pip installs may require sudo/administrator rights, and (c) dependencies like jq and wget are relied upon but not declared in registry metadata. These behaviors increase risk and should be run only with review or in an isolated environment.

ℹ Credentials

The skill uses an optional ANTHROPIC_API_KEY for Claude integration (documented in SKILL.md), and otherwise uses non-secret environment values for configuration (AUDIO_VOICE, WORKSPACE, RESPONSE_TIMEOUT). The registry metadata lists no required env vars; the SKILL.md documents ANTHROPIC_API_KEY as optional. There are no requests for unrelated credentials. The main proportionality issue is the metadata omission of required runtime tools and the installer performing system package installs which effectively requires elevated privileges to set up.

✓ Persistence & Privilege

The skill does not request always:true and does not modify other skills. It writes files under the user's workspace (~/.openclaw/workspace, skill directory) and creates a voice config file there; this is a normal level of persistence for a skill. Installer actions and created files are limited to user workspace and downloaded binaries/models; nothing in the package attempts to alter other skills or global agent configuration beyond advising an openclaw restart.

Version History

v2.0.1

- Added _meta.json manifest for improved skill metadata and integration. - Introduced event hook for automatic audio processing on message receipt. - Added trigger for `/voz` command to configure or process audio via script. - Updated skill configuration for better audio processing permissions and scope. - No changes to core functionality; all enhancements focus on better integration and automation.

v2.0.0

- Added category and tags fields to SKILL.md for improved discoverability. - No functional or feature changes; documentation only.

v1.0.0

Major release: Complete redesign with premium neural voices, Claude AI integration, and one-command setup. - Now uses wav2vec2-large-xlsr-53-ptBR for superior Brazilian Portuguese (slang, accents, expressions); also supports English. - Adds optional Claude API for smarter, more natural AI responses with fallback to OpenClaw Agent. - Features four neural Piper TTS voices (3 masculine, 1 feminine); easy voice switching via commands. - Audio-only by default: no text output, confirmations, or message echoes. - Fast, one-command installation process that auto-downloads all requirements. - Privacy-focused: local transcription and TTS; only send data to Anthropic if API key is set.

Metadata

Slug audio-ptbr-aprimorado

Version 2.0.1

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 3

Frequently Asked Questions

What is Audio PTBR?

Premium Portuguese-Brazilian voice interface with neural TTS and Claude AI integration. Features wav2vec2-large-xlsr-53-ptBR for excellent PT-BR understandin... It is an AI Agent Skill for Claude Code / OpenClaw, with 113 downloads so far.

How do I install Audio PTBR?

Run "/install audio-ptbr-aprimorado" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Audio PTBR free?

Yes, Audio PTBR is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Audio PTBR support?

Audio PTBR is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Audio PTBR?

It is built and maintained by Rick (@henrique-simoes); the current version is v2.0.1.

More Skills

Audio PTBR