功能描述

Brazilian Portuguese voice auto-reply skill for OpenClaw. Transcribes audio locally with wav2vec2, generates a reply with the local OpenClaw agent by default...

使用说明 (SKILL.md)

Audio PT Auto-Reply 🎙️🇧🇷

Name: Transcrição e respostas em áudio em PTBR, Português Brasil - Brazillian portuguese transcription and audio answers
Author: henrique-simoes

Talk to your OpenClaw agent in Brazilian Portuguese, and get a voice reply back.

That is the whole idea.

You send an audio message. The skill transcribes it locally. Your OpenClaw agent answers. The answer comes back as audio. 🔊

No big platform. No weird magic. Just a small, useful voice loop for people who would rather speak than type.

Why this exists ✨

Typing is not always the best way to talk to an agent.

Sometimes you are on your phone. Sometimes you are walking. Sometimes you are doing something else. Sometimes voice just feels more natural.

Audio PT Auto-Reply gives OpenClaw a simple PT-BR voice workflow that feels closer to messaging a person than operating a tool.

It is especially useful for Telegram-style interactions, accessibility workflows, quick mobile replies, and hands-busy situations.

What it does 🧩

Audio PT Auto-Reply adds a focused voice pipeline to OpenClaw:

🎧 transcribes Brazilian Portuguese audio locally with jonatasgrosman/wav2vec2-large-xlsr-53-portuguese
🧠 asks your local OpenClaw agent to generate a short reply by default
☁️ can optionally use Anthropic only when ANTHROPIC_API_KEY is set
🗣️ turns the answer into speech with local Piper voices
🎚️ lets you choose voices with /voz
🩺 includes a health check so setup problems are easier to find

What it does not do 🚧

This skill is intentionally small and careful.

It does not request sudo. It does not install system packages behind your back. It does not modify other skills. It does not read unrelated files. It does not upload audio files to third-party services. It does not ship a public automatic audio hook that expands untrusted template values inside a shell command.

That last part matters.

Earlier hook-based builds were too easy to make risky because values like {{MediaPath}} could be expanded by the platform into a shell command before the skill code had a chance to validate them.

So this build keeps the useful part, the voice pipeline, and removes the risky public hook surface. Safer, cleaner, easier to review. 🛡️

Privacy model 🔒

By default, the skill is local-first:

audio transcription runs locally
speech synthesis runs locally
response generation uses the local OpenClaw CLI
audio files are not uploaded by this skill

Optional external mode:

if ANTHROPIC_API_KEY is present, transcript text may be sent to Anthropic for response generation
audio is still not uploaded by this skill
unset ANTHROPIC_API_KEY to keep response generation local

Install ⚙️

Run:

bash install.sh

The installer creates a virtualenv inside the skill directory, installs Python dependencies there, downloads Piper, downloads PT-BR voices, writes the default voice config, and runs a health check.

It expects these system dependencies to already exist:

python3
ffmpeg
tar
curl or wget

If something is missing, the installer stops and tells you what to install manually.

Use 🗣️

List available voices:

/voz listar

Choose a voice:

/voz jeff
/voz cadu
/voz faber
/voz miro
/voz feminina
/voz masculina

Process an audio file manually:

bash process.sh --audio-file /absolute/path/to/audio.ogg

When synthesis succeeds, the script prints a MEDIA: directive pointing to the generated voice reply.

Optional environment variables 🧰

ANTHROPIC_API_KEY     Enables Anthropic response generation
AUDIO_VOICE           Sets the default voice
RESPONSE_TIMEOUT      Response timeout in seconds, default 30
SYNTHESIS_TIMEOUT     Synthesis timeout in seconds, default 45
WORKSPACE             Overrides the OpenClaw workspace path
PYTHON_BIN            Overrides the Python executable used by install.sh

Safety note for hooks 🛡️

This public package does not register an automatic message.audio.receive hook.

That is deliberate.

Shell-templated hooks can become unsafe when the platform expands values like media paths, targets, or message IDs into a shell command string before your script receives them.

For public distribution, the safer choice is to ship the voice pipeline without that hook. LOCAL_HOOK_EXAMPLE.md exists only for local operators who understand the risk and want to wire a hook manually in a controlled environment.

Files included 📦

install.sh                         Installer with local virtualenv setup
process.sh                         Main voice-processing entry point
health_check.py                    Setup validation
LOCAL_HOOK_EXAMPLE.md              Local-only hook notes
requirements.txt                   Required Python dependencies
requirements-optional.txt          Optional Anthropic dependency
scripts/transcribe_universal.py    Local PT-BR transcription
scripts/claude_adapter.py          OpenClaw or optional Anthropic response generation
scripts/synthesize_universal.py    Piper TTS synthesis
scripts/voice_config.py            Voice selection storage

Good fit ✅

Use this skill if you want a small Portuguese voice loop for OpenClaw, especially when you care about local transcription, local speech synthesis, and a public package that avoids unnecessary permission creep.

It is not trying to be a full voice assistant platform.

It is just a focused voice-reply helper: audio in, agent response, audio out. 🎙️→🧠→🔊

安全使用建议

This skill looks coherent and implements a local PT‑BR voice loop. Before installing: ensure you have python3, ffmpeg, tar and curl/wget; be prepared for large Python packages (torch, transformers) and model downloads that use disk space and may take time. The installer downloads Piper from GitHub and voices from Hugging Face — check those URLs if you need to verify provenance. If you set ANTHROPIC_API_KEY, transcript text may be sent to Anthropic (documented); leave it unset to keep response generation local. The skill will call the local 'openclaw' CLI when available (to run inference or send media); only enable that in environments where you trust the OpenClaw CLI and targets. Do not register the example shell‑templated hook publicly — the repo explicitly warns that such hooks are risky. If you want extra assurance, review the downloaded Piper binary and the voice files after install and run the included health_check.py before using the skill.

功能分析

Type: OpenClaw Skill Name: audio-ptbr-autoreply Version: 2.1.3 The skill bundle provides a legitimate local-first Brazilian Portuguese voice processing pipeline using wav2vec2 for transcription and Piper for speech synthesis. The code demonstrates high security awareness, specifically in SKILL.md and LOCAL_HOOK_EXAMPLE.md, where the author explains the removal of automated hooks to prevent potential shell-injection vulnerabilities. All external artifacts (Piper binaries and voice models) are fetched from reputable sources like GitHub and HuggingFace, and the optional use of the Anthropic API is transparently implemented without evidence of data exfiltration or malicious intent.

能力标签

requires-sensitive-credentials

能力评估

✓ Purpose & Capability

Name/description match the implementation: local wav2vec2 transcription, local Piper TTS, optional Anthropic usage, and local OpenClaw CLI integration. Required artifacts (torch/transformers, Piper, HF voice files) are appropriate for this functionality.

✓ Instruction Scope

SKILL.md and scripts limit actions to transcription, response generation, and synthesis. The skill reads only the provided audio file and a small voice config in the OpenClaw workspace. It calls the OpenClaw CLI and Anthropic only where documented; it does not read arbitrary system files or upload audio unconditionally.

ℹ Install Mechanism

Installer creates a local virtualenv and installs pinned Python packages, downloads Piper from a GitHub release and voice models from Hugging Face. These sources are standard for this use case; tar extraction and binary placement into the workspace are expected but do write files to disk and require the stated OS tools (python3, ffmpeg, tar, curl/wget).

✓ Credentials

No secret env vars are required by default. Optional ANTHROPIC_API_KEY is documented and only used if present; other env vars (WORKSPACE, AUDIO_VOICE, timeouts) are reasonable. The skill may call local 'openclaw' commands (to infer/send) if the CLI is present — that is coherent with the stated integration but gives the skill the ability to send messages via the CLI when invoked with a target.

✓ Persistence & Privilege

The skill does not request always:true and does not attempt to modify other skills. Installer writes a local virtualenv, downloads binaries/models into the workspace, and writes a small voice config file in the OpenClaw workspace — this is expected for functionality and is documented.

版本历史

v2.1.3

**Audio PT Auto-Reply 2.1.3 – More concise and user-friendly documentation** - Overhauled skill documentation to be clearer, more concise, and more accessible to new users. - Added quick-start explanations and highlighted the intended use case and privacy model. - Clarified what the skill does and does not do, with strong safety notes regarding hooks. - Expanded install and usage instructions with practical examples. - No underlying code or behavior changes in this release.

v2.1.2

- Added _meta.json file for improved metadata handling. - Version bump from 2.1.1 to 2.1.2; no functional code changes. - Improved README.

v2.1.1

**Important: Automatic audio reply hooks are now removed from public packages for safety.** - Removed automatic message.audio.receive shell hook from SKILL.md to avoid code injection risk. - Added LOCAL_HOOK_EXAMPLE.md with guidance and a risk warning for local hook configuration. - Updated SKILL.md: clarified safety boundaries, revised triggers, added manual usage instructions. - No changes to core processing scripts or installer; all core functionality remains local-first and reviewable.

v2.1.0

**Scope streamlined for safety, local-first operation, and optional cloud AI.** - Refactored to use only local virtualenv for Python dependencies; installer no longer auto-installs OS packages or requests sudo. - Removed nonessential files (documentation, React UI, global scripts) for a narrower, reviewable codebase. - All core audio processing and TTS/ASR now run locally; optional Anthropic integration only if ANTHROPIC_API_KEY is set. - `/voz` commands now only set/list voice, not broader configuration. - Safer installer: explicitly stops and informs if OS dependencies are missing. - No audio uploads to third-party services; no global system modifications.

v2.0.2

audio-ptbr-autoreply v2.0.2 - Added SECURITY_REVIEW_RESPONSE.md for improved security transparency. - Added _meta.json metadata file. - No code or functional changes; documentation and compliance updates only.

v2.0.1

- Added cross-platform and integration guides (CROSS_PLATFORM_GUIDE.md, INTEGRATION_GUIDE.md). - Introduced a refactoring summary (REFACTORING_SUMMARY.md) documenting recent code changes. - Implemented new modular interface and processing scripts: AudioPTInterface.jsx, environment.py, synthesize_universal.py, transcribe_universal.py. - Removed redundant SUMMARY.md documentation file.

v2.0.0

Support for other agents Better installation UX Better code for integration with messaging apps

v1.0.0

Audio-ptbr-autoreply v1.0.0 – Initial release - Premium Brazilian Portuguese voice interface using wav2vec2-large-xlsr-53-ptBR for advanced speech recognition, including slang and expressions. - Supports English audio as well. - Offers neural voice options (3 masculine, 2 feminine) with easy switching using the /voz command. - Defaults to audio-in, audio-out mode with no text output or confirmations unless explicitly requested. - Includes scripts for transcription, synthesis, voice preference, and workflow automation. - Optimized for Telegram with Opus OGG audio format and fast, local TTS processing. Audio-ptbr-autoreply v1.0.0 – Lançamento inicial * Interface de voz premium em português brasileiro usando wav2vec2-large-xlsr-53-ptBR para reconhecimento avançado de fala, incluindo gírias e expressões. * Também oferece suporte a áudio em inglês. * Disponibiliza opções de voz neural (3 masculinas, 2 femininas), com troca fácil usando o comando /voz. * Funciona por padrão no modo entrada por áudio e saída por áudio, sem exibir texto ou confirmações, a menos que seja solicitado explicitamente. * Inclui scripts para transcrição, síntese, preferência de voz e automação de fluxo de trabalho. * Otimizado para Telegram com formato de áudio Opus OGG e processamento TTS local rápido.

元数据

Slug audio-ptbr-autoreply

版本 2.1.3

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 8

常见问题

Transcrição e respostas em áudio em PTBR, Português Brasil - Brazillian portuguese transcription and audio answers 是什么？

Brazilian Portuguese voice auto-reply skill for OpenClaw. Transcribes audio locally with wav2vec2, generates a reply with the local OpenClaw agent by default... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 194 次。

如何安装 Transcrição e respostas em áudio em PTBR, Português Brasil - Brazillian portuguese transcription and audio answers？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install audio-ptbr-autoreply」即可一键安装，无需额外配置。

Transcrição e respostas em áudio em PTBR, Português Brasil - Brazillian portuguese transcription and audio answers 是免费的吗？

是的，Transcrição e respostas em áudio em PTBR, Português Brasil - Brazillian portuguese transcription and audio answers 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Transcrição e respostas em áudio em PTBR, Português Brasil - Brazillian portuguese transcription and audio answers 支持哪些平台？

Transcrição e respostas em áudio em PTBR, Português Brasil - Brazillian portuguese transcription and audio answers 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Transcrição e respostas em áudio em PTBR, Português Brasil - Brazillian portuguese transcription and audio answers？

由 Rick（@henrique-simoes）开发并维护，当前版本 v2.1.3。

Transcrição e respostas em áudio em PTBR, Português Brasil - Brazillian portuguese transcription and audio answers