← Back to Skills Marketplace
197
Downloads
0
Stars
1
Active Installs
1
Versions
Install in OpenClaw
/install speech-translation
Description
Build, adapt, or run an audio-processing workflow that takes spoken audio, transcribes it with Whisper or faster-whisper, translates the transcript using the...
Usage Guidance
This skill appears to do exactly what it says: local transcription, translation orchestration, and TTS. Before installing or running it: 1) Only configure notification command templates (or environment variables that set them) from trusted sources — the notifier code executes those commands with shell=True and may run arbitrary processes. 2) Prefer the provided mock senders for testing to avoid accidental data leaks. 3) Run the pipeline in a sandbox or isolated environment if you plan to use service backends or custom command hooks. 4) Validate any translation_service_url you provide and avoid pointing it to untrusted endpoints. 5) Keep the external piper binary and Python dependencies installed from trusted channels. If you want, I can point to the exact lines where subprocess.run with shell=True and unescaped formatting is used so you can review or harden them.
Capability Analysis
Type: OpenClaw Skill
Name: speech-translation
Version: 1.0.0
The skill bundle contains multiple instances of potential command injection vulnerabilities where external commands are executed via `subprocess.run(..., shell=True)` using templates provided by environment variables or CLI arguments (specifically in `scripts/send_audio.py`, `scripts/send_text.py`, and `scripts/voice_translate_app/notifier.py`). While these are designed as notification hooks for the translation pipeline, the lack of strict validation on the command templates poses a significant RCE risk. However, the code logic and agent instructions in `SKILL.md` appear consistent with the stated purpose of audio processing and translation, with no clear evidence of intentional malice or data exfiltration.
Capability Assessment
Purpose & Capability
The name/description match the included Python pipeline: transcription (faster-whisper or mock), translation (agent-LLM, manual, or HTTP service), and TTS (piper or mock). There are no unrelated required credentials, binaries, or config paths; the code and docs consistently implement the described workflows.
Instruction Scope
SKILL.md stays on-purpose (chat-native vs local pipeline, LLM-assisted default). It explicitly supports notification hooks that run external commands to report stages; those hooks (and send_text/send_audio helpers) cause the runtime to execute arbitrary shell commands when configured. This is expected for a pipeline but expands the skill's runtime actions beyond pure file IO/network calls — treat notification command templates as sensitive configuration.
Install Mechanism
No install spec (instruction-only + bundled scripts). That lowers risk because nothing is downloaded or installed by the registry; the repo only contains local Python scripts and shell wrappers. The runtime does require optional third-party packages (faster-whisper, requests) and an external 'piper' binary for full functionality, per the README.
Credentials
The skill declares no required env vars or credentials, which matches the code. However several scripts read optional environment variables (VOICE_TRANSLATE_TEXT_COMMAND_TEMPLATE, VOICE_TRANSLATE_AUDIO_COMMAND_TEMPLATE) as command templates. These are not required but, if set, control what external commands are run and could be used to exfiltrate data if misconfigured or supplied by an attacker.
Persistence & Privilege
The skill does not request persistent/always-on inclusion, does not modify other skills or system settings, and does not demand elevated privileges. It runs as a normal on-demand pipeline invoked by the agent/user.
How to Use
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install speech-translation - After installation, invoke the skill by name or use
/speech-translation - Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
- Initial release of the voice translation skill.
- Supports audio transcription with Whisper or faster-whisper, translation by the current agent model, and speech synthesis using Piper, OpenClaw tts, or a mock backend.
- Offers two modes: chat-native voice translation and a deterministic local file-based pipeline.
- Ensures consistent output order: transcript, translation, then translated audio.
- Includes resources, references, and scripts for setup, orchestration, and backend selection.
- Designed for both interactive chat and automated batch workflows.
Metadata
Frequently Asked Questions
What is speech-translation?
Build, adapt, or run an audio-processing workflow that takes spoken audio, transcribes it with Whisper or faster-whisper, translates the transcript using the... It is an AI Agent Skill for Claude Code / OpenClaw, with 197 downloads so far.
How do I install speech-translation?
Run "/install speech-translation" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is speech-translation free?
Yes, speech-translation is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does speech-translation support?
speech-translation is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created speech-translation?
It is built and maintained by decin (@decin); the current version is v1.0.0.
More Skills