← Back to Skills Marketplace
smallkeyboy

Audio Command Handler

by smallKeyboy · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
49
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install audio-command-handler
Description
Handle audio messages as commands. When user sends an audio file (WAV/PCM/MP3), transcribe it using iFlytek Speed Transcription and either (1) execute the tr...
README (SKILL.md)

Audio Command Handler

Process audio messages and execute them as commands.

Workflow

Scenario 1: Audio Only (No Text)

User sends an audio file without any text instruction:

  1. Transcribe the audio using ifly-speed-transcription skill
  2. Use transcription as the command - execute it as if the user typed it
  3. Return result directly - no file upload needed, regardless of length

Scenario 2: Audio + Text Command

User sends an audio file WITH a text instruction:

  1. Transcribe the audio using ifly-speed-transcription skill
  2. Execute the text command with the transcription as context/input
  3. Check result length:
    • If ≤ 58 characters: return result directly
    • If > 58 characters: save to file, upload via uploader skill, return URL

Quick Reference

Transcription

python3 ~/.openclaw/workspace/skills/ifly-speed-transcription/scripts/transcribe.py /path/to/audio.mp3

Upload

python3 ~/.openclaw/workspace/skills/uploader/scripts/upload_media.py /path/to/file.txt

Execution Flow

┌─────────────────┐
│  Audio Message  │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│   Transcribe    │
│ (ifly-speed-    │
│  transcription) │
└────────┬────────┘
         │
         ▼
┌─────────────────┐     NO      ┌──────────────┐
│ Has Text Cmd?   │────────────►│ Use Transcrip│
└────────┬────────┘              │ as Command   │
         │ YES                   └──────┬───────┘
         ▼                              │
┌─────────────────┐                     │
│ Execute Text    │                     │
│ Cmd with Trans  │                     │
│ Context         │                     │
└────────┬────────┘                     │
         │                              │
         │                              ▼
         │                    ┌──────────────┐
         │                    │ Return Direct│
         │                    │ to User      │
         │                    │ (no upload)  │
         │                    └──────────────┘
         │
         ▼
┌─────────────────┐
│ Result > 58 ch? │
└────────┬────────┘
         │
         ┌─────────────┴─────────────┐
         │ YES                       │ NO
         ▼                           ▼
┌─────────────────┐         ┌──────────────┐
│ Save to File    │         │ Return Direct│
│ Upload via      │         │ to User      │
│ uploader skill  │         └──────────────┘
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Return URL to   │
│ User            │
└─────────────────┘

Example Scenarios

Example 1: Audio Only

User sends: 🎤 audio file (speech: "帮我查一下明天上海的天气")

Flow:

  1. Transcribe → "帮我查一下明天上海的天气"
  2. Execute as command → check Shanghai weather for tomorrow
  3. Return weather info directly (no upload, regardless of length)

Example 2: Audio + Command (Short Result)

User sends: 🎤 audio file + text "帮我总结这段录音"

Flow:

  1. Transcribe audio → get text content
  2. Execute "帮我总结这段录音" with transcription as context
  3. If summary ≤ 58 chars → return directly

Example 3: Audio + Command (Long Result)

User sends: 🎤 audio file + text "帮我根据这段录音写一篇文章"

Flow:

  1. Transcribe audio → get text content
  2. Execute command with transcription as context
  3. Result > 58 chars → save to file, upload
  4. Return: "已生成内容,下载链接:https://..."

Notes

  • Audio formats: WAV, PCM, MP3 (16kHz, 16-bit, mono recommended)
  • Max duration: 5 hours
  • Language support: Chinese, English, 202+ Chinese dialects
  • Result threshold: 58 characters (configurable per implementation)
  • File location: Saved to ~/.openclaw/workspace/ before upload
Usage Guidance
Key things to consider before installing: - Mismatch between docs and code: the README/skill description says it will "execute" transcribed audio as commands, but scripts/handle_audio.py only transcribes and prints/saves the result — it does not execute arbitrary shell commands. If you expected automated execution, do not assume it exists; conversely, if you worry about remote execution, the code is safer than the docs claim, but the docs could cause an agent to behave dangerously when chained with other skills. - External dependencies: this skill calls two other local scripts in ~/.openclaw/workspace/skills (ifly-speed-transcription and uploader). Verify those scripts exist, inspect them, and confirm where uploader actually sends files. The uploader could exfiltrate sensitive transcriptions to external endpoints. - Data exposure: transcription results are written into HTML files in ~/.openclaw/workspace and then handed to the uploader. If your audio contains sensitive data, confirm retention and access controls for that workspace and the uploader's storage. - Execution risk in orchestration: although this script does not exec transcription text, SKILL.md instructs agents to "use transcription as the command." If an agent or orchestrator follows the SKILL.md instead of the included script, that could lead to executing user-supplied text. Ensure your agent enforces safe execution policies and does not run arbitrary text as shell commands. - Recommended actions: inspect the ifly-speed-transcription and uploader skills (particularly uploader backend endpoints and auth), confirm the uploader's destination and access controls, and decide whether you need stricter sanitization or to remove auto-upload behavior. If you will rely on automatic command execution, require additional code review and strict input sanitization before enabling that behavior.
Capability Analysis
Type: OpenClaw Skill Name: audio-command-handler Version: 1.0.0 The skill is classified as suspicious because it explicitly instructs the AI agent to execute transcribed audio content as system commands (SKILL.md), creating a significant Remote Command Execution (RCE) vulnerability. While this aligns with the stated purpose of an 'audio command handler,' it lacks any sanitization or safety constraints on the transcribed input. Furthermore, the script `scripts/handle_audio.py` facilitates the automatic upload of command results exceeding a very low threshold (58 characters) to an external service via a secondary `uploader` skill, which could be leveraged for data exfiltration of command outputs or system information.
Capability Assessment
Purpose & Capability
The description and SKILL.md say audio can be transcribed and executed as commands. The shipped script transcribes audio, prepares output, and may save/upload results, but it does NOT actually execute arbitrary commands derived from the transcription. That is a substantive mismatch: the skill advertises command execution capability that the code does not implement.
Instruction Scope
SKILL.md instructs agents to run the ifly-speed-transcription and uploader scripts from specific workspace paths and describes executing transcriptions as commands. Those instructions grant broad discretion (execute user-provided text as commands) which is dangerous in general. The actual script does not perform shell execution, but the instructions still direct the agent to use other local scripts and to upload potentially sensitive content; this scope is broader than just transcription.
Install Mechanism
No install spec or remote downloads: the skill is instruction-only with a local Python script. Nothing is pulled from external URLs during install, so install-time risk is low.
Credentials
The skill declares no credentials or env vars. It does, however, depend on external skills (ifly-speed-transcription and uploader) located under ~/.openclaw/workspace. Those helper scripts (not included here) may require credentials or upload targets; this skill will forward transcript data to them, so secret handling/exfiltration risk depends on those other skills.
Persistence & Privilege
No elevated privileges requested: always is false, the skill does not modify other skill configs, and only writes files under the user's workspace directory. It uses subprocess to run other local scripts but does not install persistent agents or alter system-wide settings.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install audio-command-handler
  3. After installation, invoke the skill by name or use /audio-command-handler
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
- Initial release of the Audio Command Handler skill. - Supports audio message transcription via iFlytek Speed Transcription (WAV/PCM/MP3 files). - Executes transcribed audio as commands if no accompanying text is provided. - If both audio and text command are present, uses transcription as context for the command. - Automatically saves and uploads results longer than 58 characters when processing audio + text command scenarios. - Supports Chinese, English, and 202+ Chinese dialects; audio up to 5 hours.
Metadata
Slug audio-command-handler
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Audio Command Handler?

Handle audio messages as commands. When user sends an audio file (WAV/PCM/MP3), transcribe it using iFlytek Speed Transcription and either (1) execute the tr... It is an AI Agent Skill for Claude Code / OpenClaw, with 49 downloads so far.

How do I install Audio Command Handler?

Run "/install audio-command-handler" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Audio Command Handler free?

Yes, Audio Command Handler is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Audio Command Handler support?

Audio Command Handler is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Audio Command Handler?

It is built and maintained by smallKeyboy (@smallkeyboy); the current version is v1.0.0.

💬 Comments