← 返回 Skills 市场
smallkeyboy

Audio Command Handler

作者 smallKeyboy · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
49
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install audio-command-handler
功能描述
Handle audio messages as commands. When user sends an audio file (WAV/PCM/MP3), transcribe it using iFlytek Speed Transcription and either (1) execute the tr...
使用说明 (SKILL.md)

Audio Command Handler

Process audio messages and execute them as commands.

Workflow

Scenario 1: Audio Only (No Text)

User sends an audio file without any text instruction:

  1. Transcribe the audio using ifly-speed-transcription skill
  2. Use transcription as the command - execute it as if the user typed it
  3. Return result directly - no file upload needed, regardless of length

Scenario 2: Audio + Text Command

User sends an audio file WITH a text instruction:

  1. Transcribe the audio using ifly-speed-transcription skill
  2. Execute the text command with the transcription as context/input
  3. Check result length:
    • If ≤ 58 characters: return result directly
    • If > 58 characters: save to file, upload via uploader skill, return URL

Quick Reference

Transcription

python3 ~/.openclaw/workspace/skills/ifly-speed-transcription/scripts/transcribe.py /path/to/audio.mp3

Upload

python3 ~/.openclaw/workspace/skills/uploader/scripts/upload_media.py /path/to/file.txt

Execution Flow

┌─────────────────┐
│  Audio Message  │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│   Transcribe    │
│ (ifly-speed-    │
│  transcription) │
└────────┬────────┘
         │
         ▼
┌─────────────────┐     NO      ┌──────────────┐
│ Has Text Cmd?   │────────────►│ Use Transcrip│
└────────┬────────┘              │ as Command   │
         │ YES                   └──────┬───────┘
         ▼                              │
┌─────────────────┐                     │
│ Execute Text    │                     │
│ Cmd with Trans  │                     │
│ Context         │                     │
└────────┬────────┘                     │
         │                              │
         │                              ▼
         │                    ┌──────────────┐
         │                    │ Return Direct│
         │                    │ to User      │
         │                    │ (no upload)  │
         │                    └──────────────┘
         │
         ▼
┌─────────────────┐
│ Result > 58 ch? │
└────────┬────────┘
         │
         ┌─────────────┴─────────────┐
         │ YES                       │ NO
         ▼                           ▼
┌─────────────────┐         ┌──────────────┐
│ Save to File    │         │ Return Direct│
│ Upload via      │         │ to User      │
│ uploader skill  │         └──────────────┘
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Return URL to   │
│ User            │
└─────────────────┘

Example Scenarios

Example 1: Audio Only

User sends: 🎤 audio file (speech: "帮我查一下明天上海的天气")

Flow:

  1. Transcribe → "帮我查一下明天上海的天气"
  2. Execute as command → check Shanghai weather for tomorrow
  3. Return weather info directly (no upload, regardless of length)

Example 2: Audio + Command (Short Result)

User sends: 🎤 audio file + text "帮我总结这段录音"

Flow:

  1. Transcribe audio → get text content
  2. Execute "帮我总结这段录音" with transcription as context
  3. If summary ≤ 58 chars → return directly

Example 3: Audio + Command (Long Result)

User sends: 🎤 audio file + text "帮我根据这段录音写一篇文章"

Flow:

  1. Transcribe audio → get text content
  2. Execute command with transcription as context
  3. Result > 58 chars → save to file, upload
  4. Return: "已生成内容,下载链接:https://..."

Notes

  • Audio formats: WAV, PCM, MP3 (16kHz, 16-bit, mono recommended)
  • Max duration: 5 hours
  • Language support: Chinese, English, 202+ Chinese dialects
  • Result threshold: 58 characters (configurable per implementation)
  • File location: Saved to ~/.openclaw/workspace/ before upload
安全使用建议
Key things to consider before installing: - Mismatch between docs and code: the README/skill description says it will "execute" transcribed audio as commands, but scripts/handle_audio.py only transcribes and prints/saves the result — it does not execute arbitrary shell commands. If you expected automated execution, do not assume it exists; conversely, if you worry about remote execution, the code is safer than the docs claim, but the docs could cause an agent to behave dangerously when chained with other skills. - External dependencies: this skill calls two other local scripts in ~/.openclaw/workspace/skills (ifly-speed-transcription and uploader). Verify those scripts exist, inspect them, and confirm where uploader actually sends files. The uploader could exfiltrate sensitive transcriptions to external endpoints. - Data exposure: transcription results are written into HTML files in ~/.openclaw/workspace and then handed to the uploader. If your audio contains sensitive data, confirm retention and access controls for that workspace and the uploader's storage. - Execution risk in orchestration: although this script does not exec transcription text, SKILL.md instructs agents to "use transcription as the command." If an agent or orchestrator follows the SKILL.md instead of the included script, that could lead to executing user-supplied text. Ensure your agent enforces safe execution policies and does not run arbitrary text as shell commands. - Recommended actions: inspect the ifly-speed-transcription and uploader skills (particularly uploader backend endpoints and auth), confirm the uploader's destination and access controls, and decide whether you need stricter sanitization or to remove auto-upload behavior. If you will rely on automatic command execution, require additional code review and strict input sanitization before enabling that behavior.
功能分析
Type: OpenClaw Skill Name: audio-command-handler Version: 1.0.0 The skill is classified as suspicious because it explicitly instructs the AI agent to execute transcribed audio content as system commands (SKILL.md), creating a significant Remote Command Execution (RCE) vulnerability. While this aligns with the stated purpose of an 'audio command handler,' it lacks any sanitization or safety constraints on the transcribed input. Furthermore, the script `scripts/handle_audio.py` facilitates the automatic upload of command results exceeding a very low threshold (58 characters) to an external service via a secondary `uploader` skill, which could be leveraged for data exfiltration of command outputs or system information.
能力评估
Purpose & Capability
The description and SKILL.md say audio can be transcribed and executed as commands. The shipped script transcribes audio, prepares output, and may save/upload results, but it does NOT actually execute arbitrary commands derived from the transcription. That is a substantive mismatch: the skill advertises command execution capability that the code does not implement.
Instruction Scope
SKILL.md instructs agents to run the ifly-speed-transcription and uploader scripts from specific workspace paths and describes executing transcriptions as commands. Those instructions grant broad discretion (execute user-provided text as commands) which is dangerous in general. The actual script does not perform shell execution, but the instructions still direct the agent to use other local scripts and to upload potentially sensitive content; this scope is broader than just transcription.
Install Mechanism
No install spec or remote downloads: the skill is instruction-only with a local Python script. Nothing is pulled from external URLs during install, so install-time risk is low.
Credentials
The skill declares no credentials or env vars. It does, however, depend on external skills (ifly-speed-transcription and uploader) located under ~/.openclaw/workspace. Those helper scripts (not included here) may require credentials or upload targets; this skill will forward transcript data to them, so secret handling/exfiltration risk depends on those other skills.
Persistence & Privilege
No elevated privileges requested: always is false, the skill does not modify other skill configs, and only writes files under the user's workspace directory. It uses subprocess to run other local scripts but does not install persistent agents or alter system-wide settings.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install audio-command-handler
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /audio-command-handler 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
- Initial release of the Audio Command Handler skill. - Supports audio message transcription via iFlytek Speed Transcription (WAV/PCM/MP3 files). - Executes transcribed audio as commands if no accompanying text is provided. - If both audio and text command are present, uses transcription as context for the command. - Automatically saves and uploads results longer than 58 characters when processing audio + text command scenarios. - Supports Chinese, English, and 202+ Chinese dialects; audio up to 5 hours.
元数据
Slug audio-command-handler
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Audio Command Handler 是什么?

Handle audio messages as commands. When user sends an audio file (WAV/PCM/MP3), transcribe it using iFlytek Speed Transcription and either (1) execute the tr... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 49 次。

如何安装 Audio Command Handler?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install audio-command-handler」即可一键安装,无需额外配置。

Audio Command Handler 是免费的吗?

是的,Audio Command Handler 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Audio Command Handler 支持哪些平台?

Audio Command Handler 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Audio Command Handler?

由 smallKeyboy(@smallkeyboy)开发并维护,当前版本 v1.0.0。

💬 留言讨论