功能描述

Handle audio messages as commands. When user sends an audio file (WAV/PCM/MP3), transcribe it using iFlytek Speed Transcription and either (1) execute the tr...

使用说明 (SKILL.md)

Audio Command Handler

Name: Audio Command Handler
Author: smallkeyboy

Process audio messages and execute them as commands.

Workflow

Scenario 1: Audio Only (No Text)

User sends an audio file without any text instruction:

Transcribe the audio using ifly-speed-transcription skill
Use transcription as the command - execute it as if the user typed it
Return result directly - no file upload needed, regardless of length

Scenario 2: Audio + Text Command

User sends an audio file WITH a text instruction:

Transcribe the audio using ifly-speed-transcription skill
Execute the text command with the transcription as context/input
Check result length:
- If ≤ 58 characters: return result directly
- If > 58 characters: save to file, upload via uploader skill, return URL

Quick Reference

Transcription

python3 ~/.openclaw/workspace/skills/ifly-speed-transcription/scripts/transcribe.py /path/to/audio.mp3

Upload

python3 ~/.openclaw/workspace/skills/uploader/scripts/upload_media.py /path/to/file.txt

Execution Flow

┌─────────────────┐
│  Audio Message  │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│   Transcribe    │
│ (ifly-speed-    │
│  transcription) │
└────────┬────────┘
         │
         ▼
┌─────────────────┐     NO      ┌──────────────┐
│ Has Text Cmd?   │────────────►│ Use Transcrip│
└────────┬────────┘              │ as Command   │
         │ YES                   └──────┬───────┘
         ▼                              │
┌─────────────────┐                     │
│ Execute Text    │                     │
│ Cmd with Trans  │                     │
│ Context         │                     │
└────────┬────────┘                     │
         │                              │
         │                              ▼
         │                    ┌──────────────┐
         │                    │ Return Direct│
         │                    │ to User      │
         │                    │ (no upload)  │
         │                    └──────────────┘
         │
         ▼
┌─────────────────┐
│ Result > 58 ch? │
└────────┬────────┘
         │
         ┌─────────────┴─────────────┐
         │ YES                       │ NO
         ▼                           ▼
┌─────────────────┐         ┌──────────────┐
│ Save to File    │         │ Return Direct│
│ Upload via      │         │ to User      │
│ uploader skill  │         └──────────────┘
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Return URL to   │
│ User            │
└─────────────────┘

Example Scenarios

Example 1: Audio Only

User sends: 🎤 audio file (speech: "帮我查一下明天上海的天气")

Flow:

Transcribe → "帮我查一下明天上海的天气"
Execute as command → check Shanghai weather for tomorrow
Return weather info directly (no upload, regardless of length)

Example 2: Audio + Command (Short Result)

User sends: 🎤 audio file + text "帮我总结这段录音"

Flow:

Transcribe audio → get text content
Execute "帮我总结这段录音" with transcription as context
If summary ≤ 58 chars → return directly

Example 3: Audio + Command (Long Result)

User sends: 🎤 audio file + text "帮我根据这段录音写一篇文章"

Flow:

Transcribe audio → get text content
Execute command with transcription as context
Result > 58 chars → save to file, upload
Return: "已生成内容，下载链接：https://..."

Notes

Audio formats: WAV, PCM, MP3 (16kHz, 16-bit, mono recommended)
Max duration: 5 hours
Language support: Chinese, English, 202+ Chinese dialects
Result threshold: 58 characters (configurable per implementation)
File location: Saved to ~/.openclaw/workspace/ before upload

安全使用建议

Key things to consider before installing: - Mismatch between docs and code: the README/skill description says it will "execute" transcribed audio as commands, but scripts/handle_audio.py only transcribes and prints/saves the result — it does not execute arbitrary shell commands. If you expected automated execution, do not assume it exists; conversely, if you worry about remote execution, the code is safer than the docs claim, but the docs could cause an agent to behave dangerously when chained with other skills. - External dependencies: this skill calls two other local scripts in ~/.openclaw/workspace/skills (ifly-speed-transcription and uploader). Verify those scripts exist, inspect them, and confirm where uploader actually sends files. The uploader could exfiltrate sensitive transcriptions to external endpoints. - Data exposure: transcription results are written into HTML files in ~/.openclaw/workspace and then handed to the uploader. If your audio contains sensitive data, confirm retention and access controls for that workspace and the uploader's storage. - Execution risk in orchestration: although this script does not exec transcription text, SKILL.md instructs agents to "use transcription as the command." If an agent or orchestrator follows the SKILL.md instead of the included script, that could lead to executing user-supplied text. Ensure your agent enforces safe execution policies and does not run arbitrary text as shell commands. - Recommended actions: inspect the ifly-speed-transcription and uploader skills (particularly uploader backend endpoints and auth), confirm the uploader's destination and access controls, and decide whether you need stricter sanitization or to remove auto-upload behavior. If you will rely on automatic command execution, require additional code review and strict input sanitization before enabling that behavior.

功能分析

Type: OpenClaw Skill Name: audio-command-handler Version: 1.0.0 The skill is classified as suspicious because it explicitly instructs the AI agent to execute transcribed audio content as system commands (SKILL.md), creating a significant Remote Command Execution (RCE) vulnerability. While this aligns with the stated purpose of an 'audio command handler,' it lacks any sanitization or safety constraints on the transcribed input. Furthermore, the script `scripts/handle_audio.py` facilitates the automatic upload of command results exceeding a very low threshold (58 characters) to an external service via a secondary `uploader` skill, which could be leveraged for data exfiltration of command outputs or system information.

能力评估

⚠ Purpose & Capability

The description and SKILL.md say audio can be transcribed and executed as commands. The shipped script transcribes audio, prepares output, and may save/upload results, but it does NOT actually execute arbitrary commands derived from the transcription. That is a substantive mismatch: the skill advertises command execution capability that the code does not implement.

⚠ Instruction Scope

SKILL.md instructs agents to run the ifly-speed-transcription and uploader scripts from specific workspace paths and describes executing transcriptions as commands. Those instructions grant broad discretion (execute user-provided text as commands) which is dangerous in general. The actual script does not perform shell execution, but the instructions still direct the agent to use other local scripts and to upload potentially sensitive content; this scope is broader than just transcription.

✓ Install Mechanism

No install spec or remote downloads: the skill is instruction-only with a local Python script. Nothing is pulled from external URLs during install, so install-time risk is low.

ℹ Credentials

The skill declares no credentials or env vars. It does, however, depend on external skills (ifly-speed-transcription and uploader) located under ~/.openclaw/workspace. Those helper scripts (not included here) may require credentials or upload targets; this skill will forward transcript data to them, so secret handling/exfiltration risk depends on those other skills.

✓ Persistence & Privilege

No elevated privileges requested: always is false, the skill does not modify other skill configs, and only writes files under the user's workspace directory. It uses subprocess to run other local scripts but does not install persistent agents or alter system-wide settings.

版本历史

v1.0.0

- Initial release of the Audio Command Handler skill. - Supports audio message transcription via iFlytek Speed Transcription (WAV/PCM/MP3 files). - Executes transcribed audio as commands if no accompanying text is provided. - If both audio and text command are present, uses transcription as context for the command. - Automatically saves and uploads results longer than 58 characters when processing audio + text command scenarios. - Supports Chinese, English, and 202+ Chinese dialects; audio up to 5 hours.

元数据

Slug audio-command-handler

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题