audioclaw-skills-voice-intake
/install audioclaw-skills-voice-intake
AudioClaw Skills Voice Intake
When to use
Use this skill when the user sends a voice message and AudioClaw should understand the content before replying.
Common triggers:
- A Feishu or chat bot receives an audio message instead of text.
- AudioClaw needs a transcript plus a clean user message payload.
- The workflow wants richer ASR features such as timestamps, sentiment, or speaker separation.
- The team wants one stable AudioClaw intake entrypoint instead of hand-written ASR requests.
- The channel stores inbound voice files as
.oggor.opus, and AudioClaw still needs one stable ASR path.
Do not use this skill for speech output. Use $audioclaw-skills-voice-reply for TTS.
Workflow
- Save the incoming audio file locally.
- Run
scripts/openclaw_voice_intake.pywith the audio path. - Let the script choose the best model when no model is forced:
sense-asr-deepthinkfor normal single-speaker voice understandingsense-asrwhen a language hint is providedsense-asr-prowhen timestamps, sentiment, speaker diarization, or punctuation are requestedsense-asr-litewhen hotwords are requested
- Use the JSON manifest it returns as the AudioClaw handoff:
transcript.normalized_textopenclaw.turn_payloadrouting.selected_model
- If
understanding.clarification_neededistrue, ask the user to repeat or resend the audio.
Runtime model
Official HTTP ASR API:
- Endpoint:
https://api.senseaudio.cn/v1/audio/transcriptions - Content type:
multipart/form-data - File size limit:
\x3C=10MB - Practical local input suffixes accepted by this skill:
.wav,.mp3,.ogg,.opus,.flac,.aac,.m4a,.mp4
Supported response goals:
- plain transcript
- richer raw response passthrough
- AudioClaw-ready turn payload
The skill keeps two layers separate:
- ASR output from AudioClaw ASR
- AudioClaw packaging and clarification heuristics
API key lookup
This skill now treats SENSEAUDIO_API_KEY as the default API key source again.
Runtime rules:
- If the host app injects
SENSEAUDIO_API_KEYas an AudioClaw login token such asv2.public..., the shared bootstrap will replace it with the realsk-...value from~/.audioclaw/workspace/state/senseaudio_credentials.jsonbefore ASR starts. --api-key-envstill works, but the default runtime path isSENSEAUDIO_API_KEY.
Commands
Basic voice intake:
python3 scripts/openclaw_voice_intake.py \
--input /path/to/user_audio.mp3
Voice intake with richer AudioClaw structure:
python3 scripts/openclaw_voice_intake.py \
--input /path/to/meeting_clip.m4a \
--enable-punctuation \
--timestamp-granularity segment \
--enable-sentiment \
--out-json /tmp/openclaw_voice_turn.json
Force a specific model:
python3 scripts/openclaw_voice_intake.py \
--input /path/to/user_audio.mp3 \
--model sense-asr-deepthink
AudioClaw integration pattern
Recommended handoff:
- Channel adapter stores the inbound audio.
- AudioClaw calls
scripts/openclaw_voice_intake.py. - AudioClaw reads:
openclaw.turn_payload.roleopenclaw.turn_payload.contentopenclaw.turn_payload.metadata
- The normal dialogue pipeline continues as if the user typed the recognized text.
Operational rules:
- Keep the original audio path in metadata for debugging.
- Pass
languageonly when you are confident; otherwise let ASR auto-detect. - If you request timestamps, sentiment, or diarization, let the script choose
sense-asr-pro. - If transcript is empty, do not hallucinate a user intent. Ask for clarification.
Resources
scripts/senseaudio_asr_client.py- Multipart HTTP client for AudioClaw ASR
- Handles model routing validation and JSON or text responses
scripts/openclaw_voice_intake.py- Main runtime for AudioClaw
- Builds transcript, normalized user text, and turn payload
references/openclaw_voice_intake.md- Official ASR docs summary, model support notes, and AudioClaw payload examples
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install audioclaw-skills-voice-intake - 安装完成后,直接呼叫该 Skill 的名称或使用
/audioclaw-skills-voice-intake触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
audioclaw-skills-voice-intake 是什么?
Use when AudioClaw Skills needs to understand a user voice message with AudioClaw ASR, including speech-to-text, model routing for deepthink or pro features,... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 250 次。
如何安装 audioclaw-skills-voice-intake?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install audioclaw-skills-voice-intake」即可一键安装,无需额外配置。
audioclaw-skills-voice-intake 是免费的吗?
是的,audioclaw-skills-voice-intake 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
audioclaw-skills-voice-intake 支持哪些平台?
audioclaw-skills-voice-intake 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 audioclaw-skills-voice-intake?
由 Wu Ruixiao(@kikidouloveme79)开发并维护,当前版本 v1.0.1。