audioclaw-skills-voice-intake
/install audioclaw-skills-voice-intake
AudioClaw Skills Voice Intake
When to use
Use this skill when the user sends a voice message and AudioClaw should understand the content before replying.
Common triggers:
- A Feishu or chat bot receives an audio message instead of text.
- AudioClaw needs a transcript plus a clean user message payload.
- The workflow wants richer ASR features such as timestamps, sentiment, or speaker separation.
- The team wants one stable AudioClaw intake entrypoint instead of hand-written ASR requests.
- The channel stores inbound voice files as
.oggor.opus, and AudioClaw still needs one stable ASR path.
Do not use this skill for speech output. Use $audioclaw-skills-voice-reply for TTS.
Workflow
- Save the incoming audio file locally.
- Run
scripts/openclaw_voice_intake.pywith the audio path. - Let the script choose the best model when no model is forced:
sense-asr-deepthinkfor normal single-speaker voice understandingsense-asrwhen a language hint is providedsense-asr-prowhen timestamps, sentiment, speaker diarization, or punctuation are requestedsense-asr-litewhen hotwords are requested
- Use the JSON manifest it returns as the AudioClaw handoff:
transcript.normalized_textopenclaw.turn_payloadrouting.selected_model
- If
understanding.clarification_neededistrue, ask the user to repeat or resend the audio.
Runtime model
Official HTTP ASR API:
- Endpoint:
https://api.senseaudio.cn/v1/audio/transcriptions - Content type:
multipart/form-data - File size limit:
\x3C=10MB - Practical local input suffixes accepted by this skill:
.wav,.mp3,.ogg,.opus,.flac,.aac,.m4a,.mp4
Supported response goals:
- plain transcript
- richer raw response passthrough
- AudioClaw-ready turn payload
The skill keeps two layers separate:
- ASR output from AudioClaw ASR
- AudioClaw packaging and clarification heuristics
API key lookup
This skill now treats SENSEAUDIO_API_KEY as the default API key source again.
Runtime rules:
- If the host app injects
SENSEAUDIO_API_KEYas an AudioClaw login token such asv2.public..., the shared bootstrap will replace it with the realsk-...value from~/.audioclaw/workspace/state/senseaudio_credentials.jsonbefore ASR starts. --api-key-envstill works, but the default runtime path isSENSEAUDIO_API_KEY.
Commands
Basic voice intake:
python3 scripts/openclaw_voice_intake.py \
--input /path/to/user_audio.mp3
Voice intake with richer AudioClaw structure:
python3 scripts/openclaw_voice_intake.py \
--input /path/to/meeting_clip.m4a \
--enable-punctuation \
--timestamp-granularity segment \
--enable-sentiment \
--out-json /tmp/openclaw_voice_turn.json
Force a specific model:
python3 scripts/openclaw_voice_intake.py \
--input /path/to/user_audio.mp3 \
--model sense-asr-deepthink
AudioClaw integration pattern
Recommended handoff:
- Channel adapter stores the inbound audio.
- AudioClaw calls
scripts/openclaw_voice_intake.py. - AudioClaw reads:
openclaw.turn_payload.roleopenclaw.turn_payload.contentopenclaw.turn_payload.metadata
- The normal dialogue pipeline continues as if the user typed the recognized text.
Operational rules:
- Keep the original audio path in metadata for debugging.
- Pass
languageonly when you are confident; otherwise let ASR auto-detect. - If you request timestamps, sentiment, or diarization, let the script choose
sense-asr-pro. - If transcript is empty, do not hallucinate a user intent. Ask for clarification.
Resources
scripts/senseaudio_asr_client.py- Multipart HTTP client for AudioClaw ASR
- Handles model routing validation and JSON or text responses
scripts/openclaw_voice_intake.py- Main runtime for AudioClaw
- Builds transcript, normalized user text, and turn payload
references/openclaw_voice_intake.md- Official ASR docs summary, model support notes, and AudioClaw payload examples
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install audioclaw-skills-voice-intake - After installation, invoke the skill by name or use
/audioclaw-skills-voice-intake - Provide required inputs per the skill's parameter spec and get structured output
What is audioclaw-skills-voice-intake?
Use when AudioClaw Skills needs to understand a user voice message with AudioClaw ASR, including speech-to-text, model routing for deepthink or pro features,... It is an AI Agent Skill for Claude Code / OpenClaw, with 250 downloads so far.
How do I install audioclaw-skills-voice-intake?
Run "/install audioclaw-skills-voice-intake" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is audioclaw-skills-voice-intake free?
Yes, audioclaw-skills-voice-intake is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does audioclaw-skills-voice-intake support?
audioclaw-skills-voice-intake is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created audioclaw-skills-voice-intake?
It is built and maintained by Wu Ruixiao (@kikidouloveme79); the current version is v1.0.1.