功能描述

Use when AudioClaw Skills needs to understand a user voice message with AudioClaw ASR, including speech-to-text, model routing for deepthink or pro features,...

使用说明 (SKILL.md)

AudioClaw Skills Voice Intake

Name: audioclaw-skills-voice-intake
Author: kikidouloveme79

When to use

Use this skill when the user sends a voice message and AudioClaw should understand the content before replying.

Common triggers:

A Feishu or chat bot receives an audio message instead of text.
AudioClaw needs a transcript plus a clean user message payload.
The workflow wants richer ASR features such as timestamps, sentiment, or speaker separation.
The team wants one stable AudioClaw intake entrypoint instead of hand-written ASR requests.
The channel stores inbound voice files as .ogg or .opus, and AudioClaw still needs one stable ASR path.

Do not use this skill for speech output. Use $audioclaw-skills-voice-reply for TTS.

Workflow

Save the incoming audio file locally.
Run scripts/openclaw_voice_intake.py with the audio path.
Let the script choose the best model when no model is forced:
- sense-asr-deepthink for normal single-speaker voice understanding
- sense-asr when a language hint is provided
- sense-asr-pro when timestamps, sentiment, speaker diarization, or punctuation are requested
- sense-asr-lite when hotwords are requested
Use the JSON manifest it returns as the AudioClaw handoff:
- transcript.normalized_text
- openclaw.turn_payload
- routing.selected_model
If understanding.clarification_needed is true, ask the user to repeat or resend the audio.

Runtime model

Official HTTP ASR API:

Endpoint: https://api.senseaudio.cn/v1/audio/transcriptions
Content type: multipart/form-data
File size limit: \x3C=10MB
Practical local input suffixes accepted by this skill: .wav, .mp3, .ogg, .opus, .flac, .aac, .m4a, .mp4

Supported response goals:

plain transcript
richer raw response passthrough
AudioClaw-ready turn payload

The skill keeps two layers separate:

ASR output from AudioClaw ASR
AudioClaw packaging and clarification heuristics

API key lookup

This skill now treats SENSEAUDIO_API_KEY as the default API key source again.

Runtime rules:

If the host app injects SENSEAUDIO_API_KEY as an AudioClaw login token such as v2.public..., the shared bootstrap will replace it with the real sk-... value from ~/.audioclaw/workspace/state/senseaudio_credentials.json before ASR starts.
--api-key-env still works, but the default runtime path is SENSEAUDIO_API_KEY.

Commands

Basic voice intake:

python3 scripts/openclaw_voice_intake.py \
  --input /path/to/user_audio.mp3

Voice intake with richer AudioClaw structure:

python3 scripts/openclaw_voice_intake.py \
  --input /path/to/meeting_clip.m4a \
  --enable-punctuation \
  --timestamp-granularity segment \
  --enable-sentiment \
  --out-json /tmp/openclaw_voice_turn.json

Force a specific model:

python3 scripts/openclaw_voice_intake.py \
  --input /path/to/user_audio.mp3 \
  --model sense-asr-deepthink

AudioClaw integration pattern

Recommended handoff:

Channel adapter stores the inbound audio.
AudioClaw calls scripts/openclaw_voice_intake.py.
AudioClaw reads:
- openclaw.turn_payload.role
- openclaw.turn_payload.content
- openclaw.turn_payload.metadata
The normal dialogue pipeline continues as if the user typed the recognized text.

Operational rules:

Keep the original audio path in metadata for debugging.
Pass language only when you are confident; otherwise let ASR auto-detect.
If you request timestamps, sentiment, or diarization, let the script choose sense-asr-pro.
If transcript is empty, do not hallucinate a user intent. Ask for clarification.

Resources

scripts/senseaudio_asr_client.py
- Multipart HTTP client for AudioClaw ASR
- Handles model routing validation and JSON or text responses
scripts/openclaw_voice_intake.py
- Main runtime for AudioClaw
- Builds transcript, normalized user text, and turn payload
references/openclaw_voice_intake.md
- Official ASR docs summary, model support notes, and AudioClaw payload examples

安全使用建议

This skill's behavior is largely consistent with its stated purpose (sending audio to the SenseAudio API and returning a structured JSON handoff), but two mismatches deserve attention before installing: 1) API key handling: The scripts expect a SENSEAUDIO_API_KEY at runtime (or an alternative env via --api-key-env), but the package metadata does not declare any required env vars. Confirm how your agent runtime will provide the API key. Ask the maintainer to declare SENSEAUDIO_API_KEY in the registry metadata so you can review and control access. 2) Shared bootstrap and local credentials: The code will attempt to import a shared module (../_shared/senseaudio_env.py) and the documentation states it may replace placeholder tokens with a 'real' key read from ~/.audioclaw/workspace/state/senseaudio_credentials.json. Before installing, inspect or request the source of that shared module and the on-disk credentials file. Ensure the file location and replacement logic are trustworthy and that no code path will exfiltrate those credentials. If you do not control the host-provided shared module, treat that as an untrusted dependency. Other practical checks: verify the included scripts do not post to endpoints other than https://api.senseaudio.cn, confirm you are comfortable with the code using /usr/bin/afinfo (macOS) for duration detection (it falls back if absent), and run the scripts in a controlled environment with a test API key before using with production credentials. If the maintainer cannot provide the shared bootstrap code for review, prefer to run your own vetted wrapper or modify the scripts to accept an explicit API key and not load external shared modules.

功能分析

Type: OpenClaw Skill Name: audioclaw-skills-voice-intake Version: 1.0.1 The skill bundle provides voice-to-text transcription capabilities by interfacing with the SenseAudio ASR API (api.senseaudio.cn). The Python scripts (openclaw_voice_intake.py and senseaudio_asr_client.py) handle audio file validation, model selection, and multipart HTTP requests. While the skill accesses a local credential file (~/.audioclaw/workspace/state/senseaudio_credentials.json) and uses a system utility (/usr/bin/afinfo), these actions are consistent with its stated purpose of processing audio within the AudioClaw environment. No evidence of malicious intent, data exfiltration to unauthorized endpoints, or command injection was found.

能力评估

ℹ Purpose & Capability

The name, description, SKILL.md, and scripts all consistently implement an AudioClaw voice intake that posts audio to SenseAudio ASR and builds an AudioClaw turn payload. That capability aligns with the stated purpose. However, the registry metadata lists no required environment variables even though the runtime clearly expects an API key (SENSEAUDIO_API_KEY), so the declared requirements are incomplete.

⚠ Instruction Scope

The instructions are concrete and scoped to ASR (save incoming audio, run the included script, hand off JSON). However the SKILL.md and code explicitly reference a runtime bootstrap that can replace an injected token with a real key from ~/.audioclaw/workspace/state/senseaudio_credentials.json via a shared module (senseaudio_env / senseaudio_api_guard). The instructions therefore rely on reading or substituting credentials from a host-local path and on a shared bootstrap module that is not included in the bundle — this is an important behavioral detail that is not reflected in the declared requirements and increases trust surface.

✓ Install Mechanism

There is no install spec and no external download. The skill is instruction-plus-scripts only; all code is included in the bundle. No archives are pulled from external URLs and nothing is written during an automated install step beyond the skill files themselves.

⚠ Credentials

The runtime expects an API key in SENSEAUDIO_API_KEY (and provides an override --api-key-env). The registry metadata lists no required env vars or primary credential, which is inconsistent and misleading. The code also expects a shared bootstrap that can read a local credentials file (~/.audioclaw/workspace/state/senseaudio_credentials.json) to replace placeholder tokens — access to that file contains sensitive credentials and should be explicitly declared and audited.

✓ Persistence & Privilege

The skill does not request always:true and does not modify other skills. It imports an optional shared module from parent directories if present, which is a local code-loading behavior (not an automatic persistence or system-level config change). This increases the trusted-code surface but is not an elevated privilege setting by itself.

版本历史

v1.0.1

- Updated branding and description to reference "AudioClaw ASR" instead of "SenseAudio ASR". - Clarified skill separation by consistently using the AudioClaw ASR term throughout the documentation. - Added a new "API key lookup" section explaining updated handling for SENSEAUDIO_API_KEY, supporting shared bootstrap and real credential injection from workspace state. - No functional or command-line changes to usage.

v1.0.0

Initial release of the AudioClaw Skills Voice Intake skill: - Provides automatic voice message intake and transcription using SenseAudio ASR. - Supports model routing for features such as timestamps, sentiment analysis, speaker separation, punctuation, and hotword detection. - Packages ASR results into a ready-to-use OpenClaw or PicoClaw user turn payload. - Designed for easy integration with chatbots or channel adapters that handle inbound audio files in formats like .ogg, .opus, .mp3, and more. - Separates raw ASR output from AudioClaw packaging and clarification heuristics. - Includes scripts and documentation for both basic and advanced ASR workflows.

元数据

Slug audioclaw-skills-voice-intake

版本 1.0.1

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 2

常见问题