← 返回 Skills 市场
kikidouloveme79

audioclaw-skills-voice-intake

作者 Wu Ruixiao · GitHub ↗ · v1.0.1 · MIT-0
cross-platform ⚠ suspicious
250
总下载
0
收藏
0
当前安装
2
版本数
在 OpenClaw 中安装
/install audioclaw-skills-voice-intake
功能描述
Use when AudioClaw Skills needs to understand a user voice message with AudioClaw ASR, including speech-to-text, model routing for deepthink or pro features,...
使用说明 (SKILL.md)

AudioClaw Skills Voice Intake

When to use

Use this skill when the user sends a voice message and AudioClaw should understand the content before replying.

Common triggers:

  • A Feishu or chat bot receives an audio message instead of text.
  • AudioClaw needs a transcript plus a clean user message payload.
  • The workflow wants richer ASR features such as timestamps, sentiment, or speaker separation.
  • The team wants one stable AudioClaw intake entrypoint instead of hand-written ASR requests.
  • The channel stores inbound voice files as .ogg or .opus, and AudioClaw still needs one stable ASR path.

Do not use this skill for speech output. Use $audioclaw-skills-voice-reply for TTS.

Workflow

  1. Save the incoming audio file locally.
  2. Run scripts/openclaw_voice_intake.py with the audio path.
  3. Let the script choose the best model when no model is forced:
    • sense-asr-deepthink for normal single-speaker voice understanding
    • sense-asr when a language hint is provided
    • sense-asr-pro when timestamps, sentiment, speaker diarization, or punctuation are requested
    • sense-asr-lite when hotwords are requested
  4. Use the JSON manifest it returns as the AudioClaw handoff:
    • transcript.normalized_text
    • openclaw.turn_payload
    • routing.selected_model
  5. If understanding.clarification_needed is true, ask the user to repeat or resend the audio.

Runtime model

Official HTTP ASR API:

  • Endpoint: https://api.senseaudio.cn/v1/audio/transcriptions
  • Content type: multipart/form-data
  • File size limit: \x3C=10MB
  • Practical local input suffixes accepted by this skill: .wav, .mp3, .ogg, .opus, .flac, .aac, .m4a, .mp4

Supported response goals:

  • plain transcript
  • richer raw response passthrough
  • AudioClaw-ready turn payload

The skill keeps two layers separate:

  • ASR output from AudioClaw ASR
  • AudioClaw packaging and clarification heuristics

API key lookup

This skill now treats SENSEAUDIO_API_KEY as the default API key source again.

Runtime rules:

  • If the host app injects SENSEAUDIO_API_KEY as an AudioClaw login token such as v2.public..., the shared bootstrap will replace it with the real sk-... value from ~/.audioclaw/workspace/state/senseaudio_credentials.json before ASR starts.
  • --api-key-env still works, but the default runtime path is SENSEAUDIO_API_KEY.

Commands

Basic voice intake:

python3 scripts/openclaw_voice_intake.py \
  --input /path/to/user_audio.mp3

Voice intake with richer AudioClaw structure:

python3 scripts/openclaw_voice_intake.py \
  --input /path/to/meeting_clip.m4a \
  --enable-punctuation \
  --timestamp-granularity segment \
  --enable-sentiment \
  --out-json /tmp/openclaw_voice_turn.json

Force a specific model:

python3 scripts/openclaw_voice_intake.py \
  --input /path/to/user_audio.mp3 \
  --model sense-asr-deepthink

AudioClaw integration pattern

Recommended handoff:

  1. Channel adapter stores the inbound audio.
  2. AudioClaw calls scripts/openclaw_voice_intake.py.
  3. AudioClaw reads:
    • openclaw.turn_payload.role
    • openclaw.turn_payload.content
    • openclaw.turn_payload.metadata
  4. The normal dialogue pipeline continues as if the user typed the recognized text.

Operational rules:

  • Keep the original audio path in metadata for debugging.
  • Pass language only when you are confident; otherwise let ASR auto-detect.
  • If you request timestamps, sentiment, or diarization, let the script choose sense-asr-pro.
  • If transcript is empty, do not hallucinate a user intent. Ask for clarification.

Resources

  • scripts/senseaudio_asr_client.py
    • Multipart HTTP client for AudioClaw ASR
    • Handles model routing validation and JSON or text responses
  • scripts/openclaw_voice_intake.py
    • Main runtime for AudioClaw
    • Builds transcript, normalized user text, and turn payload
  • references/openclaw_voice_intake.md
    • Official ASR docs summary, model support notes, and AudioClaw payload examples
安全使用建议
This skill's behavior is largely consistent with its stated purpose (sending audio to the SenseAudio API and returning a structured JSON handoff), but two mismatches deserve attention before installing: 1) API key handling: The scripts expect a SENSEAUDIO_API_KEY at runtime (or an alternative env via --api-key-env), but the package metadata does not declare any required env vars. Confirm how your agent runtime will provide the API key. Ask the maintainer to declare SENSEAUDIO_API_KEY in the registry metadata so you can review and control access. 2) Shared bootstrap and local credentials: The code will attempt to import a shared module (../_shared/senseaudio_env.py) and the documentation states it may replace placeholder tokens with a 'real' key read from ~/.audioclaw/workspace/state/senseaudio_credentials.json. Before installing, inspect or request the source of that shared module and the on-disk credentials file. Ensure the file location and replacement logic are trustworthy and that no code path will exfiltrate those credentials. If you do not control the host-provided shared module, treat that as an untrusted dependency. Other practical checks: verify the included scripts do not post to endpoints other than https://api.senseaudio.cn, confirm you are comfortable with the code using /usr/bin/afinfo (macOS) for duration detection (it falls back if absent), and run the scripts in a controlled environment with a test API key before using with production credentials. If the maintainer cannot provide the shared bootstrap code for review, prefer to run your own vetted wrapper or modify the scripts to accept an explicit API key and not load external shared modules.
功能分析
Type: OpenClaw Skill Name: audioclaw-skills-voice-intake Version: 1.0.1 The skill bundle provides voice-to-text transcription capabilities by interfacing with the SenseAudio ASR API (api.senseaudio.cn). The Python scripts (openclaw_voice_intake.py and senseaudio_asr_client.py) handle audio file validation, model selection, and multipart HTTP requests. While the skill accesses a local credential file (~/.audioclaw/workspace/state/senseaudio_credentials.json) and uses a system utility (/usr/bin/afinfo), these actions are consistent with its stated purpose of processing audio within the AudioClaw environment. No evidence of malicious intent, data exfiltration to unauthorized endpoints, or command injection was found.
能力评估
Purpose & Capability
The name, description, SKILL.md, and scripts all consistently implement an AudioClaw voice intake that posts audio to SenseAudio ASR and builds an AudioClaw turn payload. That capability aligns with the stated purpose. However, the registry metadata lists no required environment variables even though the runtime clearly expects an API key (SENSEAUDIO_API_KEY), so the declared requirements are incomplete.
Instruction Scope
The instructions are concrete and scoped to ASR (save incoming audio, run the included script, hand off JSON). However the SKILL.md and code explicitly reference a runtime bootstrap that can replace an injected token with a real key from ~/.audioclaw/workspace/state/senseaudio_credentials.json via a shared module (senseaudio_env / senseaudio_api_guard). The instructions therefore rely on reading or substituting credentials from a host-local path and on a shared bootstrap module that is not included in the bundle — this is an important behavioral detail that is not reflected in the declared requirements and increases trust surface.
Install Mechanism
There is no install spec and no external download. The skill is instruction-plus-scripts only; all code is included in the bundle. No archives are pulled from external URLs and nothing is written during an automated install step beyond the skill files themselves.
Credentials
The runtime expects an API key in SENSEAUDIO_API_KEY (and provides an override --api-key-env). The registry metadata lists no required env vars or primary credential, which is inconsistent and misleading. The code also expects a shared bootstrap that can read a local credentials file (~/.audioclaw/workspace/state/senseaudio_credentials.json) to replace placeholder tokens — access to that file contains sensitive credentials and should be explicitly declared and audited.
Persistence & Privilege
The skill does not request always:true and does not modify other skills. It imports an optional shared module from parent directories if present, which is a local code-loading behavior (not an automatic persistence or system-level config change). This increases the trusted-code surface but is not an elevated privilege setting by itself.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install audioclaw-skills-voice-intake
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /audioclaw-skills-voice-intake 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.1
- Updated branding and description to reference "AudioClaw ASR" instead of "SenseAudio ASR". - Clarified skill separation by consistently using the AudioClaw ASR term throughout the documentation. - Added a new "API key lookup" section explaining updated handling for SENSEAUDIO_API_KEY, supporting shared bootstrap and real credential injection from workspace state. - No functional or command-line changes to usage.
v1.0.0
Initial release of the AudioClaw Skills Voice Intake skill: - Provides automatic voice message intake and transcription using SenseAudio ASR. - Supports model routing for features such as timestamps, sentiment analysis, speaker separation, punctuation, and hotword detection. - Packages ASR results into a ready-to-use OpenClaw or PicoClaw user turn payload. - Designed for easy integration with chatbots or channel adapters that handle inbound audio files in formats like .ogg, .opus, .mp3, and more. - Separates raw ASR output from AudioClaw packaging and clarification heuristics. - Includes scripts and documentation for both basic and advanced ASR workflows.
元数据
Slug audioclaw-skills-voice-intake
版本 1.0.1
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 2
常见问题

audioclaw-skills-voice-intake 是什么?

Use when AudioClaw Skills needs to understand a user voice message with AudioClaw ASR, including speech-to-text, model routing for deepthink or pro features,... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 250 次。

如何安装 audioclaw-skills-voice-intake?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install audioclaw-skills-voice-intake」即可一键安装,无需额外配置。

audioclaw-skills-voice-intake 是免费的吗?

是的,audioclaw-skills-voice-intake 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

audioclaw-skills-voice-intake 支持哪些平台?

audioclaw-skills-voice-intake 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 audioclaw-skills-voice-intake?

由 Wu Ruixiao(@kikidouloveme79)开发并维护,当前版本 v1.0.1。

💬 留言讨论