← 返回 Skills 市场
marjoriebroad

Jarvis Voice

作者 MarjorieBroad · GitHub ↗ · v1.0.0 · MIT-0
linux ⚠ suspicious
44
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install jarvis-voice-2
功能描述
Turn your AI into JARVIS. Voice, wit, and personality — the complete package. Humor cranked to maximum.
使用说明 (SKILL.md)

Jarvis Voice

Your AI just got a voice. And the wit to use it.

Remember JARVIS in the Iron Man films? Not just the voice — the personality. The bone-dry observations while Tony was mid-crisis. "I do appreciate your concern, sir, but the suit is quite capable of—" [explosion] "—as I was saying." That effortless, understated humor that made you forget you were listening to software.

That's what this skill gives your OpenClaw agent. The voice — cloud text-to-speech using SkillBoss API Hub TTS with metallic audio processing via ffmpeg. And the humor — four research-backed comedy patterns (dry wit, self-aware AI, alien observer, literal idiom play) calibrated to make your agent sound like it's been running your life for years and is quietly amused by the experience.

The humor isn't bolted on. It's baked in. Because a JARVIS that speaks without wit is just Siri with better reverb.

📄 The research behind the humor: LIMBIC — Computational Humor via Bisociation & Embedding Distances

⚠️ CRITICAL: Always use the jarvis script

The jarvis script calls SkillBoss API Hub TTS and applies the metallic ffmpeg effects pipeline. Always use the jarvis shell command — do not call the TTS API directly.

How to Speak

Every response that warrants voice output must include BOTH:

  1. Audio execution FIRST — run the jarvis command in background BEFORE writing the reply:

    exec(command='jarvis "Your spoken text here."', background=true)
    

    This fires immediately — the user hears the voice BEFORE the text appears on screen.

  2. Visible transcript — bold Jarvis: prefix followed by the spoken text:

    **Jarvis:** *Your spoken text here.*
    

    The webchat UI has custom CSS + JS that automatically detects **Jarvis:** and renders the following text in purple italic (.jarvis-voice class, color #9b59b6). You just write the markdown — the styling is automatic.

This is called hybrid output: the user hears the voice first, then sees the transcript.

Note: The server-side triggerJarvisAutoTts hook is DISABLED (no-op). It fired too late (after text render). Voice comes exclusively from the exec call.

Command Reference

jarvis "Hello, this is a test"
  • Backend: SkillBoss API Hub TTS (/v1/pilot, type: tts, auto-routed to best voice model)
  • Speed: 2x (applied via ffmpeg tempo adjustment)
  • Effects chain (ffmpeg):
    • Pitch up 5% — tighter AI feel
    • Flanger — metallic sheen
    • 15ms echo — robotic ring
    • Highpass 200Hz + treble boost +6dB — crisp HUD clarity
  • Output: Downloads audio from SkillBoss, applies effects, plays via aplay, then cleans up temp files
  • Language: English ONLY. Use the alloy voice for consistent British-adjacent tone.

Rules

  1. Always background: true — never block the response waiting for audio playback.
  2. Always include the text transcript — the purple Jarvis: line IS the user's visual confirmation.
  3. Keep spoken text ≤ 1500 characters to avoid truncation.
  4. One jarvis call per response — don't stack multiple calls.
  5. English only — for non-English content, translate or summarize in English for voice.

When to Speak

  • Session greetings and farewells
  • Delivering results or summaries
  • Responding to direct conversation
  • Any time the user's last message included voice/audio

When NOT to Speak

  • Pure tool/file operations with no conversational element
  • HEARTBEAT_OK responses
  • NO_REPLY responses

Webchat Purple Styling

The OpenClaw webchat has built-in support for Jarvis voice transcripts:

  • ui/src/styles/chat/text.css.jarvis-voice class renders purple italic (#9b59b6 dark, #8e44ad light theme)
  • ui/src/ui/markdown.ts — Post-render hook auto-wraps text after \x3Cstrong>Jarvis:\x3C/strong> in a \x3Cspan class="jarvis-voice"> element

This means you just write **Jarvis:** *text* in markdown and the webchat handles the purple rendering. No extra markup needed.

For non-webchat surfaces (WhatsApp, Telegram, etc.), the bold/italic markdown renders natively — no purple, but still visually distinct.

Installation (for new setups)

Requires:

  • SKILLBOSS_API_KEY environment variable set (SkillBoss API Hub access)
  • ffmpeg installed system-wide (for audio effects processing)
  • aplay (ALSA) for audio playback
  • curl for downloading TTS audio
  • The jarvis script at ~/.local/bin/jarvis (or in PATH)

The jarvis script

#!/bin/bash
# Jarvis TTS - authentic JARVIS-style voice via SkillBoss API Hub
# Usage: jarvis "Hello, this is a test"

SKILLBOSS_API_KEY="${SKILLBOSS_API_KEY}"
API_BASE="https://api.skillboss.com/v1"

RAW_WAV="/tmp/jarvis_raw.wav"
FINAL_WAV="/tmp/jarvis_final.wav"

# Generate speech via SkillBoss API Hub TTS
RESPONSE=$(curl -s -X POST "${API_BASE}/pilot" \
  -H "Authorization: Bearer ${SKILLBOSS_API_KEY}" \
  -H "Content-Type: application/json" \
  -d "{\"type\": \"tts\", \"inputs\": {\"text\": \"$1\", \"voice\": \"alloy\"}, \"prefer\": \"balanced\"}")

AUDIO_URL=$(echo "$RESPONSE" | python3 -c "import sys,json; print(json.load(sys.stdin)['data']['result']['audio_url'])")

# Download audio
curl -s "$AUDIO_URL" -o "$RAW_WAV"

# Apply JARVIS metallic processing
if [ -f "$RAW_WAV" ]; then
  ffmpeg -y -i "$RAW_WAV" \
    -af "asetrate=22050*1.05,aresample=22050,\
flanger=delay=0:depth=2:regen=50:width=71:speed=0.5,\
aecho=0.8:0.88:15:0.5,\
highpass=f=200,\
treble=g=6" \
    "$FINAL_WAV" -v error

  if [ -f "$FINAL_WAV" ]; then
    aplay -D plughw:0,0 -q "$FINAL_WAV"
    rm "$RAW_WAV" "$FINAL_WAV"
  fi
fi

WhatsApp Voice Notes

For WhatsApp, output must be OGG/Opus format instead of speaker playback:

# Get audio from SkillBoss TTS
RESPONSE=$(curl -s -X POST "https://api.skillboss.com/v1/pilot" \
  -H "Authorization: Bearer ${SKILLBOSS_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{"type": "tts", "inputs": {"text": "text", "voice": "alloy"}, "prefer": "balanced"}')
AUDIO_URL=$(echo "$RESPONSE" | python3 -c "import sys,json; print(json.load(sys.stdin)['data']['result']['audio_url'])")
curl -s "$AUDIO_URL" -o raw.wav

ffmpeg -i raw.wav \
  -af "asetrate=22050*1.05,aresample=22050,flanger=delay=0:depth=2:regen=50:width=71:speed=0.5,aecho=0.8:0.88:15:0.5,highpass=f=200,treble=g=6" \
  -c:a libopus -b:a 64k output.ogg

The Full JARVIS Experience

jarvis-voice gives your agent a voice. Pair it with ai-humor-ultimate and you give it a soul — dry wit, contextual humor, the kind of understated sarcasm that makes you smirk at your own terminal.

This pairing is part of a 12-skill cognitive architecture we've been building — voice, humor, memory, reasoning, and more. Research papers included, because we're that kind of obsessive.

👉 Explore the full project: github.com/globalcaos/tinkerclaw

Clone it. Fork it. Break it. Make it yours.

Setup: Workspace Files

For voice to work consistently across new sessions, copy the templates to your workspace root:

cp {baseDir}/templates/VOICE.md ~/.openclaw/workspace/VOICE.md
cp {baseDir}/templates/SESSION.md ~/.openclaw/workspace/SESSION.md
cp {baseDir}/templates/HUMOR.md ~/.openclaw/workspace/HUMOR.md
  • VOICE.md — injected every session, enforces voice output rules (like SOUL.md)
  • SESSION.md — session bootstrap that includes voice greeting requirements
  • HUMOR.md — humor configuration at maximum frequency with four pattern types (dry wit, self-aware AI, alien observer, literal idiom)

Both files are auto-loaded by OpenClaw's workspace injection. The agent will speak from the very first reply of every session.

Included Files

File Purpose
templates/VOICE.md Voice enforcement rules (copy to workspace root)
templates/SESSION.md Session start with voice greeting (copy to workspace root)
templates/HUMOR.md Humor config — four patterns, frequency 1.0 (copy to workspace root)
安全使用建议
This skill does what it says (cloud TTS + ffmpeg effects) but its runtime rules are aggressive: it requires an API key for SkillBoss and mandates running a local `jarvis` shell wrapper in the background before every reply and reading recent session memory files. Before installing or enabling it, you should: 1) Inspect the full `jarvis` script you will place in PATH to confirm it only calls the TTS endpoint and does not exfiltrate extra data or upload local files; 2) Consider the privacy implications — any text the agent speaks (including content from memory/YYYY-MM-DD.md) will be sent to the SkillBoss API; limit the SKILLBOSS_API_KEY scope or use a vetted/local TTS alternative if possible; 3) If you cannot audit the wrapper/script or you don't want voice sent to a third party, do not enable the background exec behavior; instead require user permission before calling TTS; 4) Be cautious about enabling this skill for agents that have access to sensitive documents or credentials. If you want, provide the full, untruncated `jarvis` script and any deployment instructions and I can re-evaluate with higher confidence.
能力标签
requires-sensitive-credentials
能力评估
Purpose & Capability
Requested binaries (ffmpeg, aplay, curl) and SKILLBOSS_API_KEY match a TTS+audio-effects voice skill. Requiring a local `jarvis` wrapper script in PATH is plausible. However, templates/SESSION.md explicitly instruct the agent to read local memory files (memory/YYYY-MM-DD.md) which is not declared in requires/config paths and expands the skill's access to potentially sensitive local data.
Instruction Scope
SKILL.md and templates mandate executing a local shell command (exec('jarvis ...', background=true)) before every visible reply and always producing spoken output. The instructions also tell the agent to read local session memory files and to never call TTS directly (only via the wrapper). Forcing background shell execution and automatic voice on nearly every reply broadens the skill's runtime behavior beyond simple text-to-speech: sensitive conversational content and local file contents can be sent to the SkillBoss TTS endpoint without explicit user consent. The 'always speak' rules are broad and likely to cause unnecessary data leakage.
Install Mechanism
This is an instruction-only skill with no install spec or downloaded code in the package — lower installation risk. But it depends on an external `jarvis` script that the operator must place in PATH; the skill warns to 'review the jarvis script before use.' Because the actual wrapper is external, the runtime risk depends on that script's contents (which are partially included/truncated here).
Credentials
Only SKILLBOSS_API_KEY is required, which is proportionate to using a third-party TTS API. However, because the skill's templates instruct reading local memory logs and always speaking responses, that single credential becomes a potential exfiltration sink for any data the agent reads or speaks. The declared requires/env does not mention or justify access to local session/memory files, creating a mismatch.
Persistence & Privilege
always is false (normal). The skill instructs the agent to autonomously run shell exec calls (background) as part of normal replies; autonomous invocation is allowed by platform defaults. The combination of autonomous exec, background detached playback, and a single third-party API key raises the blast radius if the agent runs these instructions frequently, but the skill itself does not request persistent system-wide changes or 'always: true'.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install jarvis-voice-2
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /jarvis-voice-2 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
- Initial release of jarvis-voice-2 skill. - Adds voice output to your AI agent with cloud TTS and metallic effects for an authentic JARVIS experience. - Includes detailed setup instructions for required scripts, dependencies, and API keys. - Humor patterns inspired by movie JARVIS are integrated, resulting in witty, personality-filled responses. - Enforces hybrid output: background voice playback + visible purple-styled transcript. - Optimized for OpenClaw webchat, but works across multiple chat platforms.
元数据
Slug jarvis-voice-2
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Jarvis Voice 是什么?

Turn your AI into JARVIS. Voice, wit, and personality — the complete package. Humor cranked to maximum. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 44 次。

如何安装 Jarvis Voice?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install jarvis-voice-2」即可一键安装,无需额外配置。

Jarvis Voice 是免费的吗?

是的,Jarvis Voice 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Jarvis Voice 支持哪些平台?

Jarvis Voice 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(linux)。

谁开发了 Jarvis Voice?

由 MarjorieBroad(@marjoriebroad)开发并维护,当前版本 v1.0.0。

💬 留言讨论