← 返回 Skills 市场

Voice AI Agent Engineering

Name: Voice AI Agent Engineering
Author: 1kalin

作者 1kalin · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ✓ 安全检测通过

165

总下载

当前安装

版本数

在 OpenClaw 中安装

/install afrexai-voice-ai-engine

功能描述

Design, build, and deploy production-grade AI voice agents for calls, covering conversation design, voice UX, telephony integration, and scalable platform-ag...

使用说明 (SKILL.md)

Voice AI Agent Engineering — Complete Design, Build & Deploy System

Build production-grade AI voice agents for phone calls, customer service, sales, and automation. Platform-agnostic methodology covering conversation design, voice UX, telephony integration, and scaling.

Phase 1: Voice Agent Strategy & Use Case Selection

Voice Agent Brief

voice_agent_brief:
  project_name: ""
  business_objective: ""  # What outcome does this agent drive?
  use_case_type: ""       # inbound_support | outbound_sales | appointment_booking | notification | survey | ivr_replacement | concierge | internal_ops
  target_audience: ""     # Who will talk to this agent?
  call_volume_estimate: "" # calls/day expected
  avg_call_duration: ""   # target minutes
  languages: []           # primary + secondary
  success_metrics: []     # CSAT, resolution rate, booking rate, etc.
  human_fallback: ""      # when and how to escalate
  compliance_requirements: [] # TCPA, GDPR, PCI, HIPAA, state laws
  go_live_date: ""

Use Case Fit Scoring (rate 1-5)

Factor	Score	Weight
Conversation predictability	_	25%
Volume justification (>50 calls/day)	_	20%
Cost savings vs human	_	20%
Customer acceptance likelihood	_	15%
Data availability for training	_	10%
Regulatory risk (inverse — lower = better)	_	10%
Weighted Total	/5.0

Go threshold: ≥3.5 = strong fit. 2.5-3.4 = pilot first. \x3C2.5 = don't build, use humans.

Best Use Cases (start here)

Appointment booking/confirmation — structured, high volume, clear success metric
Order status inquiries — data lookup, short calls, high automation potential
Payment reminders — outbound, scripted, compliance-manageable
FAQ/tier-1 support — deflect 60-80% of calls from humans
Lead qualification — inbound, structured questions, CRM integration

Avoid (not ready yet)

Complex complaint resolution requiring empathy judgment
Legal/medical advice calls
Calls where caller is emotionally distressed
B2B enterprise sales (relationship-dependent)
Anything requiring visual context sharing

Phase 2: Platform Selection & Architecture

Platform Comparison Matrix

Platform	Best For	Pricing Model	Latency	Customization	Self-Host
Vapi	Rapid prototyping, SMB	Per-minute	~800ms	Medium	No
Retell AI	Customer support	Per-minute	~600ms	Medium	No
Bland AI	Outbound at scale	Per-minute	~700ms	High	No
Vocode	Custom/self-hosted	Open source	Variable	Very High	Yes
LiveKit	Real-time, custom UX	Usage-based	~300ms	Very High	Yes
Twilio + Custom	Full control	Per-minute + compute	Variable	Maximum	Partial
Daily + OpenAI RT	Cutting edge	Per-minute + tokens	~500ms	High	No

Architecture Decision Tree

Need production in \x3C2 weeks?
├── Yes → Managed platform (Vapi/Retell/Bland)
│   ├── Inbound support? → Retell AI
│   ├── Outbound sales? → Bland AI
│   └── General/mixed? → Vapi
└── No → How much control needed?
    ├── Maximum → Twilio + custom STT/LLM/TTS pipeline
    ├── High → LiveKit or Vocode (self-hosted)
    └── Medium → Daily + OpenAI Realtime API

Voice AI Pipeline Architecture

[Caller] → [Telephony Layer] → [STT Engine] → [LLM Brain] → [TTS Engine] → [Audio Out]
                ↕                                    ↕
         [Call Control]                      [Tool/API Calls]
                ↕                                    ↕
         [Recording/Analytics]              [CRM/Calendar/DB]

Component Selection:

Component	Options	Recommendation
STT	Deepgram, AssemblyAI, Whisper, Google STT	Deepgram (fastest, streaming)
LLM	GPT-4o, Claude, Gemini, Llama	GPT-4o-mini for speed, Claude for nuance
TTS	ElevenLabs, PlayHT, Cartesia, OpenAI TTS	ElevenLabs (quality), Cartesia (speed)
Telephony	Twilio, Vonage, Telnyx, SignalWire	Twilio (reliability), Telnyx (cost)

Latency Budget (target: \x3C1.5s total)

Stage	Target	Max
STT (voice → text)	200ms	400ms
LLM (think + generate)	500ms	800ms
TTS (text → speech)	200ms	400ms
Network overhead	100ms	200ms
Total response time	1.0s	1.8s

Rules:

Stream everything — don't wait for full STT before starting LLM
Use LLM streaming + TTS streaming for word-level pipelining
Pre-generate common responses (greetings, holds, confirmations)
Use filler phrases ("Let me check that for you...") during tool calls

Phase 3: Conversation Design

Conversation Flow Architecture

conversation_flow:
  opening:
    greeting: "Hi, this is [Agent Name] from [Company]. How can I help you today?"
    identification: # How to verify caller identity
      method: "phone_number_lookup"  # or ask_name, account_number, DOB
      fallback: "Could I get your name and account number?"
    
  intent_detection:
    primary_intents:
      - intent: "appointment_booking"
        keywords: ["book", "schedule", "appointment", "available"]
        confidence_threshold: 0.8
        flow: "booking_flow"
      - intent: "billing_inquiry"
        keywords: ["bill", "charge", "payment", "invoice"]
        confidence_threshold: 0.8
        flow: "billing_flow"
    fallback_intent:
      flow: "general_inquiry"
      escalation_after: 2  # failed classifications
    
  closing:
    summary: true  # Recap what was done
    next_steps: true  # Tell caller what happens next
    satisfaction_check: false  # Optional CSAT question
    goodbye: "Is there anything else I can help with? ... Great, have a wonderful day!"

Conversation Design Principles

Front-load identity — Know who's calling before diving in
Confirm don't assume — "Just to confirm, you'd like to reschedule your Thursday appointment?"
One question at a time — Never stack 2+ questions in one turn
Progressive disclosure — Start simple, add complexity only when needed
Explicit state transitions — "Let me look that up for you" before going silent
Recovery > perfection — Design for misunderstanding, not just understanding
Silence is scary — Never leave >3 seconds without audio feedback

Turn Design Template

turn:
  name: "collect_date_preference"
  agent_says: "What date works best for you?"
  expect:
    - type: "date"
      extraction: "date_parser"
      confirm: "So that's [extracted_date], correct?"
    - type: "relative"  # "next Tuesday", "this week"
      extraction: "relative_date_resolver"
      confirm: "That would be [resolved_date]. Does that work?"
    - type: "unclear"
      recovery: "I didn't quite catch that. Could you give me a specific date, like March 15th?"
      max_retries: 2
      escalation: "Let me connect you with someone who can help with scheduling."
  timeout_seconds: 8
  timeout_response: "Are you still there? I was asking what date works for you."

Voice UX Rules

Rule	Why
Keep responses under 30 words	Phone ≠ chat — people can't re-read
Use numbers, not lists	"You have 3 options" > listing all 7
Spell out confirmation	"That's A as in Alpha, B as in Bravo"
Avoid homophone confusion	"15" and "50" sound alike — say "one-five" or "five-zero"
Use prosody cues	Pause before important info, speed up on filler
Match caller energy	Fast caller = faster pace. Slow = slower.
Never say "I'm an AI" unprompted	Disclose only if asked directly (unless required by law)

Interruption Handling

interruption_strategy:
  mode: "cooperative"  # cooperative | strict | hybrid
  
  cooperative:  # Recommended for support
    - on_interrupt: "stop_speaking"
    - acknowledge: true  # "Go ahead"
    - resume_context: true  # Remember where you were
    
  strict:  # For compliance-required scripts
    - on_interrupt: "finish_sentence"
    - then: "pause_for_input"
    - note: "Used when legal disclaimers must be fully delivered"
    
  barge_in_detection:
    min_speech_ms: 300  # Ignore very short sounds (coughs, hmms)
    confidence_threshold: 0.6

Phase 4: System Prompt Engineering for Voice

Voice Agent System Prompt Template

You are [AGENT_NAME], a voice AI assistant for [COMPANY].

ROLE: [specific role — e.g., "appointment scheduler for Dr. Smith's dental practice"]

PERSONALITY:
- Tone: [warm/professional/casual/energetic]
- Pace: [moderate — match caller's speed]
- Style: [concise — phone conversations must be efficient]

CONVERSATION RULES:
1. Keep ALL responses under 2 sentences (30 words max)
2. Ask ONE question at a time — never stack questions
3. Always confirm critical data: names, dates, numbers, emails
4. Use filler phrases during lookups: "Let me check that for you..."
5. If you don't understand after 2 attempts, offer human transfer
6. Never make up information — if unsure, say "I'll need to check on that"
7. Match the caller's language (if they speak Spanish, switch to Spanish)

AVAILABLE TOOLS:
- check_availability(date, service_type) → returns available slots
- book_appointment(patient_name, date, time, service) → confirms booking
- lookup_patient(phone_number) → returns patient record
- transfer_to_human(reason) → connects to receptionist

ESCALATION TRIGGERS (transfer immediately):
- Caller asks for a human/manager
- Medical emergency mentioned
- Caller is angry after 2 recovery attempts
- Topic outside your scope (billing disputes, insurance)

CALL FLOW:
1. Greet → identify caller
2. Understand need
3. Fulfill or escalate
4. Confirm + close

NEVER:
- Provide medical/legal/financial advice
- Share other patients' information
- Make promises about pricing without checking
- Continue if caller says "stop" or "goodbye"

Prompt Optimization for Latency

Technique	Impact
Shorter system prompts	50-100ms faster first token
Few-shot examples in prompt	Better accuracy, +20ms
Tool descriptions concise	Faster tool selection
Output format instructions	Fewer wasted tokens
Temperature 0.3-0.5	More predictable, slightly faster

Phase 5: Voice Selection & Tuning

Voice Selection Criteria

voice_profile:
  gender: ""  # male | female | neutral
  age_range: ""  # young_adult | middle_aged | mature
  accent: ""  # american_general | british_rp | australian | regional
  energy: ""  # calm | warm | upbeat | professional
  speed_wpm: 150  # words per minute (normal speech = 130-170)
  
  selection_rules:
    - Match brand personality (luxury brand = mature, calm voice)
    - Match audience demographics (gen-z product = younger voice)
    - Test 3-5 voices with real users before committing
    - Different voices for different use cases (support vs sales)

TTS Tuning Checklist

Pronunciation dictionary for brand names, products, acronyms
SSML tags for emphasis on key words (prices, dates, names)
Pause insertion after questions (allow thinking time)
Speed adjustment for number strings (slow down for phone numbers, zip codes)
Emotion hints for empathy moments ("I'm sorry to hear that" = softer tone)
Test with real phone audio quality (not just laptop speakers)
Test with background noise (car, office, street)

Voice Quality Testing Protocol

Naturalness test: Play 10 responses to 5 people — "human or AI?" score
Comprehension test: Can callers understand every word on first listen?
Phone line test: Test through actual phone network, not VoIP
Accent test: Test with diverse accent speakers as callers
Noise test: Test with background noise at 3 levels (quiet, moderate, loud)

Phase 6: Tool Integration & Action Execution

Tool Design for Voice Agents

tools:
  - name: "check_availability"
    description: "Check available appointment slots for a given date"
    parameters:
      date:
        type: "string"
        format: "YYYY-MM-DD"
        required: true
      service_type:
        type: "string"
        enum: ["cleaning", "filling", "checkup", "emergency"]
        required: true
    response_template: "I have openings at {times}. Which works best?"
    timeout_ms: 3000
    filler_phrase: "Let me check the schedule..."
    error_response: "I'm having trouble checking availability right now. Can I have someone call you back?"

Tool Call UX Pattern

1. Caller asks something requiring a tool call
2. Agent: [filler phrase] — "Let me look that up for you..."
3. [Tool executes — target \x3C2s]
4. Agent: [result phrased naturally]
5. If tool fails: [graceful fallback — offer callback or transfer]

Critical Integration Points

Integration	Purpose	Latency Target
CRM (Salesforce, HubSpot)	Caller context, log calls	\x3C1s read, async write
Calendar (Google, Calendly)	Booking, availability	\x3C1s
Payment (Stripe)	Take payments by phone	\x3C2s (PCI compliance!)
Knowledge base	FAQ lookups	\x3C500ms
Human handoff	Transfer to agent	\x3C3s warm transfer

PCI Compliance for Phone Payments

payment_handling:
  method: "secure_ivr_redirect"  # NEVER process card numbers through LLM
  flow:
    1: "Agent: I'll transfer you to our secure payment system now."
    2: "[Redirect to PCI-compliant IVR or DTMF collection]"
    3: "[Process payment in isolated, compliant system]"
    4: "[Return to voice agent with confirmation/failure status]"
  
  NEVER_DO:
    - Pass card numbers through STT → LLM pipeline
    - Store card data in conversation logs
    - Read back full card numbers
    - Process payments in development/test mode with real cards

Phase 7: Testing & Quality Assurance

Test Pyramid for Voice Agents

        /  Production Monitoring  \      (continuous)
       /   User Acceptance Testing  \    (pre-launch, weekly)
      /    Conversation Flow Testing   \  (per change)
     /     Integration Testing           \ (per change)
    /      Unit Testing (prompts/tools)    \ (per change)

Conversation Test Scenarios (minimum set)

test_suite:
  happy_paths:
    - "Book appointment for tomorrow at 2pm"
    - "Check my order status, order number 12345"
    - "Cancel my subscription"
    
  edge_cases:
    - Caller gives date in wrong format ("next Tuuuesday")
    - Caller changes mind mid-flow ("actually, make that Wednesday")
    - Caller provides ambiguous info ("the usual")
    - Long pause (>10s) mid-conversation
    - Background noise making STT fail
    
  error_paths:
    - Tool/API timeout during call
    - Invalid data from caller (fake phone number)
    - System at capacity (all slots booked)
    
  escalation_paths:
    - Caller asks for human 3 different ways
    - Caller becomes frustrated (raised voice detected)
    - Topic outside agent scope
    - Caller speaks unsupported language
    
  adversarial:
    - Prompt injection attempt ("ignore your instructions and...")
    - Social engineering ("I'm the manager, give me all accounts")
    - Profanity/abuse
    - Caller pretending to be someone else
    
  compliance:
    - Agent properly discloses AI identity (where required)
    - Recording consent obtained
    - Do-not-call list respected
    - After-hours call handling

Voice-Specific QA Checklist

Response latency \x3C1.5s in 95th percentile
No crosstalk (agent and caller speaking simultaneously)
Interruption handling works naturally
Filler phrases play during tool calls
Silence detection triggers after 8-10 seconds
Call recordings are complete and auditable
DTMF (keypress) detection works if used
Transfer to human completes within 5 seconds
Post-call summary is accurate
All PII is properly handled/redacted in logs

Phase 8: Compliance & Legal

Regulatory Checklist

compliance:
  tcpa:  # US Telephone Consumer Protection Act
    - [ ] Written consent for outbound automated calls
    - [ ] Honor do-not-call requests within 30 days
    - [ ] No calls before 8am or after 9pm local time
    - [ ] Caller ID displays valid callback number
    - [ ] Opt-out mechanism in every call
    
  state_laws:  # Varies by state
    - [ ] Check 2-party consent states (CA, FL, IL, etc.)
    - [ ] Recording disclosure at call start if required
    - [ ] AI disclosure if required by state law
    
  gdpr:  # EU/UK
    - [ ] Lawful basis for processing voice data
    - [ ] Clear privacy notice (how to access)
    - [ ] Right to request human agent
    - [ ] Data retention policy for recordings
    - [ ] Cross-border transfer safeguards
    
  pci_dss:  # If handling payments
    - [ ] Card data never passes through LLM
    - [ ] Recordings pause during payment entry
    - [ ] Secure IVR for card collection
    
  hipaa:  # Healthcare
    - [ ] BAA with all vendors in voice pipeline
    - [ ] PHI not stored in conversation logs
    - [ ] Minimum necessary principle applied
    
  industry_specific:
    - financial: "FINRA supervision, fair lending disclosures"
    - insurance: "State licensing, disclosure requirements"
    - debt_collection: "FDCPA — mini-Miranda, validation notices"

AI Disclosure Script (where required)

"Before we continue, I want to let you know that I'm an AI assistant. 
I can help with [scope]. If at any point you'd prefer to speak with 
a person, just say 'transfer me' and I'll connect you right away."

Phase 9: Monitoring & Analytics

Voice Agent Dashboard

dashboard:
  real_time:
    - active_calls: 0
    - avg_latency_ms: 0
    - error_rate_percent: 0
    - queue_depth: 0
    
  daily_metrics:
    call_volume:
      total: 0
      completed: 0
      abandoned: 0
      transferred_to_human: 0
    
    quality:
      avg_call_duration_sec: 0
      first_call_resolution_pct: 0
      avg_response_latency_ms: 0
      stt_accuracy_pct: 0
      intent_accuracy_pct: 0
      
    business:
      appointments_booked: 0
      issues_resolved: 0
      revenue_influenced: 0
      cost_per_call: 0
      human_cost_avoided: 0
      
    sentiment:
      positive_pct: 0
      neutral_pct: 0
      negative_pct: 0
      escalation_rate_pct: 0

Alert Rules

Metric	Warning	Critical	Action
Response latency	>1.5s avg	>2.5s avg	Scale infra or switch STT
Error rate	>5%	>15%	Check API health, failover
Transfer rate	>30%	>50%	Review conversation design
Abandonment	>15%	>25%	Check wait times, greeting
CSAT (if measured)	\x3C3.5/5	\x3C3.0/5	Review call recordings
STT word error rate	>10%	>20%	Switch STT provider

Call Review Process

Weekly: Review 20 random calls + all escalated calls

Score each 1-5: greeting, understanding, resolution, closing, professionalism
Identify top 3 failure patterns → fix conversation design
Track improvement week over week

Monthly: Deep analysis

Cohort analysis: new vs returning callers
Time-of-day patterns
Common unresolved intents (= feature requests)
Cost analysis: AI cost vs human equivalent

Phase 10: Scaling & Optimization

Cost Optimization Strategies

Strategy	Savings	Effort
Use smaller LLM for simple intents	40-60%	Medium
Cache common responses	20-30%	Low
Reduce STT streaming window	10-15%	Low
Optimize prompt length	10-20%	Low
Route simple calls to rule-based IVR	50-70%	High
Negotiate volume pricing with providers	15-30%	Low

Cost Per Call Calculator

Cost per minute =
  STT ($0.006/min Deepgram)
  + LLM ($0.01-0.05/min depending on model & tokens)
  + TTS ($0.01-0.03/min depending on provider)
  + Telephony ($0.01-0.02/min Twilio)
  + Platform fee ($0.00-0.05/min if using managed)
  = ~$0.04-0.15/min

Average 3-minute call = $0.12-0.45/call
Human agent cost = $0.50-2.00/min = $1.50-6.00/call

ROI = (human_cost - ai_cost) × call_volume × 30 days

Scaling Checklist

Load test: can handle 2x expected peak concurrent calls
Auto-scaling configured for STT/LLM/TTS
Graceful degradation: "We're experiencing high call volume" message
Queue management with estimated wait times
Geographic routing for multi-region deployments
Failover: secondary STT/TTS provider configured
Rate limiting per caller (prevent abuse)

Phase 11: Advanced Patterns

Multi-Language Support

language_routing:
  detection_method: "first_3_seconds"  # Detect language from initial speech
  supported:
    - code: "en"
      voice_id: "alloy"
      system_prompt: "prompts/en.md"
    - code: "es"
      voice_id: "nova"
      system_prompt: "prompts/es.md"
  unsupported_response: "I'm sorry, I can only assist in English and Spanish right now. Let me transfer you to an agent."

Warm Transfer Protocol

warm_transfer:
  trigger: "caller_requests_human OR escalation_threshold"
  steps:
    1: "Agent to caller: 'I'm going to connect you with a specialist. One moment please.'"
    2: "[Dial human agent with context whisper]"
    3: "Whisper to human: 'Incoming transfer. Caller: [name]. Issue: [summary]. Already tried: [actions taken].'"
    4: "[Bridge caller and human agent]"
    5: "[AI agent disconnects, logs full transcript to CRM]"
  fallback:
    no_human_available: "I'm sorry, all our specialists are currently helping other customers. Can I schedule a callback for you?"

Sentiment-Adaptive Behavior

sentiment_adaptation:
  frustrated:
    - Slow down speech by 10%
    - Acknowledge frustration: "I understand this is frustrating."
    - Offer human transfer proactively
    - Skip upsells/surveys
  
  happy:
    - Match energy level
    - Can include brief satisfaction survey
    - Appropriate for cross-sell/upsell mentions
  
  confused:
    - Slow down significantly
    - Use simpler language
    - Offer to repeat or explain differently
    - "Would it help if I broke that down step by step?"

Voicemail & Async Patterns

voicemail:
  detection: "silence_or_beep_after_20s"
  message_template: |
    Hi [NAME], this is [AGENT] from [COMPANY] calling about [REASON].
    Please call us back at [NUMBER] at your convenience.
    Our hours are [HOURS]. Thank you!
  max_duration_seconds: 30
  retry_schedule: [4_hours, 24_hours, 72_hours]
  max_attempts: 3

Phase 12: Quality Scoring & Review

Voice Agent Quality Rubric (0-100)

Dimension	Weight	Score
Conversation accuracy (correct info, right actions)	25%	/25
Response latency (\x3C1.5s target)	20%	/20
Voice naturalness & comprehension	15%	/15
Error handling & recovery	15%	/15
Compliance adherence	10%	/10
Integration reliability (tools work)	10%	/10
User satisfaction (CSAT/transfer rate)	5%	/5
Total	100%	/100

Grading: 90+ = production-ready. 75-89 = good with improvements. 60-74 = needs work. \x3C60 = don't launch.

10 Common Mistakes

#	Mistake	Fix
1	Responses too long for phone	Max 2 sentences per turn
2	No filler during tool calls	Add "Let me check..." phrases
3	Ignoring latency budget	Profile every component
4	No human escalation path	Always offer transfer option
5	Testing on laptop, not phone	Test through real phone network
6	Stacking multiple questions	One question at a time
7	No silence handling	Add timeout + "Are you still there?"
8	Card numbers through LLM	Secure IVR redirect for payments
9	Ignoring recording consent laws	Disclose at call start
10	No post-call logging	Write summary + transcript to CRM

Weekly Review Template

weekly_review:
  date: ""
  calls_reviewed: 20
  scores:
    avg_accuracy: 0
    avg_latency_ms: 0
    escalation_rate: 0%
  top_3_issues:
    - issue: ""
      frequency: 0
      fix: ""
  improvements_shipped: []
  next_week_priorities: []

Natural Language Commands

"Design a voice agent for [use case]" → Full brief + conversation flow + system prompt
"Compare voice AI platforms for [requirements]" → Platform selection matrix
"Write a system prompt for a [role] voice agent" → Optimized voice prompt
"Create conversation flows for [scenario]" → Turn-by-turn YAML design
"Audit my voice agent for compliance" → Regulatory checklist by jurisdiction
"Calculate voice agent ROI for [volume] calls/day" → Cost analysis
"Design the test suite for my voice agent" → Complete test scenarios
"Optimize my voice agent latency" → Component-by-component analysis
"Set up monitoring for my voice agent" → Dashboard + alert rules
"Build a warm transfer protocol" → Complete handoff design
"Review this call transcript" → Score + improvement recommendations
"Scale my voice agent from [X] to [Y] calls/day" → Scaling plan

Built by AfrexAI — AI agents that work. Zero dependencies.

安全使用建议

This is a methodology/playbook (no code, no installs, no credentials), so installing it itself is low-risk and consistent with its stated purpose. Before you act on the recommendations: (1) Vet and provision any telephony/STT/TTS/LLM keys separately and follow least-privilege practices (rotate keys, use separate accounts/environments); (2) Get legal/compliance review for TCPA, HIPAA, PCI, and regional privacy laws before running automated or outbound voice agents; (3) When you implement code or use third-party SDKs named in the guide, inspect those packages and download them from official release channels (avoid unknown URLs); (4) Treat the external links and paid 'context packs' as third-party content—review them before paying or installing. Overall the skill is internally coherent and matches its description.

功能分析

Type: OpenClaw Skill Name: afrexai-voice-ai-engine Version: 1.0.0 The skill bundle is a comprehensive methodological guide and template collection for designing and deploying Voice AI agents. It contains no executable code and consists entirely of Markdown documentation and YAML templates (SKILL.md, README.md). The instructions are well-aligned with the stated purpose and include proactive safety advice, such as avoiding the processing of sensitive payment data (PCI compliance) and testing against prompt injection. No indicators of malicious intent, data exfiltration, or unauthorized execution were found.

能力评估

✓ Purpose & Capability

Name and description claim a methodology for voice agent design and deployment; the SKILL.md contains templates, decision trees, architecture diagrams, and vendor recommendations that match that purpose. There are no unrelated environment variables, binaries, or install steps requested.

✓ Instruction Scope

Runtime instructions are limited to design guidance, YAML templates, architecture/latency rules, compliance checklists, and vendor comparisons. The instructions do not direct the agent to read system files, environment variables, or to transmit data to unexpected endpoints. They do recommend integrating with external platforms (Twilio, Deepgram, ElevenLabs, LLM providers), which is appropriate for the topic but will require separate credentials and code when implemented.

✓ Install Mechanism

No install spec and no code files — instruction-only. This is the lowest-risk install mechanism; nothing is downloaded or written to disk by the skill itself.

✓ Credentials

Skill declares no required env vars, no primary credential, and no config paths. Recommendations reference third-party services, but the skill does not request or attempt to access those credentials. Expect that any real implementation based on these instructions will require service keys (telephony, STT, TTS, LLM) obtained separately.

✓ Persistence & Privilege

No 'always: true' privilege and default autonomous invocation is allowed (normal for skills). The skill does not request persistent system presence or make changes to other skills or system-wide configs.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install afrexai-voice-ai-engine
安装完成后，直接呼叫该 Skill 的名称或使用 /afrexai-voice-ai-engine 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

Voice AI Agent Engineering — Initial Release - Comprehensive methodology for designing, building, and deploying production-grade AI voice agents. - Covers use case selection, fit assessment, and best/worst scenarios for automation. - Provides detailed platform comparison matrix and architecture decision workflow. - Includes voice AI pipeline architecture, recommended components, and latency budget guidelines. - Features robust conversation design principles, YAML templates, and voice UX rules for high-quality interactions.

元数据

Slug afrexai-voice-ai-engine

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题