Chapter 22

Case Study: Smart Customer Service — Intent, Multi-Turn and Human Handoff

Chapter 22: Project — Intelligent Customer Service System

A customer service system that truly works doesn't use AI to block users — it ensures users meet the right kind of help at the right moment, whether that's AI or a human agent.

Chapter Overview

Intelligent customer service is both Dify's most common deployment scenario and the one with the most pitfalls. Many teams assume "LLM + FAQ knowledge base = intelligent customer service," only to face angry users after launch.

A production-grade intelligent customer service system must solve three core challenges:

  1. Accurate intent recognition: When a user says "my item is broken," the system must determine "after-sales repair request" rather than "product complaint"
  2. Multi-turn dialogue management: Across 10+ conversation turns, remember everything the user said and prevent them from repeating themselves
  3. Seamless human handoff: When AI cannot help, transfer the complete conversation context to a human agent — not make the user start over

Case background: A home furnishings e-commerce company (annual GMV RMB 300M), with approximately 8,000 customer service sessions per day, a 30-person CS team, and peak daily sessions during sales events reaching 25,000. AI customer service targets:


Level 1: Core Concepts (1–3 Years Experience)

What Makes Customer Service Unique

External customer service differs from internal knowledge assistants in critical ways:

Challenge 1: Users don't express themselves clearly

Internal employees ask: "How many days in advance must I apply for leave?" (structured)

External users ask: "Can I take tomorrow off?" / "Your broken system" / "What's that policy thing?" / "???"

AI must handle ambiguity, colloquial language, emotional expression, and incomplete information.

Challenge 2: Many complex intents

Common e-commerce customer service intents (partial list):

Pre-sale:
- Product inquiry (materials, dimensions, colors, stock)
- Price inquiry (discounts, comparison)
- Delivery inquiry (regions, timeframe)
- Installation inquiry (included, fees)

Post-sale:
- Order tracking (logistics status, shipping)
- Returns and exchanges (process, conditions, status)
- Repair requests (how to apply, cost)
- Complaints (product quality, service attitude)

Account:
- Password/login issues
- Points inquiry/redemption
- Address modification

Challenge 3: Complex system integrations

Customer service requires connecting to: order systems, logistics systems, CRM, ticketing systems, and human agent platforms — each one a "tool" that a Dify Agent must call.

Overall Architecture

User Interface (Web/App/WeChat Mini Program)
         |
         v
  Message Gateway (WebSocket/HTTP)
         |
         v
   Dify AI Customer Service App
   +-------------------------------------+
   |  Intent Recognition (Workflow)      |
   |       |                             |
   |  Multi-turn Dialogue Engine         |
   |    +-- Pre-sale FAQ knowledge base  |
   |    +-- Post-sale FAQ knowledge base |
   |    +-- Tool calls:                  |
   |         +-- Query order status      |
   |         +-- Query logistics         |
   |         +-- Submit refund request   |
   |         +-- Create support ticket   |
   |  Human Escalation Trigger           |
   +-------------------------------------+
         |                    |
         v                    v
   AI auto-reply         Human agent platform
                         (with full chat history)

Minimum Viable Version

Step 1: Create chatbot application

In Dify Console:

  1. New App → Chatbot
  2. Model: GPT-4 Turbo (strong comprehension needed for customer service)
  3. Enable conversation history: retain last 10 turns
  4. Associate knowledge bases: Pre-sale FAQ, Post-sale FAQ

Step 2: Base System Prompt

You are Xiao Mei, the customer service assistant for [Brand Name].

## Your capabilities:
- Answer product questions (materials, dimensions, usage)
- Look up order status (requires order number from user)
- Process return/exchange requests
- Resolve common after-sales issues

## Your principles:
1. Always be friendly and patient, even when the user is upset
2. Ask clarifying questions when uncertain, rather than guessing
3. For issues you cannot resolve, promptly inform the user you will transfer to a human agent
4. Do not promise specific compensation amounts (requires human approval)
5. For complaints, apologize first, then work toward a solution

## Your constraints:
- Handle only issues related to [Brand Name] products and services
- Do not recommend competitor products
- Do not comment on political or social topics

Level 2: Mechanism Deep Dive (3–5 Years Experience)

Intent Recognition Implementation

Relying solely on LLMs for intent recognition is expensive and slow. Use a layered approach:

Layer 1 (Rules): Regex + keyword matching
  - Covers 40% of high-frequency clear intents (e.g., "refund" keyword → refund intent)
  - Response time < 1ms, zero cost

Layer 2 (Small model): BERT/ALBERT classifier
  - Covers 85% of intents
  - Response time < 50ms, very low cost

Layer 3 (Large model): GPT-3.5/Claude Haiku
  - Handles remaining 15% of complex/ambiguous intents
  - Response time 500ms–1s, moderate cost

Python rule-based intent detection:

import re

INTENT_PATTERNS = {
    'refund': [r'refund', r'return', r'don.t want', r'send.*back'],
    'order_query': [r'order', r'shipping', r'delivery', r'tracking', r'dispatched'],
    'repair': [r'broken', r'damaged', r'repair', r'fix', r'malfunction'],
    'complaint': [r'terrible', r'scam', r'complain', r'unacceptable'],
    'product_inquiry': [r'material', r'size', r'color', r'model', r'spec'],
}

def rule_based_intent(user_input: str) -> dict:
    for intent, patterns in INTENT_PATTERNS.items():
        for pattern in patterns:
            if re.search(pattern, user_input, re.IGNORECASE):
                return {'intent': intent, 'confidence': 0.95, 'method': 'rule'}
    return {'intent': None, 'confidence': 0.0, 'method': 'rule'}

Multi-turn Dialogue State Management

The core challenge: how to collect all information needed to complete a task across multiple turns.

Dify tracks state using Variables. For a refund flow:

variables = {
    "refund_order_id": "",      # Order number
    "refund_reason": "",        # Reason for refund
    "refund_method": "",        # Refund method preference
    "refund_step": "initial",   # Current workflow step
    "user_emotion": "neutral",  # User emotional state
    "escalation_triggered": False
}

Slot-filling System Prompt:

You are the refund processing assistant. Guide the user through the refund application step by step.

## Required information (slots):
1. Order number [collected: {{refund_order_id}}]
2. Reason for refund [collected: {{refund_reason}}]
3. Preferred refund method [collected: {{refund_method}}]

## Current step: {{refund_step}}

## Instructions:
- If a slot is empty, politely ask for that piece of information
- Ask for only one piece of information at a time
- After the user provides information, confirm it and note the slot as filled
- Once all slots are filled, summarize the refund request and call the "submit_refund" tool

## Important:
- If user emotion is angry ({{user_emotion}} == "angry"), de-escalate before collecting information
- Refund amounts over $200 must be flagged for human review

Emotion Detection and Escalation Strategy

def should_escalate(context: dict) -> tuple:
    reasons = []
    
    # Rule 1: User explicitly requests human
    human_keywords = ['human', 'agent', 'real person', 'speak to someone', 'transfer me']
    if any(kw in context['last_user_message'].lower() for kw in human_keywords):
        reasons.append('User explicitly requested human agent')
    
    # Rule 2: Severe negative emotion
    if context['user_emotion'] in ['angry', 'very_angry']:
        reasons.append('User is highly upset')
    
    # Rule 3: AI failed 3+ times consecutively
    if context['ai_failure_count'] >= 3:
        reasons.append('AI failed to resolve issue 3 consecutive times')
    
    # Rule 4: High-value transaction
    if context.get('refund_amount', 0) > 200:
        reasons.append('Refund amount exceeds $200, requires human approval')
    
    # Rule 5: Excessive turns
    if context.get('turn_count', 0) > 15:
        reasons.append('Conversation exceeded 15 turns — likely complex issue')
    
    if reasons:
        return True, ' + '.join(reasons)
    return False, ''

Human Handoff: The Seamless Transition Protocol

def generate_handover_context(conversation: dict) -> dict:
    """Generate structured handover information for human agents"""
    
    extraction_prompt = f"""
    Extract key information from this customer service conversation in JSON format:
    
    Conversation:
    {format_conversation(conversation['history'])}
    
    Extract:
    {{
        "user_name": "user's name if mentioned",
        "order_id": "order number if mentioned",
        "product_name": "product name",
        "issue_summary": "2-sentence issue description",
        "user_requests": ["list of specific user requests"],
        "steps_taken": ["AI resolution steps already attempted"],
        "blockers": ["why AI could not resolve"],
        "urgency": "high/medium/low",
        "user_emotion": "emotional state",
        "recommended_action": "suggested action for human agent"
    }}
    """
    
    handover_info = llm.extract(extraction_prompt)
    
    return {
        "handover_time": datetime.now().isoformat(),
        "conversation_id": conversation['id'],
        "turn_count": len(conversation['history']) // 2,
        "escalation_reason": conversation['escalation_reason'],
        "structured_info": handover_info,
        "full_history_url": f"/conversations/{conversation['id']}"
    }

Level 3: Source Code and Architecture (5+ Years)

High-Concurrency Architecture

During sales events, daily sessions spike from 8,000 to 25,000, with peak concurrent sessions reaching 5,000.

Bottleneck analysis:

User → WebSocket server → Dify API → OpenAI API → Response

Bottleneck 1: OpenAI API concurrency (Tier 3: 10,000 RPM)
Bottleneck 2: Dify API single node (~500 concurrent connections)
Bottleneck 3: Session state storage (Redis memory)
Bottleneck 4: WebSocket connections (single node ~10,000)

Async streaming implementation:

import asyncio
import aiohttp
import redis.asyncio as aioredis
from typing import AsyncGenerator

redis_client = aioredis.from_url("redis://redis:6379")

DIFY_CONNECTION_POOL = aiohttp.TCPConnector(
    limit=200,
    limit_per_host=200,
    keepalive_timeout=60
)

async def handle_chat_message(
    session_id: str, user_message: str, user_id: str
) -> AsyncGenerator[str, None]:
    
    session_data = await redis_client.get(f"session:{session_id}")
    session = json.loads(session_data) if session_data else {
        "conversation_id": None, "turn_count": 0, "user_emotion": "neutral"
    }
    
    async with aiohttp.ClientSession(connector=DIFY_CONNECTION_POOL) as http_session:
        async with http_session.post(
            f"{DIFY_API_URL}/chat-messages",
            headers={"Authorization": f"Bearer {DIFY_APP_TOKEN}"},
            json={
                "query": user_message,
                "response_mode": "streaming",
                "conversation_id": session.get("conversation_id") or "",
                "user": user_id
            }
        ) as response:
            async for line in response.content:
                line_str = line.decode('utf-8').strip()
                if line_str.startswith('data: '):
                    data = json.loads(line_str[6:])
                    if data.get('event') == 'message':
                        yield data.get('answer', '')
                    elif data.get('event') == 'message_end':
                        session['conversation_id'] = data.get('conversation_id')
    
    session['turn_count'] = session.get('turn_count', 0) + 1
    
    should_transfer, reason = should_escalate({
        'last_user_message': user_message,
        'user_emotion': session['user_emotion'],
        'turn_count': session['turn_count']
    })
    
    if should_transfer:
        yield "\n\n[SYSTEM:ESCALATE]"
        await trigger_escalation(session_id, session, reason)
    
    await redis_client.setex(f"session:{session_id}", 86400, json.dumps(session))

WebSocket server:

from fastapi import FastAPI, WebSocket, WebSocketDisconnect

app = FastAPI()
active_connections = {}

@app.websocket("/ws/{session_id}")
async def websocket_endpoint(websocket: WebSocket, session_id: str):
    await websocket.accept()
    active_connections[session_id] = websocket
    
    try:
        while True:
            data = await websocket.receive_json()
            
            # Critical: check if session was transferred to human agent
            status = await redis_client.get(f"session:{session_id}:status")
            if status == b"human_agent":
                await forward_to_human_agent(session_id, data['message'])
                continue
            
            async for chunk in handle_chat_message(
                session_id, data['message'], data['user_id']
            ):
                if chunk == "[SYSTEM:ESCALATE]":
                    await websocket.send_json({
                        "type": "escalation",
                        "message": "Connecting you to a human agent, please wait..."
                    })
                else:
                    await websocket.send_json({"type": "chunk", "content": chunk})
            
            await websocket.send_json({"type": "message_end"})
    
    except WebSocketDisconnect:
        del active_connections[session_id]

Level 4: Production Traps and Decisions (Expert Perspective)

Trap 1: Excessive Apologies Make Users Angrier

Symptom: AI says "I'm very sorry for the inconvenience" to every message. Users respond: "You just keep apologizing — can you actually solve my problem?"

Root cause: System Prompt over-emphasizes "friendly" and "apologetic" behavior, so AI treats apologies as a universal formula.

Fix:

## Apology guidelines:
- Use apology phrases ("sorry," "I apologize," "I regret") at most 2 times per conversation
- After apologizing, immediately provide a solution or next action step
- Do not include apologies in greetings (never say "Hello, I'm sorry to hear...")

Trap 2: AI Fabricates Information When Tools Fail

Symptom: The order query API times out, so the AI invents a logistics update and tells the user.

Root cause: No strict fallback behavior defined for tool failures.

Fix:

## Tool failure handling:
When a query tool returns an error or times out:
1. Clearly inform the user: "The system query timed out; I'm unable to retrieve this information right now"
2. Provide a manual lookup link or phone number
3. NEVER guess or fabricate query results
4. After 2 failed retries, proactively initiate human transfer

Trap 3: Slot Pollution Across Intents

Symptom: User asks about a refund for Order A, then chats about something else, then asks about a refund again — AI still uses Order A's data.

Root cause: Dify Variables persist across the entire session with no reset mechanism.

Fix: Explicitly clear relevant slots when intent switches:

def handle_intent_switch(new_intent: str, current_intent: str, variables: dict) -> dict:
    if new_intent != current_intent:
        if new_intent == 'refund':
            variables['refund_order_id'] = ''
            variables['refund_reason'] = ''
            variables['refund_method'] = ''
            variables['refund_step'] = 'initial'
        variables['current_intent'] = new_intent
    return variables

Trap 4: AI Resumes Responding After Human Takeover

Symptom: After transfer to a human agent, the user keeps typing, and the AI automatically responds again — simultaneously with the human agent. Users are confused by two conflicting answers.

Fix: Lock the session after human takeover:

async def check_human_takeover(session_id: str) -> bool:
    status = await redis_client.get(f"session:{session_id}:status")
    return status == b"human_agent"

# In the WebSocket handler, check before every AI call
if await check_human_takeover(session_id):
    await forward_to_human_agent(session_id, data['message'])
    continue  # Do NOT invoke AI

Key Metrics and Success Benchmarks

Metric Target Industry Average This Case (Month 3)
AI autonomous resolution rate ≥ 70% 45–55% 68%
Wait time after human transfer < 2 min 5–8 min 1.5 min
CSAT score ≥ 4.2/5 3.8/5 4.1/5
Average conversation turns < 6 8–10 5.3
Intent recognition accuracy ≥ 90% 75–80% 93%
Peak concurrent sessions 5,000 4,800 (measured during 618 sale)

Chapter Summary

Core design principles:

  1. Layered intent recognition: Rules → small model → large model. 70% of intents should be resolved by the rule layer for speed and cost efficiency.

  2. Emotion-driven escalation: Detected anger should immediately trigger human transfer — do not let AI attempt to "calm down" a furious user.

  3. Seamless handoff is the experience differentiator: Human agents must receive a complete, structured context summary so they can "pick up where the AI left off."

  4. Session locking after human takeover: AI must stop responding the moment a human agent takes over. Double responses destroy user trust.

  5. Tool failure must have a graceful fallback: API timeouts cannot result in AI guessing or fabricating. Always inform the user clearly and provide alternatives.

  6. Monitor intent accuracy weekly: Sample-check intent recognition results regularly and adjust rules and prompts when drift is detected.

Rate this chapter
4.5  / 5  (7 ratings)

💬 Comments