Chapter 21

Case Study: Enterprise Knowledge Assistant — Full Delivery Lifecycle

Chapter 21: Project — Enterprise Internal Knowledge Assistant End-to-End

Using a real 800-person manufacturing company case, this chapter walks the complete journey from requirements research through architecture design, knowledge base construction, RAG tuning, integration, and launch.

Chapter Overview

The "enterprise internal knowledge assistant" is one of Dify's most common deployment scenarios. But most teams make the same mistake: dump all documents into a knowledge base, tweak a few parameters, go live, then discover users are unsatisfied with the answers.

A genuinely successful enterprise knowledge assistant requires six stages: requirements deep-dive → document governance → tiered knowledge base design → RAG tuning → integration development → user acceptance → continuous iteration.

Case background: A precision manufacturing company (800 employees) with the following knowledge assets:

Target: Within 3 months of launch, increase the rate of employees self-resolving issues via AI from 20% to 65%, and reduce repetitive HR and IT support tickets by 40%.


Level 1: Core Concepts (1–3 Years Experience)

Requirements Research: Finding the Real Pain Points

Before writing any code, spend 2 weeks on requirements research.

Method 1: Shadow work

Follow HR staff and IT engineers for 2 days, recording:

Actual research findings from this company:

Query Type Daily Volume Avg. Handling Time AI-Solvable Rate
Leave application process 23 5 min 95%
Expense reimbursement rules 18 8 min 90%
Social insurance questions 12 15 min 70%
System operation questions 31 12 min 80%
Product specification queries 15 20 min 85%
Supplier certification requirements 8 30 min 60%

Document Governance: Garbage In, Garbage Out

Knowledge base quality directly determines AI answer quality.

Problem 1: Version chaos

Companies often have multiple versions of the same document (2019 version, 2021 version, latest). Mixed together, the AI gives outdated information.

Solution: Establish document version control:

Naming convention: [doc_type]_[version]_[effective_date].pdf
Example: employee_handbook_v3.2_20240101.pdf

Upload rules:
- When a new version is uploaded, retire older versions of the same type
- Add metadata to each Dify document: version, effective_date, department

Problem 2: Scanned PDFs with no extractable text

Manufacturing companies often have many scanned PDFs. Dify's default PDF parser cannot process them.

Solution: Pre-processing pipeline:

import pytesseract
from pdf2image import convert_from_path
from pathlib import Path

def ocr_pdf(input_path: str, output_path: str, lang: str = 'eng+chi_sim') -> str:
    images = convert_from_path(input_path, dpi=300)
    text_pages = []
    for i, image in enumerate(images):
        text = pytesseract.image_to_string(image, lang=lang)
        text_pages.append(f"=== Page {i+1} ===\n{text}")
    full_text = '\n\n'.join(text_pages)
    with open(output_path, 'w', encoding='utf-8') as f:
        f.write(full_text)
    return full_text

def batch_process(input_dir: str, output_dir: str):
    Path(output_dir).mkdir(exist_ok=True)
    for pdf in Path(input_dir).glob('**/*.pdf'):
        out_file = Path(output_dir) / pdf.with_suffix('.txt').name
        if not out_file.exists():
            print(f"Processing: {pdf.name}")
            ocr_pdf(str(pdf), str(out_file))

Creating the Knowledge Base in Dify

Step 1: Layered knowledge base structure

Do not put all documents in one knowledge base. Separate by domain:

Knowledge bases:
├── kb-hr          — HR policies and procedures
├── kb-it          — IT system operation
├── kb-product     — Product technical documentation
└── kb-quality     — Quality and compliance

Step 2: Chunking configuration

In Dify knowledge base settings:

Segmentation rule: Automatic
Max segment length: 500 tokens
Segment overlap: 50 tokens
Embedding model: text-embedding-ada-002

Step 3: Retrieval configuration

Retrieval mode: Hybrid (vector + keyword)
Vector weight: 0.7
Keyword weight: 0.3
Top K: 5
Similarity threshold: 0.5
Re-ranking: Enabled (BGE Reranker)

Level 2: Mechanism Deep Dive (3–5 Years Experience)

Architecture: Multi-App with Routing Layer

The company adopted a "single unified entry + multiple specialized assistants" architecture:

User Entry (Enterprise WeChat Bot / Internal Portal)
          |
          v
    Routing Layer (Dify Workflow)
    +-----------------------------+
    | Intent classification:      |
    | - HR queries → HR bot       |
    | - IT queries → IT bot       |
    | - Product queries → Product |
    | - Quality queries → Quality |
    | - Other → General bot       |
    +-----------------------------+
          |
    +-----+-----+------------+----------+
    v           v            v          v
 HR Bot      IT Bot    Product Bot  Quality Bot
  (RAG)      (RAG)       (RAG)        (RAG)
    |           |            |          |
 kb-hr       kb-it      kb-product  kb-quality

RAG Parameter Tuning

Parameter Default Recommended Reason
Chunk Size 500 tokens 400 tokens Dense information in policy docs; smaller chunks are more precise
Chunk Overlap 50 tokens 80 tokens Policy docs have cross-paragraph dependencies
Top K 3 5 HR questions may involve multiple regulations
Similarity threshold 0.5 0.6 Higher precision, less noise
Reranker Off On Significantly improves ranking quality

Hybrid Search Internal Mechanism

Dify's hybrid search combines BM25 (keyword matching) with vector search (semantic similarity) using Reciprocal Rank Fusion (RRF):

def reciprocal_rank_fusion(rankings: list, k: int = 60) -> list:
    scores = {}
    for ranking in rankings:
        for rank, doc_id in enumerate(ranking):
            if doc_id not in scores:
                scores[doc_id] = 0
            scores[doc_id] += 1 / (k + rank + 1)
    return sorted(scores.keys(), key=lambda x: scores[x], reverse=True)

System Prompt Engineering

HR assistant System Prompt after 6 iterations:

You are the HR AI assistant for [Company Name], responsible for answering 
employee questions about company policies, benefits, and procedures.

## Your principles:

1. **Answer only from the knowledge base**: If no relevant information exists,
   say clearly "I cannot find information about this in the knowledge base. 
   Please contact HR directly."

2. **Cite document sources**: Note which document the information comes from,
   e.g., "According to the Employee Handbook, Chapter 3..."

3. **Give complete workflows**: For process questions (leave, reimbursement),
   list every step completely.

4. **Distinguish rule absoluteness**: Clearly differentiate "must" (regulatory
   requirement) from "recommended" (company encouragement).

5. **Transfer sensitive issues to humans**: For personal salary data, 
   performance disputes, or labor arbitration, guide employees to HR staff.

6. **Professional and friendly tone**: Use polite, professional language.

## What you must NOT do:
- Guess or infer policies (only cite existing documents)
- Promise exceptions or special handling
- Comment on the reasonableness of company policies
- Reveal other employees' personal information

Enterprise WeChat Integration

import requests
from flask import Flask, request

app = Flask(__name__)

DIFY_API_URL = "https://dify.yourcompany.com/v1"
DIFY_APP_TOKEN = "your-dify-app-token"
conversation_sessions = {}  # Use Redis in production

@app.route('/wechat/callback', methods=['POST'])
def wechat_callback():
    data = parse_wechat_message(request.data)
    if data['MsgType'] != 'text':
        return reply_text(data['FromUserName'], "Text messages only, please.")
    
    user_id = data['FromUserName']
    user_query = data['Content']
    conversation_id = conversation_sessions.get(user_id)
    
    response = requests.post(
        f'{DIFY_API_URL}/chat-messages',
        headers={'Authorization': f'Bearer {DIFY_APP_TOKEN}'},
        json={
            'inputs': {},
            'query': user_query,
            'response_mode': 'blocking',
            'conversation_id': conversation_id or '',
            'user': 'enterprise_wechat'
        },
        timeout=30
    )
    
    result = response.json()
    conversation_sessions[user_id] = result.get('conversation_id')
    answer = result.get('answer', 'Sorry, I cannot answer this at the moment.')
    
    return reply_text(user_id, answer)

Level 3: Source Code and Architecture (5+ Years)

RAG Quality Evaluation Framework

from ragas import evaluate
from ragas.metrics import faithfulness, answer_relevancy, context_recall, context_precision
from datasets import Dataset

def evaluate_rag_quality(test_cases: list) -> dict:
    """
    Evaluate RAG system quality using RAGAS framework.
    
    test_cases format:
    [
        {
            "question": "How many days of annual leave?",
            "contexts": ["According to the handbook, employees with 1-10 years..."],
            "answer": "Employees with 1-10 years receive 5 days of annual leave...",
            "ground_truth": "1-10 years: 5 days; 10+ years: 10 days"
        }
    ]
    """
    dataset = Dataset.from_list(test_cases)
    return evaluate(
        dataset,
        metrics=[faithfulness, answer_relevancy, context_recall, context_precision]
    )

def weekly_quality_check(test_set: list):
    for case in test_set:
        response = call_dify_with_context(case['question'])
        case['contexts'] = response['retrieval_documents']
        case['answer'] = response['answer']
    
    metrics = evaluate_rag_quality(test_set)
    
    if metrics['faithfulness'] < 0.75:
        send_alert(f"RAG faithfulness dropped to {metrics['faithfulness']:.2f}")
    
    return metrics

Automated Document Sync

import hashlib, json, os
from pathlib import Path
import requests

class DifyKnowledgeBaseSync:
    def __init__(self, dataset_id: str, api_key: str, api_url: str):
        self.dataset_id = dataset_id
        self.api_key = api_key
        self.api_url = api_url
        self.state_file = f".sync_state_{dataset_id}.json"
        self.state = self._load_state()
    
    def _load_state(self) -> dict:
        if os.path.exists(self.state_file):
            with open(self.state_file) as f:
                return json.load(f)
        return {}
    
    def _save_state(self):
        with open(self.state_file, 'w') as f:
            json.dump(self.state, f)
    
    def _file_hash(self, file_path: str) -> str:
        with open(file_path, 'rb') as f:
            return hashlib.md5(f.read()).hexdigest()
    
    def sync_directory(self, docs_dir: str):
        docs_path = Path(docs_dir)
        for doc_file in docs_path.glob('**/*'):
            if not doc_file.is_file():
                continue
            if doc_file.suffix not in ['.pdf', '.docx', '.txt', '.md']:
                continue
            
            file_path = str(doc_file)
            current_hash = self._file_hash(file_path)
            
            if file_path not in self.state:
                print(f"New document: {doc_file.name}")
                doc_id = self._upload_document(file_path)
                self.state[file_path] = {'hash': current_hash, 'dify_doc_id': doc_id}
            elif self.state[file_path]['hash'] != current_hash:
                print(f"Updated document: {doc_file.name}")
                self._delete_document(self.state[file_path]['dify_doc_id'])
                doc_id = self._upload_document(file_path)
                self.state[file_path] = {'hash': current_hash, 'dify_doc_id': doc_id}
        
        self._save_state()

Level 4: Production Traps and Decisions (Expert Perspective)

Pre-Launch Acceptance Checklist

Before opening to all employees:

Functional tests (minimum 50 test cases):
✅ Accuracy rate on standard questions ≥ 90%
✅ Refuses to answer questions outside knowledge base (no hallucination)
✅ Retains multi-turn conversation context correctly
✅ Proactively requests clarification on ambiguous questions
✅ Correctly escalates sensitive questions to humans
✅ Handles special characters and emoji without crashing
✅ Correctly handles overly long inputs (> 2,000 characters)

Performance test targets (800 employees, 50 concurrent users):
- P50 response time < 3s
- P95 response time < 8s
- Error rate < 1%
- Throughput ≥ 20 requests/second

3-Month Post-Launch Retrospective

Month 1: Users didn't trust AI answers

Symptom: Users received AI answers but still confirmed with HR.

Root cause: No source citations; users couldn't assess reliability.

Fix: Modified System Prompt to always append sources:

---
Source: Employee Handbook, Chapter X (v3.2, effective January 2024)
Questions? Contact HR: [email protected] | Ext. 1234

Result: User satisfaction improved from 62% to 81%.

Month 2: Technical document retrieval inaccurate

Symptom: Product spec tables (material parameters, dimensional tolerances) couldn't be found accurately.

Root cause: PDF table structure was lost when converted to plain text.

Fix: Use camelot for table extraction:

import camelot

def extract_tables_from_pdf(pdf_path: str) -> str:
    tables = camelot.read_pdf(pdf_path, pages='all', flavor='stream')
    markdown_tables = []
    for i, table in enumerate(tables):
        md_table = table.df.to_markdown(index=False)
        markdown_tables.append(f"## Table {i+1}\n{md_table}")
    return '\n\n'.join(markdown_tables)

Month 3: Answer quality degraded in long conversations

Symptom: After 5+ conversation turns, the AI started mixing up context from different turns.

Root cause: Growing conversation history filled the context window, causing retrieved documents to be compressed or dropped.

Fix: Limit history tokens and auto-summarize older turns:

MAX_HISTORY_TOKENS = 2000

def manage_conversation_history(history: list, max_tokens: int) -> list:
    total_tokens = sum(estimate_tokens(msg) for msg in history)
    if total_tokens <= max_tokens:
        return history
    recent = history[-6:]  # Keep last 3 turns
    older = history[:-6]
    if older:
        summary = summarize_history(older)
        return [{'role': 'system', 'content': f'[Earlier summary]\n{summary}'}] + recent
    return recent

Knowledge Base Scaling Pitfalls

When a single knowledge base exceeds 10,000 chunks, a counterintuitive phenomenon appears: more documents actually reduces retrieval precision. The vector space becomes so dense that similarity scores compress into a narrow band (0.6–0.7), making it impossible to distinguish relevant from irrelevant results.

Solutions:

  1. Domain sharding: Keep each knowledge base under 5,000 chunks
  2. Raise similarity threshold from 0.5 to 0.65
  3. Add metadata filtering: filter by department/doc_type before vector search
  4. Regular cleanup: remove outdated and duplicate chunks

Chapter Summary

Project timeline reference:

Phase Duration Key Deliverables
Requirements research Weeks 1–2 Problem inventory, priority ranking
Document governance Weeks 3–4 Clean document library, version control rules
Knowledge base construction Weeks 5–6 4 knowledge bases, documents indexed
Application development Weeks 7–9 Dify app configuration, integration code
Testing and tuning Weeks 10–11 Acceptance tests passed, parameters optimized
Pilot rollout Week 12 50-user pilot, feedback collected
Full launch Weeks 13–14 All-staff access, training delivered
Continuous iteration Ongoing Weekly data analysis, monthly knowledge base refresh

Critical success factors:

  1. Document quality first: 70% of RAG quality issues stem from the documents themselves (outdated, poor format, chaotic structure).
  2. Domain-separated knowledge bases: Never try to solve everything with one monolithic knowledge base.
  3. Continuous evaluation: Build a Golden Set of test cases and run automated quality checks weekly.
  4. User feedback loop: Always provide a "Was this answer helpful?" mechanism and act on the data.
  5. Human fallback: Always preserve an easy path to reach a real person — AI should never be the final wall users hit.
Rate this chapter
4.7  / 5  (8 ratings)

💬 Comments