Chapter 2

Core Concepts: App Types, Workflows, Knowledge Base and Agent Relationships

Chapter 2: Core Concepts Overview — Applications, Workflows, Knowledge Bases, and Agent Relationships

Before touching any configuration, build a mental map of Dify's concepts — understanding the boundaries and relationships of these four core modules is the prerequisite for using Dify effectively.

Chapter Overview

Many people approach Dify by clicking around the interface, exploring whatever they see, and searching the documentation when stuck. This approach solves immediate problems but often leads to confusion about "why is this feature here?" and uncertainty about which module to use in complex scenarios.

This chapter aims to give you a conceptual map of Dify. We'll systematically examine Dify's four core modules: Application Types, Workflow, Knowledge Base, and Agent, along with their relationships and appropriate use boundaries.

By the end of this chapter, you will be able to:

Clearly distinguish between Dify's five application types and their appropriate use cases
Understand the fundamental difference between Workflows and Chat Assistants
Know what role the Knowledge Base plays in the overall system
Understand Agent reasoning mechanisms and usage limitations
Quickly determine which module to use when facing new requirements

Level 1: Foundational Understanding (1-3 Years Experience)

Dify's Four Core Modules

All of Dify's functionality can be organized into four core modules that are both independent and interdependent:

┌─────────────────────────────────────────────────────────┐
│                   Application Layer                     │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌────────┐  │
│  │  Chat    │  │  Text    │  │Workflow  │  │ Agent  │  │
│  │Assistant │  │Generator │  │          │  │        │  │
│  └──────────┘  └──────────┘  └──────────┘  └────────┘  │
└─────────────────────────────────────────────────────────┘
              ↕ calls                    ↕ calls
┌─────────────────────────┐  ┌─────────────────────────────┐
│   Knowledge Base (RAG)  │  │          Models              │
│  Doc Storage + Vectors  │  │  LLM + Embedding + Rerank   │
└─────────────────────────┘  └─────────────────────────────┘

This structure reveals a key fact: Knowledge bases and models are infrastructure; applications are the upper layer that uses this infrastructure. Whether it's a Chat Assistant, Workflow, or Agent, all can call the same knowledge base and use different models.

Five Application Types Explained

1. Chat Assistant

The most commonly used type. Users engage in multi-turn conversations with AI that remembers conversation history.

Core characteristics:

Has conversation history (Memory)
Question-and-answer format, waiting for user input each time
Supports linking to knowledge bases (RAG)
Supports tool calls (Tools)

Typical use cases: Customer service bots, personal assistants, professional consulting (legal, medical domain Q&A)

A practical configuration example:

Application Type: Chat Assistant
System Prompt: You are a professional legal consulting assistant, focused on Chinese labor law...
Linked Knowledge Base: Labor Law Database (contains Labor Contract Law, Labor Dispute Mediation and Arbitration Law, etc.)
Model: GPT-4o
Conversation History Rounds: 10 (retain most recent 10 rounds)

2. Text Generator

Single input, single output. No conversation history — each request is independent.

Core characteristics:

No conversation history
Usually has fixed input forms (e.g., "article topic," "word count" variables)
Suited for batch, standardized content generation

Typical use cases: Bulk SEO article generation, automated product description writing, code comment generation, email template generation

Key difference from Chat Assistant:

Chat Assistant: User drives conversation direction, back-and-forth interaction
Text Generator: User fills fixed form, AI outputs according to template

3. Workflow

Multiple processing nodes executed in sequence; each node can be an LLM call, code execution, HTTP request, conditional logic, etc.

Core characteristics:

Fixed processing pipeline (DAG graph)
Each node has a different type with a specific role
Supports conditional branches and loops (Iterator node)
Clear inputs and outputs

Typical use cases:

Content moderation pipeline: Receive article → Check violations → Auto-fix → Human review → Publish
Customer service tiering: Receive question → Classify (product/technical/complaint) → Route to appropriate knowledge base → Generate answer
Data analysis report: Receive data → Clean data → Call LLM for analysis → Formatted output

Workflow vs. Chat Assistant selection principle:

Fixed process, clearly defined steps → Workflow
Free interaction, user-driven → Chat Assistant

4. Chatflow

A hybrid of Workflow and Chat Assistant. Has a fixed processing pipeline but also supports multi-turn conversation.

Core characteristics:

Has workflow's node orchestration capability
Also has Chat Assistant's multi-turn conversation memory
Suited for "conversation with process constraints" scenarios

Typical use cases: Medical consultation assistants (need to collect symptom information following a fixed process, while supporting conversational interaction)

5. Agent (Intelligent Agent)

Provides AI with a toolset and lets AI autonomously decide which tools to call and in what order to complete the user's task.

Core characteristics:

Autonomous decision-making (AI decides what to do, not you)
Tool calls (search, calculator, database queries, etc.)
ReAct loop (Think → Act → Observe → Think...)
Unpredictability (execution path may differ each time)

Typical use cases: Data analysis agents (letting AI decide which database to query, what calculations to perform), travel planning agents

What Is the Knowledge Base?

The Knowledge Base is Dify's module for storing and retrieving documents. Essentially, it does the following:

Document Processing: Splits uploaded documents (PDF, Word, Markdown, etc.) into small chunks
Vectorization: Calls an Embedding model to convert each text chunk into a vector
Storage: Stores vectors and original text in a vector database
Retrieval: When an application needs it, vectorizes the user's question and finds the most similar text chunks

The Knowledge Base itself is not an application — it's a resource called by applications. A single knowledge base can be shared by multiple applications.

Understanding All Four Components Through One Scenario

Suppose you're building an AI system for a law firm:

Requirements Analysis:
├── Lawyers' daily consultation assistant → Chat Assistant (multi-turn dialogue, linked to legal knowledge base)
├── Contract auto-review                 → Workflow (receive contract → clause extraction → risk analysis → generate report)
├── Case summary generation              → Text Generator (input case description, output structured summary)
├── Legal research assistant             → Agent (autonomous search of legal databases, case databases, integrated analysis)
└── Legal Knowledge Base                 → Knowledge Base (stores all regulations, contract templates, case documents)

The same "Legal Knowledge Base" is used simultaneously by the Chat Assistant, Workflow, and Agent. This is the value of modularity.

Level 2: Mechanism Deep Dive (3-5 Years Experience)

Technical Differences Behind Application Types

From a technical implementation perspective, the core differences among the five application types lie in state management approach and execution control flow:

Application Type	State Management	Execution Control	Termination Condition
Chat Assistant	DB-stored conversation history	Single LLM call	Ends after one response
Text Generator	Stateless	Single LLM call	Ends after one response
Workflow	Variable passing between nodes	DAG sequential execution	Reaches termination node
Chatflow	Conversation history + node variables	DAG + conversation state	Reaches response node
Agent	Tool call result caching	ReAct loop	Max iterations reached or task complete

Workflow Node Types in Detail

Workflow supports these node types in Dify v0.10+:

Node Types
├── Basic Nodes
│   ├── Start: Workflow entry point, defines input variables
│   ├── End: Workflow exit point, defines output content
│   └── Answer: In Chatflow, replies directly to user
├── LLM Nodes
│   ├── LLM: Calls large language model
│   └── Knowledge Retrieval: Retrieves from knowledge base
├── Data Processing Nodes
│   ├── Code: Executes Python/JavaScript code
│   ├── Template Transform: Jinja2 template rendering
│   └── Variable Aggregator: Merges multiple variables
├── Flow Control Nodes
│   ├── IF/ELSE: Selects execution path based on condition
│   ├── Iteration: Performs same operation on each list element
│   └── Parameter Extractor: Extracts structured data from text
└── External Data Nodes
    ├── HTTP Request: Calls external APIs
    └── Tool: Calls Dify built-in or custom tools

Important detail: Code node sandbox mechanism

Code nodes execute Python/JS code but have strict restrictions:

Cannot access the file system
Cannot import third-party libraries (standard library only)
Execution timeout: 5 seconds (configurable, max 60 seconds)
Memory limit: 256MB

# Code node example: Extract amounts from text
def main(text: str) -> dict:
    import re
    # Match amount formats (e.g., $1,234.56 or 1234.56 USD)
    pattern = r'\$?\d{1,3}(?:,\d{3})*(?:\.\d{2})?(?:\s*USD)?'
    amounts = re.findall(pattern, text)
    return {
        "amounts": amounts,
        "count": len(amounts)
    }

Knowledge Base Retrieval Configuration in Detail

Knowledge base retrieval has three modes — understanding their differences is critical:

Vector Search

Converts the query into a vector and finds the most similar document chunks via cosine similarity.

Advantages: Strong semantic understanding — "Apple phone" and "iPhone" can match Disadvantages: Weak at exact keyword matching — poor for specific numbers and proper nouns Best for: Semantically fuzzy questions, like "When does my contract expire?"

Full-text Search

BM25 algorithm-based keyword retrieval.

Advantages: Exact keyword matching — great for numbers, codes, proper nouns Disadvantages: No semantic understanding — "phone" won't match "iPhone" Best for: Precise queries, like "clauses in contract SH-2024-001"

Hybrid Search

Runs both vector search and full-text search simultaneously, merging results using a weighted algorithm (such as RRF, Reciprocal Rank Fusion).

Advantages: Balances semantic and exact matching Disadvantages: Running two retrievals simultaneously has higher performance overhead Best for: Most real-world scenarios (Dify's recommended default)

Configuration recommendation:

# Typical hybrid retrieval configuration
retrieval_mode: hybrid
vector_weight: 0.7      # Vector retrieval weight 70%
keyword_weight: 0.3     # Keyword retrieval weight 30%
top_k: 5               # Recall top 5 chunks
score_threshold: 0.4   # Similarity threshold (discard chunks below this)
reranking_enable: true  # Enable reranking (recommended)
reranking_model: bge-reranker-v2-m3  # Reranking model

Agent's ReAct Reasoning Mechanism

The ReAct (Reasoning + Acting) pattern used by Agent is the key to understanding Agent behavior:

User question: "Check the latest GPT-4 pricing and calculate the cost of 10,000 calls"

Round 1 of Reasoning:
  Thought: I need to check the latest GPT-4 pricing first
  Action: Call web_search tool, search "GPT-4 API pricing 2024"
  Observation: Results show GPT-4o input $5/1M tokens, output $15/1M tokens

Round 2 of Reasoning:
  Thought: I have the pricing data, now I need to calculate costs
  Action: Call calculator tool, calculate cost for 10,000 calls
  Observation: Assuming average 500 input + 200 output tokens per call...

Round 3 of Reasoning:
  Thought: Calculation complete, can provide final answer
  Final Answer: Based on current pricing, 10,000 calls would cost approximately...

Agent's maximum iteration count defaults to 5. This means if the task cannot be completed within 5 tool calls, Agent forcibly terminates and provides whatever information it has gathered.

Variable System: The Connective Tissue Between Modules

Dify's variable system is the key to understanding data flow between modules:

Variable Scopes
├── System Variables
│   ├── {{sys.user_id}}         — Current user ID
│   ├── {{sys.app_id}}          — Application ID
│   └── {{sys.conversation_id}} — Conversation ID
├── Application Variables
│   └── {{variable_name}} defined in "Prompt"
├── Conversation Variables
│   └── Variables persisting across turns (Chatflow only)
└── Workflow Variables
    └── Node outputs: {{node_id.output.field}}

A common variable usage example (in a workflow):

Node 1 (Parameter Extractor) → Output: {{extract.customer_name}}, {{extract.issue_type}}
       ↓
Node 2 (IF/ELSE) → Condition: {{extract.issue_type}} == "technical issue"
       ↓                              ↓
Node 3a (Technical KB Retrieval)   Node 3b (Support KB Retrieval)
       ↓                              ↓
Node 4 (LLM Answer Generation) ← Merges retrieval results from both paths

Level 3: Source Code and Principles (5+ Years Experience)

Workflow Engine Internal Implementation

Dify's workflow engine lives in api/core/workflow/, with the core class being WorkflowEngineManager.

Graph storage format (JSON structure in PostgreSQL):

{
  "nodes": [
    {
      "id": "node_start",
      "type": "start",
      "data": {
        "variables": [
          {"variable": "user_query", "type": "string", "required": true}
        ]
      },
      "position": {"x": 100, "y": 200}
    },
    {
      "id": "node_llm_1",
      "type": "llm",
      "data": {
        "model": {"provider": "openai", "name": "gpt-4o", "mode": "chat"},
        "prompt_template": [
          {"role": "system", "text": "You are an assistant"},
          {"role": "user", "text": "{{#node_start.user_query#}}"}
        ]
      },
      "position": {"x": 400, "y": 200}
    }
  ],
  "edges": [
    {
      "id": "edge_1",
      "source": "node_start",
      "target": "node_llm_1",
      "sourceHandle": "source",
      "targetHandle": "target"
    }
  ]
}

Note the variable reference format: {{#node_id.output_field#}} — this differs from {{variable}} in Prompts. The former is an internal workflow variable reference; the latter is an application-level variable.

Event-driven model for workflow execution:

# Simplified workflow execution event stream
class WorkflowRunState:
    def __init__(self):
        self.node_run_results: dict[str, NodeRunResult] = {}
        self.total_tokens: int = 0
        self.start_at: datetime = datetime.utcnow()

# Events emitted as each node executes
class NodeRunEvent:
    class NodeRunStarted(Event):
        node_id: str
        node_type: str
        
    class NodeRunSucceeded(Event):
        node_id: str
        outputs: dict
        elapsed_time: float
        
    class NodeRunFailed(Event):
        node_id: str
        error: str

This event-driven design lets the frontend receive each node's execution status in real-time via SSE (Server-Sent Events), creating the "nodes lighting up one by one" effect you see in the Dify interface.

Agent Engine Implementation: From ReAct to Function Calling

Dify's Agent engine supports two strategies, automatically selected based on model capabilities:

Strategy 1: Function Calling (for models like GPT-4, Claude 3+ that support tool calls)

# Using OpenAI Function Calling
response = openai.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=[
        {
            "type": "function",
            "function": {
                "name": "web_search",
                "description": "Search the web for information",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "query": {"type": "string", "description": "Search query"}
                    },
                    "required": ["query"]
                }
            }
        }
    ],
    tool_choice="auto"  # Let the model decide whether to call tools
)

Strategy 2: ReAct Prompt (for models that don't support Function Calling)

For models without native tool call support, Dify simulates ReAct behavior through carefully designed Prompts:

You are an AI assistant that can use tools. Available tools:
- web_search(query): Search the web
- calculator(expression): Calculate mathematical expressions

When you need to use a tool, reply in this format:
Thought: [your reasoning process]
Action: tool_name
Action Input: {"param_name": "param_value"}

When you have a final answer, reply:
Thought: [summary]
Final Answer: [your response]

Dify's engine parses the LLM's output, extracts tool call instructions, executes the tools, appends results to conversation history, then continues reasoning.

Knowledge Base Vectorization Pipeline

Complete process from document upload to searchable state:

# Simplified document processing pipeline
class DocumentProcessor:
    def process(self, file: File, config: IndexingConfig) -> Dataset:
        # 1. Document parsing: Convert PDF/Word/HTML to plain text
        text = self.extract_text(file)
        
        # 2. Text cleaning: Remove extra whitespace, special characters
        cleaned_text = self.clean_text(text)
        
        # 3. Text chunking: Split according to configured strategy
        chunks = self.split_text(cleaned_text, config.chunk_size, config.chunk_overlap)
        
        # 4. Vectorization: Batch call Embedding model
        # Note: Batch processing to reduce API call count
        embeddings = []
        for batch in self.batch(chunks, size=100):
            batch_embeddings = embedding_model.encode(batch)
            embeddings.extend(batch_embeddings)
        
        # 5. Storage: Write to vector database
        for chunk, embedding in zip(chunks, embeddings):
            vector_db.insert({
                "text": chunk.text,
                "embedding": embedding,
                "metadata": {
                    "document_id": file.id,
                    "chunk_index": chunk.index,
                    "word_count": len(chunk.text.split())
                }
            })

Key performance numbers:

Embedding processing time for 1,000 text chunks: ~30-60 seconds (text-embedding-3-small)
Vector database write throughput: ~500-1,000 vectors/second (Weaviate)
Retrieval latency: 5-20ms (vector similarity calculation) + 50-200ms (network + database)

Multi-Knowledge-Base Retrieval Merging Strategy

When an application links multiple knowledge bases, Dify's retrieval merging process:

def multi_dataset_retrieval(query: str, datasets: list[Dataset], config: RetrievalConfig):
    all_results = []
    
    # Independently retrieve from each knowledge base
    for dataset in datasets:
        results = dataset.search(
            query=query,
            top_k=config.top_k,
            mode=config.retrieval_mode
        )
        all_results.extend(results)
    
    # Deduplication (same chunk from a document may be returned from multiple KBs)
    unique_results = deduplicate(all_results)
    
    # If reranking is enabled
    if config.reranking_enable:
        # Use Reranker model to re-score all results
        reranked = reranker_model.rerank(query=query, documents=unique_results)
        return reranked[:config.top_k]
    else:
        # Sort by similarity score, take top_k
        return sorted(unique_results, key=lambda x: x.score, reverse=True)[:config.top_k]

Level 4: Production Pitfalls and Decision Making (Expert Perspective)

Pitfall 1: Variable Reference Errors in Workflows

The most common workflow error: Variable not found: {{node_x.output.field}}.

Typical causes:

Wrong node ID: Dify's auto-generated node IDs like llm-12345 are easy to mistype manually
Field name doesn't exist: LLM node output is text, not content or output
Variable unreachable due to conditional branch: Using a variable after IF/ELSE that was only defined in one branch

Debugging approach: Enable "Step Debug" mode. Check the actual inputs and outputs of each node in Dify's debug panel.

Variable naming conventions (to reduce errors):

# Good node naming habits
✓ extract_customer_info  → Output: {{extract_customer_info.output.name}}
✓ search_policy_docs     → Output: {{search_policy_docs.output.results}}
✗ node1                  → Output: {{node1.output.xxx}} (easy to confuse)

Pitfall 2: Agent Infinite Loop Risk

Agent can fall into loops in certain scenarios: tool call fails → retry → fails again → retry again...

Dify's default protection: maximum 5 iterations. But 5 failed iterations have a cost: consuming large amounts of tokens with no useful output.

Production recommendations:

Set timeout for tool calls (configurable for HTTP tools)
Clearly specify in tool descriptions when the tool should NOT be called
Set stricter rate limits for Agent applications

# Recommended Dify Agent configuration
max_iterations: 5    # Maximum iterations (don't set too high)
tools:
  - name: web_search
    description: |
      Search the internet for real-time information.
      Only call this in these situations:
      1. Need real-time data (prices, news, etc.)
      2. User explicitly requests a search
      Do NOT call this tool when the internal knowledge base has an answer.

Pitfall 3: Missing Knowledge Base Version Control

A common enterprise problem: regulations are updated, but the knowledge base still contains the old version. Worse, new and old documents coexist, and AI answers mix information from both versions.

Correct knowledge base update process:

1. Don't directly modify existing documents
   → Instead: Upload new version documents, mark old documents as "archived"

2. Use document metadata to record versions
   → Add metadata on upload: {"version": "2024-Q1", "effective_date": "2024-01-01"}

3. Filter expired documents during retrieval
   → Configure knowledge base with metadata filter: version == "latest"

4. Periodically clean up archived documents
   → Delete old versions from knowledge base once confirmed no longer needed

Pitfall 4: Common Module Selection Mistakes

Mistake 1: Using Chat Assistant for what should be a Workflow

A company implemented contract review as a Chat Assistant — users manually paste contract content and AI analyzes it through dialogue. Problems:

Insufficient context capacity (large contracts > 128K tokens)
Cannot analyze different sections in parallel
Inconsistent result format, difficult to process downstream

Correct solution: Workflow (file input → chunk processing → parallel analysis → result aggregation)

Mistake 2: Using Agent for what should be a Workflow

A team used an Agent for "sales report generation." Results:

Different execution paths each time, inconsistent result formats
Sometimes Agent decided to skip certain steps
Difficult to debug and reproduce issues

Correct solution: Workflow (fixed process, predictable results)

Selection mnemonic:

Fixed process, results must be predictable → Workflow
Free interaction, user-driven → Chat Assistant
Autonomous decisions, diverse tools → Agent
Conversation with process constraints → Chatflow

Final Summary Diagram of Module Relationships

┌─────────────────────────────────────────────────────────────────┐
│                      Dify Application Layer                     │
│                                                                 │
│  ┌─────────────┐  ┌─────────────┐  ┌───────────┐  ┌─────────┐  │
│  │    Chat     │  │  Workflow   │  │ Chatflow  │  │  Agent  │  │
│  │  Assistant  │  │Fixed Pipeline│  │ Both caps │  │Autonomous│  │
│  └──────┬──────┘  └──────┬──────┘  └─────┬─────┘  └────┬────┘  │
└─────────┼────────────────┼───────────────┼─────────────┼───────┘
          │                │               │             │
          └────────────────┴───────────────┴─────────────┘
                                    │
              ┌─────────────────────┼─────────────────────┐
              ↓                     ↓                     ↓
    ┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐
    │  Knowledge Base  │  │     Models       │  │     Tools        │
    │ Docs + Vectors   │  │ LLM + Embedding  │  │ Search/Calc/API  │
    └──────────────────┘  └──────────────────┘  └──────────────────┘

Chapter Summary

Dify's four core modules form a complete AI application development ecosystem: the application layer (Chat Assistant/Workflow/Agent), the knowledge layer (Knowledge Base), the model layer (LLM + Embedding), and the tool layer (external services).

Key Takeaways:

Choosing the right application type is critical: Fixed processes use Workflow, free interaction uses Chat Assistant, autonomous decisions use Agent
Knowledge base is a shared resource: One knowledge base can be used by multiple applications simultaneously; updating the knowledge base affects all applications using it
Variable system is the skeleton of data flow: Understanding variable scope is the key to debugging workflow problems
Agent's inherent unpredictability: Agent execution paths are not predictable — build fallback handling for production scenarios
Hybrid retrieval is best practice: For most knowledge Q&A scenarios, hybrid retrieval + reranking significantly outperforms single-mode retrieval

The next chapter enters hands-on territory: building your first AI application from scratch, covering the complete process from requirements analysis to going live.

Rate this chapter

4.6 / 5 (91 ratings)

Core Concepts: App Types, Workflows, Knowledge Base and Agent Relationships

Chapter 2: Core Concepts Overview — Applications, Workflows, Knowledge Bases, and Agent Relationships

Chapter Overview

Level 1: Foundational Understanding (1-3 Years Experience)

Dify's Four Core Modules

Five Application Types Explained

1. Chat Assistant

2. Text Generator

3. Workflow

4. Chatflow

5. Agent (Intelligent Agent)

What Is the Knowledge Base?

Understanding All Four Components Through One Scenario

Level 2: Mechanism Deep Dive (3-5 Years Experience)

Technical Differences Behind Application Types

Workflow Node Types in Detail

Knowledge Base Retrieval Configuration in Detail

Vector Search

Full-text Search

Hybrid Search

Agent's ReAct Reasoning Mechanism

Variable System: The Connective Tissue Between Modules

Level 3: Source Code and Principles (5+ Years Experience)

Workflow Engine Internal Implementation

Agent Engine Implementation: From ReAct to Function Calling

Knowledge Base Vectorization Pipeline

Multi-Knowledge-Base Retrieval Merging Strategy

Level 4: Production Pitfalls and Decision Making (Expert Perspective)

Pitfall 1: Variable Reference Errors in Workflows

Pitfall 2: Agent Infinite Loop Risk

Pitfall 3: Missing Knowledge Base Version Control

Pitfall 4: Common Module Selection Mistakes

Final Summary Diagram of Module Relationships

Chapter Summary

💬 Comments