Core Concepts: App Types, Workflows, Knowledge Base and Agent Relationships
Chapter 2: Core Concepts Overview โ Applications, Workflows, Knowledge Bases, and Agent Relationships
Before touching any configuration, build a mental map of Dify's concepts โ understanding the boundaries and relationships of these four core modules is the prerequisite for using Dify effectively.
Chapter Overview
Many people approach Dify by clicking around the interface, exploring whatever they see, and searching the documentation when stuck. This approach solves immediate problems but often leads to confusion about "why is this feature here?" and uncertainty about which module to use in complex scenarios.
This chapter aims to give you a conceptual map of Dify. We'll systematically examine Dify's four core modules: Application Types, Workflow, Knowledge Base, and Agent, along with their relationships and appropriate use boundaries.
By the end of this chapter, you will be able to:
- Clearly distinguish between Dify's five application types and their appropriate use cases
- Understand the fundamental difference between Workflows and Chat Assistants
- Know what role the Knowledge Base plays in the overall system
- Understand Agent reasoning mechanisms and usage limitations
- Quickly determine which module to use when facing new requirements
Level 1: Foundational Understanding (1-3 Years Experience)
Dify's Four Core Modules
All of Dify's functionality can be organized into four core modules that are both independent and interdependent:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Application Layer โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโ โ
โ โ Chat โ โ Text โ โWorkflow โ โ Agent โ โ
โ โAssistant โ โGenerator โ โ โ โ โ โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ calls โ calls
โโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Knowledge Base (RAG) โ โ Models โ
โ Doc Storage + Vectors โ โ LLM + Embedding + Rerank โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
This structure reveals a key fact: Knowledge bases and models are infrastructure; applications are the upper layer that uses this infrastructure. Whether it's a Chat Assistant, Workflow, or Agent, all can call the same knowledge base and use different models.
Five Application Types Explained
1. Chat Assistant
The most commonly used type. Users engage in multi-turn conversations with AI that remembers conversation history.
Core characteristics:
- Has conversation history (Memory)
- Question-and-answer format, waiting for user input each time
- Supports linking to knowledge bases (RAG)
- Supports tool calls (Tools)
Typical use cases: Customer service bots, personal assistants, professional consulting (legal, medical domain Q&A)
A practical configuration example:
Application Type: Chat Assistant
System Prompt: You are a professional legal consulting assistant, focused on Chinese labor law...
Linked Knowledge Base: Labor Law Database (contains Labor Contract Law, Labor Dispute Mediation and Arbitration Law, etc.)
Model: GPT-4o
Conversation History Rounds: 10 (retain most recent 10 rounds)
2. Text Generator
Single input, single output. No conversation history โ each request is independent.
Core characteristics:
- No conversation history
- Usually has fixed input forms (e.g., "article topic," "word count" variables)
- Suited for batch, standardized content generation
Typical use cases: Bulk SEO article generation, automated product description writing, code comment generation, email template generation
Key difference from Chat Assistant:
- Chat Assistant: User drives conversation direction, back-and-forth interaction
- Text Generator: User fills fixed form, AI outputs according to template
3. Workflow
Multiple processing nodes executed in sequence; each node can be an LLM call, code execution, HTTP request, conditional logic, etc.
Core characteristics:
- Fixed processing pipeline (DAG graph)
- Each node has a different type with a specific role
- Supports conditional branches and loops (Iterator node)
- Clear inputs and outputs
Typical use cases:
- Content moderation pipeline: Receive article โ Check violations โ Auto-fix โ Human review โ Publish
- Customer service tiering: Receive question โ Classify (product/technical/complaint) โ Route to appropriate knowledge base โ Generate answer
- Data analysis report: Receive data โ Clean data โ Call LLM for analysis โ Formatted output
Workflow vs. Chat Assistant selection principle:
- Fixed process, clearly defined steps โ Workflow
- Free interaction, user-driven โ Chat Assistant
4. Chatflow
A hybrid of Workflow and Chat Assistant. Has a fixed processing pipeline but also supports multi-turn conversation.
Core characteristics:
- Has workflow's node orchestration capability
- Also has Chat Assistant's multi-turn conversation memory
- Suited for "conversation with process constraints" scenarios
Typical use cases: Medical consultation assistants (need to collect symptom information following a fixed process, while supporting conversational interaction)
5. Agent (Intelligent Agent)
Provides AI with a toolset and lets AI autonomously decide which tools to call and in what order to complete the user's task.
Core characteristics:
- Autonomous decision-making (AI decides what to do, not you)
- Tool calls (search, calculator, database queries, etc.)
- ReAct loop (Think โ Act โ Observe โ Think...)
- Unpredictability (execution path may differ each time)
Typical use cases: Data analysis agents (letting AI decide which database to query, what calculations to perform), travel planning agents
What Is the Knowledge Base?
The Knowledge Base is Dify's module for storing and retrieving documents. Essentially, it does the following:
- Document Processing: Splits uploaded documents (PDF, Word, Markdown, etc.) into small chunks
- Vectorization: Calls an Embedding model to convert each text chunk into a vector
- Storage: Stores vectors and original text in a vector database
- Retrieval: When an application needs it, vectorizes the user's question and finds the most similar text chunks
The Knowledge Base itself is not an application โ it's a resource called by applications. A single knowledge base can be shared by multiple applications.
Understanding All Four Components Through One Scenario
Suppose you're building an AI system for a law firm:
Requirements Analysis:
โโโ Lawyers' daily consultation assistant โ Chat Assistant (multi-turn dialogue, linked to legal knowledge base)
โโโ Contract auto-review โ Workflow (receive contract โ clause extraction โ risk analysis โ generate report)
โโโ Case summary generation โ Text Generator (input case description, output structured summary)
โโโ Legal research assistant โ Agent (autonomous search of legal databases, case databases, integrated analysis)
โโโ Legal Knowledge Base โ Knowledge Base (stores all regulations, contract templates, case documents)
The same "Legal Knowledge Base" is used simultaneously by the Chat Assistant, Workflow, and Agent. This is the value of modularity.
Level 2: Mechanism Deep Dive (3-5 Years Experience)
Technical Differences Behind Application Types
From a technical implementation perspective, the core differences among the five application types lie in state management approach and execution control flow:
| Application Type | State Management | Execution Control | Termination Condition |
|---|---|---|---|
| Chat Assistant | DB-stored conversation history | Single LLM call | Ends after one response |
| Text Generator | Stateless | Single LLM call | Ends after one response |
| Workflow | Variable passing between nodes | DAG sequential execution | Reaches termination node |
| Chatflow | Conversation history + node variables | DAG + conversation state | Reaches response node |
| Agent | Tool call result caching | ReAct loop | Max iterations reached or task complete |
Workflow Node Types in Detail
Workflow supports these node types in Dify v0.10+:
Node Types
โโโ Basic Nodes
โ โโโ Start: Workflow entry point, defines input variables
โ โโโ End: Workflow exit point, defines output content
โ โโโ Answer: In Chatflow, replies directly to user
โโโ LLM Nodes
โ โโโ LLM: Calls large language model
โ โโโ Knowledge Retrieval: Retrieves from knowledge base
โโโ Data Processing Nodes
โ โโโ Code: Executes Python/JavaScript code
โ โโโ Template Transform: Jinja2 template rendering
โ โโโ Variable Aggregator: Merges multiple variables
โโโ Flow Control Nodes
โ โโโ IF/ELSE: Selects execution path based on condition
โ โโโ Iteration: Performs same operation on each list element
โ โโโ Parameter Extractor: Extracts structured data from text
โโโ External Data Nodes
โโโ HTTP Request: Calls external APIs
โโโ Tool: Calls Dify built-in or custom tools
Important detail: Code node sandbox mechanism
Code nodes execute Python/JS code but have strict restrictions:
- Cannot access the file system
- Cannot import third-party libraries (standard library only)
- Execution timeout: 5 seconds (configurable, max 60 seconds)
- Memory limit: 256MB
# Code node example: Extract amounts from text
def main(text: str) -> dict:
import re
# Match amount formats (e.g., $1,234.56 or 1234.56 USD)
pattern = r'\$?\d{1,3}(?:,\d{3})*(?:\.\d{2})?(?:\s*USD)?'
amounts = re.findall(pattern, text)
return {
"amounts": amounts,
"count": len(amounts)
}
Knowledge Base Retrieval Configuration in Detail
Knowledge base retrieval has three modes โ understanding their differences is critical:
Vector Search
Converts the query into a vector and finds the most similar document chunks via cosine similarity.
Advantages: Strong semantic understanding โ "Apple phone" and "iPhone" can match Disadvantages: Weak at exact keyword matching โ poor for specific numbers and proper nouns Best for: Semantically fuzzy questions, like "When does my contract expire?"
Full-text Search
BM25 algorithm-based keyword retrieval.
Advantages: Exact keyword matching โ great for numbers, codes, proper nouns Disadvantages: No semantic understanding โ "phone" won't match "iPhone" Best for: Precise queries, like "clauses in contract SH-2024-001"
Hybrid Search
Runs both vector search and full-text search simultaneously, merging results using a weighted algorithm (such as RRF, Reciprocal Rank Fusion).
Advantages: Balances semantic and exact matching Disadvantages: Running two retrievals simultaneously has higher performance overhead Best for: Most real-world scenarios (Dify's recommended default)
Configuration recommendation:
# Typical hybrid retrieval configuration
retrieval_mode: hybrid
vector_weight: 0.7 # Vector retrieval weight 70%
keyword_weight: 0.3 # Keyword retrieval weight 30%
top_k: 5 # Recall top 5 chunks
score_threshold: 0.4 # Similarity threshold (discard chunks below this)
reranking_enable: true # Enable reranking (recommended)
reranking_model: bge-reranker-v2-m3 # Reranking model
Agent's ReAct Reasoning Mechanism
The ReAct (Reasoning + Acting) pattern used by Agent is the key to understanding Agent behavior:
User question: "Check the latest GPT-4 pricing and calculate the cost of 10,000 calls"
Round 1 of Reasoning:
Thought: I need to check the latest GPT-4 pricing first
Action: Call web_search tool, search "GPT-4 API pricing 2024"
Observation: Results show GPT-4o input $5/1M tokens, output $15/1M tokens
Round 2 of Reasoning:
Thought: I have the pricing data, now I need to calculate costs
Action: Call calculator tool, calculate cost for 10,000 calls
Observation: Assuming average 500 input + 200 output tokens per call...
Round 3 of Reasoning:
Thought: Calculation complete, can provide final answer
Final Answer: Based on current pricing, 10,000 calls would cost approximately...
Agent's maximum iteration count defaults to 5. This means if the task cannot be completed within 5 tool calls, Agent forcibly terminates and provides whatever information it has gathered.
Variable System: The Connective Tissue Between Modules
Dify's variable system is the key to understanding data flow between modules:
Variable Scopes
โโโ System Variables
โ โโโ {{sys.user_id}} โ Current user ID
โ โโโ {{sys.app_id}} โ Application ID
โ โโโ {{sys.conversation_id}} โ Conversation ID
โโโ Application Variables
โ โโโ {{variable_name}} defined in "Prompt"
โโโ Conversation Variables
โ โโโ Variables persisting across turns (Chatflow only)
โโโ Workflow Variables
โโโ Node outputs: {{node_id.output.field}}
A common variable usage example (in a workflow):
Node 1 (Parameter Extractor) โ Output: {{extract.customer_name}}, {{extract.issue_type}}
โ
Node 2 (IF/ELSE) โ Condition: {{extract.issue_type}} == "technical issue"
โ โ
Node 3a (Technical KB Retrieval) Node 3b (Support KB Retrieval)
โ โ
Node 4 (LLM Answer Generation) โ Merges retrieval results from both paths
Level 3: Source Code and Principles (5+ Years Experience)
Workflow Engine Internal Implementation
Dify's workflow engine lives in api/core/workflow/, with the core class being WorkflowEngineManager.
Graph storage format (JSON structure in PostgreSQL):
{
"nodes": [
{
"id": "node_start",
"type": "start",
"data": {
"variables": [
{"variable": "user_query", "type": "string", "required": true}
]
},
"position": {"x": 100, "y": 200}
},
{
"id": "node_llm_1",
"type": "llm",
"data": {
"model": {"provider": "openai", "name": "gpt-4o", "mode": "chat"},
"prompt_template": [
{"role": "system", "text": "You are an assistant"},
{"role": "user", "text": "{{#node_start.user_query#}}"}
]
},
"position": {"x": 400, "y": 200}
}
],
"edges": [
{
"id": "edge_1",
"source": "node_start",
"target": "node_llm_1",
"sourceHandle": "source",
"targetHandle": "target"
}
]
}
Note the variable reference format: {{#node_id.output_field#}} โ this differs from {{variable}} in Prompts. The former is an internal workflow variable reference; the latter is an application-level variable.
Event-driven model for workflow execution:
# Simplified workflow execution event stream
class WorkflowRunState:
def __init__(self):
self.node_run_results: dict[str, NodeRunResult] = {}
self.total_tokens: int = 0
self.start_at: datetime = datetime.utcnow()
# Events emitted as each node executes
class NodeRunEvent:
class NodeRunStarted(Event):
node_id: str
node_type: str
class NodeRunSucceeded(Event):
node_id: str
outputs: dict
elapsed_time: float
class NodeRunFailed(Event):
node_id: str
error: str
This event-driven design lets the frontend receive each node's execution status in real-time via SSE (Server-Sent Events), creating the "nodes lighting up one by one" effect you see in the Dify interface.
Agent Engine Implementation: From ReAct to Function Calling
Dify's Agent engine supports two strategies, automatically selected based on model capabilities:
Strategy 1: Function Calling (for models like GPT-4, Claude 3+ that support tool calls)
# Using OpenAI Function Calling
response = openai.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=[
{
"type": "function",
"function": {
"name": "web_search",
"description": "Search the web for information",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"}
},
"required": ["query"]
}
}
}
],
tool_choice="auto" # Let the model decide whether to call tools
)
Strategy 2: ReAct Prompt (for models that don't support Function Calling)
For models without native tool call support, Dify simulates ReAct behavior through carefully designed Prompts:
You are an AI assistant that can use tools. Available tools:
- web_search(query): Search the web
- calculator(expression): Calculate mathematical expressions
When you need to use a tool, reply in this format:
Thought: [your reasoning process]
Action: tool_name
Action Input: {"param_name": "param_value"}
When you have a final answer, reply:
Thought: [summary]
Final Answer: [your response]
Dify's engine parses the LLM's output, extracts tool call instructions, executes the tools, appends results to conversation history, then continues reasoning.
Knowledge Base Vectorization Pipeline
Complete process from document upload to searchable state:
# Simplified document processing pipeline
class DocumentProcessor:
def process(self, file: File, config: IndexingConfig) -> Dataset:
# 1. Document parsing: Convert PDF/Word/HTML to plain text
text = self.extract_text(file)
# 2. Text cleaning: Remove extra whitespace, special characters
cleaned_text = self.clean_text(text)
# 3. Text chunking: Split according to configured strategy
chunks = self.split_text(cleaned_text, config.chunk_size, config.chunk_overlap)
# 4. Vectorization: Batch call Embedding model
# Note: Batch processing to reduce API call count
embeddings = []
for batch in self.batch(chunks, size=100):
batch_embeddings = embedding_model.encode(batch)
embeddings.extend(batch_embeddings)
# 5. Storage: Write to vector database
for chunk, embedding in zip(chunks, embeddings):
vector_db.insert({
"text": chunk.text,
"embedding": embedding,
"metadata": {
"document_id": file.id,
"chunk_index": chunk.index,
"word_count": len(chunk.text.split())
}
})
Key performance numbers:
- Embedding processing time for 1,000 text chunks: ~30-60 seconds (text-embedding-3-small)
- Vector database write throughput: ~500-1,000 vectors/second (Weaviate)
- Retrieval latency: 5-20ms (vector similarity calculation) + 50-200ms (network + database)
Multi-Knowledge-Base Retrieval Merging Strategy
When an application links multiple knowledge bases, Dify's retrieval merging process:
def multi_dataset_retrieval(query: str, datasets: list[Dataset], config: RetrievalConfig):
all_results = []
# Independently retrieve from each knowledge base
for dataset in datasets:
results = dataset.search(
query=query,
top_k=config.top_k,
mode=config.retrieval_mode
)
all_results.extend(results)
# Deduplication (same chunk from a document may be returned from multiple KBs)
unique_results = deduplicate(all_results)
# If reranking is enabled
if config.reranking_enable:
# Use Reranker model to re-score all results
reranked = reranker_model.rerank(query=query, documents=unique_results)
return reranked[:config.top_k]
else:
# Sort by similarity score, take top_k
return sorted(unique_results, key=lambda x: x.score, reverse=True)[:config.top_k]
Level 4: Production Pitfalls and Decision Making (Expert Perspective)
Pitfall 1: Variable Reference Errors in Workflows
The most common workflow error: Variable not found: {{node_x.output.field}}.
Typical causes:
- Wrong node ID: Dify's auto-generated node IDs like
llm-12345are easy to mistype manually - Field name doesn't exist: LLM node output is
text, notcontentoroutput - Variable unreachable due to conditional branch: Using a variable after IF/ELSE that was only defined in one branch
Debugging approach: Enable "Step Debug" mode. Check the actual inputs and outputs of each node in Dify's debug panel.
Variable naming conventions (to reduce errors):
# Good node naming habits
โ extract_customer_info โ Output: {{extract_customer_info.output.name}}
โ search_policy_docs โ Output: {{search_policy_docs.output.results}}
โ node1 โ Output: {{node1.output.xxx}} (easy to confuse)
Pitfall 2: Agent Infinite Loop Risk
Agent can fall into loops in certain scenarios: tool call fails โ retry โ fails again โ retry again...
Dify's default protection: maximum 5 iterations. But 5 failed iterations have a cost: consuming large amounts of tokens with no useful output.
Production recommendations:
- Set timeout for tool calls (configurable for HTTP tools)
- Clearly specify in tool descriptions when the tool should NOT be called
- Set stricter rate limits for Agent applications
# Recommended Dify Agent configuration
max_iterations: 5 # Maximum iterations (don't set too high)
tools:
- name: web_search
description: |
Search the internet for real-time information.
Only call this in these situations:
1. Need real-time data (prices, news, etc.)
2. User explicitly requests a search
Do NOT call this tool when the internal knowledge base has an answer.
Pitfall 3: Missing Knowledge Base Version Control
A common enterprise problem: regulations are updated, but the knowledge base still contains the old version. Worse, new and old documents coexist, and AI answers mix information from both versions.
Correct knowledge base update process:
1. Don't directly modify existing documents
โ Instead: Upload new version documents, mark old documents as "archived"
2. Use document metadata to record versions
โ Add metadata on upload: {"version": "2024-Q1", "effective_date": "2024-01-01"}
3. Filter expired documents during retrieval
โ Configure knowledge base with metadata filter: version == "latest"
4. Periodically clean up archived documents
โ Delete old versions from knowledge base once confirmed no longer needed
Pitfall 4: Common Module Selection Mistakes
Mistake 1: Using Chat Assistant for what should be a Workflow
A company implemented contract review as a Chat Assistant โ users manually paste contract content and AI analyzes it through dialogue. Problems:
- Insufficient context capacity (large contracts > 128K tokens)
- Cannot analyze different sections in parallel
- Inconsistent result format, difficult to process downstream
Correct solution: Workflow (file input โ chunk processing โ parallel analysis โ result aggregation)
Mistake 2: Using Agent for what should be a Workflow
A team used an Agent for "sales report generation." Results:
- Different execution paths each time, inconsistent result formats
- Sometimes Agent decided to skip certain steps
- Difficult to debug and reproduce issues
Correct solution: Workflow (fixed process, predictable results)
Selection mnemonic:
- Fixed process, results must be predictable โ Workflow
- Free interaction, user-driven โ Chat Assistant
- Autonomous decisions, diverse tools โ Agent
- Conversation with process constraints โ Chatflow
Final Summary Diagram of Module Relationships
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Dify Application Layer โ
โ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโโโ โ
โ โ Chat โ โ Workflow โ โ Chatflow โ โ Agent โ โ
โ โ Assistant โ โFixed Pipelineโ โ Both caps โ โAutonomousโ โ
โ โโโโโโโโฌโโโโโโโ โโโโโโโโฌโโโโโโโ โโโโโโโฌโโโโโโ โโโโโโฌโโโโโ โ
โโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโ
โ โ โ โ
โโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโ
โ โ โ
โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ
โ Knowledge Base โ โ Models โ โ Tools โ
โ Docs + Vectors โ โ LLM + Embedding โ โ Search/Calc/API โ
โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ
Chapter Summary
Dify's four core modules form a complete AI application development ecosystem: the application layer (Chat Assistant/Workflow/Agent), the knowledge layer (Knowledge Base), the model layer (LLM + Embedding), and the tool layer (external services).
Key Takeaways:
- Choosing the right application type is critical: Fixed processes use Workflow, free interaction uses Chat Assistant, autonomous decisions use Agent
- Knowledge base is a shared resource: One knowledge base can be used by multiple applications simultaneously; updating the knowledge base affects all applications using it
- Variable system is the skeleton of data flow: Understanding variable scope is the key to debugging workflow problems
- Agent's inherent unpredictability: Agent execution paths are not predictable โ build fallback handling for production scenarios
- Hybrid retrieval is best practice: For most knowledge Q&A scenarios, hybrid retrieval + reranking significantly outperforms single-mode retrieval
The next chapter enters hands-on territory: building your first AI application from scratch, covering the complete process from requirements analysis to going live.