Chapter 9

Workflow Basics: Node Types, Variable System and Conditional Branching

Chapter 9: Workflow Fundamentals — Node Types, Variable System, and Conditional Branching

Workflows are one of Dify's most powerful capabilities; mastering node types, variable passing, and conditional branching is the foundation for building complex AI automation systems.

Chapter Overview

If a conversational application (chatbot) is about having AI answer questions, then a workflow (Workflow) is about having AI execute tasks. The fundamental difference: a conversation is a single-turn or multi-turn interaction, while a workflow is an ordered, branchable, composable automated process.

Imagine this scenario: a user submits a resume, and the system must automatically: extract the candidate's key information → score against job requirements → if qualified, generate an interview invitation email; otherwise, generate a polite rejection email → send the email. This process involves multiple steps, conditional logic, and external service integration — exactly what workflows are designed for.

This chapter systematically covers:

Core node types in Dify workflows and their purposes
The variable system: how to pass and transform data between nodes
Conditional branching: the correct way to use IF/ELSE logic
Workflow trigger methods: manual, API calls, event-driven

Level 1: Fundamentals (1–3 Years Experience)

1.1 Workflow vs. Conversational Application: When to Use Which?

Feature	Conversational App (Chatbot)	Workflow
Interaction mode	Multi-turn conversation	Single execution (clear start and end)
User perception	Real-time chat interface	Usually runs in background, returns results
Best for	Q&A, consulting, companionship	Batch processing, automation, data processing
Error handling	Model explains errors on its own	Can design try/catch logic
I/O format	Text dialogue	Structured input/output, multiple format support

Typical scenarios for choosing workflows:

Document automatic analysis and summary generation
Information extraction from resumes, contracts, invoices
Multi-step content production (research → writing → review)
Data synchronization with external systems (CRM, databases, email)

1.2 Core Node Types in Dify Workflows

Start Node Every workflow must have exactly one Start node. It defines the workflow's input parameters:

Text: string input
Number: integer or decimal
Paragraph: long text supporting line breaks
Select: choose from predefined options
File: upload a file (PDF, image, etc.)

LLM Node The core node that calls a language model to process text:

Configure model and parameters (Temperature, Max Tokens, etc.)
Write system prompts and user prompts
Can reference output variables from other nodes in prompts

Knowledge Retrieval Node Retrieves relevant content from Dify knowledge bases:

Select knowledge bases to query (multiple selection supported)
Configure retrieval strategy (vector/full-text/hybrid)
Output: list of retrieved document segments

IF/ELSE Node (Conditional Branch) Determines workflow direction based on conditions:

Supports multiple comparison operators (equals, contains, greater than, etc.)
Supports AND/OR logic combinations
Can have multiple IF branches plus one ELSE branch

Code Node Executes Python or JavaScript code:

Ideal for data transformation, formatting, computation
Can call standard libraries
Cannot access external networks (security restriction)

HTTP Request Node Calls external APIs:

Supports GET/POST/PUT/DELETE and other methods
Configurable headers and request body
Response automatically parsed as JSON or text

End Node Defines the workflow's output:

Specify which variables become the final output
A workflow can have multiple End nodes (one per branch)

1.3 Variable System: Data Flowing Between Nodes

Variable reference syntax: {{node_name.output_variable_name}}

Examples:

{{start.user_query}} — the user_query input from the Start node
{{llm_1.text}} — text output from the LLM node named "llm_1"
{{knowledge_1.result}} — retrieval results from the Knowledge Retrieval node

Using variables in LLM node prompts:

You are a resume analysis assistant.

Candidate information:
{{start.resume_text}}

Job requirements:
{{start.job_requirements}}

Analyze whether the candidate meets the job requirements.
Output the following JSON format:
{
  "score": integer score from 0 to 100,
  "strengths": ["strength 1", "strength 2"],
  "gaps": ["gap 1", "gap 2"],
  "recommendation": "recommend/reject"
}

Variable types:

String: text
Number: numeric value
Boolean: true/false
Object: JSON object (obtained from Code or HTTP nodes)
Array: list of values
File: file object

1.4 Building Your First Workflow: Resume Analyzer

Goal: User submits resume text → AI analyzes → outputs score and recommendations

Steps:

Create new workflow: Dify → Studio → Create Application → Workflow
Configure Start node:
- Input variable 1: resume_text (paragraph type, required)
- Input variable 2: job_title (text type, required)

Add LLM node:

Select model (e.g., GPT-4o mini)
System prompt: You are a professional HR assistant skilled at resume analysis

User prompt:

Analyze the following resume and assess the candidate's fit for the "{{start.job_title}}" role.

Resume:
{{start.resume_text}}

Output JSON format only (no other text):
{"score": number, "highlights": ["highlight"], "concerns": ["concern"], "verdict": "recommend/reject"}

Add Code node (parse LLM's JSON output):

import json

def main(llm_output: str) -> dict:
    # Clean possible markdown code blocks
    clean = llm_output.strip()
    if clean.startswith("```"):
        clean = clean.split("```")[1]
        if clean.startswith("json"):
            clean = clean[4:]

    result = json.loads(clean.strip())
    return {
        "score": result["score"],
        "verdict": result["verdict"],
        "highlights": "\n".join(result["highlights"]),
        "concerns": "\n".join(result["concerns"])
    }

Add End node: output score, verdict, highlights, concerns
Test: click "Run" in the top right, fill in test data

Level 2: Mechanisms in Depth (3–5 Years Experience)

2.1 Advanced Uses of Conditional Branching

Basic condition (single condition):

code_node.score > 60 → take "qualified" branch
                      otherwise → take "unqualified" branch

Compound condition (AND logic):

code_node.score > 60 AND start.years_of_experience > 3
→ take "high-quality candidate" branch

Multiple branches (IF / ELSE IF / ELSE):

IF   code_node.score >= 80  → Immediately invite for interview
ELIF code_node.score >= 60  → Add to candidate pool
ELIF code_node.score >= 40  → Send thank-you note
ELSE                        → Reject directly

In Dify workflows: each IF/ELIF branch connects to different subsequent node chains, forming true multi-path execution.

Technique for conditions involving arrays or objects:

# Pre-process complex conditions in a Code node
def main(analysis_result: dict) -> dict:
    score = analysis_result["score"]
    has_required_skills = all(
        skill in analysis_result.get("skills", [])
        for skill in ["Python", "Machine Learning"]
    )

    return {
        "final_tier": (
            "tier_1" if score >= 80 and has_required_skills else
            "tier_2" if score >= 60 else
            "tier_3"
        )
    }

Then branch on code_node.final_tier — avoid writing complex logic directly in IF/ELSE nodes.

2.2 Variable Lifecycle and Scope

Dify workflow variables have explicit scope rules:

Rule 1: Variables can only reference already-executed upstream nodes

Workflows execute according to DAG (directed acyclic graph) topological order. If node B hasn't executed yet, node C cannot reference node B's output.

Rule 2: Variables inside a branch are invisible outside that branch

IF branch A → LLM_A  (this node's output is only available inside branch A)
ELSE branch B → LLM_B (this node's output is only available inside branch B)
         ↓
Merge node (cannot directly reference LLM_A or LLM_B output)

Solution: At the end of each branch, normalize results to a standard variable name, then reference it at the merge node.

# Code node in branch A
def main(lm_a_output: str) -> dict:
    return {"result": lm_a_output, "branch": "A"}

# Code node in branch B
def main(lm_b_output: str) -> dict:
    return {"result": lm_b_output, "branch": "B"}

# Downstream node connects after both branches converge
# References code_a.result or code_b.result appropriately

Rule 3: Variables inside loop nodes are independent for each iteration

(Covered in detail in Chapter 10 on Loop nodes)

2.3 Workflow Trigger Methods in Depth

Method 1: Manual trigger in Dify interface

Best for: testing, internal tools
Click "Run" in the workflow editor and fill in input parameters

Method 2: API call trigger

# Synchronous call (wait for workflow to complete, get final output)
curl -X POST 'https://api.dify.ai/v1/workflows/run' \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "inputs": {
      "resume_text": "John Smith, 5 years Python development experience...",
      "job_title": "Senior Backend Engineer"
    },
    "response_mode": "blocking",
    "user": "user-001"
  }'

Response:

{
  "workflow_run_id": "run-xxxxx",
  "task_id": "task-xxxxx",
  "data": {
    "status": "succeeded",
    "outputs": {
      "score": 85,
      "verdict": "recommend",
      "highlights": "Proficient in Python\nRich system design experience"
    },
    "elapsed_time": 3.24,
    "total_tokens": 1250
  }
}

Streaming call (real-time progress):

import requests
import json

def run_workflow_streaming(inputs: dict, api_key: str):
    response = requests.post(
        "https://api.dify.ai/v1/workflows/run",
        headers={"Authorization": f"Bearer {api_key}"},
        json={
            "inputs": inputs,
            "response_mode": "streaming",
            "user": "user-001"
        },
        stream=True
    )

    for line in response.iter_lines():
        if line.startswith(b"data: "):
            event = json.loads(line[6:])

            if event["event"] == "node_started":
                print(f"Node started: {event['data']['title']}")
            elif event["event"] == "node_finished":
                elapsed = event['data']['elapsed_time']
                print(f"Node finished: {event['data']['title']}, took {elapsed:.2f}s")
            elif event["event"] == "workflow_finished":
                print(f"Workflow complete. Outputs: {event['data']['outputs']}")
                break

Method 3: Webhook trigger

Configure the workflow's Webhook URL so external systems (GitHub, Slack, enterprise messaging) can automatically trigger the workflow when specific events occur.

Method 4: Scheduled trigger

Dify does not natively support scheduled triggers, but external schedulers (cron, Celery, n8n) can periodically call the workflow API:

# Trigger daily report workflow at 8 AM
# crontab: 0 8 * * * python /path/to/trigger_daily_report.py

import requests
from datetime import datetime

def trigger_daily_report():
    response = requests.post(
        "https://api.dify.ai/v1/workflows/run",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json={
            "inputs": {
                "report_date": datetime.now().strftime("%Y-%m-%d"),
                "report_type": "daily_summary"
            },
            "response_mode": "blocking",
            "user": "scheduler"
        }
    )
    return response.json()

2.4 Workflow Input Validation and Error Handling

Input validation best practice:

Add a validation node (Code node) immediately after the Start node to check input validity:

def main(resume_text: str, job_title: str) -> dict:
    errors = []

    if not resume_text or len(resume_text.strip()) < 100:
        errors.append("Resume content too short (minimum 100 characters)")

    if len(resume_text) > 10000:
        errors.append("Resume content too long (maximum 10,000 characters)")

    if not job_title or len(job_title.strip()) == 0:
        errors.append("Job title cannot be empty")

    return {
        "is_valid": len(errors) == 0,
        "error_message": "; ".join(errors) if errors else "",
        "resume_length": len(resume_text)
    }

Then use conditional branching:

validation.is_valid == true → continue normal flow
validation.is_valid == false → end immediately, output error message

Workflow-level error handling (Dify v0.10+):

Enable "Exception Handling" in workflow settings to configure per-node behavior:

On error: continue execution (skip the node)
On error: stop the entire workflow
On error: take a fallback branch

Level 3: Source Code and Principles (5+ Years Experience)

3.1 Workflow Execution Engine: DAG Scheduling Principles

Dify's workflow execution is based on topological sorting of a DAG (Directed Acyclic Graph). Core data structure:

@dataclass
class WorkflowGraph:
    nodes: dict[str, WorkflowNode]  # node_id -> node
    edges: list[tuple[str, str]]    # (source_node_id, target_node_id)

    def get_execution_order(self) -> list[str]:
        """Topological sort to determine node execution order"""
        in_degree = {node_id: 0 for node_id in self.nodes}

        for source, target in self.edges:
            in_degree[target] += 1

        # BFS topological sort
        queue = [nid for nid, deg in in_degree.items() if deg == 0]
        order = []

        while queue:
            node_id = queue.pop(0)
            order.append(node_id)

            for source, target in self.edges:
                if source == node_id:
                    in_degree[target] -= 1
                    if in_degree[target] == 0:
                        queue.append(target)

        return order

Key source locations:

api/core/workflow/workflow_engine_manager.py: workflow engine entry point
api/core/workflow/graph_engine/: graph execution engine
api/core/workflow/nodes/: implementations for each node type

Execution engine core logic (simplified):

class GraphEngine:
    def run(self, graph: WorkflowGraph, inputs: dict) -> dict:
        # Initialize variable pool
        variable_pool = VariablePool(start_variables=inputs)

        # Determine execution path (accounting for conditional branches)
        execution_queue = [graph.start_node_id]
        executed = set()

        while execution_queue:
            node_id = execution_queue.pop(0)
            node = graph.nodes[node_id]

            # Execute node
            result = node.run(variable_pool)
            variable_pool.add(node_id, result.outputs)
            executed.add(node_id)

            # Determine next nodes based on execution result
            next_nodes = self._get_next_nodes(
                graph, node_id, result, variable_pool
            )
            execution_queue.extend(next_nodes)

        return variable_pool.get_final_outputs()

3.2 Variable Pool Memory Model

The VariablePool in Dify workflows is a dictionary-based in-memory data structure:

class VariablePool:
    def __init__(self, start_variables: dict):
        # Structure: {node_id: {variable_name: value}}
        self._pool = {
            "sys": {  # System variables
                "workflow_id": "xxx",
                "run_id": "yyy",
                "user_id": "zzz"
            },
            "start": start_variables  # Start node inputs
        }

    def get(self, node_id: str, variable_name: str) -> any:
        """Get variable value"""
        return self._pool.get(node_id, {}).get(variable_name)

    def get_any(self, selector: list[str]) -> any:
        """Get variable via selector path"""
        # selector = ["llm_1", "text"] corresponds to {{llm_1.text}}
        node_id, *path = selector
        value = self._pool.get(node_id, {})
        for key in path:
            if isinstance(value, dict):
                value = value.get(key)
        return value

Variable type conversion: When variables pass between nodes, Dify performs type checking and conversion. If an LLM outputs a string and a downstream Code node expects an integer, automatic conversion is attempted; failure raises a VariableTypeError.

3.3 Condition Evaluation Mechanism

The IF/ELSE node's condition evaluation is based on a rule engine:

class ConditionEvaluator:
    OPERATORS = {
        "eq": lambda a, b: a == b,
        "neq": lambda a, b: a != b,
        "gt": lambda a, b: float(a) > float(b),
        "gte": lambda a, b: float(a) >= float(b),
        "lt": lambda a, b: float(a) < float(b),
        "lte": lambda a, b: float(a) <= float(b),
        "contains": lambda a, b: str(b) in str(a),
        "not_contains": lambda a, b: str(b) not in str(a),
        "starts_with": lambda a, b: str(a).startswith(str(b)),
        "ends_with": lambda a, b: str(a).endswith(str(b)),
        "is_empty": lambda a, _: not a or str(a).strip() == "",
        "is_not_empty": lambda a, _: bool(a) and str(a).strip() != "",
    }

    def evaluate(self, condition: Condition, variable_pool: VariablePool) -> bool:
        left_value = variable_pool.get_any(condition.left_selector)
        evaluator = self.OPERATORS.get(condition.operator)
        if not evaluator:
            raise ValueError(f"Unknown operator: {condition.operator}")
        return evaluator(left_value, condition.right_value)

    def evaluate_group(self, group: ConditionGroup, variable_pool) -> bool:
        results = [self.evaluate(c, variable_pool) for c in group.conditions]
        if group.logic == "AND":
            return all(results)
        elif group.logic == "OR":
            return any(results)

Important limitation: Condition evaluation runs at the Python layer and does not support regular expressions. Complex string matching should be converted to boolean values in a Code node first, then use eq: true for the conditional check.

Level 4: Production Pitfalls and Decision-Making (Expert Perspective)

4.1 Pitfall 1: Silent Errors from Variable Name Conflicts

Problem: When multiple nodes output variables with the same name, the later node's variable silently overwrites the earlier one.

Wrong example:

Node "llm_analysis" outputs variable result (string)
Node "code_parser" also outputs variable result (dict)
Downstream nodes referencing result — which one do they get?

Correct approach:

Use meaningful, unique variable names per node
Examples: analysis_text, parsed_json rather than both being named result
In Code nodes, return keys that describe content, not generic names

4.2 Pitfall 2: LLM JSON Parsing Failures

Frequent problem: LLM is asked to output JSON but actually outputs content with markdown code blocks or extra text:

# Actual LLM output (not pure JSON):
Sure, here is the analysis:
```json
{"score": 85, "verdict": "recommend"}

I hope this helps!


Direct `json.loads()` will fail.

**Robust JSON extraction function**:

```python
import json
import re

def extract_json(llm_output: str) -> dict:
    """Extract JSON from LLM output, handling various formats"""
    # Method 1: Direct parse (ideal case)
    try:
        return json.loads(llm_output.strip())
    except json.JSONDecodeError:
        pass

    # Method 2: Extract JSON from markdown code block
    code_block_pattern = r'```(?:json)?\s*([\s\S]*?)\s*```'
    matches = re.findall(code_block_pattern, llm_output)
    if matches:
        try:
            return json.loads(matches[0])
        except json.JSONDecodeError:
            pass

    # Method 3: Find first complete JSON object with regex
    json_pattern = r'\{[^{}]*(?:\{[^{}]*\}[^{}]*)*\}'
    matches = re.findall(json_pattern, llm_output, re.DOTALL)
    for match in matches:
        try:
            return json.loads(match)
        except json.JSONDecodeError:
            continue

    raise ValueError(f"Cannot extract JSON from LLM output: {llm_output[:200]}")

Root solution: Use models that support Structured Output (GPT-4o, Claude 3.5 Sonnet), and enable JSON Schema output mode in the Dify node configuration — eliminating format errors at the source.

4.3 Pitfall 3: Unhandled Workflow Timeouts

Problem: A node in the workflow (usually an external HTTP request or large file processing) times out, the entire workflow hangs, and the API caller waits indefinitely.

Dify timeout configuration:

Each HTTP Request node can set its own timeout (recommend 30s)
LLM node timeout is indirectly controlled via max_tokens (fewer tokens = faster)
Workflow global timeout defaults to 200s, adjustable in workflow settings

Client-side timeout handling:

import requests
from requests.exceptions import Timeout

def safe_run_workflow(inputs: dict, timeout: int = 60) -> dict:
    try:
        response = requests.post(
            "https://api.dify.ai/v1/workflows/run",
            headers={"Authorization": f"Bearer {API_KEY}"},
            json={"inputs": inputs, "response_mode": "blocking"},
            timeout=timeout  # Client-side timeout
        )
        return response.json()

    except Timeout:
        return {
            "status": "timeout",
            "error": f"Workflow timed out (>{timeout}s), please try again later"
        }
    except Exception as e:
        return {"status": "error", "error": str(e)}

Async processing approach: For workflows that take more than 60 seconds, use streaming mode so the caller receives real-time progress rather than waiting blindly for the final result.

4.4 Complexity Control: When Should You Split a Workflow?

Don't try to do everything in one workflow. Split when:

Consider splitting when exceeding these thresholds:

More than 20 nodes
Branch depth greater than 4 levels
Single execution takes more than 30 seconds
Multiple business scenarios share the same logic (extract as sub-workflow)

How to split: Dify supports calling another workflow from within a workflow (sub-workflow call), enabling modular reuse.

Main workflow:
  Start → Validate → Call Sub-workflow A → Call Sub-workflow B → Merge output → End

Sub-workflow A (Resume Analysis):
  Complete independent analysis process

Sub-workflow B (Email Generation):
  Complete independent email generation process

This keeps the main workflow clean while sub-workflows can be independently tested and iterated.

Chapter Summary

Workflows are Dify's most flexible capability; mastering them requires understanding several core concepts:

Node selection principle: Use Code nodes for data processing (precise, testable); LLM nodes for semantic understanding; HTTP nodes for external integration; Knowledge Retrieval nodes for search.

Variable system: Use meaningful names, avoid generic ones (result, output); remember that variables inside a branch are invisible outside it.

Conditional branching: Convert complex conditions to boolean values in Code nodes first, then use IF/ELSE nodes; avoid overly complex condition expressions directly in IF/ELSE nodes.

Trigger methods: Use UI for development/testing; API calls for production integration; streaming mode for real-time scenarios.

Key checklist:

Workflow inputs have a validation node (length, format, required fields)
LLM JSON output has robust parsing code
HTTP nodes have timeout configured
Conditional branches cover all possible paths (including error cases)
Variables are semantically named with no naming conflicts
Workflow API callers implement timeout and error handling

Rate this chapter

4.7 / 5 (37 ratings)