Chapter 13

Agent Architecture: ReAct vs Function Calling vs Plan-and-Execute

Chapter 13: Agent Architecture Principles — ReAct vs Function Calling vs Plan-and-Execute

A deep dive into the three dominant Agent reasoning paradigms — their internal mechanics, ideal use cases, and production trade-offs — so you can choose the right architecture for any business scenario.

Chapter Overview

When we say an AI system is an "Agent," what is it actually doing? How does it decide its next action? How does it decompose a vague user request into a sequence of executable tool calls? The answers lie in three core reasoning paradigms: ReAct (Reasoning + Acting), Function Calling (structured tool invocation), and Plan-and-Execute (plan first, then execute).

This chapter systematically analyzes each paradigm's principles, how Dify implements them, their performance characteristics, and how to make sound architectural choices in production. After reading this chapter, you will be able to:

Understand the reasoning loop of each paradigm
Choose the right paradigm based on task complexity and cost constraints
Diagnose common Agent reasoning failures
Configure and debug different Agent modes in Dify

Level 1: Foundational Knowledge (1–3 Years Experience)

What Is an Agent Reasoning Paradigm?

Think of an AI Agent as an employee. When a manager (the user) assigns a task, the employee must figure out how to complete it. Different employees work differently:

Employee 1 (ReAct style): Thinks and acts alternately. Given "Check Beijing's weather and tell me if I need an umbrella," they think "I need to check the weather," then check, see "Sunny, 25°C," then think "No umbrella needed," and finally give the answer.

Employee 2 (Function Calling style): Tells the system which tool to call, waits for the result, then gives the answer. Like a precise dispatcher: "Call the weather API" → result arrives → instant recommendation.

Employee 3 (Plan-and-Execute style): Plans completely before acting. Given a complex task, they first write a plan: "Step 1: check weather, Step 2: check calendar, Step 3: synthesize, Step 4: advise," then execute step by step.

ReAct Paradigm Introduction

ReAct comes from the 2022 paper "ReAct: Synergizing Reasoning and Acting in Language Models." Its core idea is to interleave Reasoning and Acting in a cycle:

Thought → Action → Observation → Thought → ... → Final Answer

In Dify, when you select a ReAct-mode Agent, the model generates a thought before each tool call, then decides which tool to invoke.

A real ReAct loop example:

User: Analyze Microsoft's recent stock trend and give investment advice.

Thought: I need to get Microsoft's stock data.
Action: search_stock_price({"symbol": "MSFT", "period": "30d"})
Observation: [30-day data returned: price rose from 380 to 420]

Thought: Clear uptrend, ~10.5% gain. I also need recent news to understand why.
Action: search_news({"query": "Microsoft AI 2024"})
Observation: [News: Copilot sales strong, AI service revenue growing]

Thought: I have enough data to give analysis now.
Final Answer: Microsoft's stock rose ~10.5% over 30 days, driven by strong AI (Copilot) revenue...

Function Calling Paradigm Introduction

Function Calling was introduced by OpenAI with GPT-4 and has become an industry standard. Unlike ReAct, the reasoning is more structured — the model outputs a JSON object directly, telling the system which function to call and with what parameters:

{
  "tool_calls": [
    {
      "id": "call_abc123",
      "type": "function",
      "function": {
        "name": "search_stock_price",
        "arguments": "{\"symbol\": \"MSFT\", \"period\": \"30d\"}"
      }
    }
  ]
}

When Dify works with models that support Function Calling (GPT-4, Claude 3), it automatically uses this mode — more efficient and structurally reliable.

Plan-and-Execute Paradigm Introduction

Plan-and-Execute is a two-phase paradigm:

Phase 1 (Planning): A powerful model generates a complete execution plan.

Task: Write a comprehensive competitor analysis report

Plan:
1. Search for competitor companies (use web_search tool)
2. Get detailed info for each competitor (use company_info tool)
3. Compare key metrics (use data_analysis tool)
4. Generate report draft (use write_report tool)
5. Review and polish the report

Phase 2 (Execution): Execute each step in order, potentially using smaller, faster models.

This approach is ideal for complex, multi-step tasks because potential problems can be identified during the planning phase.

Selecting an Agent Type in Dify

When creating an Agent application in Dify, the "Reasoning Mode" setting determines which paradigm is used:

Function Call mode: For GPT-4, Claude 3, and other models with native tool-call support
ReAct mode: Compatible with all models, especially those without Function Calling support
Custom prompt: Advanced users can fully control the reasoning logic

# Dify Agent configuration (app.yaml excerpt)
agent:
  mode: function_call   # or: react
  tools:
    - type: built_in
      tool_name: web_search
    - type: api
      api_name: stock_api
  max_iterations: 10
  early_stopping: true

Level 2: Mechanism Deep Dive (3–5 Years Experience)

How ReAct Works Internally

ReAct relies on a special prompt template that guides the model to output in a specific format:

You are an intelligent assistant with access to the following tools:
{tool_list}

Respond in this format:
Thought: [your analysis of the current situation]
Action: tool_name
Action Input: {parameter_json}
Observation: [tool result — filled in by the system]
... (repeat Thought/Action/Observation as needed)
Final Answer: [your answer to the user]

Dify's ReAct execution loop:

def react_agent_loop(query: str, tools: dict, max_iterations: int = 10):
    messages = [
        {"role": "system", "content": build_react_system_prompt(tools)},
        {"role": "user",   "content": query}
    ]

    for iteration in range(max_iterations):
        # Stop sequences halt the model before "Observation:"
        # so Dify can inject real tool results
        response = llm.call(messages, stop=["\nObservation:", "\nObservation："])
        parsed   = parse_react_output(response.content)

        if parsed["final_answer"]:
            return parsed["final_answer"]

        if parsed["action"]:
            try:
                obs = tools[parsed["action"]].run(parsed["action_input"])
            except Exception as e:
                obs = f"Tool error: {e}"

            messages.append({"role": "assistant", "content": response.content})
            messages.append({"role": "user",      "content": f"Observation: {obs}"})
        else:
            return response.content  # fallback

    return "Max iterations reached"

The stop-sequence trick: Dify passes stop sequences to the LLM, causing it to halt output when it generates "Observation:". This gives Dify the opportunity to inject real tool results before asking the model to continue — the core mechanism that makes ReAct work.

Function Calling Protocol Details

Function Calling follows the OpenAI standard. Tool definition format:

{
  "type": "function",
  "function": {
    "name": "get_current_weather",
    "description": "Get the current weather for a specified city",
    "parameters": {
      "type": "object",
      "properties": {
        "city": {
          "type": "string",
          "description": "City name, e.g. 'Beijing' or 'Shanghai'"
        },
        "unit": {
          "type": "string",
          "enum": ["celsius", "fahrenheit"],
          "description": "Temperature unit"
        }
      },
      "required": ["city"]
    }
  }
}

Parallel Tool Calls:

GPT-4 and Claude 3 support multiple tool calls in a single response — a major advantage over ReAct:

# Model returns multiple tool calls simultaneously
tool_calls = [
    {"id": "call_001", "function": {"name": "get_weather",       "arguments": '{"city":"Beijing"}'}},
    {"id": "call_002", "function": {"name": "get_weather",       "arguments": '{"city":"Shanghai"}'}},
    {"id": "call_003", "function": {"name": "get_exchange_rate", "arguments": '{"from":"USD","to":"CNY"}'}},
]

# Dify executes them in parallel
import asyncio

async def execute_parallel_tools(tool_calls: list) -> list:
    tasks = [
        (call["id"], asyncio.create_task(
            execute_tool(call["function"]["name"],
                         json.loads(call["function"]["arguments"]))
        ))
        for call in tool_calls
    ]
    return [
        {"tool_call_id": cid, "role": "tool", "content": str(await task)}
        for cid, task in tasks
    ]

# Impact: 3 serial calls × 500ms = 1500ms  →  parallel = ~600ms (60% faster)

Plan-and-Execute Dual-Model Architecture

In Dify, Plan-and-Execute is typically implemented via Workflow:

workflow:
  nodes:
    - id: planner
      type: llm
      model: gpt-4          # Strong model for planning
      prompt: |
        You are a task planning expert. Decompose the following task into steps (JSON):
        {user_task}
        Output: {"steps": [{"id": 1, "action": "...", "tool": "...", "input": "..."}]}

    - id: executor
      type: iteration
      iterator: "{{planner.output.steps}}"
      nodes:
        - id: step_exec
          type: tool
          tool_name: "{{item.tool}}"
          tool_input: "{{item.input}}"

    - id: synthesizer
      type: llm
      model: gpt-3.5-turbo  # Smaller model to save cost
      prompt: |
        Based on these execution results, generate the final answer:
        {{executor.outputs}}

Performance Comparison

Production benchmark (GPT-4, 5 tools defined, avg 3 tool calls/task):

Metric	ReAct	Function Calling	Plan-and-Execute
First-response latency	High	Low	Very High
Parallel tool calls	No	Yes (GPT-4/Claude 3)	Yes (planned)
Tokens/task	~2,100	~980	~3,200
Est. cost/task (GPT-4)	$0.042	$0.020	$0.064
Complex task handling	Medium	Medium	Strong
Mid-task plan adjustment	Strong	Weak	Weak
Model compatibility	All models	Requires FC support	Requires strong model

Level 3: Source Code and Principles (5+ Years Experience)

Dify Agent Module Architecture

api/core/agent/
├── base_agent_runner.py          # Abstract base class
├── react/
│   ├── react_agent_runner.py     # ReAct implementation
│   └── react_multi_dataset_router_agent_runner.py
├── fc/
│   ├── fc_agent_runner.py        # Function Calling implementation
│   └── parallel_fc_runner.py     # Parallel FC executor
└── agent_factory.py              # Factory: picks runner based on config

BaseAgentRunner core interface:

from abc import ABC, abstractmethod
from typing import Generator

class BaseAgentRunner(ABC):

    def __init__(self, tenant_id, app_config, model_config, tools, agent_config):
        self.tools    = {t.name: t for t in tools}
        self.max_iter = agent_config.max_iterations or 10

    @abstractmethod
    def run(self, query: str, message, conversation) -> Generator:
        raise NotImplementedError

    def _should_continue(self, iteration: int) -> bool:
        return iteration < self.max_iter and not self._token_usage_exceeded()

    def _invoke_tool(self, name: str, params: dict):
        if name not in self.tools:
            raise ToolNotFoundError(name)
        try:
            result = self.tools[name].invoke(params, timeout=30)
            return ToolResult(success=True, output=result)
        except TimeoutError:
            return ToolResult(success=False, error="Tool timed out")
        except Exception as e:
            return ToolResult(success=False, error=str(e))

ReAct Runner — stop sequences and parsing:

class ReactAgentRunner(BaseAgentRunner):

    STOP_SEQUENCES = ["\nObservation:", "\nObservation：", "\n观察："]

    def run(self, query, message, conversation) -> Generator:
        iteration = 0
        messages  = self._build_initial_messages(query)

        while self._should_continue(iteration):
            iteration += 1
            response  = self._invoke_llm(messages, stop=self.STOP_SEQUENCES)
            parsed    = self._parse(response.content)

            if parsed.is_final_answer:
                yield AgentFinish(output=parsed.final_answer)
                return

            if parsed.tool_call:
                result = self._invoke_tool(parsed.tool_call.name,
                                           parsed.tool_call.input)
                yield AgentStep(thought=parsed.thought,
                                action=parsed.tool_call,
                                observation=result)
                messages = self._append_step(messages, parsed, result)
            else:
                yield AgentFinish(output=response.content)
                return

    def _parse(self, text: str):
        for pat in [r'Final Answer[：:]\s*(.*)',
                    r'最终答案[：:]\s*(.*)']:
            m = re.search(pat, text, re.DOTALL | re.IGNORECASE)
            if m:
                return ReactParsed(is_final_answer=True,
                                   final_answer=m.group(1).strip())

        for pat in [r'Action[：:]\s*(\S+)\nAction Input[：:]\s*(.*)',
                    r'行动[：:]\s*(\S+)\n行动输入[：:]\s*(.*)']:
            m = re.search(pat, text, re.DOTALL)
            if m:
                try:
                    ainput = json.loads(m.group(2).strip())
                except json.JSONDecodeError:
                    ainput = {"input": m.group(2).strip()}
                return ReactParsed(
                    thought=self._extract_thought(text),
                    tool_call=ToolCall(name=m.group(1).strip(), input=ainput)
                )

        return ReactParsed(is_final_answer=True, final_answer=text)

Complete Function Calling Message Flow

# Full conversation history structure for Function Calling
history = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user",   "content": "What's the weather in Beijing and what day is it?"},

    # Model returns tool_calls (no content)
    {
        "role": "assistant",
        "content": None,
        "tool_calls": [
            {"id": "call_abc", "type": "function",
             "function": {"name": "get_weather",      "arguments": '{"city":"Beijing"}'}},
            {"id": "call_def", "type": "function",
             "function": {"name": "get_current_date", "arguments": "{}"}},
        ]
    },

    # Tool results — must match tool_call_id
    {"role": "tool", "tool_call_id": "call_abc",
     "content": '{"weather":"Sunny","temperature":25}'},
    {"role": "tool", "tool_call_id": "call_def",
     "content": '{"date":"2024-03-15","weekday":"Friday"}'},

    # Final model response (plain text)
    {"role": "assistant",
     "content": "Beijing is sunny and 25°C today. It's Friday — great day to go out!"},
]

Token cost measurement:

import tiktoken, json

def measure_fc_token_cost(tools: list, messages: list) -> dict:
    enc          = tiktoken.encoding_for_model("gpt-4")
    tools_tokens = len(enc.encode(json.dumps(tools)))
    msg_tokens   = sum(len(enc.encode(str(m.get("content") or ""))) for m in messages)
    overhead     = len(tools) * 15   # ~15 tokens per function definition (OpenAI overhead)
    total        = tools_tokens + msg_tokens + overhead
    return {
        "tools_tokens":   tools_tokens,
        "messages_tokens": msg_tokens,
        "overhead_tokens": overhead,
        "total":          total,
        "cost_usd":       round(total / 1000 * 0.03, 5),
    }
# Note: 5 tool definitions ≈ 400 tokens — a fixed per-call cost often overlooked!

Deep Comparison: ReAct vs Function Calling

Thought-chain quality:

ReAct's explicit Thought text becomes part of the context, maintaining logical coherence across long tasks. Function Calling has no explicit reasoning chain — the model reasons implicitly through message history, which can lead to logical jumps on complex tasks.

Error recovery:

# ReAct: model reasons about the error explicitly in Thought
"""
Thought: Let me get OpenAI's stock price.
Action: stock_api
Action Input: {"symbol": "OPENAI"}
Observation: Error — symbol OPENAI not found

Thought: OpenAI is private. I should use Microsoft (MSFT) as a proxy
         since Microsoft is the largest OpenAI investor.
Action: stock_api
Action Input: {"symbol": "MSFT"}
"""

# Function Calling: error returned as tool message, model adapts implicitly
{"role": "tool", "tool_call_id": "call_001",
 "content": "Error: Symbol OPENAI not found"}
# The model must infer from this error that it needs to try a different symbol
# — less reliable than ReAct's explicit self-correction

Level 4: Production Pitfalls and Decision-Making (Expert Perspective)

Pitfall 1: ReAct "Hallucinated Tool Names"

The most common ReAct production issue: the model generates a tool name that doesn't exist.

Action: analyze_sentiment_advanced   # only "analyze_sentiment" exists
Action Input: {"text": "..."}

Fix: fuzzy matching with edit distance

from difflib import SequenceMatcher

def validate_and_correct_tool(requested: str, available: dict,
                               threshold: float = 0.8) -> Optional[str]:
    if requested in available:
        return requested

    best_name, best_ratio = None, 0.0
    for name in available:
        ratio = SequenceMatcher(None, requested, name).ratio()
        if ratio > best_ratio:
            best_ratio, best_name = ratio, name

    if best_ratio >= threshold:
        logger.warning(f"Tool '{requested}' not found, correcting to '{best_name}' "
                       f"(similarity {best_ratio:.2f})")
        return best_name
    return None

Pitfall 2: Function Calling Parameter Hallucinations

The model generates invalid parameter values — out-of-range dates, non-existent enum values, etc.

from pydantic import BaseModel, validator
from datetime import date

class StockQueryInput(BaseModel):
    symbol:     str
    start_date: date
    end_date:   date
    interval:   str

    @validator("symbol")
    def valid_symbol(cls, v):
        if not v.isupper() or len(v) > 5:
            raise ValueError(f"Invalid ticker: {v}")
        return v

    @validator("end_date")
    def valid_dates(cls, v, values):
        if "start_date" in values and v < values["start_date"]:
            raise ValueError("end_date must be after start_date")
        if v > date.today():
            raise ValueError("Cannot query future prices")
        return v

    @validator("interval")
    def valid_interval(cls, v):
        if v not in ["1d", "1wk", "1mo"]:
            raise ValueError(f"interval must be one of 1d/1wk/1mo, got: {v}")
        return v

def safe_invoke(name: str, raw: dict):
    try:
        validated = StockQueryInput(**raw)
        return execute_stock_query(validated)
    except ValidationError as e:
        # Return validation error to model so it can self-correct
        return ToolResult(success=False, error=e.json(),
                          hint="Please fix the parameters and retry")

Pitfall 3: Plan-and-Execute Stale Plans

When execution hits an unexpected failure, the original plan may no longer be valid.

class AdaptivePlanExecutor:
    def __init__(self, planner, executor, max_replan: int = 2):
        self.planner    = planner
        self.executor   = executor
        self.max_replan = max_replan

    async def execute(self, task: str) -> str:
        plan    = await self._plan(task, {})
        history = []
        replan_count = 0

        for idx, step in enumerate(plan.steps):
            result = await self._execute_step(step)
            history.append({"step": step, "result": result})

            if not result.success and replan_count < self.max_replan:
                replan_count += 1
                context = {
                    "completed": history,
                    "failed":    step,
                    "reason":    result.error,
                    "remaining": plan.steps[idx+1:]
                }
                plan = await self._plan(task, context)

        return await self._synthesize(task, history)

Production Decision Framework

How complex is the task?
│
├─ Simple (single tool call)
│  └─ Function Calling — lowest latency, lowest cost
│
├─ Medium (2–5 steps, conditional branches)
│  ├─ Model supports Function Calling?
│  │  ├─ Yes → Function Calling (parallel tools, efficient)
│  │  └─ No  → ReAct (universal compatibility)
│  └─ Need visible reasoning trace?
│     ├─ Yes → ReAct (transparent, easy to debug)
│     └─ No  → Function Calling
│
└─ Complex (5+ steps, multi-source, report generation)
   ├─ Steps known in advance?
   │  ├─ Yes → Plan-and-Execute (pre-planned, efficient execution)
   │  └─ No  → ReAct (dynamic decision-making)
   └─ Need mid-task plan adjustment?
      ├─ Yes → ReAct or Adaptive Plan-and-Execute
      └─ No  → Plan-and-Execute

Cost comparison (GPT-4, avg 3 tool calls/task):

Mode	Input Tokens	Output Tokens	Cost/task	Cost/1K tasks
ReAct	~1,900	~650	$0.0420	$42.00
Function Calling	~800	~290	$0.0198	$19.80
Plan-and-Execute	~2,700	~1,000	$0.0651	$65.10

Observability Configuration

from dataclasses import dataclass
from typing import Optional

@dataclass
class AgentMetrics:
    mode:         str
    iterations:   int
    tool_calls:   int
    duration_ms:  float
    in_tokens:    int
    out_tokens:   int
    success:      bool
    error:        Optional[str] = None

class AgentMonitor:
    def record(self, m: AgentMetrics):
        metrics.histogram("agent.duration_ms", m.duration_ms, tags={"mode": m.mode})
        metrics.gauge    ("agent.iterations",  m.iterations,  tags={"mode": m.mode})
        metrics.counter  ("agent.total", tags={"mode": m.mode, "ok": str(m.success)})

        # Alert: >8 iterations suggests a loop or stuck state
        if m.iterations >= 8:
            alerts.warn("agent.high_iterations",
                        f"Agent ran {m.iterations} iterations in {m.mode} mode")

        # Alert: single-task cost over threshold
        cost = (m.in_tokens * 0.03 + m.out_tokens * 0.06) / 1000
        if cost > 0.5:
            alerts.warn("agent.high_cost", f"Single task cost ${cost:.3f}")

Chapter Summary

This chapter systematically analyzed three Agent reasoning paradigms and their production trade-offs:

Key takeaways:

ReAct uses explicit thought-action cycles, ideal for tasks requiring dynamic decision-making and visible reasoning. Higher token cost, universally compatible.
Function Calling provides structured JSON tool invocation with parallel execution support. Lowest cost, requires native model support.
Plan-and-Execute is a two-phase approach suited for structured multi-step tasks. Planning quality determines execution quality; least flexible.

Selection principles:

Default to Function Calling (roughly 53% cheaper, lowest latency)
Choose ReAct when reasoning transparency or dynamic adaptation is needed
Use Plan-and-Execute when the task is complex but steps are well-defined
Always set iteration limits, parameter validation, and cost monitoring in production

Key numbers:

Function Calling costs ~47% of ReAct and ~30% of Plan-and-Execute
Each tool definition adds ~80 fixed tokens per request
ReAct tasks with >8 iterations should trigger investigation alerts
Parallel Function Calling reduces 3-tool latency from ~1500ms to ~600ms

Next chapter: Chapter 14 dives into Dify's tool ecosystem — a complete analysis of all built-in tools and hands-on custom tool development.

Rate this chapter

4.7 / 5 (22 ratings)