Agent Architecture: ReAct vs Function Calling vs Plan-and-Execute
Chapter 13: Agent Architecture Principles — ReAct vs Function Calling vs Plan-and-Execute
A deep dive into the three dominant Agent reasoning paradigms — their internal mechanics, ideal use cases, and production trade-offs — so you can choose the right architecture for any business scenario.
Chapter Overview
When we say an AI system is an "Agent," what is it actually doing? How does it decide its next action? How does it decompose a vague user request into a sequence of executable tool calls? The answers lie in three core reasoning paradigms: ReAct (Reasoning + Acting), Function Calling (structured tool invocation), and Plan-and-Execute (plan first, then execute).
This chapter systematically analyzes each paradigm's principles, how Dify implements them, their performance characteristics, and how to make sound architectural choices in production. After reading this chapter, you will be able to:
- Understand the reasoning loop of each paradigm
- Choose the right paradigm based on task complexity and cost constraints
- Diagnose common Agent reasoning failures
- Configure and debug different Agent modes in Dify
Level 1: Foundational Knowledge (1–3 Years Experience)
What Is an Agent Reasoning Paradigm?
Think of an AI Agent as an employee. When a manager (the user) assigns a task, the employee must figure out how to complete it. Different employees work differently:
Employee 1 (ReAct style): Thinks and acts alternately. Given "Check Beijing's weather and tell me if I need an umbrella," they think "I need to check the weather," then check, see "Sunny, 25°C," then think "No umbrella needed," and finally give the answer.
Employee 2 (Function Calling style): Tells the system which tool to call, waits for the result, then gives the answer. Like a precise dispatcher: "Call the weather API" → result arrives → instant recommendation.
Employee 3 (Plan-and-Execute style): Plans completely before acting. Given a complex task, they first write a plan: "Step 1: check weather, Step 2: check calendar, Step 3: synthesize, Step 4: advise," then execute step by step.
ReAct Paradigm Introduction
ReAct comes from the 2022 paper "ReAct: Synergizing Reasoning and Acting in Language Models." Its core idea is to interleave Reasoning and Acting in a cycle:
Thought → Action → Observation → Thought → ... → Final Answer
In Dify, when you select a ReAct-mode Agent, the model generates a thought before each tool call, then decides which tool to invoke.
A real ReAct loop example:
User: Analyze Microsoft's recent stock trend and give investment advice.
Thought: I need to get Microsoft's stock data.
Action: search_stock_price({"symbol": "MSFT", "period": "30d"})
Observation: [30-day data returned: price rose from 380 to 420]
Thought: Clear uptrend, ~10.5% gain. I also need recent news to understand why.
Action: search_news({"query": "Microsoft AI 2024"})
Observation: [News: Copilot sales strong, AI service revenue growing]
Thought: I have enough data to give analysis now.
Final Answer: Microsoft's stock rose ~10.5% over 30 days, driven by strong AI (Copilot) revenue...
Function Calling Paradigm Introduction
Function Calling was introduced by OpenAI with GPT-4 and has become an industry standard. Unlike ReAct, the reasoning is more structured — the model outputs a JSON object directly, telling the system which function to call and with what parameters:
{
"tool_calls": [
{
"id": "call_abc123",
"type": "function",
"function": {
"name": "search_stock_price",
"arguments": "{\"symbol\": \"MSFT\", \"period\": \"30d\"}"
}
}
]
}
When Dify works with models that support Function Calling (GPT-4, Claude 3), it automatically uses this mode — more efficient and structurally reliable.
Plan-and-Execute Paradigm Introduction
Plan-and-Execute is a two-phase paradigm:
Phase 1 (Planning): A powerful model generates a complete execution plan.
Task: Write a comprehensive competitor analysis report
Plan:
1. Search for competitor companies (use web_search tool)
2. Get detailed info for each competitor (use company_info tool)
3. Compare key metrics (use data_analysis tool)
4. Generate report draft (use write_report tool)
5. Review and polish the report
Phase 2 (Execution): Execute each step in order, potentially using smaller, faster models.
This approach is ideal for complex, multi-step tasks because potential problems can be identified during the planning phase.
Selecting an Agent Type in Dify
When creating an Agent application in Dify, the "Reasoning Mode" setting determines which paradigm is used:
- Function Call mode: For GPT-4, Claude 3, and other models with native tool-call support
- ReAct mode: Compatible with all models, especially those without Function Calling support
- Custom prompt: Advanced users can fully control the reasoning logic
# Dify Agent configuration (app.yaml excerpt)
agent:
mode: function_call # or: react
tools:
- type: built_in
tool_name: web_search
- type: api
api_name: stock_api
max_iterations: 10
early_stopping: true
Level 2: Mechanism Deep Dive (3–5 Years Experience)
How ReAct Works Internally
ReAct relies on a special prompt template that guides the model to output in a specific format:
You are an intelligent assistant with access to the following tools:
{tool_list}
Respond in this format:
Thought: [your analysis of the current situation]
Action: tool_name
Action Input: {parameter_json}
Observation: [tool result — filled in by the system]
... (repeat Thought/Action/Observation as needed)
Final Answer: [your answer to the user]
Dify's ReAct execution loop:
def react_agent_loop(query: str, tools: dict, max_iterations: int = 10):
messages = [
{"role": "system", "content": build_react_system_prompt(tools)},
{"role": "user", "content": query}
]
for iteration in range(max_iterations):
# Stop sequences halt the model before "Observation:"
# so Dify can inject real tool results
response = llm.call(messages, stop=["\nObservation:", "\nObservation:"])
parsed = parse_react_output(response.content)
if parsed["final_answer"]:
return parsed["final_answer"]
if parsed["action"]:
try:
obs = tools[parsed["action"]].run(parsed["action_input"])
except Exception as e:
obs = f"Tool error: {e}"
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": f"Observation: {obs}"})
else:
return response.content # fallback
return "Max iterations reached"
The stop-sequence trick: Dify passes stop sequences to the LLM, causing it to halt output when it generates "Observation:". This gives Dify the opportunity to inject real tool results before asking the model to continue — the core mechanism that makes ReAct work.
Function Calling Protocol Details
Function Calling follows the OpenAI standard. Tool definition format:
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather for a specified city",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name, e.g. 'Beijing' or 'Shanghai'"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["city"]
}
}
}
Parallel Tool Calls:
GPT-4 and Claude 3 support multiple tool calls in a single response — a major advantage over ReAct:
# Model returns multiple tool calls simultaneously
tool_calls = [
{"id": "call_001", "function": {"name": "get_weather", "arguments": '{"city":"Beijing"}'}},
{"id": "call_002", "function": {"name": "get_weather", "arguments": '{"city":"Shanghai"}'}},
{"id": "call_003", "function": {"name": "get_exchange_rate", "arguments": '{"from":"USD","to":"CNY"}'}},
]
# Dify executes them in parallel
import asyncio
async def execute_parallel_tools(tool_calls: list) -> list:
tasks = [
(call["id"], asyncio.create_task(
execute_tool(call["function"]["name"],
json.loads(call["function"]["arguments"]))
))
for call in tool_calls
]
return [
{"tool_call_id": cid, "role": "tool", "content": str(await task)}
for cid, task in tasks
]
# Impact: 3 serial calls × 500ms = 1500ms → parallel = ~600ms (60% faster)
Plan-and-Execute Dual-Model Architecture
In Dify, Plan-and-Execute is typically implemented via Workflow:
workflow:
nodes:
- id: planner
type: llm
model: gpt-4 # Strong model for planning
prompt: |
You are a task planning expert. Decompose the following task into steps (JSON):
{user_task}
Output: {"steps": [{"id": 1, "action": "...", "tool": "...", "input": "..."}]}
- id: executor
type: iteration
iterator: "{{planner.output.steps}}"
nodes:
- id: step_exec
type: tool
tool_name: "{{item.tool}}"
tool_input: "{{item.input}}"
- id: synthesizer
type: llm
model: gpt-3.5-turbo # Smaller model to save cost
prompt: |
Based on these execution results, generate the final answer:
{{executor.outputs}}
Performance Comparison
Production benchmark (GPT-4, 5 tools defined, avg 3 tool calls/task):
| Metric | ReAct | Function Calling | Plan-and-Execute |
|---|---|---|---|
| First-response latency | High | Low | Very High |
| Parallel tool calls | No | Yes (GPT-4/Claude 3) | Yes (planned) |
| Tokens/task | ~2,100 | ~980 | ~3,200 |
| Est. cost/task (GPT-4) | $0.042 | $0.020 | $0.064 |
| Complex task handling | Medium | Medium | Strong |
| Mid-task plan adjustment | Strong | Weak | Weak |
| Model compatibility | All models | Requires FC support | Requires strong model |
Level 3: Source Code and Principles (5+ Years Experience)
Dify Agent Module Architecture
api/core/agent/
├── base_agent_runner.py # Abstract base class
├── react/
│ ├── react_agent_runner.py # ReAct implementation
│ └── react_multi_dataset_router_agent_runner.py
├── fc/
│ ├── fc_agent_runner.py # Function Calling implementation
│ └── parallel_fc_runner.py # Parallel FC executor
└── agent_factory.py # Factory: picks runner based on config
BaseAgentRunner core interface:
from abc import ABC, abstractmethod
from typing import Generator
class BaseAgentRunner(ABC):
def __init__(self, tenant_id, app_config, model_config, tools, agent_config):
self.tools = {t.name: t for t in tools}
self.max_iter = agent_config.max_iterations or 10
@abstractmethod
def run(self, query: str, message, conversation) -> Generator:
raise NotImplementedError
def _should_continue(self, iteration: int) -> bool:
return iteration < self.max_iter and not self._token_usage_exceeded()
def _invoke_tool(self, name: str, params: dict):
if name not in self.tools:
raise ToolNotFoundError(name)
try:
result = self.tools[name].invoke(params, timeout=30)
return ToolResult(success=True, output=result)
except TimeoutError:
return ToolResult(success=False, error="Tool timed out")
except Exception as e:
return ToolResult(success=False, error=str(e))
ReAct Runner — stop sequences and parsing:
class ReactAgentRunner(BaseAgentRunner):
STOP_SEQUENCES = ["\nObservation:", "\nObservation:", "\n观察:"]
def run(self, query, message, conversation) -> Generator:
iteration = 0
messages = self._build_initial_messages(query)
while self._should_continue(iteration):
iteration += 1
response = self._invoke_llm(messages, stop=self.STOP_SEQUENCES)
parsed = self._parse(response.content)
if parsed.is_final_answer:
yield AgentFinish(output=parsed.final_answer)
return
if parsed.tool_call:
result = self._invoke_tool(parsed.tool_call.name,
parsed.tool_call.input)
yield AgentStep(thought=parsed.thought,
action=parsed.tool_call,
observation=result)
messages = self._append_step(messages, parsed, result)
else:
yield AgentFinish(output=response.content)
return
def _parse(self, text: str):
for pat in [r'Final Answer[::]\s*(.*)',
r'最终答案[::]\s*(.*)']:
m = re.search(pat, text, re.DOTALL | re.IGNORECASE)
if m:
return ReactParsed(is_final_answer=True,
final_answer=m.group(1).strip())
for pat in [r'Action[::]\s*(\S+)\nAction Input[::]\s*(.*)',
r'行动[::]\s*(\S+)\n行动输入[::]\s*(.*)']:
m = re.search(pat, text, re.DOTALL)
if m:
try:
ainput = json.loads(m.group(2).strip())
except json.JSONDecodeError:
ainput = {"input": m.group(2).strip()}
return ReactParsed(
thought=self._extract_thought(text),
tool_call=ToolCall(name=m.group(1).strip(), input=ainput)
)
return ReactParsed(is_final_answer=True, final_answer=text)
Complete Function Calling Message Flow
# Full conversation history structure for Function Calling
history = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What's the weather in Beijing and what day is it?"},
# Model returns tool_calls (no content)
{
"role": "assistant",
"content": None,
"tool_calls": [
{"id": "call_abc", "type": "function",
"function": {"name": "get_weather", "arguments": '{"city":"Beijing"}'}},
{"id": "call_def", "type": "function",
"function": {"name": "get_current_date", "arguments": "{}"}},
]
},
# Tool results — must match tool_call_id
{"role": "tool", "tool_call_id": "call_abc",
"content": '{"weather":"Sunny","temperature":25}'},
{"role": "tool", "tool_call_id": "call_def",
"content": '{"date":"2024-03-15","weekday":"Friday"}'},
# Final model response (plain text)
{"role": "assistant",
"content": "Beijing is sunny and 25°C today. It's Friday — great day to go out!"},
]
Token cost measurement:
import tiktoken, json
def measure_fc_token_cost(tools: list, messages: list) -> dict:
enc = tiktoken.encoding_for_model("gpt-4")
tools_tokens = len(enc.encode(json.dumps(tools)))
msg_tokens = sum(len(enc.encode(str(m.get("content") or ""))) for m in messages)
overhead = len(tools) * 15 # ~15 tokens per function definition (OpenAI overhead)
total = tools_tokens + msg_tokens + overhead
return {
"tools_tokens": tools_tokens,
"messages_tokens": msg_tokens,
"overhead_tokens": overhead,
"total": total,
"cost_usd": round(total / 1000 * 0.03, 5),
}
# Note: 5 tool definitions ≈ 400 tokens — a fixed per-call cost often overlooked!
Deep Comparison: ReAct vs Function Calling
Thought-chain quality:
ReAct's explicit Thought text becomes part of the context, maintaining logical coherence across long tasks. Function Calling has no explicit reasoning chain — the model reasons implicitly through message history, which can lead to logical jumps on complex tasks.
Error recovery:
# ReAct: model reasons about the error explicitly in Thought
"""
Thought: Let me get OpenAI's stock price.
Action: stock_api
Action Input: {"symbol": "OPENAI"}
Observation: Error — symbol OPENAI not found
Thought: OpenAI is private. I should use Microsoft (MSFT) as a proxy
since Microsoft is the largest OpenAI investor.
Action: stock_api
Action Input: {"symbol": "MSFT"}
"""
# Function Calling: error returned as tool message, model adapts implicitly
{"role": "tool", "tool_call_id": "call_001",
"content": "Error: Symbol OPENAI not found"}
# The model must infer from this error that it needs to try a different symbol
# — less reliable than ReAct's explicit self-correction
Level 4: Production Pitfalls and Decision-Making (Expert Perspective)
Pitfall 1: ReAct "Hallucinated Tool Names"
The most common ReAct production issue: the model generates a tool name that doesn't exist.
Action: analyze_sentiment_advanced # only "analyze_sentiment" exists
Action Input: {"text": "..."}
Fix: fuzzy matching with edit distance
from difflib import SequenceMatcher
def validate_and_correct_tool(requested: str, available: dict,
threshold: float = 0.8) -> Optional[str]:
if requested in available:
return requested
best_name, best_ratio = None, 0.0
for name in available:
ratio = SequenceMatcher(None, requested, name).ratio()
if ratio > best_ratio:
best_ratio, best_name = ratio, name
if best_ratio >= threshold:
logger.warning(f"Tool '{requested}' not found, correcting to '{best_name}' "
f"(similarity {best_ratio:.2f})")
return best_name
return None
Pitfall 2: Function Calling Parameter Hallucinations
The model generates invalid parameter values — out-of-range dates, non-existent enum values, etc.
from pydantic import BaseModel, validator
from datetime import date
class StockQueryInput(BaseModel):
symbol: str
start_date: date
end_date: date
interval: str
@validator("symbol")
def valid_symbol(cls, v):
if not v.isupper() or len(v) > 5:
raise ValueError(f"Invalid ticker: {v}")
return v
@validator("end_date")
def valid_dates(cls, v, values):
if "start_date" in values and v < values["start_date"]:
raise ValueError("end_date must be after start_date")
if v > date.today():
raise ValueError("Cannot query future prices")
return v
@validator("interval")
def valid_interval(cls, v):
if v not in ["1d", "1wk", "1mo"]:
raise ValueError(f"interval must be one of 1d/1wk/1mo, got: {v}")
return v
def safe_invoke(name: str, raw: dict):
try:
validated = StockQueryInput(**raw)
return execute_stock_query(validated)
except ValidationError as e:
# Return validation error to model so it can self-correct
return ToolResult(success=False, error=e.json(),
hint="Please fix the parameters and retry")
Pitfall 3: Plan-and-Execute Stale Plans
When execution hits an unexpected failure, the original plan may no longer be valid.
class AdaptivePlanExecutor:
def __init__(self, planner, executor, max_replan: int = 2):
self.planner = planner
self.executor = executor
self.max_replan = max_replan
async def execute(self, task: str) -> str:
plan = await self._plan(task, {})
history = []
replan_count = 0
for idx, step in enumerate(plan.steps):
result = await self._execute_step(step)
history.append({"step": step, "result": result})
if not result.success and replan_count < self.max_replan:
replan_count += 1
context = {
"completed": history,
"failed": step,
"reason": result.error,
"remaining": plan.steps[idx+1:]
}
plan = await self._plan(task, context)
return await self._synthesize(task, history)
Production Decision Framework
How complex is the task?
│
├─ Simple (single tool call)
│ └─ Function Calling — lowest latency, lowest cost
│
├─ Medium (2–5 steps, conditional branches)
│ ├─ Model supports Function Calling?
│ │ ├─ Yes → Function Calling (parallel tools, efficient)
│ │ └─ No → ReAct (universal compatibility)
│ └─ Need visible reasoning trace?
│ ├─ Yes → ReAct (transparent, easy to debug)
│ └─ No → Function Calling
│
└─ Complex (5+ steps, multi-source, report generation)
├─ Steps known in advance?
│ ├─ Yes → Plan-and-Execute (pre-planned, efficient execution)
│ └─ No → ReAct (dynamic decision-making)
└─ Need mid-task plan adjustment?
├─ Yes → ReAct or Adaptive Plan-and-Execute
└─ No → Plan-and-Execute
Cost comparison (GPT-4, avg 3 tool calls/task):
| Mode | Input Tokens | Output Tokens | Cost/task | Cost/1K tasks |
|---|---|---|---|---|
| ReAct | ~1,900 | ~650 | $0.0420 | $42.00 |
| Function Calling | ~800 | ~290 | $0.0198 | $19.80 |
| Plan-and-Execute | ~2,700 | ~1,000 | $0.0651 | $65.10 |
Observability Configuration
from dataclasses import dataclass
from typing import Optional
@dataclass
class AgentMetrics:
mode: str
iterations: int
tool_calls: int
duration_ms: float
in_tokens: int
out_tokens: int
success: bool
error: Optional[str] = None
class AgentMonitor:
def record(self, m: AgentMetrics):
metrics.histogram("agent.duration_ms", m.duration_ms, tags={"mode": m.mode})
metrics.gauge ("agent.iterations", m.iterations, tags={"mode": m.mode})
metrics.counter ("agent.total", tags={"mode": m.mode, "ok": str(m.success)})
# Alert: >8 iterations suggests a loop or stuck state
if m.iterations >= 8:
alerts.warn("agent.high_iterations",
f"Agent ran {m.iterations} iterations in {m.mode} mode")
# Alert: single-task cost over threshold
cost = (m.in_tokens * 0.03 + m.out_tokens * 0.06) / 1000
if cost > 0.5:
alerts.warn("agent.high_cost", f"Single task cost ${cost:.3f}")
Chapter Summary
This chapter systematically analyzed three Agent reasoning paradigms and their production trade-offs:
Key takeaways:
- ReAct uses explicit thought-action cycles, ideal for tasks requiring dynamic decision-making and visible reasoning. Higher token cost, universally compatible.
- Function Calling provides structured JSON tool invocation with parallel execution support. Lowest cost, requires native model support.
- Plan-and-Execute is a two-phase approach suited for structured multi-step tasks. Planning quality determines execution quality; least flexible.
Selection principles:
- Default to Function Calling (roughly 53% cheaper, lowest latency)
- Choose ReAct when reasoning transparency or dynamic adaptation is needed
- Use Plan-and-Execute when the task is complex but steps are well-defined
- Always set iteration limits, parameter validation, and cost monitoring in production
Key numbers:
- Function Calling costs ~47% of ReAct and ~30% of Plan-and-Execute
- Each tool definition adds ~80 fixed tokens per request
- ReAct tasks with >8 iterations should trigger investigation alerts
- Parallel Function Calling reduces 3-tool latency from ~1500ms to ~600ms
Next chapter: Chapter 14 dives into Dify's tool ecosystem — a complete analysis of all built-in tools and hands-on custom tool development.