ReAct Architecture Deep Dive with Full Code Implementation
Chapter 51: ReAct Architecture In Depth and Code Implementation
Introduction
ReAct (Reasoning + Acting) is the cornerstone of modern AI Agent frameworks. Proposed by Shunyu Yao et al. in 2022, this paper changed how people understand the capability boundaries of LLMs: by interleaving "thinking" and "acting," LLMs can solve complex reasoning tasks that previously required specialized training. The core reasoning engine of Hermes Agent is built on the ReAct paradigm. This chapter starts from the paper's principles, progressively dissects implementation details, and guides you through building a complete ReAct Agent from scratch in Python.
51.1 Core Ideas of the ReAct Paper
Background: Limitations of Pure Reasoning and Pure Action
Before ReAct, LLMs were used in two main ways:
Chain-of-Thought (CoT): The LLM reasons step by step, but all information must be pre-provided in the prompt—no external information access possible.
Tool-calling (early Agents): The LLM directly generates actions, but lacks intermediate reasoning steps, leading to poorly-grounded action selection and easy error accumulation.
ReAct's key insight: thinking and acting should alternate, mutually reinforcing each other.
flowchart LR
subgraph CoT["CoT Only"]
T1[Think 1] --> T2[Think 2] --> T3[Think 3] --> A1[Answer]
end
subgraph Act["Action Only"]
AC1[Act 1] --> AC2[Act 2] --> AC3[Act 3] --> A2[Answer]
end
subgraph ReAct["ReAct"]
RT1[Thought 1] --> RA1[Action 1] --> RO1[Obs 1]
RO1 --> RT2[Thought 2] --> RA2[Action 2] --> RO2[Obs 2]
RO2 --> RT3[Thought 3] --> A3[Final Answer]
end
ReAct's Advantages
| Dimension | CoT Only | Action Only | ReAct |
|---|---|---|---|
| Interpretability | High | Low | High |
| External info access | None | Yes | Yes |
| Error correction | Limited | None | Strong (observation feedback) |
| Hallucination risk | High | Medium | Low |
| Task success rate | Baseline | Near-baseline | Highest |
51.2 The Thought → Action → Observation Loop
Thought
The LLM generates internal reasoning, analyzing the current situation and planning next steps. Thought does not directly produce output; it provides rationale for Action.
Thought: The user is asking about the 2024 Nobel Prize in Physics.
This may be beyond my training data cutoff. I should search for the
latest information about "2024 Nobel Prize Physics winner".
Action
The LLM generates a concrete tool-call instruction. Action format is typically structured (JSON or specific syntax).
Action: search["2024 Nobel Prize Physics winner"]
Observation
The tool's execution result is injected back into the context, becoming input for the LLM's next reasoning round.
Observation: The 2024 Nobel Prize in Physics was awarded to John Hopfield
and Geoffrey Hinton for foundational discoveries enabling machine learning
with artificial neural networks.
The loop continues: based on the Observation, the LLM thinks again (Thought), decides to continue acting or provide the final answer.
stateDiagram-v2
[*] --> Thought: User Input
Thought --> Action: Needs tool
Thought --> FinalAnswer: Sufficient info
Action --> Observation: Tool executes
Observation --> Thought: Continue reasoning
FinalAnswer --> [*]: Return answer
51.3 Hermes ReAct Implementation Specifics
Hermes Agent makes several enhancements over standard ReAct:
1. Structured Function Calling
Hermes uses JSON function calling rather than the free-text format of the original paper—more reliable and easier to parse.
2. Parallel Tool Calls
Standard ReAct allows only one tool call per turn. Hermes supports parallel tool calls in the same Action step, significantly improving efficiency.
3. Custom Stop Conditions
Hermes supports configuring max_steps, timeout, and stop_tokens to prevent infinite loops.
4. Context Window Management
When conversation history exceeds the model's maximum context length, Hermes automatically applies summary compression to preserve core information.
51.4 Building a ReAct Agent from Scratch
# react_agent.py
"""
Complete ReAct Agent implementation from scratch.
~120 lines of core logic with tool calling, loop reasoning, and error handling.
"""
import json
import re
import asyncio
from typing import Any, Callable, Optional
from dataclasses import dataclass, field
from openai import AsyncOpenAI # Hermes exposes an OpenAI-compatible API
@dataclass
class Tool:
name: str
description: str
parameters: dict
func: Callable
def to_openai_format(self) -> dict:
return {
"type": "function",
"function": {
"name": self.name,
"description": self.description,
"parameters": self.parameters
}
}
@dataclass
class ReActStep:
step_number: int
thought: str
action: Optional[dict] = None
observation: Optional[str] = None
final_answer: Optional[str] = None
@dataclass
class ReActResult:
success: bool
answer: str
steps: list
total_steps: int
error: Optional[str] = None
class ReActAgent:
"""
ReAct Agent core implementation.
Supports: multi-tool registration, max-steps limit, tool error handling, verbose logging.
"""
SYSTEM_PROMPT = """You are a ReAct Agent that solves problems by alternating between
Thinking and Acting.
How to work:
1. Analyze the problem and think about what information or actions are needed
2. Call tools to get information or perform operations
3. Continue reasoning based on tool results
4. When you have sufficient information, provide a final answer
Important rules:
- Focus on the most important sub-question at each step
- After observing tool results, update your understanding
- If a tool call fails, think of alternatives
- When you have enough information, give the final answer directly—don't over-use tools"""
def __init__(
self,
model: str = "NousResearch/Hermes-3-Llama-3.1-8B",
base_url: str = "http://localhost:8000/v1",
api_key: str = "not-needed",
max_steps: int = 15,
verbose: bool = True
):
self.model = model
self.max_steps = max_steps
self.verbose = verbose
self.tools: dict[str, Tool] = {}
self.client = AsyncOpenAI(base_url=base_url, api_key=api_key)
def register_tool(self, tool: Tool) -> 'ReActAgent':
self.tools[tool.name] = tool
return self
def _log(self, message: str) -> None:
if self.verbose:
print(message)
async def _execute_tool(self, tool_name: str, arguments: dict) -> str:
if tool_name not in self.tools:
return f"Error: tool '{tool_name}' not found. Available: {list(self.tools.keys())}"
tool = self.tools[tool_name]
try:
if asyncio.iscoroutinefunction(tool.func):
result = await tool.func(**arguments)
else:
result = tool.func(**arguments)
result_str = str(result)
if len(result_str) > 2000:
result_str = result_str[:2000] + "\n... [truncated at 2000 chars]"
return result_str
except Exception as e:
return f"Tool execution error: {type(e).__name__}: {str(e)}"
async def run(self, task: str) -> ReActResult:
messages = [
{"role": "system", "content": self.SYSTEM_PROMPT},
{"role": "user", "content": task}
]
tools_schema = [t.to_openai_format() for t in self.tools.values()]
steps = []
self._log(f"\n{'='*50}\nTask: {task}\n{'='*50}\n")
for step_num in range(1, self.max_steps + 1):
response = await self.client.chat.completions.create(
model=self.model,
messages=messages,
tools=tools_schema if tools_schema else None,
tool_choice="auto",
temperature=0.1,
max_tokens=1024
)
msg = response.choices[0].message
thought = msg.content or "(analyzing)"
if msg.tool_calls:
tool_call = msg.tool_calls[0]
tool_name = tool_call.function.name
try:
arguments = json.loads(tool_call.function.arguments)
except json.JSONDecodeError:
arguments = {"raw": tool_call.function.arguments}
self._log(f"\n--- Step {step_num} ---")
self._log(f"Thought: {thought}")
self._log(f"Action: {tool_name}({json.dumps(arguments)})")
observation = await self._execute_tool(tool_name, arguments)
self._log(f"Observation: {observation[:200]}{'...' if len(observation) > 200 else ''}")
steps.append(ReActStep(
step_number=step_num,
thought=thought,
action={"tool": tool_name, "arguments": arguments},
observation=observation
))
messages.append({
"role": "assistant",
"content": thought,
"tool_calls": [{
"id": tool_call.id,
"type": "function",
"function": {
"name": tool_name,
"arguments": tool_call.function.arguments
}
}]
})
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": observation
})
else:
final_answer = msg.content or "Task completed."
steps.append(ReActStep(
step_number=step_num,
thought=thought,
final_answer=final_answer
))
self._log(f"\n--- Step {step_num} (Final) ---")
self._log(f"Final Answer: {final_answer}")
self._log(f"\nCompleted in {step_num} steps.")
return ReActResult(
success=True,
answer=final_answer,
steps=steps,
total_steps=step_num
)
return ReActResult(
success=False,
answer="Reached maximum steps without completing the task.",
steps=steps,
total_steps=self.max_steps,
error=f"Exceeded max_steps={self.max_steps}"
)
# Tool implementations
def make_calculator_tool() -> Tool:
def calculate(expression: str) -> str:
allowed = set('0123456789+-*/()., ')
if not all(c in allowed for c in expression):
return "Error: expression contains disallowed characters"
try:
return f"{expression} = {eval(expression)}"
except Exception as e:
return f"Calculation error: {str(e)}"
return Tool(
name="calculator",
description="Evaluate mathematical expressions",
parameters={
"type": "object",
"properties": {
"expression": {"type": "string", "description": "Math expression, e.g. '2 + 3 * 4'"}
},
"required": ["expression"]
},
func=calculate
)
async def main():
agent = (
ReActAgent(max_steps=10, verbose=True)
.register_tool(make_calculator_tool())
)
result = await agent.run(
"If I save $500 per month at 3% annual interest, "
"how much will I have after 10 years?"
)
print(f"\nAnswer ({result.total_steps} steps): {result.answer}")
if __name__ == "__main__":
asyncio.run(main())
51.5 Debugging Common ReAct Problems
Problem 1: Infinite Loops
Symptoms: Agent repeatedly calls similar tools, never reaching a final answer.
Root Causes: Tools return insufficient information; system prompt lacks clear stop conditions; LLM is overly conservative.
Solution: Add loop detection and explicit stop rules.
class LoopDetector:
def __init__(self, window: int = 3):
self.recent_actions = []
self.window = window
def is_looping(self, action: dict) -> bool:
action_str = json.dumps(action, sort_keys=True)
self.recent_actions.append(action_str)
if len(self.recent_actions) > self.window:
self.recent_actions.pop(0)
return (len(set(self.recent_actions)) == 1 and
len(self.recent_actions) == self.window)
Problem 2: JSON Parsing Failures
Symptoms: json.JSONDecodeError when parsing tool arguments.
def safe_parse_arguments(raw: str) -> dict:
try:
return json.loads(raw)
except json.JSONDecodeError:
pass
json_pattern = re.search(r'\{.*\}', raw, re.DOTALL)
if json_pattern:
try:
return json.loads(json_pattern.group())
except json.JSONDecodeError:
pass
return {"input": raw}
Problem 3: Context Overflow
Symptoms: LLM "forgets" early information; API returns context_length_exceeded.
Solution: Implement rolling context compression—preserve system prompt and recent N messages, summarize the middle section.
Quick Reference Table
| Problem | Diagnosis | Solution |
|---|---|---|
| Infinite loop | Print each action, check repetition | Loop detector + explicit stop rules |
| Parameter parse failure | Print raw arguments | Fault-tolerant parsing |
| Context overflow | Monitor token count | Summary compression |
| Poor final answer | Check observation quality | Improve tool output format |
| Too many steps | Track average step count | Tune system prompt |
Summary
This chapter provided a deep dive into ReAct theory and implementation:
- Paper principles: ReAct overcomes CoT and pure-action limitations by interleaving Thought (reasoning) and Action, grounding each decision in external observations.
- Three-phase loop: Thought → Action → Observation forms a closed feedback loop; each observation updates the LLM's world model.
- Hermes enhancements: Structured function calling, parallel tools, and context compression are the main improvements over standard ReAct.
- Complete code: ~120 lines implementing a production-capable ReAct Agent with tool registration, error handling, and verbose logging.
- Common pitfalls: Infinite loops, argument parse failures, and context overflow—each with concrete solutions.
Review Questions
- Is the Thought in ReAct meant for the LLM itself or for the user? How would you hide Thoughts in production without losing their reasoning value?
- When a tool call fails, should the Agent immediately retry, switch tools, or report failure to the user? How do you design this strategy?
- To support parallel tool calls (multiple tools executing simultaneously), how must the message history format change?
- What is the fundamental difference between ReAct and plain CoT? For pure reasoning tasks requiring no external information, does ReAct still offer advantages?