Chapter 51

ReAct Architecture Deep Dive with Full Code Implementation

Chapter 51: ReAct Architecture In Depth and Code Implementation

Introduction

ReAct (Reasoning + Acting) is the cornerstone of modern AI Agent frameworks. Proposed by Shunyu Yao et al. in 2022, this paper changed how people understand the capability boundaries of LLMs: by interleaving "thinking" and "acting," LLMs can solve complex reasoning tasks that previously required specialized training. The core reasoning engine of Hermes Agent is built on the ReAct paradigm. This chapter starts from the paper's principles, progressively dissects implementation details, and guides you through building a complete ReAct Agent from scratch in Python.


51.1 Core Ideas of the ReAct Paper

Background: Limitations of Pure Reasoning and Pure Action

Before ReAct, LLMs were used in two main ways:

Chain-of-Thought (CoT): The LLM reasons step by step, but all information must be pre-provided in the prompt—no external information access possible.

Tool-calling (early Agents): The LLM directly generates actions, but lacks intermediate reasoning steps, leading to poorly-grounded action selection and easy error accumulation.

ReAct's key insight: thinking and acting should alternate, mutually reinforcing each other.

flowchart LR
    subgraph CoT["CoT Only"]
        T1[Think 1] --> T2[Think 2] --> T3[Think 3] --> A1[Answer]
    end
    
    subgraph Act["Action Only"]
        AC1[Act 1] --> AC2[Act 2] --> AC3[Act 3] --> A2[Answer]
    end
    
    subgraph ReAct["ReAct"]
        RT1[Thought 1] --> RA1[Action 1] --> RO1[Obs 1]
        RO1 --> RT2[Thought 2] --> RA2[Action 2] --> RO2[Obs 2]
        RO2 --> RT3[Thought 3] --> A3[Final Answer]
    end

ReAct's Advantages

Dimension CoT Only Action Only ReAct
Interpretability High Low High
External info access None Yes Yes
Error correction Limited None Strong (observation feedback)
Hallucination risk High Medium Low
Task success rate Baseline Near-baseline Highest

51.2 The Thought → Action → Observation Loop

Thought

The LLM generates internal reasoning, analyzing the current situation and planning next steps. Thought does not directly produce output; it provides rationale for Action.

Thought: The user is asking about the 2024 Nobel Prize in Physics. 
This may be beyond my training data cutoff. I should search for the 
latest information about "2024 Nobel Prize Physics winner".

Action

The LLM generates a concrete tool-call instruction. Action format is typically structured (JSON or specific syntax).

Action: search["2024 Nobel Prize Physics winner"]

Observation

The tool's execution result is injected back into the context, becoming input for the LLM's next reasoning round.

Observation: The 2024 Nobel Prize in Physics was awarded to John Hopfield 
and Geoffrey Hinton for foundational discoveries enabling machine learning 
with artificial neural networks.

The loop continues: based on the Observation, the LLM thinks again (Thought), decides to continue acting or provide the final answer.

stateDiagram-v2
    [*] --> Thought: User Input
    Thought --> Action: Needs tool
    Thought --> FinalAnswer: Sufficient info
    Action --> Observation: Tool executes
    Observation --> Thought: Continue reasoning
    FinalAnswer --> [*]: Return answer

51.3 Hermes ReAct Implementation Specifics

Hermes Agent makes several enhancements over standard ReAct:

1. Structured Function Calling

Hermes uses JSON function calling rather than the free-text format of the original paper—more reliable and easier to parse.

2. Parallel Tool Calls

Standard ReAct allows only one tool call per turn. Hermes supports parallel tool calls in the same Action step, significantly improving efficiency.

3. Custom Stop Conditions

Hermes supports configuring max_steps, timeout, and stop_tokens to prevent infinite loops.

4. Context Window Management

When conversation history exceeds the model's maximum context length, Hermes automatically applies summary compression to preserve core information.


51.4 Building a ReAct Agent from Scratch

# react_agent.py
"""
Complete ReAct Agent implementation from scratch.
~120 lines of core logic with tool calling, loop reasoning, and error handling.
"""

import json
import re
import asyncio
from typing import Any, Callable, Optional
from dataclasses import dataclass, field
from openai import AsyncOpenAI  # Hermes exposes an OpenAI-compatible API


@dataclass
class Tool:
    name: str
    description: str
    parameters: dict
    func: Callable
    
    def to_openai_format(self) -> dict:
        return {
            "type": "function",
            "function": {
                "name": self.name,
                "description": self.description,
                "parameters": self.parameters
            }
        }


@dataclass
class ReActStep:
    step_number: int
    thought: str
    action: Optional[dict] = None
    observation: Optional[str] = None
    final_answer: Optional[str] = None


@dataclass
class ReActResult:
    success: bool
    answer: str
    steps: list
    total_steps: int
    error: Optional[str] = None


class ReActAgent:
    """
    ReAct Agent core implementation.
    Supports: multi-tool registration, max-steps limit, tool error handling, verbose logging.
    """
    
    SYSTEM_PROMPT = """You are a ReAct Agent that solves problems by alternating between
Thinking and Acting.

How to work:
1. Analyze the problem and think about what information or actions are needed
2. Call tools to get information or perform operations
3. Continue reasoning based on tool results
4. When you have sufficient information, provide a final answer

Important rules:
- Focus on the most important sub-question at each step
- After observing tool results, update your understanding
- If a tool call fails, think of alternatives
- When you have enough information, give the final answer directly—don't over-use tools"""
    
    def __init__(
        self,
        model: str = "NousResearch/Hermes-3-Llama-3.1-8B",
        base_url: str = "http://localhost:8000/v1",
        api_key: str = "not-needed",
        max_steps: int = 15,
        verbose: bool = True
    ):
        self.model = model
        self.max_steps = max_steps
        self.verbose = verbose
        self.tools: dict[str, Tool] = {}
        self.client = AsyncOpenAI(base_url=base_url, api_key=api_key)
    
    def register_tool(self, tool: Tool) -> 'ReActAgent':
        self.tools[tool.name] = tool
        return self
    
    def _log(self, message: str) -> None:
        if self.verbose:
            print(message)
    
    async def _execute_tool(self, tool_name: str, arguments: dict) -> str:
        if tool_name not in self.tools:
            return f"Error: tool '{tool_name}' not found. Available: {list(self.tools.keys())}"
        
        tool = self.tools[tool_name]
        try:
            if asyncio.iscoroutinefunction(tool.func):
                result = await tool.func(**arguments)
            else:
                result = tool.func(**arguments)
            
            result_str = str(result)
            if len(result_str) > 2000:
                result_str = result_str[:2000] + "\n... [truncated at 2000 chars]"
            return result_str
        except Exception as e:
            return f"Tool execution error: {type(e).__name__}: {str(e)}"
    
    async def run(self, task: str) -> ReActResult:
        messages = [
            {"role": "system", "content": self.SYSTEM_PROMPT},
            {"role": "user", "content": task}
        ]
        tools_schema = [t.to_openai_format() for t in self.tools.values()]
        steps = []
        
        self._log(f"\n{'='*50}\nTask: {task}\n{'='*50}\n")
        
        for step_num in range(1, self.max_steps + 1):
            response = await self.client.chat.completions.create(
                model=self.model,
                messages=messages,
                tools=tools_schema if tools_schema else None,
                tool_choice="auto",
                temperature=0.1,
                max_tokens=1024
            )
            
            msg = response.choices[0].message
            thought = msg.content or "(analyzing)"
            
            if msg.tool_calls:
                tool_call = msg.tool_calls[0]
                tool_name = tool_call.function.name
                
                try:
                    arguments = json.loads(tool_call.function.arguments)
                except json.JSONDecodeError:
                    arguments = {"raw": tool_call.function.arguments}
                
                self._log(f"\n--- Step {step_num} ---")
                self._log(f"Thought: {thought}")
                self._log(f"Action: {tool_name}({json.dumps(arguments)})")
                
                observation = await self._execute_tool(tool_name, arguments)
                
                self._log(f"Observation: {observation[:200]}{'...' if len(observation) > 200 else ''}")
                
                steps.append(ReActStep(
                    step_number=step_num,
                    thought=thought,
                    action={"tool": tool_name, "arguments": arguments},
                    observation=observation
                ))
                
                messages.append({
                    "role": "assistant",
                    "content": thought,
                    "tool_calls": [{
                        "id": tool_call.id,
                        "type": "function",
                        "function": {
                            "name": tool_name,
                            "arguments": tool_call.function.arguments
                        }
                    }]
                })
                messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": observation
                })
            
            else:
                final_answer = msg.content or "Task completed."
                steps.append(ReActStep(
                    step_number=step_num,
                    thought=thought,
                    final_answer=final_answer
                ))
                
                self._log(f"\n--- Step {step_num} (Final) ---")
                self._log(f"Final Answer: {final_answer}")
                self._log(f"\nCompleted in {step_num} steps.")
                
                return ReActResult(
                    success=True,
                    answer=final_answer,
                    steps=steps,
                    total_steps=step_num
                )
        
        return ReActResult(
            success=False,
            answer="Reached maximum steps without completing the task.",
            steps=steps,
            total_steps=self.max_steps,
            error=f"Exceeded max_steps={self.max_steps}"
        )


# Tool implementations
def make_calculator_tool() -> Tool:
    def calculate(expression: str) -> str:
        allowed = set('0123456789+-*/()., ')
        if not all(c in allowed for c in expression):
            return "Error: expression contains disallowed characters"
        try:
            return f"{expression} = {eval(expression)}"
        except Exception as e:
            return f"Calculation error: {str(e)}"
    
    return Tool(
        name="calculator",
        description="Evaluate mathematical expressions",
        parameters={
            "type": "object",
            "properties": {
                "expression": {"type": "string", "description": "Math expression, e.g. '2 + 3 * 4'"}
            },
            "required": ["expression"]
        },
        func=calculate
    )


async def main():
    agent = (
        ReActAgent(max_steps=10, verbose=True)
        .register_tool(make_calculator_tool())
    )
    
    result = await agent.run(
        "If I save $500 per month at 3% annual interest, "
        "how much will I have after 10 years?"
    )
    print(f"\nAnswer ({result.total_steps} steps): {result.answer}")


if __name__ == "__main__":
    asyncio.run(main())

51.5 Debugging Common ReAct Problems

Problem 1: Infinite Loops

Symptoms: Agent repeatedly calls similar tools, never reaching a final answer.

Root Causes: Tools return insufficient information; system prompt lacks clear stop conditions; LLM is overly conservative.

Solution: Add loop detection and explicit stop rules.

class LoopDetector:
    def __init__(self, window: int = 3):
        self.recent_actions = []
        self.window = window
    
    def is_looping(self, action: dict) -> bool:
        action_str = json.dumps(action, sort_keys=True)
        self.recent_actions.append(action_str)
        if len(self.recent_actions) > self.window:
            self.recent_actions.pop(0)
        return (len(set(self.recent_actions)) == 1 and 
                len(self.recent_actions) == self.window)

Problem 2: JSON Parsing Failures

Symptoms: json.JSONDecodeError when parsing tool arguments.

def safe_parse_arguments(raw: str) -> dict:
    try:
        return json.loads(raw)
    except json.JSONDecodeError:
        pass
    json_pattern = re.search(r'\{.*\}', raw, re.DOTALL)
    if json_pattern:
        try:
            return json.loads(json_pattern.group())
        except json.JSONDecodeError:
            pass
    return {"input": raw}

Problem 3: Context Overflow

Symptoms: LLM "forgets" early information; API returns context_length_exceeded.

Solution: Implement rolling context compression—preserve system prompt and recent N messages, summarize the middle section.

Quick Reference Table

Problem Diagnosis Solution
Infinite loop Print each action, check repetition Loop detector + explicit stop rules
Parameter parse failure Print raw arguments Fault-tolerant parsing
Context overflow Monitor token count Summary compression
Poor final answer Check observation quality Improve tool output format
Too many steps Track average step count Tune system prompt

Summary

This chapter provided a deep dive into ReAct theory and implementation:

  1. Paper principles: ReAct overcomes CoT and pure-action limitations by interleaving Thought (reasoning) and Action, grounding each decision in external observations.
  2. Three-phase loop: Thought → Action → Observation forms a closed feedback loop; each observation updates the LLM's world model.
  3. Hermes enhancements: Structured function calling, parallel tools, and context compression are the main improvements over standard ReAct.
  4. Complete code: ~120 lines implementing a production-capable ReAct Agent with tool registration, error handling, and verbose logging.
  5. Common pitfalls: Infinite loops, argument parse failures, and context overflow—each with concrete solutions.

Review Questions

  1. Is the Thought in ReAct meant for the LLM itself or for the user? How would you hide Thoughts in production without losing their reasoning value?
  2. When a tool call fails, should the Agent immediately retry, switch tools, or report failure to the user? How do you design this strategy?
  3. To support parallel tool calls (multiple tools executing simultaneously), how must the message history format change?
  4. What is the fundamental difference between ReAct and plain CoT? For pure reasoning tasks requiring no external information, does ReAct still offer advantages?
Rate this chapter
4.8  / 5  (3 ratings)

💬 Comments