Chapter 71

Case Study: Multi-Step Deep Research Agent

Chapter 71: Case Study — Multi-Step Deep Research Agent

Chapter Introduction

The great paradox of the information age is that data grows exponentially while genuine insight becomes rarer. A high-quality competitive analysis report that once required days of analyst time can today be completed by a Hermes Agent in under an hour — searching academic papers, news, technical documentation, reading full-text sources, cross-verifying facts, tracking every citation, and producing a structured report with traceable references. This chapter builds a complete multi-step deep research agent from scratch, focusing on how to automate the entire research workflow without sacrificing quality.

71.1 Requirements: Core Challenges of Automated Research

Where Time Goes in Manual Research

Time distribution in manual research:
Searching for relevant material     ██████ 30%
Reading and filtering content      ████████ 40%
Cross-verifying facts              ████ 20%
Writing and organizing report      ██ 10%

Core automation challenges:

Challenge	Description	Difficulty
Source reliability	Distinguishing authoritative vs. low-quality content	High
Fact-checking	Same fact may conflict across sources	High
Depth vs. breadth	Too broad = shallow; too narrow = gaps	Medium
Citation tracking	Every claim must have a traceable source	Medium
Content deduplication	Same info reported by many sources	Low

Agent Target Capabilities

Input: Research topic + optional depth/breadth parameters
Output: Structured Markdown report with citations
Process: Search → Filter → Read → Extract → Synthesize → Write
Quality: Citation tracking + cross-verification + confidence labels

71.2 System Architecture

Research Pipeline State Machine

┌─────────────────────────────────────────────────────────┐
│                   Research Agent Pipeline                │
│                                                         │
│  [INIT]      Parse topic, plan search strategy          │
│     ↓                                                   │
│  [SEARCH]    Multi-angle search (Tavily/SerpAPI)        │
│     ↓                                                   │
│  [FILTER]    Evaluate source reliability                │
│     ↓                                                   │
│  [READ]      Deep-read high-value pages (full text)     │
│     ↓                                                   │
│  [EXTRACT]   Distill key facts from each source         │
│     ↓                                                   │
│  [VERIFY]    Cross-verify: find corroboration/conflicts │
│     ↓                                                   │
│  [SYNTHESIZE] Combine findings, identify key themes     │
│     ↓                                                   │
│  [WRITE]     Generate structured research report        │
│     ↓                                                   │
│  [QA]        Validate citations, annotate confidence    │
└─────────────────────────────────────────────────────────┘

Tool Inventory

Tool	Description	API/Library
`search_web`	General web search	Tavily API
`search_academic`	Academic paper search	Semantic Scholar
`fetch_page_content`	Retrieve full page text	requests + BeautifulSoup
`extract_pdf_text`	PDF text extraction	pdfplumber
`record_fact`	Store verified fact to memory	In-memory
`cross_verify_fact`	Check fact against multiple sources	Hermes LLM
`write_final_report`	Compile full report	Template
`format_citations`	Format reference list	Custom

71.3 Full Implementation

Core Agent

# research_agent/agent.py
import os
import json
from datetime import datetime
from openai import OpenAI

client = OpenAI(
    base_url=os.getenv("HERMES_BASE_URL", "http://localhost:11434/v1"),
    api_key=os.getenv("HERMES_API_KEY", "ollama"),
)
MODEL = os.getenv("HERMES_MODEL", "nous-hermes-2-mixtral-8x7b-dpo")

SYSTEM_PROMPT = """You are a professional research analyst skilled at synthesizing insights from large volumes of information.

Your research methodology:
1. **Broad search**: Query from multiple angles, not just the surface question
2. **Deep reading**: Read full text of key sources, not just excerpts
3. **Cross-verification**: Confirm important claims with at least 2 independent sources
4. **Citation tracking**: Every data point must have a source — no unsourced assertions
5. **Structured output**: Reports must have clear hierarchical structure

Confidence annotation rules:
- HIGH: 2+ authoritative sources confirm the claim
- MEDIUM: 1 reliable source, not independently verified
- LOW: Speculation or single source, must be flagged

Report structure:
- Executive Summary (300 words max)
- Key Findings (3-5 bullets)
- Detailed Analysis (by section)
- Conclusions & Recommendations
- References"""

TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "search_web",
            "description": "Search the web using the Tavily search API",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string"},
                    "max_results": {"type": "integer", "default": 5},
                    "include_domains": {"type": "array", "items": {"type": "string"}},
                    "search_depth": {"type": "string", "enum": ["basic", "advanced"], "default": "advanced"}
                },
                "required": ["query"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "search_academic",
            "description": "Search academic papers via Semantic Scholar",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string"},
                    "year_from": {"type": "integer"},
                    "max_results": {"type": "integer", "default": 5}
                },
                "required": ["query"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "fetch_page_content",
            "description": "Retrieve the full text content of a URL",
            "parameters": {
                "type": "object",
                "properties": {
                    "url": {"type": "string"},
                    "max_chars": {"type": "integer", "default": 8000}
                },
                "required": ["url"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "record_fact",
            "description": "Record a verified fact into the research memory",
            "parameters": {
                "type": "object",
                "properties": {
                    "fact": {"type": "string"},
                    "source_url": {"type": "string"},
                    "source_title": {"type": "string"},
                    "confidence": {"type": "string", "enum": ["high", "medium", "low"]},
                    "category": {"type": "string"}
                },
                "required": ["fact", "source_url", "confidence", "category"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_recorded_facts",
            "description": "Retrieve all recorded research facts",
            "parameters": {
                "type": "object",
                "properties": {
                    "category": {"type": "string"}
                }
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "write_final_report",
            "description": "Compile all collected information into the final research report",
            "parameters": {
                "type": "object",
                "properties": {
                    "topic": {"type": "string"},
                    "facts": {"type": "array"},
                    "report_structure": {"type": "array", "items": {"type": "string"}}
                },
                "required": ["topic", "facts"]
            }
        }
    }
]


class ResearchMemory:
    def __init__(self):
        self.visited_urls: set = set()
        self.facts: list = []

    def add_fact(self, fact, source_url, source_title, confidence, category):
        self.facts.append({
            "id": len(self.facts) + 1, "fact": fact,
            "source_url": source_url, "source_title": source_title,
            "confidence": confidence, "category": category
        })
        return {"success": True, "fact_id": len(self.facts)}

    def get_facts(self, category=None):
        return [f for f in self.facts if not category or f["category"] == category]


def run_research_agent(topic: str, depth: str = "comprehensive") -> dict:
    memory = ResearchMemory()
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": f"""Conduct a deep research on the following topic and produce a complete report:

**Topic:** {topic}
**Depth:** {depth}
**Date:** {datetime.now().strftime('%B %Y')}

Steps:
1. Plan search strategy (at least 3 different search angles)
2. Execute multiple search rounds from different sources
3. Deep-read the 5-10 most relevant sources in full
4. Record key facts using record_fact tool
5. Cross-verify important data points
6. Call write_final_report to generate the report

Begin the research."""}
    ]

    for iteration in range(30):
        response = client.chat.completions.create(
            model=MODEL, messages=messages, tools=TOOLS,
            tool_choice="auto", temperature=0.3, max_tokens=4000
        )
        message = response.choices[0].message
        messages.append(message)

        if not message.tool_calls:
            return {
                "status": "completed", "report": message.content,
                "facts_collected": len(memory.facts),
                "sources_visited": len(memory.visited_urls),
                "iterations": iteration + 1
            }

        for tc in message.tool_calls:
            args = json.loads(tc.function.arguments)
            result = _dispatch(tc.function.name, args, memory)
            messages.append({
                "role": "tool", "tool_call_id": tc.id,
                "content": json.dumps(result)
            })

    return {"status": "max_iterations", "facts_collected": len(memory.facts)}

Search & Content Tools

# research_agent/tools/search_tools.py
import requests, os

TAVILY_KEY = os.getenv("TAVILY_API_KEY")

def search_tavily(query, max_results=5, include_domains=None, search_depth="advanced"):
    payload = {
        "api_key": TAVILY_KEY, "query": query,
        "max_results": max_results, "search_depth": search_depth,
        "include_answer": True
    }
    if include_domains:
        payload["include_domains"] = include_domains
    resp = requests.post("https://api.tavily.com/search", json=payload, timeout=30)
    resp.raise_for_status()
    data = resp.json()
    return {
        "query": query,
        "ai_answer": data.get("answer", ""),
        "results": [
            {"title": r["title"], "url": r["url"],
             "snippet": r.get("content", "")[:500], "score": r.get("score", 0)}
            for r in data.get("results", [])
        ]
    }

def search_academic(query, year_from=None, max_results=5):
    params = {
        "query": query, "limit": max_results,
        "fields": "title,abstract,year,authors,citationCount,url"
    }
    if year_from:
        params["year"] = f"{year_from}-"
    resp = requests.get(
        "https://api.semanticscholar.org/graph/v1/paper/search",
        params=params, timeout=30
    )
    papers = resp.json().get("data", [])
    papers.sort(key=lambda p: p.get("citationCount", 0), reverse=True)
    return {"query": query, "papers": [
        {"title": p.get("title"), "abstract": p.get("abstract", "")[:400],
         "year": p.get("year"), "citations": p.get("citationCount", 0),
         "url": p.get("url", "")}
        for p in papers
    ]}

71.4 Quality Control

Citation Tracking

Every factual assertion in the final report must be traceable. The CitationTracker class assigns numeric IDs to sources and injects [1], [2] style markers into the report text at write time.

Confidence Scoring

def score_confidence(supporting_sources: list, contradicting_sources: list,
                     authoritative_domains: list) -> dict:
    score = 0
    for s in supporting_sources:
        is_auth = any(d in s.get("url", "") for d in authoritative_domains)
        score += 2 if is_auth else 1
    score -= len(contradicting_sources)

    if score >= 3:
        return {"level": "high", "marker": "HIGH", "note": "Ready to cite"}
    if score >= 1:
        return {"level": "medium", "marker": "MEDIUM", "note": "Flag limited sources"}
    return {"level": "low", "marker": "LOW", "note": "Verify or mark as speculation"}

71.5 Time & Cost Analysis

Resource Usage by Research Depth

Research Type	Search Rounds	Pages Read	LLM Calls	Est. Time	Est. Cost*
Quick overview	2-3	3-5	10-15	3-5 min	$0.05-0.15
Standard research	5-8	8-15	20-30	10-20 min	$0.20-0.60
Deep synthesis	10-15	15-30	35-50	30-60 min	$0.60-2.00
Academic-grade	20+	30-50	60-100	1-3 hrs	$2.00-8.00

Self-hosted Hermes inference. Commercial API costs are 5-10x higher.

Budget Control

class CostOptimizer:
    def __init__(self, budget_usd: float = 1.0):
        self.budget = budget_usd
        self.spent = 0.0

    def should_continue(self, facts_count: int, min_facts: int = 10) -> bool:
        if self.spent >= self.budget:
            return False
        if facts_count >= min_facts and self.spent >= self.budget * 0.5:
            return False
        return True

    def should_deep_read(self, relevance_score: float) -> bool:
        return relevance_score >= 0.7 and self.spent < self.budget * 0.8

Chapter Summary

This chapter built a production-quality deep research agent covering:

Pipeline design: Full search → filter → read → extract → verify → synthesize → write chain
Quality mechanisms: Citation tracking, confidence scoring, cross-verification
Tool stack: Tavily search, Semantic Scholar, BeautifulSoup content extraction
Cost control: Depth-tiered research with built-in budget controller

The agent's core value is encoding the research analyst's methodology — systematic search, rigorous verification, complete citations — into agent behavior, not just "asking AI a question."

Discussion Questions

How should the agent handle contradicting facts from equally authoritative sources?
How can we prevent confirmation bias — only collecting evidence that supports a preset conclusion?
For non-English research topics, how should the agent balance source language diversity?
What automated quality metrics could evaluate a research report's reliability without human review?

Rate this chapter

4.8 / 5 (3 ratings)