Chapter 59

Amazon Bedrock (Mantle) Integration: Complete Guide for SigV4 Authentication, Regional Endpoints and Quota Requests

Chapter 59: LangChain Integration: Best Practices for Hybrid Agent Architecture

59.1 Why Choose LangChain + Claude

LangChain is one of the most popular LLM application frameworks, providing a complete infrastructure for building LLM applications: chained calls, Agent orchestration, memory management, vector stores, and more. Claude excels at complex reasoning, instruction following, and safety. Combining them leverages the strengths of both:

The combination is especially powerful when building Agents that access multiple external tools, maintain conversation history, and integrate knowledge base retrieval—significantly reducing the infrastructure work required.

59.2 Environment Setup and ChatAnthropic Initialization

59.2.1 Installing Dependencies

pip install langchain langchain-anthropic langchain-community
pip install langchain-chroma  # for vector store (optional)

59.2.2 Basic ChatAnthropic Configuration

from langchain_anthropic import ChatAnthropic
from langchain_core.messages import HumanMessage, SystemMessage

# Basic initialization
llm = ChatAnthropic(
    model="claude-opus-4-5",
    anthropic_api_key="your-api-key",  # or set ANTHROPIC_API_KEY env var
    max_tokens=4096,
    temperature=0,      # set to 0 for deterministic output
    timeout=60,
    max_retries=3,
)

# Simple invocation
response = llm.invoke([HumanMessage(content="Explain quantum entanglement")])
print(response.content)

# Invocation with system prompt
messages = [
    SystemMessage(content="You are a professional financial analyst. Answer concisely."),
    HumanMessage(content="What is a price-to-earnings ratio?")
]
response = llm.invoke(messages)
print(response.content)

59.2.3 Streaming Support

llm = ChatAnthropic(model="claude-opus-4-5", streaming=True)

for chunk in llm.stream([HumanMessage(content="Write a poem about autumn")]):
    print(chunk.content, end="", flush=True)

59.3 Building LCEL Chains

LCEL (LangChain Expression Language) is LangChain's core chain orchestration syntax, using the | pipe operator to connect components.

59.3.1 Basic Prompt + LLM Chain

from langchain_anthropic import ChatAnthropic
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

llm = ChatAnthropic(model="claude-opus-4-5")

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a professional {domain} consultant. Answer in {language}."),
    ("human", "{question}")
])

parser = StrOutputParser()

# Connect with | to form a chain
chain = prompt | llm | parser

result = chain.invoke({
    "domain": "legal",
    "language": "English",
    "question": "What is the maximum probationary period in an employment contract?"
})
print(result)

59.3.2 RAG Chain

from langchain_anthropic import ChatAnthropic
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_chroma import Chroma
from langchain_anthropic import AnthropicEmbeddings

embeddings = AnthropicEmbeddings(model="voyage-3")
vectorstore = Chroma(
    collection_name="company_docs",
    embedding_function=embeddings,
    persist_directory="./chroma_db"
)
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})

RAG_PROMPT = ChatPromptTemplate.from_messages([
    ("system", """You are the company's internal knowledge assistant.
Answer questions based on the retrieved documents below.
If the documents don't contain relevant information, say so clearly — do not fabricate answers.

Retrieved documents:
{context}"""),
    ("human", "{question}")
])

llm = ChatAnthropic(model="claude-opus-4-5", temperature=0)

def format_docs(docs):
    return "\n\n---\n\n".join(
        f"Source: {doc.metadata.get('source', 'Unknown')}\n{doc.page_content}"
        for doc in docs
    )

rag_chain = (
    {
        "context": retriever | format_docs,
        "question": RunnablePassthrough()
    }
    | RAG_PROMPT
    | llm
    | StrOutputParser()
)

answer = rag_chain.invoke("What is the company's annual leave policy?")
print(answer)

59.3.3 Sequential Chain

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_anthropic import ChatAnthropic

llm = ChatAnthropic(model="claude-opus-4-5")
parser = StrOutputParser()

# Step 1: Analyze user feedback
analyze_prompt = ChatPromptTemplate.from_template(
    "Analyze the following user feedback and extract core issues (output as JSON):\n\n{feedback}"
)

# Step 2: Generate improvement suggestions
suggest_prompt = ChatPromptTemplate.from_template(
    "Based on the following analysis, provide concrete product improvement suggestions:\n\n{analysis}"
)

# Step 3: Generate an action plan
plan_prompt = ChatPromptTemplate.from_template(
    "Convert the following improvement suggestions into actionable quarterly OKRs:\n\n{suggestions}"
)

analyze_chain = analyze_prompt | llm | parser
suggest_chain = suggest_prompt | llm | parser
plan_chain = plan_prompt | llm | parser

full_pipeline = (
    analyze_chain
    | (lambda analysis: {"analysis": analysis})
    | suggest_chain
    | (lambda suggestions: {"suggestions": suggestions})
    | plan_chain
)

result = full_pipeline.invoke({
    "feedback": "The product loads too slowly and the UI is unintuitive. New users have no idea how to use it."
})
print(result)

59.4 Building LangChain Agents

LangChain Agents let Claude dynamically decide which tools to use, making them ideal for complex multi-step tasks.

59.4.1 Agent with Built-in Tools

from langchain_anthropic import ChatAnthropic
from langchain_community.tools import DuckDuckGoSearchRun, WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

search_tool = DuckDuckGoSearchRun(name="web_search")
wiki_tool = WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper())
tools = [search_tool, wiki_tool]

# agent_scratchpad placeholder is required
agent_prompt = ChatPromptTemplate.from_messages([
    ("system", """You are a helpful research assistant. Use the search tool for 
current information and Wikipedia for background knowledge. 
Synthesize multiple sources for accurate answers."""),
    MessagesPlaceholder(variable_name="chat_history", optional=True),
    ("human", "{input}"),
    MessagesPlaceholder(variable_name="agent_scratchpad")
])

llm = ChatAnthropic(model="claude-opus-4-5", temperature=0)

agent = create_tool_calling_agent(llm, tools, agent_prompt)
agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,
    max_iterations=5,
    handle_parsing_errors=True
)

result = agent_executor.invoke({
    "input": "Who won the 2024 Nobel Prize in Physics? What did they research?"
})
print(result["output"])

59.4.2 Agent with Custom Tools

from langchain_core.tools import tool
from typing import Annotated

@tool
def query_customer_database(customer_id: Annotated[str, "Customer ID"]) -> str:
    """Query a customer's basic information and order history. Read-only."""
    mock_data = {
        "C001": {
            "name": "Alice Johnson",
            "email": "[email protected]",
            "orders": ["ORD-001", "ORD-002"],
            "total_spent": 2999.0
        }
    }
    customer = mock_data.get(customer_id)
    if not customer:
        return f"Customer not found: {customer_id}"
    return str(customer)

@tool
def calculate_discount(
    total_amount: Annotated[float, "Order total amount"],
    customer_level: Annotated[str, "Customer tier: silver/gold/platinum"]
) -> str:
    """Calculate discount amount based on customer tier."""
    discount_rates = {"silver": 0.05, "gold": 0.10, "platinum": 0.15}
    rate = discount_rates.get(customer_level, 0)
    discount = total_amount * rate
    return f"Discount rate: {rate*100}%, discount: {discount:.2f}, final: {total_amount - discount:.2f}"

@tool
def send_notification(
    customer_email: Annotated[str, "Customer email"],
    message: Annotated[str, "Notification message"]
) -> str:
    """Send a notification email to the customer (mock in test environment)."""
    print(f"[MOCK EMAIL] To: {customer_email}\nContent: {message}")
    return f"Notification sent to {customer_email}"

custom_tools = [query_customer_database, calculate_discount, send_notification]

llm = ChatAnthropic(model="claude-opus-4-5")
agent = create_tool_calling_agent(llm, custom_tools, agent_prompt)
agent_executor = AgentExecutor(agent=agent, tools=custom_tools, verbose=True)

result = agent_executor.invoke({
    "input": "Look up customer C001, calculate their gold member discount on a $5000 purchase, then email them the result"
})

59.5 Adding Memory

from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.prompts import MessagesPlaceholder

llm = ChatAnthropic(model="claude-opus-4-5")

store = {}  # In production, use Redis or a database

def get_session_history(session_id: str) -> InMemoryChatMessageHistory:
    if session_id not in store:
        store[session_id] = InMemoryChatMessageHistory()
    return store[session_id]

prompt_with_history = ChatPromptTemplate.from_messages([
    ("system", "You are an AI assistant with memory. Remember information the user tells you."),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}")
])

chain_with_history = RunnableWithMessageHistory(
    prompt_with_history | llm | StrOutputParser(),
    get_session_history,
    input_messages_key="input",
    history_messages_key="history"
)

session_id = "user_001_session_1"

chain_with_history.invoke(
    {"input": "My name is Alice and I'm a software engineer."},
    config={"configurable": {"session_id": session_id}}
)

response = chain_with_history.invoke(
    {"input": "Do you remember my name and profession?"},
    config={"configurable": {"session_id": session_id}}
)
print(response)  # Should recall Alice is a software engineer

59.6 Hybrid Agent Architecture: Claude + Specialized Models

In complex enterprise scenarios, a single model is rarely optimal. Hybrid Agent architecture routes tasks to the most appropriate model.

from langchain_anthropic import ChatAnthropic
from langchain_openai import ChatOpenAI
from langchain_core.runnables import RunnableLambda, RunnablePassthrough

# Claude for complex reasoning and content generation
claude = ChatAnthropic(model="claude-opus-4-5", temperature=0)

# GPT-4o-mini for simple classification and extraction (lower cost)
gpt_mini = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Classification using cheaper model
classify_prompt = ChatPromptTemplate.from_template(
    "Classify this request as: simple_query/complex_analysis/creative_writing\n\nRequest: {input}\n\nOutput only the classification:"
)
classify_chain = classify_prompt | gpt_mini | StrOutputParser()

# Claude for complex tasks
complex_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a deep analysis expert. Provide detailed, insightful analysis."),
    ("human", "{input}")
])
complex_chain = complex_prompt | claude | StrOutputParser()

# Cheaper model for simple queries
simple_prompt = ChatPromptTemplate.from_template("Answer concisely: {input}")
simple_chain = simple_prompt | gpt_mini | StrOutputParser()

def route(info):
    classification = info["classification"]
    if "complex" in classification or "creative" in classification:
        return complex_chain
    return simple_chain

hybrid_chain = (
    {"input": RunnablePassthrough(), "classification": classify_chain}
    | RunnableLambda(route)
)

59.7 Error Handling and Production Configuration

59.7.1 Automatic Fallback

primary_llm = ChatAnthropic(model="claude-opus-4-5")     # High quality, higher cost
fallback_llm = ChatAnthropic(model="claude-haiku-4-5")   # Lower cost, faster

llm_with_fallback = primary_llm.with_fallbacks([fallback_llm])
# Automatically switches to Haiku if Opus fails

59.7.2 Callbacks and Observability

from langchain_core.callbacks import BaseCallbackHandler
import time

class PerformanceCallback(BaseCallbackHandler):
    def __init__(self):
        self.start_times = {}
        self.metrics = []

    def on_llm_start(self, serialized, prompts, **kwargs):
        run_id = str(kwargs.get("run_id", "unknown"))
        self.start_times[run_id] = time.time()

    def on_llm_end(self, response, **kwargs):
        run_id = str(kwargs.get("run_id", "unknown"))
        if run_id in self.start_times:
            elapsed = time.time() - self.start_times[run_id]
            token_usage = response.llm_output.get("usage", {}) if response.llm_output else {}
            self.metrics.append({
                "elapsed_s": round(elapsed, 2),
                "input_tokens": token_usage.get("input_tokens", 0),
                "output_tokens": token_usage.get("output_tokens", 0)
            })

    def on_tool_start(self, serialized, input_str, **kwargs):
        print(f"[Tool Call] {serialized.get('name')}: {input_str[:100]}")

callback = PerformanceCallback()
llm = ChatAnthropic(model="claude-opus-4-5", callbacks=[callback])
llm.invoke("Explain overfitting in machine learning")
print(f"Call metrics: {callback.metrics}")

59.8 Best Practices Summary

Prompt design recommendations:

BEST_SYSTEM_PROMPT = """You are a {role}.

## Capabilities
- {capability_1}
- {capability_2}

## Behavioral guidelines
- Always answer based on provided context; do not fabricate information
- If uncertain, say so explicitly
- Use Markdown formatting for structured output

## Constraints
- Do not {constraint_1}
- Do not reveal {constraint_2}"""

Performance optimization:

  1. Use temperature=0 for deterministic output on structured tasks
  2. Set max_tokens for long documents to avoid unnecessary lengthy output
  3. Set max_iterations in Agents to prevent infinite loops
  4. Use streaming=True to improve perceived response speed
  5. Enable LangChain's SQLite cache for high-frequency calls

Summary

LangChain integrates with Claude through the ChatAnthropic class, supporting LCEL chain orchestration, Agent tool calling, and RAG retrieval augmentation. Hybrid Agent architecture routes tasks to different models based on complexity, balancing quality and cost. Production deployments should focus on: callback-based observability, automatic fallback strategies, persistent conversation history storage, and Agent iteration limits.

Rate this chapter
4.8  / 5  (3 ratings)

💬 Comments