Amazon Bedrock (Mantle) Integration: Complete Guide for SigV4 Authentication, Regional Endpoints and Quota Requests
Chapter 59: LangChain Integration: Best Practices for Hybrid Agent Architecture
59.1 Why Choose LangChain + Claude
LangChain is one of the most popular LLM application frameworks, providing a complete infrastructure for building LLM applications: chained calls, Agent orchestration, memory management, vector stores, and more. Claude excels at complex reasoning, instruction following, and safety. Combining them leverages the strengths of both:
- LangChain provides framework capabilities: tool ecosystem, chain orchestration, multi-modal retrieval, memory management
- Claude provides reasoning capabilities: longer context window (200K tokens), stronger instruction following, fewer hallucinations
The combination is especially powerful when building Agents that access multiple external tools, maintain conversation history, and integrate knowledge base retrieval—significantly reducing the infrastructure work required.
59.2 Environment Setup and ChatAnthropic Initialization
59.2.1 Installing Dependencies
pip install langchain langchain-anthropic langchain-community
pip install langchain-chroma # for vector store (optional)
59.2.2 Basic ChatAnthropic Configuration
from langchain_anthropic import ChatAnthropic
from langchain_core.messages import HumanMessage, SystemMessage
# Basic initialization
llm = ChatAnthropic(
model="claude-opus-4-5",
anthropic_api_key="your-api-key", # or set ANTHROPIC_API_KEY env var
max_tokens=4096,
temperature=0, # set to 0 for deterministic output
timeout=60,
max_retries=3,
)
# Simple invocation
response = llm.invoke([HumanMessage(content="Explain quantum entanglement")])
print(response.content)
# Invocation with system prompt
messages = [
SystemMessage(content="You are a professional financial analyst. Answer concisely."),
HumanMessage(content="What is a price-to-earnings ratio?")
]
response = llm.invoke(messages)
print(response.content)
59.2.3 Streaming Support
llm = ChatAnthropic(model="claude-opus-4-5", streaming=True)
for chunk in llm.stream([HumanMessage(content="Write a poem about autumn")]):
print(chunk.content, end="", flush=True)
59.3 Building LCEL Chains
LCEL (LangChain Expression Language) is LangChain's core chain orchestration syntax, using the | pipe operator to connect components.
59.3.1 Basic Prompt + LLM Chain
from langchain_anthropic import ChatAnthropic
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
llm = ChatAnthropic(model="claude-opus-4-5")
prompt = ChatPromptTemplate.from_messages([
("system", "You are a professional {domain} consultant. Answer in {language}."),
("human", "{question}")
])
parser = StrOutputParser()
# Connect with | to form a chain
chain = prompt | llm | parser
result = chain.invoke({
"domain": "legal",
"language": "English",
"question": "What is the maximum probationary period in an employment contract?"
})
print(result)
59.3.2 RAG Chain
from langchain_anthropic import ChatAnthropic
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_chroma import Chroma
from langchain_anthropic import AnthropicEmbeddings
embeddings = AnthropicEmbeddings(model="voyage-3")
vectorstore = Chroma(
collection_name="company_docs",
embedding_function=embeddings,
persist_directory="./chroma_db"
)
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
RAG_PROMPT = ChatPromptTemplate.from_messages([
("system", """You are the company's internal knowledge assistant.
Answer questions based on the retrieved documents below.
If the documents don't contain relevant information, say so clearly — do not fabricate answers.
Retrieved documents:
{context}"""),
("human", "{question}")
])
llm = ChatAnthropic(model="claude-opus-4-5", temperature=0)
def format_docs(docs):
return "\n\n---\n\n".join(
f"Source: {doc.metadata.get('source', 'Unknown')}\n{doc.page_content}"
for doc in docs
)
rag_chain = (
{
"context": retriever | format_docs,
"question": RunnablePassthrough()
}
| RAG_PROMPT
| llm
| StrOutputParser()
)
answer = rag_chain.invoke("What is the company's annual leave policy?")
print(answer)
59.3.3 Sequential Chain
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_anthropic import ChatAnthropic
llm = ChatAnthropic(model="claude-opus-4-5")
parser = StrOutputParser()
# Step 1: Analyze user feedback
analyze_prompt = ChatPromptTemplate.from_template(
"Analyze the following user feedback and extract core issues (output as JSON):\n\n{feedback}"
)
# Step 2: Generate improvement suggestions
suggest_prompt = ChatPromptTemplate.from_template(
"Based on the following analysis, provide concrete product improvement suggestions:\n\n{analysis}"
)
# Step 3: Generate an action plan
plan_prompt = ChatPromptTemplate.from_template(
"Convert the following improvement suggestions into actionable quarterly OKRs:\n\n{suggestions}"
)
analyze_chain = analyze_prompt | llm | parser
suggest_chain = suggest_prompt | llm | parser
plan_chain = plan_prompt | llm | parser
full_pipeline = (
analyze_chain
| (lambda analysis: {"analysis": analysis})
| suggest_chain
| (lambda suggestions: {"suggestions": suggestions})
| plan_chain
)
result = full_pipeline.invoke({
"feedback": "The product loads too slowly and the UI is unintuitive. New users have no idea how to use it."
})
print(result)
59.4 Building LangChain Agents
LangChain Agents let Claude dynamically decide which tools to use, making them ideal for complex multi-step tasks.
59.4.1 Agent with Built-in Tools
from langchain_anthropic import ChatAnthropic
from langchain_community.tools import DuckDuckGoSearchRun, WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
search_tool = DuckDuckGoSearchRun(name="web_search")
wiki_tool = WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper())
tools = [search_tool, wiki_tool]
# agent_scratchpad placeholder is required
agent_prompt = ChatPromptTemplate.from_messages([
("system", """You are a helpful research assistant. Use the search tool for
current information and Wikipedia for background knowledge.
Synthesize multiple sources for accurate answers."""),
MessagesPlaceholder(variable_name="chat_history", optional=True),
("human", "{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad")
])
llm = ChatAnthropic(model="claude-opus-4-5", temperature=0)
agent = create_tool_calling_agent(llm, tools, agent_prompt)
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
verbose=True,
max_iterations=5,
handle_parsing_errors=True
)
result = agent_executor.invoke({
"input": "Who won the 2024 Nobel Prize in Physics? What did they research?"
})
print(result["output"])
59.4.2 Agent with Custom Tools
from langchain_core.tools import tool
from typing import Annotated
@tool
def query_customer_database(customer_id: Annotated[str, "Customer ID"]) -> str:
"""Query a customer's basic information and order history. Read-only."""
mock_data = {
"C001": {
"name": "Alice Johnson",
"email": "[email protected]",
"orders": ["ORD-001", "ORD-002"],
"total_spent": 2999.0
}
}
customer = mock_data.get(customer_id)
if not customer:
return f"Customer not found: {customer_id}"
return str(customer)
@tool
def calculate_discount(
total_amount: Annotated[float, "Order total amount"],
customer_level: Annotated[str, "Customer tier: silver/gold/platinum"]
) -> str:
"""Calculate discount amount based on customer tier."""
discount_rates = {"silver": 0.05, "gold": 0.10, "platinum": 0.15}
rate = discount_rates.get(customer_level, 0)
discount = total_amount * rate
return f"Discount rate: {rate*100}%, discount: {discount:.2f}, final: {total_amount - discount:.2f}"
@tool
def send_notification(
customer_email: Annotated[str, "Customer email"],
message: Annotated[str, "Notification message"]
) -> str:
"""Send a notification email to the customer (mock in test environment)."""
print(f"[MOCK EMAIL] To: {customer_email}\nContent: {message}")
return f"Notification sent to {customer_email}"
custom_tools = [query_customer_database, calculate_discount, send_notification]
llm = ChatAnthropic(model="claude-opus-4-5")
agent = create_tool_calling_agent(llm, custom_tools, agent_prompt)
agent_executor = AgentExecutor(agent=agent, tools=custom_tools, verbose=True)
result = agent_executor.invoke({
"input": "Look up customer C001, calculate their gold member discount on a $5000 purchase, then email them the result"
})
59.5 Adding Memory
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.prompts import MessagesPlaceholder
llm = ChatAnthropic(model="claude-opus-4-5")
store = {} # In production, use Redis or a database
def get_session_history(session_id: str) -> InMemoryChatMessageHistory:
if session_id not in store:
store[session_id] = InMemoryChatMessageHistory()
return store[session_id]
prompt_with_history = ChatPromptTemplate.from_messages([
("system", "You are an AI assistant with memory. Remember information the user tells you."),
MessagesPlaceholder(variable_name="history"),
("human", "{input}")
])
chain_with_history = RunnableWithMessageHistory(
prompt_with_history | llm | StrOutputParser(),
get_session_history,
input_messages_key="input",
history_messages_key="history"
)
session_id = "user_001_session_1"
chain_with_history.invoke(
{"input": "My name is Alice and I'm a software engineer."},
config={"configurable": {"session_id": session_id}}
)
response = chain_with_history.invoke(
{"input": "Do you remember my name and profession?"},
config={"configurable": {"session_id": session_id}}
)
print(response) # Should recall Alice is a software engineer
59.6 Hybrid Agent Architecture: Claude + Specialized Models
In complex enterprise scenarios, a single model is rarely optimal. Hybrid Agent architecture routes tasks to the most appropriate model.
from langchain_anthropic import ChatAnthropic
from langchain_openai import ChatOpenAI
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
# Claude for complex reasoning and content generation
claude = ChatAnthropic(model="claude-opus-4-5", temperature=0)
# GPT-4o-mini for simple classification and extraction (lower cost)
gpt_mini = ChatOpenAI(model="gpt-4o-mini", temperature=0)
# Classification using cheaper model
classify_prompt = ChatPromptTemplate.from_template(
"Classify this request as: simple_query/complex_analysis/creative_writing\n\nRequest: {input}\n\nOutput only the classification:"
)
classify_chain = classify_prompt | gpt_mini | StrOutputParser()
# Claude for complex tasks
complex_prompt = ChatPromptTemplate.from_messages([
("system", "You are a deep analysis expert. Provide detailed, insightful analysis."),
("human", "{input}")
])
complex_chain = complex_prompt | claude | StrOutputParser()
# Cheaper model for simple queries
simple_prompt = ChatPromptTemplate.from_template("Answer concisely: {input}")
simple_chain = simple_prompt | gpt_mini | StrOutputParser()
def route(info):
classification = info["classification"]
if "complex" in classification or "creative" in classification:
return complex_chain
return simple_chain
hybrid_chain = (
{"input": RunnablePassthrough(), "classification": classify_chain}
| RunnableLambda(route)
)
59.7 Error Handling and Production Configuration
59.7.1 Automatic Fallback
primary_llm = ChatAnthropic(model="claude-opus-4-5") # High quality, higher cost
fallback_llm = ChatAnthropic(model="claude-haiku-4-5") # Lower cost, faster
llm_with_fallback = primary_llm.with_fallbacks([fallback_llm])
# Automatically switches to Haiku if Opus fails
59.7.2 Callbacks and Observability
from langchain_core.callbacks import BaseCallbackHandler
import time
class PerformanceCallback(BaseCallbackHandler):
def __init__(self):
self.start_times = {}
self.metrics = []
def on_llm_start(self, serialized, prompts, **kwargs):
run_id = str(kwargs.get("run_id", "unknown"))
self.start_times[run_id] = time.time()
def on_llm_end(self, response, **kwargs):
run_id = str(kwargs.get("run_id", "unknown"))
if run_id in self.start_times:
elapsed = time.time() - self.start_times[run_id]
token_usage = response.llm_output.get("usage", {}) if response.llm_output else {}
self.metrics.append({
"elapsed_s": round(elapsed, 2),
"input_tokens": token_usage.get("input_tokens", 0),
"output_tokens": token_usage.get("output_tokens", 0)
})
def on_tool_start(self, serialized, input_str, **kwargs):
print(f"[Tool Call] {serialized.get('name')}: {input_str[:100]}")
callback = PerformanceCallback()
llm = ChatAnthropic(model="claude-opus-4-5", callbacks=[callback])
llm.invoke("Explain overfitting in machine learning")
print(f"Call metrics: {callback.metrics}")
59.8 Best Practices Summary
Prompt design recommendations:
BEST_SYSTEM_PROMPT = """You are a {role}.
## Capabilities
- {capability_1}
- {capability_2}
## Behavioral guidelines
- Always answer based on provided context; do not fabricate information
- If uncertain, say so explicitly
- Use Markdown formatting for structured output
## Constraints
- Do not {constraint_1}
- Do not reveal {constraint_2}"""
Performance optimization:
- Use
temperature=0for deterministic output on structured tasks - Set
max_tokensfor long documents to avoid unnecessary lengthy output - Set
max_iterationsin Agents to prevent infinite loops - Use
streaming=Trueto improve perceived response speed - Enable LangChain's SQLite cache for high-frequency calls
Summary
LangChain integrates with Claude through the ChatAnthropic class, supporting LCEL chain orchestration, Agent tool calling, and RAG retrieval augmentation. Hybrid Agent architecture routes tasks to different models based on complexity, balancing quality and cost. Production deployments should focus on: callback-based observability, automatic fallback strategies, persistent conversation history storage, and Agent iteration limits.