Google Vertex AI Integration: Multi-Region Endpoints, Data Residency and Feature Differences from Direct API
Chapter 60: LlamaIndex Integration: Document Intelligence and Enterprise Knowledge Bases
60.1 LlamaIndex's Role: Infrastructure for Document Intelligence
LlamaIndex (formerly GPT Index) is an LLM application framework focused on data connectivity and retrieval. If LangChain is the "Swiss Army knife of LLM applications," LlamaIndex is the "professional toolbox for document intelligence." Its core value lies in:
- Unified data connectors: Support for 100+ data sources (PDF, Word, Excel, Confluence, Notion, databases, etc.)
- High-performance indexing engine: Vector indexes, keyword indexes, and knowledge graph indexes, composable
- Query engine: Converts natural language queries into structured retrieval, then uses an LLM to generate answers
- Agent framework: ReAct Agents based on retrieval
Using Claude as LlamaIndex's LLM backend produces enterprise knowledge base systems with strong comprehension, fewer hallucinations, and the ability to handle extremely long contexts.
60.2 Environment Setup
pip install llama-index llama-index-llms-anthropic llama-index-embeddings-voyageai
pip install llama-index-readers-file
pip install llama-index-vector-stores-chroma # optional
60.2.1 Configuring Claude as LlamaIndex's LLM
from llama_index.llms.anthropic import Anthropic
from llama_index.core import Settings
llm = Anthropic(
model="claude-opus-4-5",
api_key="your-api-key", # or ANTHROPIC_API_KEY env var
max_tokens=4096,
temperature=0,
)
Settings.llm = llm
# Configure embedding model (Voyage AI, an Anthropic product, is recommended)
from llama_index.embeddings.voyageai import VoyageEmbedding
embed_model = VoyageEmbedding(
model_name="voyage-3",
voyage_api_key="your-voyage-key"
)
Settings.embed_model = embed_model
# Claude's 200K context allows larger chunks
Settings.chunk_size = 1024
Settings.chunk_overlap = 128
60.3 Building a Basic Document Index
60.3.1 Building an Index from Local Files
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core import StorageContext, load_index_from_storage
import os
documents = SimpleDirectoryReader(
input_dir="./company_docs",
recursive=True,
required_exts=[".pdf", ".docx", ".md", ".txt"],
filename_as_id=True
).load_data()
print(f"Loaded {len(documents)} document chunks")
index = VectorStoreIndex.from_documents(documents, show_progress=True)
index.storage_context.persist(persist_dir="./index_storage")
print("Index saved")
# Load on subsequent runs without rebuilding
if os.path.exists("./index_storage"):
storage_context = StorageContext.from_defaults(persist_dir="./index_storage")
index = load_index_from_storage(storage_context)
60.3.2 Loading from Multiple Data Sources
from llama_index.core import Document
from llama_index.readers.file import PDFReader
from llama_index.core.node_parser import SentenceSplitter
pdf_reader = PDFReader()
pdf_docs = pdf_reader.load_data("./reports/annual_report_2024.pdf")
# Custom Documents from API data
api_docs = [
Document(
text=article["content"],
metadata={
"source": "company_wiki",
"author": article["author"],
"created_at": article["created_at"],
"department": article["department"],
"doc_type": "policy"
}
)
for article in wiki_api.get_articles()
]
all_docs = pdf_docs + api_docs
# Custom splitter: split at sentence boundaries to preserve semantic integrity
splitter = SentenceSplitter(
chunk_size=1024,
chunk_overlap=128,
paragraph_separator="\n\n"
)
index = VectorStoreIndex.from_documents(
all_docs,
transformations=[splitter],
show_progress=True
)
60.4 Query Engines: Extracting Knowledge from Documents
60.4.1 Basic Query Engine
from llama_index.core import VectorStoreIndex
from llama_index.core.retrievers import VectorIndexRetriever
query_engine = index.as_query_engine(
similarity_top_k=5,
response_mode="tree_summarize", # Hierarchical summarization for long documents
verbose=True
)
response = query_engine.query("What are the business travel reimbursement limits?")
print(response.response)
# Inspect source documents
for node in response.source_nodes:
print(f"\nSource: {node.metadata.get('file_name', 'Unknown')}")
print(f"Relevance: {node.score:.3f}")
print(f"Excerpt: {node.text[:200]}...")
60.4.2 Advanced Query Modes
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.tools import QueryEngineTool
# Separate indexes for different document types
policy_index = VectorStoreIndex.from_documents(policy_docs)
technical_index = VectorStoreIndex.from_documents(technical_docs)
hr_index = VectorStoreIndex.from_documents(hr_docs)
tools = [
QueryEngineTool.from_defaults(
query_engine=policy_index.as_query_engine(),
name="policy_search",
description="Search company policies and regulations including reimbursements, attendance, and procurement"
),
QueryEngineTool.from_defaults(
query_engine=technical_index.as_query_engine(),
name="technical_docs",
description="Search technical documentation, API docs, and architecture design documents"
),
QueryEngineTool.from_defaults(
query_engine=hr_index.as_query_engine(),
name="hr_knowledge",
description="Search HR documents including benefits, leave policies, and promotion processes"
)
]
# SubQuestion engine: automatically decomposes complex questions
sub_question_engine = SubQuestionQueryEngine.from_defaults(
query_engine_tools=tools,
verbose=True
)
response = sub_question_engine.query(
"What are the travel policies? How does accommodation reimbursement work, and how is overtime calculated during business trips?"
)
print(response.response)
60.4.3 Streaming Responses
query_engine = index.as_query_engine(streaming=True)
streaming_response = query_engine.query("Summarize the company's main product milestones in 2024")
for text in streaming_response.response_gen:
print(text, end="", flush=True)
print()
for node in streaming_response.source_nodes:
print(f"Source: {node.metadata.get('source')}")
60.5 Building an Enterprise Knowledge Base Agent
60.5.1 ReAct-based Knowledge Base Agent
from llama_index.core.agent import ReActAgent
from llama_index.core.tools import QueryEngineTool, FunctionTool
from llama_index.llms.anthropic import Anthropic
llm = Anthropic(model="claude-opus-4-5", max_tokens=4096)
kb_tool = QueryEngineTool.from_defaults(
query_engine=query_engine,
name="knowledge_base",
description="""Search the company's internal knowledge base. Suitable for:
- Company policy queries (reimbursement, attendance, leave)
- Product documentation and technical specifications
- Historical project information
- Organizational structure and contacts
Not suitable for: questions requiring real-time data"""
)
def get_employee_info(employee_id: str) -> str:
"""Retrieve real-time employee information from the HR system."""
return f"Employee {employee_id}: Name=Alice Johnson, Dept=Engineering, Manager=Bob Smith"
employee_tool = FunctionTool.from_defaults(
fn=get_employee_info,
name="get_employee_info",
description="Look up employee information by employee ID, returns name, department, and manager"
)
def get_current_projects() -> str:
"""Get list of currently active projects."""
return "Active projects:\n1. Intelligent Support System (Q2 2025)\n2. Data Platform Build-out (Q3 2025)"
projects_tool = FunctionTool.from_defaults(
fn=get_current_projects,
name="get_current_projects",
description="Get the list of currently active company projects"
)
agent = ReActAgent.from_tools(
tools=[kb_tool, employee_tool, projects_tool],
llm=llm,
verbose=True,
max_iterations=8,
context="""You are Aria, the company's intelligent knowledge assistant.
Your role is to help employees find information including company policies, technical docs, and personnel information.
Always cite your sources accurately. Clearly state when you are uncertain."""
)
response = agent.chat("What is the reimbursement limit for new employee E12345? Which project are they on?")
print(response.response)
60.5.2 Multi-modal Knowledge Base (Images and Diagrams)
from llama_index.core.multi_modal_llms.anthropic import AnthropicMultiModal
from llama_index.core import SimpleDirectoryReader, Document
mm_llm = AnthropicMultiModal(model="claude-opus-4-5", max_new_tokens=1024)
image_docs = SimpleDirectoryReader(
input_dir="./diagrams",
required_exts=[".png", ".jpg", ".jpeg"]
).load_data()
# Use Claude's vision to generate text descriptions of diagrams for indexing
text_descriptions = []
for img_doc in image_docs:
description = mm_llm.complete(
prompt="Please describe this image in detail, especially any process steps, component relationships, or data shown.",
image_documents=[img_doc]
)
text_descriptions.append(Document(
text=description.text,
metadata={
"source_image": img_doc.metadata.get("file_name"),
"doc_type": "image_description"
}
))
all_docs = text_docs + text_descriptions
index = VectorStoreIndex.from_documents(all_docs)
60.6 Incremental Index Updates
Enterprise knowledge bases change continuously; incremental updates are essential to avoid full rebuilds.
from llama_index.core.ingestion import IngestionPipeline
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.storage.docstore import SimpleDocumentStore
from llama_index.embeddings.voyageai import VoyageEmbedding
embed_model = VoyageEmbedding(model_name="voyage-3")
pipeline = IngestionPipeline(
transformations=[
SentenceSplitter(chunk_size=1024, chunk_overlap=128),
embed_model
],
docstore=SimpleDocumentStore(), # Tracks processed documents
)
# Full initialization
nodes = pipeline.run(documents=initial_docs, show_progress=True)
index = VectorStoreIndex(nodes)
index.storage_context.persist(persist_dir="./index_storage")
pipeline.persist("./pipeline_storage")
def update_knowledge_base(new_or_modified_docs: list):
"""Incrementally update the knowledge base, skipping unchanged documents"""
pipeline = IngestionPipeline.from_persist_dir("./pipeline_storage")
# run() automatically detects duplicates via doc_id + hash
new_nodes = pipeline.run(documents=new_or_modified_docs)
if new_nodes:
storage_context = StorageContext.from_defaults(persist_dir="./index_storage")
index = load_index_from_storage(storage_context)
for node in new_nodes:
index.insert_nodes([node])
index.storage_context.persist(persist_dir="./index_storage")
pipeline.persist("./pipeline_storage")
print(f"Updated {len(new_nodes)} new nodes")
else:
print("No document changes detected")
60.7 Retrieval Quality Evaluation
from llama_index.core.evaluation import (
FaithfulnessEvaluator,
RelevancyEvaluator,
)
from llama_index.llms.anthropic import Anthropic
eval_llm = Anthropic(model="claude-opus-4-5")
# Faithfulness: is the answer grounded in the retrieved context (anti-hallucination)?
faithfulness_evaluator = FaithfulnessEvaluator(llm=eval_llm)
# Relevancy: are the retrieved documents relevant to the question?
relevancy_evaluator = RelevancyEvaluator(llm=eval_llm)
test_questions = [
"How many annual leave days do employees get?",
"How do I apply for travel reimbursement?",
"What benefits do probationary employees receive?"
]
for question in test_questions:
response = query_engine.query(question)
faithfulness_result = faithfulness_evaluator.evaluate_response(response=response)
relevancy_result = relevancy_evaluator.evaluate_response(query=question, response=response)
print(f"\nQuestion: {question}")
print(f"Faithfulness: {faithfulness_result.score:.2f} ({faithfulness_result.feedback})")
print(f"Relevancy: {relevancy_result.score:.2f}")
60.8 Production Deployment Recommendations
60.8.1 Using Chroma or Qdrant as Production Vector Database
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext, VectorStoreIndex
import chromadb
chroma_client = chromadb.HttpClient(host="localhost", port=8000)
chroma_collection = chroma_client.get_or_create_collection("company_knowledge")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
documents,
storage_context=storage_context,
show_progress=True
)
60.8.2 Caching Strategy
from llama_index.core.storage.index_store import RedisIndexStore
from llama_index.core.storage.docstore import RedisDocumentStore
# Redis caching to avoid reloading large indexes
storage_context = StorageContext.from_defaults(
index_store=RedisIndexStore.from_host_and_port(host="localhost", port=6379),
docstore=RedisDocumentStore.from_host_and_port(host="localhost", port=6379)
)
Summary
The combination of LlamaIndex and Claude has clear advantages in enterprise knowledge base scenarios: Claude's 200K token context window handles extremely long documents in one pass, while LlamaIndex provides the complete toolchain for index building, incremental updates, and multi-source data fusion. The core architecture is: multi-source data loading โ SentenceSplitter chunking โ Voyage Embedding โ vector store โ query engine โ Claude answer generation. In production, the incremental update Pipeline and a vector database (Chroma/Qdrant) are key to keeping the knowledge base current, while FaithfulnessEvaluator is an important tool for controlling hallucination risk.