RAG Knowledge Base Workflows
Chapter 16: RAG Knowledge Base Workflows
Retrieval-Augmented Generation (RAG) is the most mature approach for combining private enterprise documents with large language models. n8n natively supports the full RAG pipeline — vector storage, document loading, embeddings — with no code required. This chapter walks through the principles, node choices, and a complete enterprise knowledge base Q&A bot project.
16.1 RAG Principles: Five Core Steps
A RAG system consists of two pipelines. The indexing pipeline runs once (or periodically): load documents → chunk text → compute embeddings → store vectors. The query pipeline runs on every user question: embed the query → retrieve top-K similar chunks → feed chunks + question to LLM → return answer.
Why chunk documents? Embedding models have token limits (text-embedding-3-small supports up to 8,191 tokens), and shorter chunks produce more precise semantic vectors. Split documents into 500–1,000 token chunks with 10–20% overlap to avoid breaking sentence boundaries.
16.2 Vector Store Options
| Vector Store | Deployment | Best For | Cost |
|---|---|---|---|
| Qdrant | Self-hosted Docker / Qdrant Cloud | Enterprise, data sovereignty requirements | Free self-hosted |
| Pinecone | Fully managed cloud | Fast prototyping, no ops overhead | Free tier + usage |
| PGVector | PostgreSQL extension | Teams already running PostgreSQL | Shared with PG |
| Supabase Vector | Managed PGVector | Small teams, quick launch | Free tier available |
Recommendation: high data-security requirements → Qdrant self-hosted; quick prototyping → Pinecone; teams already on PostgreSQL → PGVector (zero additional ops).
16.3 Embeddings Configuration
n8n supports OpenAI Embeddings (text-embedding-3-small — recommended, 1536 dimensions, ~$0.02/million tokens), Azure OpenAI, Ollama (local, free — use nomic-embed-text), and Cohere (better multilingual, especially Chinese). The critical rule: use the exact same embedding model for both indexing and querying. Mixing models produces meaningless results.
16.4 Document Loaders
- Default Data Loader: processes binary data already in the workflow (e.g. HTTP-downloaded files)
- Binary Input Loader: reads text from files received via form upload or Webhook
- GitHub Document Loader: reads Markdown/code files directly from a GitHub repository
- Notion Document Loader: reads Notion page content preserving heading hierarchy
For PDF/Word files: HTTP Request (download) → Extract from File node (supports PDF/DOCX/TXT) → feed into the vector pipeline.
16.5 Complete RAG Pipeline
Part 1: Indexing Workflow
// Indexing workflow node config
{
"nodes": [
{
"name": "Character Text Splitter",
"type": "@n8n/n8n-nodes-langchain.textSplitterCharacterTextSplitter",
"parameters": {
"chunkSize": 800,
"chunkOverlap": 100
}
},
{
"name": "Qdrant Insert",
"type": "@n8n/n8n-nodes-langchain.vectorStoreQdrant",
"parameters": {
"mode": "insert",
"qdrantCollection": "company_kb"
}
}
]
}
Part 2: Query Workflow
// Qdrant retriever config
{
"name": "Qdrant Retrieve",
"type": "@n8n/n8n-nodes-langchain.vectorStoreQdrant",
"parameters": {
"mode": "retrieve",
"qdrantCollection": "company_kb",
"topK": 5,
"scoreThreshold": 0.7
}
}
// scoreThreshold: filter out results below this similarity score
// Range 0-1, start at 0.6 and tune upward
// topK: return the K most similar document chunks
Connect the Vector Store Retriever to a Retrieval QA Chain node, which automatically handles: retrieve → inject chunks into prompt → call LLM → return answer.
Prompt tip: In your RAG Chain system prompt, explicitly instruct the model to "answer only based on the provided context; if the context does not contain relevant information, say so." This dramatically reduces hallucinations in production knowledge base deployments.
16.5 Project: Enterprise Knowledge Base Bot
Scenario: Hundreds of product manuals and SOPs are stored in Google Drive. Employees query a Feishu bot that retrieves accurate answers in under 2 seconds, with source document attribution.
Indexing workflow (runs nightly at 2 AM):
- Schedule trigger (2 AM daily)
- Google Drive node: list all files in the designated folder (PDF/DOCX/TXT)
- Loop Over Items: iterate each file
- HTTP Request: download file content
- Extract from File: extract plain text
- Character Text Splitter: chunkSize=800, overlap=100
- Embeddings OpenAI: text-embedding-3-small
- Qdrant (insert mode): upsert to company_kb with filename and source URL in metadata
Query workflow (triggers on each Feishu @mention):
- Feishu Webhook: receive message, extract question text
- Embeddings OpenAI: vectorize the question
- Qdrant (retrieve mode): topK=5, scoreThreshold=0.65
- Code node: extract source filenames, format as numbered list
- Retrieval QA Chain: LLM generates the answer
- Feishu Send Message: reply with answer and source document list
// Code node: extract source document names from retrieval results
const docs = $input.all();
const sources = [...new Set(
docs.map(d => d.json.metadata?.source || 'Unknown source')
)];
return [{
json: {
sourceList: sources.map((s, i) => `${i+1}. ${s}`).join('\n'),
context: docs.map(d => d.json.pageContent).join('\n\n---\n\n')
}
}];