Chapter 16

RAG Knowledge Base Workflows

Chapter 16: RAG Knowledge Base Workflows

Retrieval-Augmented Generation (RAG) is the most mature approach for combining private enterprise documents with large language models. n8n natively supports the full RAG pipeline — vector storage, document loading, embeddings — with no code required. This chapter walks through the principles, node choices, and a complete enterprise knowledge base Q&A bot project.

16.1 RAG Principles: Five Core Steps

A RAG system consists of two pipelines. The indexing pipeline runs once (or periodically): load documents → chunk text → compute embeddings → store vectors. The query pipeline runs on every user question: embed the query → retrieve top-K similar chunks → feed chunks + question to LLM → return answer.

Why chunk documents? Embedding models have token limits (text-embedding-3-small supports up to 8,191 tokens), and shorter chunks produce more precise semantic vectors. Split documents into 500–1,000 token chunks with 10–20% overlap to avoid breaking sentence boundaries.

16.2 Vector Store Options

Vector Store	Deployment	Best For	Cost
Qdrant	Self-hosted Docker / Qdrant Cloud	Enterprise, data sovereignty requirements	Free self-hosted
Pinecone	Fully managed cloud	Fast prototyping, no ops overhead	Free tier + usage
PGVector	PostgreSQL extension	Teams already running PostgreSQL	Shared with PG
Supabase Vector	Managed PGVector	Small teams, quick launch	Free tier available

Recommendation: high data-security requirements → Qdrant self-hosted; quick prototyping → Pinecone; teams already on PostgreSQL → PGVector (zero additional ops).

16.3 Embeddings Configuration

n8n supports OpenAI Embeddings (text-embedding-3-small — recommended, 1536 dimensions, ~$0.02/million tokens), Azure OpenAI, Ollama (local, free — use nomic-embed-text), and Cohere (better multilingual, especially Chinese). The critical rule: use the exact same embedding model for both indexing and querying. Mixing models produces meaningless results.

16.4 Document Loaders

Default Data Loader: processes binary data already in the workflow (e.g. HTTP-downloaded files)
Binary Input Loader: reads text from files received via form upload or Webhook
GitHub Document Loader: reads Markdown/code files directly from a GitHub repository
Notion Document Loader: reads Notion page content preserving heading hierarchy

For PDF/Word files: HTTP Request (download) → Extract from File node (supports PDF/DOCX/TXT) → feed into the vector pipeline.

16.5 Complete RAG Pipeline

Part 1: Indexing Workflow

// Indexing workflow node config
{
  "nodes": [
    {
      "name": "Character Text Splitter",
      "type": "@n8n/n8n-nodes-langchain.textSplitterCharacterTextSplitter",
      "parameters": {
        "chunkSize": 800,
        "chunkOverlap": 100
      }
    },
    {
      "name": "Qdrant Insert",
      "type": "@n8n/n8n-nodes-langchain.vectorStoreQdrant",
      "parameters": {
        "mode": "insert",
        "qdrantCollection": "company_kb"
      }
    }
  ]
}

Part 2: Query Workflow

// Qdrant retriever config
{
  "name": "Qdrant Retrieve",
  "type": "@n8n/n8n-nodes-langchain.vectorStoreQdrant",
  "parameters": {
    "mode": "retrieve",
    "qdrantCollection": "company_kb",
    "topK": 5,
    "scoreThreshold": 0.7
  }
}

// scoreThreshold: filter out results below this similarity score
// Range 0-1, start at 0.6 and tune upward
// topK: return the K most similar document chunks

Connect the Vector Store Retriever to a Retrieval QA Chain node, which automatically handles: retrieve → inject chunks into prompt → call LLM → return answer.

Prompt tip: In your RAG Chain system prompt, explicitly instruct the model to "answer only based on the provided context; if the context does not contain relevant information, say so." This dramatically reduces hallucinations in production knowledge base deployments.

16.5 Project: Enterprise Knowledge Base Bot

Scenario: Hundreds of product manuals and SOPs are stored in Google Drive. Employees query a Feishu bot that retrieves accurate answers in under 2 seconds, with source document attribution.

Indexing workflow (runs nightly at 2 AM):

Schedule trigger (2 AM daily)
Google Drive node: list all files in the designated folder (PDF/DOCX/TXT)
Loop Over Items: iterate each file
HTTP Request: download file content
Extract from File: extract plain text
Character Text Splitter: chunkSize=800, overlap=100
Embeddings OpenAI: text-embedding-3-small
Qdrant (insert mode): upsert to company_kb with filename and source URL in metadata

Query workflow (triggers on each Feishu @mention):

Feishu Webhook: receive message, extract question text
Embeddings OpenAI: vectorize the question
Qdrant (retrieve mode): topK=5, scoreThreshold=0.65
Code node: extract source filenames, format as numbered list
Retrieval QA Chain: LLM generates the answer
Feishu Send Message: reply with answer and source document list

// Code node: extract source document names from retrieval results
const docs = $input.all();
const sources = [...new Set(
  docs.map(d => d.json.metadata?.source || 'Unknown source')
)];

return [{
  json: {
    sourceList: sources.map((s, i) => `${i+1}. ${s}`).join('\n'),
    context: docs.map(d => d.json.pageContent).join('\n\n---\n\n')
  }
}];

Rate this chapter

4.5 / 5 (14 ratings)