Chapter 16

RAG Knowledge Base Workflows

Chapter 16: RAG Knowledge Base Workflows

Retrieval-Augmented Generation (RAG) is the most mature approach for combining private enterprise documents with large language models. n8n natively supports the full RAG pipeline โ€” vector storage, document loading, embeddings โ€” with no code required. This chapter walks through the principles, node choices, and a complete enterprise knowledge base Q&A bot project.

16.1 RAG Principles: Five Core Steps

A RAG system consists of two pipelines. The indexing pipeline runs once (or periodically): load documents โ†’ chunk text โ†’ compute embeddings โ†’ store vectors. The query pipeline runs on every user question: embed the query โ†’ retrieve top-K similar chunks โ†’ feed chunks + question to LLM โ†’ return answer.

Why chunk documents? Embedding models have token limits (text-embedding-3-small supports up to 8,191 tokens), and shorter chunks produce more precise semantic vectors. Split documents into 500โ€“1,000 token chunks with 10โ€“20% overlap to avoid breaking sentence boundaries.

16.2 Vector Store Options

Vector Store Deployment Best For Cost
Qdrant Self-hosted Docker / Qdrant Cloud Enterprise, data sovereignty requirements Free self-hosted
Pinecone Fully managed cloud Fast prototyping, no ops overhead Free tier + usage
PGVector PostgreSQL extension Teams already running PostgreSQL Shared with PG
Supabase Vector Managed PGVector Small teams, quick launch Free tier available

Recommendation: high data-security requirements โ†’ Qdrant self-hosted; quick prototyping โ†’ Pinecone; teams already on PostgreSQL โ†’ PGVector (zero additional ops).

16.3 Embeddings Configuration

n8n supports OpenAI Embeddings (text-embedding-3-small โ€” recommended, 1536 dimensions, ~$0.02/million tokens), Azure OpenAI, Ollama (local, free โ€” use nomic-embed-text), and Cohere (better multilingual, especially Chinese). The critical rule: use the exact same embedding model for both indexing and querying. Mixing models produces meaningless results.

16.4 Document Loaders

For PDF/Word files: HTTP Request (download) โ†’ Extract from File node (supports PDF/DOCX/TXT) โ†’ feed into the vector pipeline.

16.5 Complete RAG Pipeline

Part 1: Indexing Workflow

// Indexing workflow node config
{
  "nodes": [
    {
      "name": "Character Text Splitter",
      "type": "@n8n/n8n-nodes-langchain.textSplitterCharacterTextSplitter",
      "parameters": {
        "chunkSize": 800,
        "chunkOverlap": 100
      }
    },
    {
      "name": "Qdrant Insert",
      "type": "@n8n/n8n-nodes-langchain.vectorStoreQdrant",
      "parameters": {
        "mode": "insert",
        "qdrantCollection": "company_kb"
      }
    }
  ]
}

Part 2: Query Workflow

// Qdrant retriever config
{
  "name": "Qdrant Retrieve",
  "type": "@n8n/n8n-nodes-langchain.vectorStoreQdrant",
  "parameters": {
    "mode": "retrieve",
    "qdrantCollection": "company_kb",
    "topK": 5,
    "scoreThreshold": 0.7
  }
}

// scoreThreshold: filter out results below this similarity score
// Range 0-1, start at 0.6 and tune upward
// topK: return the K most similar document chunks

Connect the Vector Store Retriever to a Retrieval QA Chain node, which automatically handles: retrieve โ†’ inject chunks into prompt โ†’ call LLM โ†’ return answer.

Prompt tip: In your RAG Chain system prompt, explicitly instruct the model to "answer only based on the provided context; if the context does not contain relevant information, say so." This dramatically reduces hallucinations in production knowledge base deployments.

16.5 Project: Enterprise Knowledge Base Bot

Scenario: Hundreds of product manuals and SOPs are stored in Google Drive. Employees query a Feishu bot that retrieves accurate answers in under 2 seconds, with source document attribution.

Indexing workflow (runs nightly at 2 AM):

  1. Schedule trigger (2 AM daily)
  2. Google Drive node: list all files in the designated folder (PDF/DOCX/TXT)
  3. Loop Over Items: iterate each file
  4. HTTP Request: download file content
  5. Extract from File: extract plain text
  6. Character Text Splitter: chunkSize=800, overlap=100
  7. Embeddings OpenAI: text-embedding-3-small
  8. Qdrant (insert mode): upsert to company_kb with filename and source URL in metadata

Query workflow (triggers on each Feishu @mention):

  1. Feishu Webhook: receive message, extract question text
  2. Embeddings OpenAI: vectorize the question
  3. Qdrant (retrieve mode): topK=5, scoreThreshold=0.65
  4. Code node: extract source filenames, format as numbered list
  5. Retrieval QA Chain: LLM generates the answer
  6. Feishu Send Message: reply with answer and source document list
// Code node: extract source document names from retrieval results
const docs = $input.all();
const sources = [...new Set(
  docs.map(d => d.json.metadata?.source || 'Unknown source')
)];

return [{
  json: {
    sourceList: sources.map((s, i) => `${i+1}. ${s}`).join('\n'),
    context: docs.map(d => d.json.pageContent).join('\n\n---\n\n')
  }
}];
Rate this chapter
4.5  / 5  (14 ratings)

๐Ÿ’ฌ Comments