Chapter 14

LLM Integration

Chapter 14: LLM Integration

Since version 1.0, n8n has included a native AI node system supporting OpenAI, Anthropic Claude, Google Gemini, Mistral, and locally-hosted Ollama models. These nodes handle single-turn queries, conversation history, streaming output, and token accounting. This chapter covers credential setup for each provider, parameter tuning strategies, cost control, and a complete batch article summarization project.

n8n AI Node Architecture

Two layers: Language Model nodes (communicate with specific LLM APIs — OpenAI Chat Model, Anthropic Chat Model, Google Gemini Chat Model, Ollama Chat Model) and AI Chain/Agent nodes (orchestrate reasoning — Basic LLM Chain, Conversational Chain, AI Agent, Summarization Chain). Language Model nodes are sub-components referenced by Chain/Agent nodes. Swapping models requires changing only the Language Model sub-node.

OpenAI: API Key Setup and Parameters

Create an "OpenAI API" credential with your key from platform.openai.com/api-keys. Never hard-code keys in node parameters. Create separate keys for test and production environments. Set spending limits in the OpenAI dashboard as a safety net.

Parameter	Description	Recommended
Model	Model version	gpt-4o (general) / gpt-4o-mini (cost-optimized)
Temperature	Output randomness, 0–2	0.3 for summaries; 0.8 for creative tasks
Max Tokens	Maximum output length	512 for summaries — avoids waste
Frequency Penalty	Penalises repeated tokens	0.3 reduces repetition effectively

Claude (Anthropic): Model Selection and Tuning

Create an "Anthropic API" credential with your key from console.anthropic.com. Model recommendations: Claude 3.5 Sonnet for best price-performance; Claude 3 Opus for highest quality; Claude 3 Haiku for fast, high-volume, simple tasks. Claude follows System Prompts very precisely — define role, output format, and constraints there for best results.

XML output trick: Ask Claude to wrap structured content in XML tags (e.g. <summary>...</summary>), then extract with a regex in a Code node. More reliable than requesting JSON, as Claude rarely malforms XML tags.

Ollama: Private Local Model Deployment

Ollama runs Llama 3, Qwen2.5, Mistral, DeepSeek, and hundreds of open-source models locally. Data never leaves your network and there are no API fees or rate limits.

# Install Ollama and pull a model
curl -fsSL https://ollama.com/install.sh | sh
ollama pull qwen2.5:7b

# Allow external access for Docker-hosted n8n
export OLLAMA_HOST=0.0.0.0:11434

In n8n: create an "Ollama API" credential with Base URL http://localhost:11434 (or host machine IP if n8n is in Docker). Select the pulled model name in the Ollama Chat Model node.

Token Usage and Cost Control

After each AI node execution, usage data is in $json.usage.prompt_tokens and $json.usage.completion_tokens. Add a Code node to compute cost, then write to Google Sheets for monthly LLM spend dashboards and anomaly detection.

// Cost calculation for GPT-4o
const cost =
  $json.usage.prompt_tokens * (2.5 / 1_000_000) +
  $json.usage.completion_tokens * (10 / 1_000_000);
return [{ json: { ...$json, _cost_usd: cost.toFixed(6) } }];

Project: Batch Article Summarization Workflow

Goal: daily automated Chinese summaries with topic tags and importance scores for 10–50 industry articles, saved to Notion.

Workflow: Cron (6 AM) → HTTP Request (fetch new article list from RSS/API) → IF (filter already-processed URLs) → Loop Over Items → HTTP Request (fetch full text) → Basic LLM Chain (Claude 3.5 Sonnet, prompt requests JSON with summary/tags/score) → Code node (parse JSON, handle format errors) → Notion Create Page → Wait (2s rate-limit throttle).

// System Prompt (Basic LLM Chain node)
You are a professional tech media editor. For the following article, complete three tasks:
1. Write a Chinese summary under 150 characters, preserving key points and data
2. Choose 1-3 matching tags from: AI, LLM, Cloud, Chips, Overseas, E-commerce, Enterprise, Funding, Policy
3. Give an importance score from 1-5 (5=major industry breakthrough, 1=routine news)

Output in this exact JSON format with no other content:
{
  "summary": "...",
  "tags": ["...", "..."],
  "score": 4
}

Cost estimate: A 2,000-word article uses ~2,500 input + ~200 output tokens. At Claude 3.5 Sonnet pricing, that's roughly $0.0009 per article — processing 50 articles per day costs under $0.05, at 50× human speed.

Rate this chapter

4.6 / 5 (18 ratings)