Chapter 20

Performance Optimization and Monitoring

Chapter 20: Performance Optimization and Monitoring (Final Chapter)

When your n8n deployment graduates from personal use to team dependency, performance and observability become non-negotiable. This final chapter covers execution mode selection, horizontal scaling with Queue mode, Prometheus metrics, Grafana dashboards, alerting rules โ€” and closes with a full learning path review to help you plan your next steps.

20.1 Execution Mode: Main vs Queue

Aspect Main Mode (default) Queue Mode
How it works Main process executes workflows directly Main process enqueues jobs; Worker processes consume them
Concurrency ceiling Limited to single machine resources Scale horizontally by adding Workers
High availability Single point of failure Worker crash: job re-queued; Main crash: UI down, executions continue
Dependencies PostgreSQL only PostgreSQL + Redis
Scale < 10,000 executions/day > 10,000 executions/day or HA required

When to switch to Queue mode: (1) Execution queue frequently backs up with >30s start delays; (2) Memory usage exceeds 80% under concurrent load; (3) Business SLAs require <5min downtime tolerance.

20.2 Queue Mode and Worker Scaling

# Queue mode worker docker-compose config
services:
  n8n:
    image: n8nio/n8n:1.45.0
    environment:
      - EXECUTIONS_MODE=queue
    command: n8n start

  n8n-worker:
    image: n8nio/n8n:1.45.0
    environment:
      - EXECUTIONS_MODE=queue
      - QUEUE_BULL_REDIS_HOST=redis
      - QUEUE_BULL_REDIS_PASSWORD=${REDIS_PASSWORD}
      - N8N_CONCURRENCY_PRODUCTION_LIMIT=10
    command: n8n worker
    deploy:
      replicas: 3   # 3 workers ร— 10 concurrent = 30 max parallel executions

Adjust replicas and N8N_CONCURRENCY_PRODUCTION_LIMIT based on your actual CPU and memory capacity.

20.3 Concurrency Controls

20.4 Resource Limits and Timeouts

# Key resource limit env vars

# Maximum workflow execution time (seconds)
EXECUTIONS_TIMEOUT=3600
EXECUTIONS_TIMEOUT_MAX=7200

# Execution data retention โ€” critical for controlling DB growth
EXECUTIONS_DATA_MAX_AGE=30
EXECUTIONS_DATA_PRUNE=true
EXECUTIONS_DATA_PRUNE_MAX_COUNT=50000

# Node.js heap limit (MB)
NODE_OPTIONS=--max-old-space-size=4096

Enable pruning in production: Without EXECUTIONS_DATA_PRUNE=true, n8n retains every execution record forever. Your PostgreSQL database will grow unbounded until performance degrades or disk fills up.

20.5 Prometheus Metrics

# Enable the Prometheus metrics endpoint
N8N_METRICS=true
N8N_METRICS_INCLUDE_DEFAULT_METRICS=true
N8N_METRICS_INCLUDE_WORKFLOW_ID_LABEL=true
N8N_METRICS_INCLUDE_NODE_TYPE_LABEL=true

# Metrics available at http://n8n:5678/metrics
#
# Key metrics:
# n8n_workflow_executions_total        โ€” total executions by status
# n8n_workflow_execution_duration_ms   โ€” execution latency histogram
# n8n_queue_jobs_waiting               โ€” jobs waiting in queue
# n8n_queue_jobs_active                โ€” jobs currently executing
# prometheus.yml scrape config
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'n8n'
    static_configs:
      - targets: ['n8n:5678']
    metrics_path: '/metrics'
    basic_auth:
      username: metrics
      password: ${METRICS_PASSWORD}

20.6 Grafana Dashboard

Key panels to build:

20.7 Three Core Alert Rules

# PrometheusRule alerts
groups:
  - name: n8n.alerts
    rules:
      # 1. Error rate alert
      - alert: N8nHighErrorRate
        expr: |
          rate(n8n_workflow_executions_total{status="error"}[5m])
          / rate(n8n_workflow_executions_total[5m]) > 0.05
        for: 5m
        labels: { severity: critical }
        annotations:
          summary: "n8n execution error rate > 5%"

      # 2. Queue backlog alert
      - alert: N8nQueueBacklog
        expr: n8n_queue_jobs_waiting > 100
        for: 5m
        labels: { severity: warning }
        annotations:
          summary: "n8n queue has {{ $value }} waiting jobs"

      # 3. High latency alert
      - alert: N8nHighLatency
        expr: |
          histogram_quantile(0.95,
            rate(n8n_workflow_execution_duration_ms_bucket[10m])
          ) > 120000
        for: 10m
        labels: { severity: warning }
        annotations:
          summary: "n8n P95 execution latency > 120s"

20.8 Full Learning Path Review

You have now completed the full n8n learning journey โ€” from installing a local instance to running a production-grade, monitored, multi-worker deployment with AI integration:

Further Resources

Thank you for completing all 20 chapters of the n8n Automation Handbook. Automation is a compounding skill โ€” every repetitive task you eliminate is an investment in your future productivity. Build something great.

Rate this chapter
4.8  / 5  (8 ratings)

๐Ÿ’ฌ Comments