Linux Ollama
/install linux-ollama
Linux Ollama — Fleet Routing for Ollama on Linux
Run Ollama on Linux with multi-machine load balancing. Linux Ollama Herd turns multiple Linux machines into one smart Ollama endpoint. Your server rack, your desktop, your edge device — all serving AI through one Linux Ollama URL.
Linux Ollama setup
Step 1: Install Ollama on Linux
curl -fsSL https://ollama.ai/install.sh | sh
Step 2: Install Linux Ollama Herd
pip install ollama-herd
Step 3: Start the Linux Ollama router
On one Linux machine (your router):
herd # starts Linux Ollama router on port 11435
herd-node # registers this Linux machine
On every other Linux machine:
herd-node # auto-discovers the Linux Ollama router via mDNS
No mDNS? Connect Linux nodes directly:
herd-node --router-url http://router-ip:11435
Linux Ollama systemd integration
Run Linux Ollama Herd as a systemd service for automatic startup:
# /etc/systemd/system/ollama-herd.service
[Unit]
Description=Linux Ollama Herd Router
After=network.target ollama.service
[Service]
Type=simple
ExecStart=/usr/local/bin/herd
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
sudo systemctl enable ollama-herd
sudo systemctl start ollama-herd
Node agent as a Linux systemd service:
# /etc/systemd/system/ollama-herd-node.service
[Unit]
Description=Linux Ollama Herd Node Agent
After=network.target ollama.service
[Service]
Type=simple
ExecStart=/usr/local/bin/herd-node
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
Use Linux Ollama
OpenAI SDK
from openai import OpenAI
# Your Linux Ollama fleet
client = OpenAI(base_url="http://localhost:11435/v1", api_key="not-needed")
response = client.chat.completions.create(
model="llama3.3:70b",
messages=[{"role": "user", "content": "Write a systemd service file for a Python API"}],
stream=True,
)
for chunk in response:
print(chunk.choices[0].delta.content or "", end="")
curl (Ollama format)
# Linux Ollama inference
curl http://localhost:11435/api/chat -d '{
"model": "qwen3.5:32b",
"messages": [{"role": "user", "content": "Explain Linux process scheduling"}],
"stream": false
}'
Linux Ollama environment setup
# Optimize Linux Ollama performance via systemd
sudo systemctl edit ollama
# Add under [Service]:
# Environment="OLLAMA_KEEP_ALIVE=-1"
# Environment="OLLAMA_MAX_LOADED_MODELS=-1"
# Environment="OLLAMA_NUM_PARALLEL=2"
sudo systemctl restart ollama
Or via shell profile:
echo 'export OLLAMA_KEEP_ALIVE=-1' >> ~/.bashrc
echo 'export OLLAMA_MAX_LOADED_MODELS=-1' >> ~/.bashrc
source ~/.bashrc
Linux Ollama GPU support
| Linux GPU | vRAM | Best Linux Ollama models |
|---|---|---|
| NVIDIA RTX 4090 | 24GB | llama3.3:70b, qwen3.5:32b |
| NVIDIA A100 | 40/80GB | deepseek-v3, qwen3.5:72b |
| NVIDIA L40S | 48GB | llama3.3:70b (full precision) |
| AMD ROCm (experimental) | varies | Ollama ROCm support on Linux |
| CPU only | system RAM | phi4-mini, gemma3:1b — slower but works |
Linux Ollama supports NVIDIA CUDA, experimental AMD ROCm, and CPU-only inference.
Linux Ollama firewall
# UFW (Ubuntu/Debian)
sudo ufw allow 11435/tcp
# firewalld (RHEL/Fedora)
sudo firewall-cmd --add-port=11435/tcp --permanent
sudo firewall-cmd --reload
# iptables
sudo iptables -A INPUT -p tcp --dport 11435 -j ACCEPT
Monitor Linux Ollama
# Linux Ollama fleet status
curl -s http://localhost:11435/fleet/status | python3 -m json.tool
# Linux Ollama health — 15 automated checks
curl -s http://localhost:11435/dashboard/api/health | python3 -m json.tool
# Models on Linux Ollama nodes
curl -s http://localhost:11435/api/ps | python3 -m json.tool
Dashboard at http://localhost:11435/dashboard — live Linux Ollama monitoring.
Linux Ollama logs
# JSONL structured logs
tail -f ~/.fleet-manager/logs/herd.jsonl.$(date +%Y-%m-%d) | python3 -m json.tool
# Check for Linux Ollama errors
grep '"level":"ERROR"' ~/.fleet-manager/logs/herd.jsonl.$(date +%Y-%m-%d)
Also available on Linux Ollama
Image generation
curl http://localhost:11435/api/generate-image \
-d '{"model": "z-image-turbo", "prompt": "Linux penguin in cyberspace", "width": 1024, "height": 1024}'
Embeddings
curl http://localhost:11435/api/embed \
-d '{"model": "nomic-embed-text", "input": "Linux Ollama local inference"}'
Full documentation
Contribute
Ollama Herd is open source (MIT). Linux Ollama users welcome:
Guardrails
- Linux Ollama model downloads require explicit user confirmation.
- Linux Ollama model deletion requires explicit user confirmation.
- Never delete or modify files in
~/.fleet-manager/. - No models are downloaded automatically — all pulls are user-initiated or require opt-in.
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install linux-ollama - After installation, invoke the skill by name or use
/linux-ollama - Provide required inputs per the skill's parameter spec and get structured output
What is Linux Ollama?
Linux Ollama — run Ollama on Linux with fleet routing across multiple Linux machines. Linux Ollama setup for Llama, Qwen, DeepSeek, Phi, Mistral. Route Ollam... It is an AI Agent Skill for Claude Code / OpenClaw, with 121 downloads so far.
How do I install Linux Ollama?
Run "/install linux-ollama" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Linux Ollama free?
Yes, Linux Ollama is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Linux Ollama support?
Linux Ollama is cross-platform and runs anywhere OpenClaw / Claude Code is available (linux).
Who created Linux Ollama?
It is built and maintained by Twin Geeks (@twinsgeeks); the current version is v1.0.0.