Linux Ollama
/install linux-ollama
Linux Ollama — Fleet Routing for Ollama on Linux
Run Ollama on Linux with multi-machine load balancing. Linux Ollama Herd turns multiple Linux machines into one smart Ollama endpoint. Your server rack, your desktop, your edge device — all serving AI through one Linux Ollama URL.
Linux Ollama setup
Step 1: Install Ollama on Linux
curl -fsSL https://ollama.ai/install.sh | sh
Step 2: Install Linux Ollama Herd
pip install ollama-herd
Step 3: Start the Linux Ollama router
On one Linux machine (your router):
herd # starts Linux Ollama router on port 11435
herd-node # registers this Linux machine
On every other Linux machine:
herd-node # auto-discovers the Linux Ollama router via mDNS
No mDNS? Connect Linux nodes directly:
herd-node --router-url http://router-ip:11435
Linux Ollama systemd integration
Run Linux Ollama Herd as a systemd service for automatic startup:
# /etc/systemd/system/ollama-herd.service
[Unit]
Description=Linux Ollama Herd Router
After=network.target ollama.service
[Service]
Type=simple
ExecStart=/usr/local/bin/herd
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
sudo systemctl enable ollama-herd
sudo systemctl start ollama-herd
Node agent as a Linux systemd service:
# /etc/systemd/system/ollama-herd-node.service
[Unit]
Description=Linux Ollama Herd Node Agent
After=network.target ollama.service
[Service]
Type=simple
ExecStart=/usr/local/bin/herd-node
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
Use Linux Ollama
OpenAI SDK
from openai import OpenAI
# Your Linux Ollama fleet
client = OpenAI(base_url="http://localhost:11435/v1", api_key="not-needed")
response = client.chat.completions.create(
model="llama3.3:70b",
messages=[{"role": "user", "content": "Write a systemd service file for a Python API"}],
stream=True,
)
for chunk in response:
print(chunk.choices[0].delta.content or "", end="")
curl (Ollama format)
# Linux Ollama inference
curl http://localhost:11435/api/chat -d '{
"model": "qwen3.5:32b",
"messages": [{"role": "user", "content": "Explain Linux process scheduling"}],
"stream": false
}'
Linux Ollama environment setup
# Optimize Linux Ollama performance via systemd
sudo systemctl edit ollama
# Add under [Service]:
# Environment="OLLAMA_KEEP_ALIVE=-1"
# Environment="OLLAMA_MAX_LOADED_MODELS=-1"
# Environment="OLLAMA_NUM_PARALLEL=2"
sudo systemctl restart ollama
Or via shell profile:
echo 'export OLLAMA_KEEP_ALIVE=-1' >> ~/.bashrc
echo 'export OLLAMA_MAX_LOADED_MODELS=-1' >> ~/.bashrc
source ~/.bashrc
Linux Ollama GPU support
| Linux GPU | vRAM | Best Linux Ollama models |
|---|---|---|
| NVIDIA RTX 4090 | 24GB | llama3.3:70b, qwen3.5:32b |
| NVIDIA A100 | 40/80GB | deepseek-v3, qwen3.5:72b |
| NVIDIA L40S | 48GB | llama3.3:70b (full precision) |
| AMD ROCm (experimental) | varies | Ollama ROCm support on Linux |
| CPU only | system RAM | phi4-mini, gemma3:1b — slower but works |
Linux Ollama supports NVIDIA CUDA, experimental AMD ROCm, and CPU-only inference.
Linux Ollama firewall
# UFW (Ubuntu/Debian)
sudo ufw allow 11435/tcp
# firewalld (RHEL/Fedora)
sudo firewall-cmd --add-port=11435/tcp --permanent
sudo firewall-cmd --reload
# iptables
sudo iptables -A INPUT -p tcp --dport 11435 -j ACCEPT
Monitor Linux Ollama
# Linux Ollama fleet status
curl -s http://localhost:11435/fleet/status | python3 -m json.tool
# Linux Ollama health — 15 automated checks
curl -s http://localhost:11435/dashboard/api/health | python3 -m json.tool
# Models on Linux Ollama nodes
curl -s http://localhost:11435/api/ps | python3 -m json.tool
Dashboard at http://localhost:11435/dashboard — live Linux Ollama monitoring.
Linux Ollama logs
# JSONL structured logs
tail -f ~/.fleet-manager/logs/herd.jsonl.$(date +%Y-%m-%d) | python3 -m json.tool
# Check for Linux Ollama errors
grep '"level":"ERROR"' ~/.fleet-manager/logs/herd.jsonl.$(date +%Y-%m-%d)
Also available on Linux Ollama
Image generation
curl http://localhost:11435/api/generate-image \
-d '{"model": "z-image-turbo", "prompt": "Linux penguin in cyberspace", "width": 1024, "height": 1024}'
Embeddings
curl http://localhost:11435/api/embed \
-d '{"model": "nomic-embed-text", "input": "Linux Ollama local inference"}'
Full documentation
Contribute
Ollama Herd is open source (MIT). Linux Ollama users welcome:
Guardrails
- Linux Ollama model downloads require explicit user confirmation.
- Linux Ollama model deletion requires explicit user confirmation.
- Never delete or modify files in
~/.fleet-manager/. - No models are downloaded automatically — all pulls are user-initiated or require opt-in.
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install linux-ollama - 安装完成后,直接呼叫该 Skill 的名称或使用
/linux-ollama触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Linux Ollama 是什么?
Linux Ollama — run Ollama on Linux with fleet routing across multiple Linux machines. Linux Ollama setup for Llama, Qwen, DeepSeek, Phi, Mistral. Route Ollam... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 121 次。
如何安装 Linux Ollama?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install linux-ollama」即可一键安装,无需额外配置。
Linux Ollama 是免费的吗?
是的,Linux Ollama 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Linux Ollama 支持哪些平台?
Linux Ollama 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(linux)。
谁开发了 Linux Ollama?
由 Twin Geeks(@twinsgeeks)开发并维护,当前版本 v1.0.0。