Description

Linux Ollama — run Ollama on Linux with fleet routing across multiple Linux machines. Linux Ollama setup for Llama, Qwen, DeepSeek, Phi, Mistral. Route Ollam...

README (SKILL.md)

Linux Ollama — Fleet Routing for Ollama on Linux

Name: Linux Ollama
Author: twinsgeeks

Run Ollama on Linux with multi-machine load balancing. Linux Ollama Herd turns multiple Linux machines into one smart Ollama endpoint. Your server rack, your desktop, your edge device — all serving AI through one Linux Ollama URL.

Linux Ollama setup

Step 1: Install Ollama on Linux

curl -fsSL https://ollama.ai/install.sh | sh

Step 2: Install Linux Ollama Herd

pip install ollama-herd

Step 3: Start the Linux Ollama router

On one Linux machine (your router):

herd          # starts Linux Ollama router on port 11435
herd-node     # registers this Linux machine

On every other Linux machine:

herd-node     # auto-discovers the Linux Ollama router via mDNS

No mDNS? Connect Linux nodes directly: herd-node --router-url http://router-ip:11435

Linux Ollama systemd integration

Run Linux Ollama Herd as a systemd service for automatic startup:

# /etc/systemd/system/ollama-herd.service
[Unit]
Description=Linux Ollama Herd Router
After=network.target ollama.service

[Service]
Type=simple
ExecStart=/usr/local/bin/herd
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

sudo systemctl enable ollama-herd
sudo systemctl start ollama-herd

Node agent as a Linux systemd service:

# /etc/systemd/system/ollama-herd-node.service
[Unit]
Description=Linux Ollama Herd Node Agent
After=network.target ollama.service

[Service]
Type=simple
ExecStart=/usr/local/bin/herd-node
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

Use Linux Ollama

OpenAI SDK

from openai import OpenAI

# Your Linux Ollama fleet
client = OpenAI(base_url="http://localhost:11435/v1", api_key="not-needed")

response = client.chat.completions.create(
    model="llama3.3:70b",
    messages=[{"role": "user", "content": "Write a systemd service file for a Python API"}],
    stream=True,
)
for chunk in response:
    print(chunk.choices[0].delta.content or "", end="")

curl (Ollama format)

# Linux Ollama inference
curl http://localhost:11435/api/chat -d '{
  "model": "qwen3.5:32b",
  "messages": [{"role": "user", "content": "Explain Linux process scheduling"}],
  "stream": false
}'

Linux Ollama environment setup

# Optimize Linux Ollama performance via systemd
sudo systemctl edit ollama
# Add under [Service]:
#   Environment="OLLAMA_KEEP_ALIVE=-1"
#   Environment="OLLAMA_MAX_LOADED_MODELS=-1"
#   Environment="OLLAMA_NUM_PARALLEL=2"
sudo systemctl restart ollama

Or via shell profile:

echo 'export OLLAMA_KEEP_ALIVE=-1' >> ~/.bashrc
echo 'export OLLAMA_MAX_LOADED_MODELS=-1' >> ~/.bashrc
source ~/.bashrc

Linux Ollama GPU support

Linux GPU	vRAM	Best Linux Ollama models
NVIDIA RTX 4090	24GB	`llama3.3:70b`, `qwen3.5:32b`
NVIDIA A100	40/80GB	`deepseek-v3`, `qwen3.5:72b`
NVIDIA L40S	48GB	`llama3.3:70b` (full precision)
AMD ROCm (experimental)	varies	Ollama ROCm support on Linux
CPU only	system RAM	`phi4-mini`, `gemma3:1b` — slower but works

Linux Ollama supports NVIDIA CUDA, experimental AMD ROCm, and CPU-only inference.

Linux Ollama firewall

# UFW (Ubuntu/Debian)
sudo ufw allow 11435/tcp

# firewalld (RHEL/Fedora)
sudo firewall-cmd --add-port=11435/tcp --permanent
sudo firewall-cmd --reload

# iptables
sudo iptables -A INPUT -p tcp --dport 11435 -j ACCEPT

Monitor Linux Ollama

# Linux Ollama fleet status
curl -s http://localhost:11435/fleet/status | python3 -m json.tool

# Linux Ollama health — 15 automated checks
curl -s http://localhost:11435/dashboard/api/health | python3 -m json.tool

# Models on Linux Ollama nodes
curl -s http://localhost:11435/api/ps | python3 -m json.tool

Dashboard at http://localhost:11435/dashboard — live Linux Ollama monitoring.

Linux Ollama logs

# JSONL structured logs
tail -f ~/.fleet-manager/logs/herd.jsonl.$(date +%Y-%m-%d) | python3 -m json.tool

# Check for Linux Ollama errors
grep '"level":"ERROR"' ~/.fleet-manager/logs/herd.jsonl.$(date +%Y-%m-%d)

Also available on Linux Ollama

Image generation

curl http://localhost:11435/api/generate-image \
  -d '{"model": "z-image-turbo", "prompt": "Linux penguin in cyberspace", "width": 1024, "height": 1024}'

Embeddings

curl http://localhost:11435/api/embed \
  -d '{"model": "nomic-embed-text", "input": "Linux Ollama local inference"}'

Full documentation

Contribute

Ollama Herd is open source (MIT). Linux Ollama users welcome:

Guardrails

Linux Ollama model downloads require explicit user confirmation.
Linux Ollama model deletion requires explicit user confirmation.
Never delete or modify files in ~/.fleet-manager/.
No models are downloaded automatically — all pulls are user-initiated or require opt-in.

Usage Guidance

This skill appears internally consistent with its purpose, but exercise normal caution before following the install steps: 1) Inspect the remote install script (https://ollama.ai/install.sh) before piping into sh; prefer to download and review or follow distro-packaged installers if available. 2) Review the 'ollama-herd' PyPI package source (or its GitHub repo) before pip installing; consider installing in a virtualenv or sandbox. 3) The router listens on TCP 11435 — avoid exposing this port to the public internet without authentication; lock it to your LAN or put it behind an authenticated reverse proxy. 4) Confirm how the herd handles authentication/authorization between nodes (the SKILL.md shows mDNS and direct IP but does not document auth), and require secure networking (VPN/TLS) between nodes if used across untrusted networks. 5) Monitor ~/.fleet-manager logs and ensure retained sensitive data is handled appropriately. If you need higher assurance, test the setup in an isolated VM or lab network and audit the GitHub repo (https://github.com/geeks-accelerator/ollama-herd) and the install scripts before production use.

Capability Assessment

✓ Purpose & Capability

Name/description describe multi-machine Ollama routing and the SKILL.md contains step-by-step instructions to install Ollama, install a 'herd' Python package, run router/node processes, configure systemd, firewall, and monitor files under ~/.fleet-manager. Declared anyBins (curl|wget) and optional bins (python3, pip, systemctl, nvidia-smi) are referenced in the instructions and the provided configPaths (~/.fleet-manager/...) are used in examples — coherent with the stated purpose.

✓ Instruction Scope

The instructions stay within the stated purpose: installing Ollama, installing the herd package, starting router/node processes, systemd integration, firewall rules, monitoring endpoints and logs, and examples for API usage. The SKILL.md references only the declared config paths and common tooling; it does not instruct reading unrelated system credentials or other users' data. One minor note: the examples show OpenAI client usage against a local base_url with api_key set to 'not-needed' (demonstrative), which is not a request for secrets.

ℹ Install Mechanism

This is instruction-only (no installer in the registry). The install steps ask users to run curl -fsSL https://ollama.ai/install.sh | sh and pip install ollama-herd. Both are coherent for installing a system binary and a Python package, but piping a remote script into sh and installing PyPI packages execute remote code and should be reviewed/audited before running. No arbitrary archive downloads from unknown hosts are present in the SKILL.md.

✓ Credentials

The skill does not require any credentials or environment variables in the registry metadata. It suggests non-sensitive OL LAMA tuning env vars (OLLAMA_*), which are reasonable for runtime configuration. No secrets (API keys, tokens, passwords) are requested or required by the skill itself. The example use of an OpenAI client is illustrative and does not request a real key.

✓ Persistence & Privilege

always is false and the skill does not request elevated platform privileges. The instructions suggest enabling systemd services for the router/node so the process runs at boot — this is an appropriate behavior for a network service. The skill does not direct modifying other skills or system-wide agent settings beyond installing and enabling its own services.

Version History

v1.0.0

Initial release of Linux Ollama — multi-machine Ollama fleet routing for Linux. - Run Ollama inference across multiple Linux machines with automatic load balancing and routing. - Supports setup for Llama, Qwen, DeepSeek, Phi, Mistral models on servers, desktops, and edge devices. - Integrates with systemd for reliable service management. - Provides a dashboard, fleet health checks, and structured logging. - Includes firewall configuration, GPU/CPU tips, and OpenAI API compatibility. - No automatic model downloads or deletions; all actions require user confirmation.

Metadata

Slug linux-ollama

Version 1.0.0

License MIT-0

All-time Installs 2

Active Installs 2

Total Versions 1

Frequently Asked Questions

What is Linux Ollama?

Linux Ollama — run Ollama on Linux with fleet routing across multiple Linux machines. Linux Ollama setup for Llama, Qwen, DeepSeek, Phi, Mistral. Route Ollam... It is an AI Agent Skill for Claude Code / OpenClaw, with 121 downloads so far.

How do I install Linux Ollama?

Run "/install linux-ollama" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Linux Ollama free?

Yes, Linux Ollama is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Linux Ollama support?

Linux Ollama is cross-platform and runs anywhere OpenClaw / Claude Code is available (linux).

Who created Linux Ollama?

It is built and maintained by Twin Geeks (@twinsgeeks); the current version is v1.0.0.

More Skills

Linux Ollama