← Back to Skills Marketplace
Ollama Load Balancer
by
Twin Geeks
· GitHub ↗
· v1.0.4
· MIT-0
256
Downloads
0
Stars
3
Active Installs
6
Versions
Install in OpenClaw
/install ollama-load-balancer
Description
Ollama load balancer for Llama, Qwen, DeepSeek, and Mistral inference across multiple machines. Load balancing with auto-discovery via mDNS, health checks, q...
Usage Guidance
This skill appears internally consistent for running a local Ollama load balancer, but it relies on you pip installing a third-party package and running local servers that manage model downloads and expose admin HTTP endpoints. Before installing: (1) review the ollama-herd PyPI package and GitHub repo to ensure the code matches expectations, (2) prefer installing in a sandbox/VM or isolated network, (3) disable or review auto-pull behavior to avoid unexpected large downloads, and (4) restrict access to the daemon's HTTP port (localhost-only or firewall) to prevent unauthorized remote control.
Capability Assessment
Purpose & Capability
Name/description (Ollama load balancer) matches the runtime instructions: auto-discovery, health checks, routing, and admin HTTP endpoints. Required binaries (curl/wget) and optional python/pip/sqlite3 are appropriate for a Python-based local service.
Instruction Scope
SKILL.md instructs the agent to pip install ollama-herd and run local commands (herd / herd-node) and to call local HTTP endpoints on localhost:11435. It does not instruct reading unrelated system files or exfiltrating secrets. It does include administrative endpoints (pull/delete models) — expected for a load-balancer but potentially powerful if misused.
Install Mechanism
There is no registry install spec; the README tells the user to pip install ollama-herd from PyPI. pip installs are common but will execute third-party code on the host — moderate risk. The SKILL.md points to a PyPI project and GitHub repo (traceable), which is better than an arbitrary download, but users should review the package/source before installing.
Credentials
The skill declares no required environment variables or credentials. The use of FLEET_MAX_RETRIES and runtime settings is plausible for a load balancer; no unexplained secret access is requested.
Persistence & Privilege
always:false (normal). The instructions create and use local config paths (~/.fleet-manager/...), run long-lived processes, and expose an admin HTTP interface. This is consistent with the stated functionality but means the package will persist on disk and run services — run with appropriate isolation and review.
How to Use
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install ollama-load-balancer - After installation, invoke the skill by name or use
/ollama-load-balancer - Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.4
Cross-platform support: macOS, Linux, and Windows. Updated OS metadata, descriptions, and hardware recommendations.
v1.0.3
Version 1.0.3
- Expanded internationalization in the description (added Chinese and Spanish).
- Reworded across the documentation to consistently refer to "load balancer" for clarity.
- Enhanced deployment and API examples for more explicit load balancer usage.
- Updated feature and endpoint explanations to clarify their relation to the load balancer.
- Incremented version metadata to 1.0.3.
v1.1.0
- Updated the description for improved clarity and to highlight support for Llama, Qwen, DeepSeek, and Mistral.
- Reduced the zombie reaper’s request cleanup threshold from 15 minutes to 10 minutes.
- No changes to functionality or API; documentation only.
v1.0.2
- Updated version to 1.0.2 in SKILL.md.
- Clarified that auto-pull of missing models is now optional and disabled by default; enable via settings API.
- Updated metadata for configPaths location.
- No code or feature changes; documentation and metadata only.
v1.0.1
- Added "optionalBins" and "configPaths" fields to the metadata section in SKILL.md.
- Now explicitly lists optional dependencies: python3, sqlite3, and pip.
- Declares expected config and log file locations for fleet manager operation.
- No changes to code or user-facing features.
v1.0.0
Initial release.
- Load balances Ollama inference across multiple machines with automatic discovery, health checks, and real-time monitoring dashboard.
- Built-in queue management, zero configuration setup, and automatic failover with retry logic.
- Includes zombie request cleanup, VRAM-aware model routing, and operational analytics via SQL queries.
- Web dashboard provides fleet status, trends, insights, and runtime toggles.
- Exposes multiple REST API endpoints for fleet health, request traces, usage, and model management.
- Designed for high availability and operational visibility in Ollama deployments.
Metadata
Frequently Asked Questions
What is Ollama Load Balancer?
Ollama load balancer for Llama, Qwen, DeepSeek, and Mistral inference across multiple machines. Load balancing with auto-discovery via mDNS, health checks, q... It is an AI Agent Skill for Claude Code / OpenClaw, with 256 downloads so far.
How do I install Ollama Load Balancer?
Run "/install ollama-load-balancer" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Ollama Load Balancer free?
Yes, Ollama Load Balancer is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Ollama Load Balancer support?
Ollama Load Balancer is cross-platform and runs anywhere OpenClaw / Claude Code is available (darwin, linux, windows).
Who created Ollama Load Balancer?
It is built and maintained by Twin Geeks (@twinsgeeks); the current version is v1.0.4.
More Skills