Description

Embeddings with nomic-embed-text, mxbai-embed, and snowflake-arctic-embed across your device fleet. Fleet-routed via Ollama for RAG, semantic search, and vec...

README (SKILL.md)

Fleet Embeddings

Name: Fleet Embeddings
Author: twinsgeeks

You're helping someone generate embeddings — converting text into vectors for semantic search, RAG pipelines, duplicate detection, or recommendation systems. Instead of hitting one Ollama instance, the fleet distributes embedding requests across all available nodes automatically.

Why fleet embeddings matter

Building a RAG knowledge base means embedding thousands of document chunks. On a single machine, embedding 10,000 chunks takes significant time and blocks LLM inference. With fleet routing, embedding requests spread across nodes — the machine that's least busy handles each batch, and LLM inference continues uninterrupted on other nodes.

Same Ollama embedding models you already know. Same API. Just faster because the fleet parallelizes it.

Get started

pip install ollama-herd
herd                        # start the router (port 11435)
herd-node                   # start on each device
ollama pull nomic-embed-text  # pull an embedding model

No feature toggle needed — embeddings route through Ollama automatically.

Package: ollama-herd | Repo: github.com/geeks-accelerator/ollama-herd

Generate embeddings

Ollama format (curl)

curl http://localhost:11435/api/embeddings -d '{
  "model": "nomic-embed-text",
  "prompt": "The fleet manages all inference routing"
}'

OpenAI SDK (Python)

from openai import OpenAI

client = OpenAI(base_url="http://localhost:11435/v1", api_key="not-needed")

response = client.embeddings.create(
    model="nomic-embed-text",
    input="The fleet manages all inference routing",
)
vector = response.data[0].embedding
print(f"Dimensions: {len(vector)}")

Python (httpx)

import httpx

def embed(text, model="nomic-embed-text"):
    resp = httpx.post(
        "http://localhost:11435/api/embeddings",
        json={"model": model, "prompt": text},
        timeout=30.0,
    )
    resp.raise_for_status()
    return resp.json()["embedding"]

vector = embed("search query here")

Batch embedding for RAG

import httpx

def embed_batch(texts, model="nomic-embed-text"):
    """Embed a list of texts. Fleet distributes across nodes."""
    vectors = []
    for text in texts:
        resp = httpx.post(
            "http://localhost:11435/api/embeddings",
            json={"model": model, "prompt": text},
            timeout=30.0,
        )
        resp.raise_for_status()
        vectors.append(resp.json()["embedding"])
    return vectors

# Embed document chunks for RAG
chunks = [
    "Introduction to fleet management...",
    "The scoring engine uses 7 signals...",
    "Context protection prevents model reloads...",
]
vectors = embed_batch(chunks)
print(f"Embedded {len(vectors)} chunks, {len(vectors[0])} dimensions each")

Available embedding models

Check what's available:

curl -s http://localhost:11435/api/tags | python3 -c "
import json, sys
for m in json.load(sys.stdin)['models']:
    if 'embed' in m['name'].lower() or 'nomic' in m['name'].lower():
        print(f'  {m[\"name\"]}')"

Common models: nomic-embed-text, mxbai-embed-large, all-minilm, snowflake-arctic-embed.

Pull a model if needed:

curl -X POST http://localhost:11435/dashboard/api/pull \
  -H "Content-Type: application/json" \
  -d '{"model": "nomic-embed-text", "node_id": "your-node-id"}'

Usage analytics

Tag embedding requests to track per-project usage:

resp = httpx.post(
    "http://localhost:11435/api/embeddings",
    json={
        "model": "nomic-embed-text",
        "prompt": text,
        "metadata": {"tags": ["my-rag-pipeline", "indexing"]},
    },
)

Also available on this fleet

LLM inference

curl http://localhost:11435/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-oss:120b","messages":[{"role":"user","content":"Hello"}]}'

Drop-in OpenAI SDK compatible. 7-signal scoring routes to the optimal node.

Image generation

curl -o image.png http://localhost:11435/api/generate-image \
  -H "Content-Type: application/json" \
  -d '{"model":"z-image-turbo","prompt":"a sunset","width":1024,"height":1024,"steps":4}'

Requires FLEET_IMAGE_GENERATION=true. Uses mflux (MLX-native Flux).

Speech-to-text

curl -s http://localhost:11435/api/transcribe \
  -F "[email protected]" | python3 -m json.tool

Requires FLEET_TRANSCRIPTION=true. Uses Qwen3-ASR.

Monitoring

# Fleet health and model recommendations
curl -s http://localhost:11435/dashboard/api/health | python3 -m json.tool

# Per-app usage (see which projects use the most tokens)
curl -s http://localhost:11435/dashboard/api/apps | python3 -m json.tool

Dashboard at http://localhost:11435/dashboard — embedding requests flow through the same queues as LLM requests.

Full documentation

Agent Setup Guide — complete reference for all 4 model types.

Request Tagging Guide — tag requests for per-project analytics.

Guardrails

Never delete or modify files in ~/.fleet-manager/.
Never pull or delete models without user confirmation.
If embedding model not available, suggest: ollama pull nomic-embed-text.
If router not running, suggest: herd or uv run herd.

Usage Guidance

This skill appears consistent with its purpose, but before installing or running it: (1) inspect the PyPI package and GitHub repo (https://github.com/geeks-accelerator/ollama-herd) to confirm you trust the code; (2) be aware it expects to run a local service (herd/herd-node) that listens on port 11435 and may create/read files under ~/.fleet-manager; (3) pulling models can download large binaries and may implicate model licensing or bandwidth; (4) if you don't want a local server, do not run herd/herd-node — curl/http calls in the docs target localhost and will fail harmlessly otherwise; and (5) review any model pull requests and node IDs to avoid unintentionally provisioning models to devices you don't control.

Capability Assessment

✓ Purpose & Capability

Name/description (fleet embeddings via Ollama) match what the SKILL.md instructs: calling a local Ollama-like router on http://localhost:11435, pulling embedding models, and running herd/herd-node. Required binaries (curl or wget, optional python3/pip) are appropriate for the stated tasks.

ℹ Instruction Scope

The runtime instructions focus on localhost API calls, model pulls, and running the herd router/node; they do not ask the agent to read unrelated system files or external secrets. One noteworthy point: the SKILL.md metadata lists config paths (~/.fleet-manager/latency.db and logs) though the registry metadata earlier said no required config paths — the skill may create or read those local files when running the herd, so confirm you accept that local state.

ℹ Install Mechanism

This is instruction-only (no install spec). The guide recommends pip install ollama-herd (a PyPI package) and running herd/herd-node. Installing from PyPI is expected for a Python tool but is not enforced automatically by the registry; review the PyPI package and GitHub repo before installing.

✓ Credentials

The skill does not request environment secrets or credentials. It documents optional env flags (FLEET_IMAGE_GENERATION, FLEET_TRANSCRIPTION) to enable feature toggles, which are proportional and optional. No unrelated API keys or secrets are requested.

✓ Persistence & Privilege

always is false and the skill does not request elevated system-wide privileges. Running herd/herd-node will run local services and create local state under ~/.fleet-manager per the metadata — this is expected for a fleet router but verify you are comfortable running a local server listening on port 11435.

Version History

v1.1.1

Cross-platform support: macOS, Linux, and Windows. Updated OS metadata, descriptions, and hardware recommendations.

v1.1.0

- Updated the description to highlight supported embedding models (nomic-embed-text, mxbai-embed, and snowflake-arctic-embed) and clarify fleet-routed usage. - No code or usage changes; documentation only.

v1.0.0

Initial release of fleet-embeddings. - Generate text embeddings across all nodes in your device fleet, not just one machine. - Automatic load balancing with Ollama embedding models for fast parallel batch embeddings. - Supports use cases like RAG pipelines, semantic search, similarity matching, and recommendation systems. - Simple API compatible with curl, Python (OpenAI SDK, httpx), and OpenAI-compatible tools. - Provides examples for batch embedding, model management, project usage tagging, and monitoring via dashboard. - Includes guardrails for file safety and user-driven model management.

Metadata

Slug fleet-embeddings

Version 1.1.1

License MIT-0

All-time Installs 2

Active Installs 2

Total Versions 3

Frequently Asked Questions

What is Fleet Embeddings?

Embeddings with nomic-embed-text, mxbai-embed, and snowflake-arctic-embed across your device fleet. Fleet-routed via Ollama for RAG, semantic search, and vec... It is an AI Agent Skill for Claude Code / OpenClaw, with 142 downloads so far.

How do I install Fleet Embeddings?

Run "/install fleet-embeddings" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Fleet Embeddings free?

Yes, Fleet Embeddings is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Fleet Embeddings support?

Fleet Embeddings is cross-platform and runs anywhere OpenClaw / Claude Code is available (darwin, linux, windows).

Who created Fleet Embeddings?

It is built and maintained by Twin Geeks (@twinsgeeks); the current version is v1.1.1.

More Skills

Fleet Embeddings

Fleet Embeddings

Why fleet embeddings matter

Get started

Generate embeddings

Ollama format (curl)

OpenAI SDK (Python)

Python (httpx)

Batch embedding for RAG

Available embedding models

Usage analytics

Also available on this fleet

LLM inference

Image generation

Speech-to-text

Monitoring

Full documentation

Guardrails

What is Fleet Embeddings?

How do I install Fleet Embeddings?

Is Fleet Embeddings free?

Which platforms does Fleet Embeddings support?

Who created Fleet Embeddings?

💬 Comments