← Back to Skills Marketplace
twinsgeeks

Llama Llama3

by Twin Geeks · GitHub ↗ · v1.0.1 · MIT-0
darwinlinuxwindows ✓ Security Clean
153
Downloads
2
Stars
2
Active Installs
2
Versions
Install in OpenClaw
/install llama-llama3
Description
Llama 3 by Meta — run Llama 3.3, Llama 3.2, and Llama 3.1 across your local device fleet. The most popular open-source LLM family routed to the best availabl...
README (SKILL.md)

Llama 3 — Run Meta's LLMs Across Your Local Fleet

The Llama family is the most widely deployed open-source LLM. This skill routes Llama requests across your devices — the fleet picks the best machine for every request automatically.

Supported Llama models

Model Parameters Ollama name Best for
Llama 3.3 70B llama3.3:70b Best overall — matches GPT-4o on most benchmarks
Llama 3.2 1B, 3B llama3.2:3b Fast responses on low-RAM devices
Llama 3.1 8B, 70B, 405B llama3.1:70b Proven workhorse, massive community
Llama 3 8B, 70B llama3:70b Original release, still widely used

Quick start

pip install ollama-herd    # PyPI: https://pypi.org/project/ollama-herd/
herd                       # start the router (port 11435)
herd-node                  # run on each device — finds the router automatically

No models are downloaded during installation. Models are pulled on demand when a request arrives, or manually via the dashboard. All pulls require user confirmation.

Use Llama through the fleet

OpenAI SDK (drop-in replacement)

from openai import OpenAI

client = OpenAI(base_url="http://localhost:11435/v1", api_key="not-needed")

response = client.chat.completions.create(
    model="llama3.3:70b",
    messages=[{"role": "user", "content": "Explain transformer architecture"}],
    stream=True,
)
for chunk in response:
    print(chunk.choices[0].delta.content or "", end="")

curl (Ollama format)

curl http://localhost:11435/api/chat -d '{
  "model": "llama3.3:70b",
  "messages": [{"role": "user", "content": "Write a Python quicksort"}],
  "stream": false
}'

curl (OpenAI format)

curl http://localhost:11435/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3.2:3b", "messages": [{"role": "user", "content": "Hello"}]}'

Which Llama model for your hardware

Cross-platform: These are example configurations. Any device (Mac, Linux, Windows) with equivalent RAM works. The fleet router runs on all platforms.

Pick the model that fits your available memory — smaller models work great for most tasks:

Model Min RAM Example hardware
llama3.2:1b 2GB Any Mac — even 8GB
llama3.2:3b 4GB Mac Mini (16GB)
llama3:8b 8GB Mac Mini (16GB)
llama3.3:70b 48GB Mac Studio M4 Max (128GB)
llama3.1:405b 256GB+ Mac Studio M4 Ultra (256GB) or distributed

The fleet router sends requests to the machine where the model is loaded. No manual routing needed.

Why run Llama locally

  • Free after hardware — Meta's license allows commercial use with no per-token cost
  • Privacy — prompts and responses never leave your network
  • No rate limits — your hardware, your throughput
  • Fleet routing — multiple machines share the load automatically

See what's running

# Models loaded in memory right now
curl -s http://localhost:11435/api/ps | python3 -m json.tool

# All models available across the fleet
curl -s http://localhost:11435/api/tags | python3 -m json.tool

Monitor Llama performance

# Recent request traces — see latency, tokens, which node handled each request
curl -s "http://localhost:11435/dashboard/api/traces?limit=10" | python3 -m json.tool

# Fleet health — 15 automated checks
curl -s http://localhost:11435/dashboard/api/health | python3 -m json.tool

Web dashboard at http://localhost:11435/dashboard — live view of all nodes, queues, and models.

Also available on this fleet

Other LLM models

Qwen 3.5, DeepSeek-V3, DeepSeek-R1, Phi 4, Mistral, Gemma 3, Codestral — any Ollama model routes through the same endpoint.

Image generation

curl http://localhost:11435/api/generate-image \
  -d '{"model": "z-image-turbo", "prompt": "a llama in the mountains", "width": 512, "height": 512}'

Speech-to-text

curl http://localhost:11435/api/transcribe -F "[email protected]" -F "model=qwen3-asr"

Embeddings

curl http://localhost:11435/api/embed \
  -d '{"model": "nomic-embed-text", "input": "Meta Llama open source language model"}'

Full documentation

Guardrails

  • Model downloads require explicit user confirmation — Llama models range from 1GB (1B) to 230GB+ (405B). Always confirm before pulling.
  • Model deletion requires explicit user confirmation.
  • Never delete or modify files in ~/.fleet-manager/.
  • If a model is too large for available memory, suggest a smaller variant.
  • No models are downloaded automatically — all pulls are user-initiated or require opt-in via the auto_pull setting.
Usage Guidance
This skill is internally consistent with being a local fleet router, but you should still do basic hygiene before installing: 1) Inspect the PyPI package 'ollama-herd' and the linked GitHub repository to confirm the code matches the docs and that model downloads are interactive as stated. 2) Run the software in an isolated/test environment first (or a VM) to verify it only listens on localhost or your intended network interfaces. 3) Review ~/.fleet-manager/ contents and logs for any sensitive data; back them up if needed. 4) Confirm that model pulls truly require explicit confirmation and that no automatic outbound traffic uploads prompts/responses. 5) Limit which devices/users can join your fleet (authentication/ACLs) to avoid exposing local models. If you cannot review the package source, treat the PyPI install as a moderate risk.
Capability Assessment
Purpose & Capability
Name/description (a local fleet router for Llama models) lines up with what's requested and documented: the SKILL.md instructs installing a herd router package, running local binaries (herd, herd-node), and talking to localhost endpoints. Required binaries (curl/wget, optional python/pip) are appropriate. Declared config paths (~/.fleet-manager/latency.db and logs/herd.jsonl) are consistent with a fleet manager that records latency and logs.
Instruction Scope
SKILL.md is instruction-only and stays within the stated purpose: it tells the operator to pip install the herd package, run herd and herd-node, and call local HTTP endpoints. It does not instruct reading arbitrary user files or exfiltrating data. One point to note: metadata lists fleet config paths (logs/db) which are sensitive — the doc warns not to modify them, but if installed, the herd software will likely read/write those files. Verify that behavior in the package source before trusting logs/latency data.
Install Mechanism
There is no formal install spec in the registry (instruction-only), but the SKILL.md recommends 'pip install ollama-herd' from PyPI. Installing third-party packages from PyPI is a normal distribution route but carries modest risk—inspect the PyPI package and the linked GitHub repo before installing. No downloads from unknown personal servers or archive extracts are specified.
Credentials
The skill requests no environment variables, no credentials, and no system config paths beyond its own fleet config directory. That is proportionate for a local fleet router that operates on localhost and local devices.
Persistence & Privilege
always is false and the skill does not request system-wide privileges or modifications to other skills. The declared config paths imply it will maintain local state under ~/.fleet-manager/, which is expected for this type of software; ensure you are comfortable with that directory being created/used.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install llama-llama3
  3. After installation, invoke the skill by name or use /llama-llama3
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.1
Cross-platform support: macOS, Linux, and Windows. Updated OS metadata, descriptions, and hardware recommendations.
v1.0.0
llama-llama3 1.0.0 — Initial release - Route Meta’s Llama 3 family of models (including 3.3, 3.2, 3.1, and original 3) across your local device fleet. - Automatically selects the best available device for each Llama request. - Supports OpenAI-compatible API and Ollama API endpoints. - Manual, user-confirmed model downloads—no automatic pulls. - Includes fleet monitoring, dashboard, and support for multiple tasks (chat, images, speech-to-text, embeddings). - Commercial use supported with zero cloud or per-token costs.
Metadata
Slug llama-llama3
Version 1.0.1
License MIT-0
All-time Installs 2
Active Installs 2
Total Versions 2
Frequently Asked Questions

What is Llama Llama3?

Llama 3 by Meta — run Llama 3.3, Llama 3.2, and Llama 3.1 across your local device fleet. The most popular open-source LLM family routed to the best availabl... It is an AI Agent Skill for Claude Code / OpenClaw, with 153 downloads so far.

How do I install Llama Llama3?

Run "/install llama-llama3" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Llama Llama3 free?

Yes, Llama Llama3 is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Llama Llama3 support?

Llama Llama3 is cross-platform and runs anywhere OpenClaw / Claude Code is available (darwin, linux, windows).

Who created Llama Llama3?

It is built and maintained by Twin Geeks (@twinsgeeks); the current version is v1.0.1.

💬 Comments