Description

Apple Silicon AI — run LLMs, image generation, speech-to-text, and embeddings on Mac Studio, Mac Mini, MacBook Pro, and Mac Pro. Turn your Apple Silicon devi...

README (SKILL.md)

Apple Silicon AI — Your Macs Are the Cluster

Name: Apple Silicon Ai
Author: twinsgeeks

Turn your Mac Studio, Mac Mini, MacBook Pro, or Mac Pro into a local Apple Silicon AI fleet. One endpoint routes LLM inference, image generation, speech-to-text, and embeddings across every Apple Silicon device on your network.

No cloud APIs. No GPU rentals. No Docker. Your Apple Silicon M1/M2/M3/M4 chips with unified memory are already better inference hardware than most cloud instances — you just need software that treats them as an Apple Silicon fleet.

Why Apple Silicon for AI

Apple Silicon unified memory keeps the entire model in one address space — no PCIe bottleneck, no CPU-GPU transfer overhead. A Mac Studio with M4 Ultra and 256GB runs 120B parameter models that would need multiple NVIDIA A100s. That is the Apple Silicon advantage.

Apple Silicon Chip	Unified Memory	LLM Sweet Spot	Apple Silicon Image Gen	Notes
M1 (8GB)	8GB	7B models	Slow	Entry-level Apple Silicon
M1 Pro/Max (32-64GB)	32-64GB	14B-32B	Capable	Apple Silicon MacBook Pro
M2 Ultra (192GB)	192GB	70B-120B	Fast	Apple Silicon Mac Studio/Pro
M3 Max (128GB)	128GB	70B	Fast	Latest Apple Silicon MacBook Pro
M4 Max (128GB)	128GB	70B	Fast	Apple Silicon Mac Studio, newest gen
M4 Ultra (256GB)	256GB	120B+	Very fast	Apple Silicon Mac Studio/Pro, largest models

Apple Silicon Fleet Setup

1. Install on every Apple Silicon Mac

pip install ollama-herd    # Apple Silicon optimized inference router

2. Start the Apple Silicon router (pick one Mac)

herd    # starts Apple Silicon router on port 11435

3. Start the Apple Silicon node agent on every Mac

herd-node    # Apple Silicon node auto-discovers the router

That's it. Apple Silicon nodes discover the router automatically on your local network. No IP addresses to configure, no config files. For explicit connection, use herd-node --router-url http://\x3Crouter-ip>:11435.

How Apple Silicon routing works

MacBook Pro (M3 Max, 64GB)  ─┐
Mac Mini (M4, 32GB)          ├──→  Apple Silicon Router (:11435)  ←──  Your apps
Mac Studio (M4 Ultra, 256GB) ─┘

The Apple Silicon router scores each device on 7 signals and routes every request to the best available Mac — thermal state, memory fit, queue depth, and more.

Apple Silicon LLM Inference

Run Llama, Qwen, DeepSeek, Phi, Mistral, Gemma, and any Ollama model across your Apple Silicon fleet.

OpenAI-compatible API (Apple Silicon backend)

curl http://localhost:11435/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.3:70b",
    "messages": [{"role": "user", "content": "Explain Apple Silicon unified memory architecture"}]
  }'

Ollama-compatible API

curl http://localhost:11435/api/chat \
  -d '{"model": "qwen3:32b", "messages": [{"role": "user", "content": "Compare Apple Silicon M4 vs M3 for AI inference"}]}'

Apple Silicon Python Client

from openai import OpenAI
# Apple Silicon inference client
apple_silicon_client = OpenAI(base_url="http://localhost:11435/v1", api_key="unused")
apple_silicon_response = apple_silicon_client.chat.completions.create(
    model="deepseek-r1:70b",
    messages=[{"role": "user", "content": "Optimize this function for Apple Silicon"}]
)

Apple Silicon Image Generation (mflux)

Generate images using MLX-native Flux models. Runs natively on Apple Silicon — no CUDA, no cloud.

curl http://localhost:11435/api/generate-image \
  -d '{"prompt": "Apple Silicon Mac Studio rendering AI art, photorealistic", "model": "z-image-turbo", "width": 512, "height": 512}'

Apple Silicon image generation performance:

Mac Studio M4 Ultra: ~5s at 512px, ~14s at 1024px
MacBook Pro M3 Max: ~7s at 512px, ~18s at 1024px
Mac Mini M4: ~12s at 512px, ~30s at 1024px

Apple Silicon Speech-to-Text (Qwen ASR)

Transcribe audio locally on Apple Silicon using Qwen3-ASR via MLX. Meetings, voice notes, podcasts — no cloud, no Whisper API costs.

curl http://localhost:11435/api/transcribe \
  -F "file=@apple_silicon_meeting.wav" \
  -F "model=qwen3-asr"

Supports WAV, MP3, M4A, FLAC. ~2s for a 30-second clip on Apple Silicon M4 Ultra.

Apple Silicon Embeddings

Embed documents across your Apple Silicon fleet using Ollama embedding models (nomic-embed-text, mxbai-embed-large, snowflake-arctic-embed).

curl http://localhost:11435/api/embed \
  -d '{"model": "nomic-embed-text", "input": "Apple Silicon unified memory architecture for AI inference"}'

Batch thousands of documents across Apple Silicon nodes instead of bottlenecking on one Mac.

Apple Silicon Fleet Monitoring

Dashboard

Open http://localhost:11435/dashboard — see every Apple Silicon Mac in your fleet: models loaded, queue depth, thermal state, memory usage, and health status.

Apple Silicon Fleet Status API

curl http://localhost:11435/fleet/status

Returns every Apple Silicon node with hardware specs, loaded models, image/STT capabilities, and health metrics.

Apple Silicon Health Checks

curl http://localhost:11435/dashboard/api/health

15 automated checks: offline Apple Silicon nodes, memory pressure, thermal throttling, VRAM fallbacks, error rates, and more.

Recommended Models by Apple Silicon Hardware

Your Apple Silicon Mac	RAM	Recommended models
Mac Mini (16GB)	16GB	llama3.2:3b, phi4-mini, nomic-embed-text
Mac Mini (32GB)	32GB	qwen3:14b, deepseek-r1:14b, llama3.3:8b
MacBook Pro (36-64GB)	36-64GB	qwen3:32b, deepseek-r1:32b, codestral
Mac Studio (128GB)	128GB	llama3.3:70b, qwen3:72b, deepseek-r1:70b
Mac Studio/Pro (192-256GB)	192-256GB	qwen3:110b, deepseek-v3:236b (quantized)

The Apple Silicon router's model recommender analyzes your fleet hardware and suggests the optimal model mix: GET /dashboard/api/model-recommendations.

Full documentation

Agent Setup Guide — complete Apple Silicon setup for all 4 model types
Configuration Reference — all 44+ environment variables
API Reference — all endpoints with request/response schemas
Troubleshooting — common Apple Silicon issues and fixes

Guardrails

No automatic downloads: Apple Silicon model pulls are always user-initiated and require explicit confirmation. Downloads range from 2GB to 70GB+ depending on model size.
Model deletion requires confirmation: Never remove models from Apple Silicon nodes without explicit user approval.
All Apple Silicon requests stay local: No data leaves your local network — all inference happens on your Apple Silicon Macs.
No API keys: No accounts, no tokens, no cloud dependencies for your Apple Silicon fleet.
No external network access: The Apple Silicon router and nodes communicate only on your local network. No telemetry, no cloud callbacks.
Read-only local state: The only local files created are ~/.fleet-manager/latency.db (Apple Silicon routing metrics) and ~/.fleet-manager/logs/herd.jsonl (structured logs). Never delete or modify these files without user confirmation.

Usage Guidance

This skill is coherent with its stated goal (running an Apple Silicon inference fleet) but requires you to install a third‑party Python package and run a router/node that auto-discovers Macs and exposes a local HTTP dashboard and fleet/status endpoints. Before installing: (1) inspect the ollama-herd project on the linked GitHub (and PyPI package contents) to verify source and maintainers; (2) run the software in an isolated/trusted network or VM first to confirm behavior; (3) review created files under ~/.fleet-manager and check logs for sensitive info; (4) restrict access to port 11435 with a local firewall if you don't want LAN devices to discover or call the router; (5) ensure any models the system downloads are sourced from places you trust (the doc implies local models but implementations may fetch remote artifacts). If you cannot review the package source or do not accept LAN-wide auto-discovery and local telemetry exposure, do not install.

Capability Assessment

ℹ Purpose & Capability

Name and description claim local Apple Silicon fleet inference and the runtime instructions (pip install ollama-herd, herd, herd-node, local HTTP APIs) are consistent with that purpose. Declared required bins (curl/wget, optional python3/pip) align with examples. Minor inconsistency: SKILL.md text says "No config files" but metadata lists configPaths (~/.fleet-manager/latency.db, ~/.fleet-manager/logs/herd.jsonl), implying the software will create local state/log files.

ℹ Instruction Scope

Instructions stay within the stated domain (install package, run router/node, call local APIs). They direct the agent/user to run pip install and start services that auto-discover other Macs and expose a dashboard and fleet/status endpoints (hardware specs, models loaded, thermal/memory state). That behavior is expected for a fleet manager but raises privacy/network exposure concerns (auto-discovery and telemetry shared across the LAN). The SKILL.md does not instruct reading unrelated user files or environment variables.

ℹ Install Mechanism

There is no platform install spec; the document instructs the user to run `pip install ollama-herd`. That means installing third-party code from PyPI (or a specified index) which is a moderate supply-chain risk. The skill itself does not bundle code for review, so reviewers must inspect the external package/repo before trusting it.

ℹ Credentials

The skill declares no required environment variables or credentials (good). However, it will access local system metrics and create/read local fleet manager files (per metadata), and it opens a network service on port 11435 exposing machine-level info. Those capabilities are proportionate to a fleet management/inference router but are sensitive — no extra creds were requested, but network exposure and local logs could contain sensitive data.

✓ Persistence & Privilege

The skill is instruction-only, does not set always:true, and does not request elevated platform privileges in the manifest. Autonomous invocation by the agent is allowed by default (normal). There is no indication it modifies other skills or global agent settings.

Version History

v1.0.3

Cross-platform support: macOS, Linux, and Windows. Updated OS metadata, descriptions, and hardware recommendations.

v1.0.2

Version 1.0.2 - Updated documentation to explicitly emphasize "Apple Silicon" branding throughout SKILL.md. - Added multilingual summary in the description (Chinese and Spanish). - Clarified all references to hardware, APIs, endpoints, and workflows as "Apple Silicon" specific. - Enhanced guidance and usage examples to highlight Apple Silicon advantages and terminology. - No functional or code changes—documentation only.

v1.0.1

- Improved setup instructions for clarity and added explicit connection options for node agents. - Strengthened guardrails: model downloads and deletions now always require explicit user confirmation. - Highlighted that no external network access or telemetry occurs; all fleet activity stays local. - Clarified which files are written locally and added encouragement not to delete/modify without user approval. - Minor language and formatting refinements throughout for greater clarity.

v1.0.0

Initial release of apple-silicon-ai. - Transform your Apple Silicon Macs into a local AI fleet for LLM, image generation, speech-to-text, and embeddings. - Supports M1, M2, M3, M4 Max/Ultra chips with unified memory for high-performance, on-device AI. - Simple setup: install on each Mac, run the router and node agents—auto-discovers and joins fleet via mDNS. - OpenAI/Ollama-compatible APIs for LLMs, MLX-native image generation, Qwen ASR speech-to-text, and embeddings. - Web dashboard and fleet status APIs for real-time monitoring, health checks, and model placement recommendations. - All inference runs locally; no cloud, no Docker, no API keys, no external data transfer.

Metadata

Slug apple-silicon-ai

Version 1.0.3

License MIT-0

All-time Installs 2

Active Installs 2

Total Versions 4

Frequently Asked Questions

What is Apple Silicon Ai?

Apple Silicon AI — run LLMs, image generation, speech-to-text, and embeddings on Mac Studio, Mac Mini, MacBook Pro, and Mac Pro. Turn your Apple Silicon devi... It is an AI Agent Skill for Claude Code / OpenClaw, with 147 downloads so far.

How do I install Apple Silicon Ai?

Run "/install apple-silicon-ai" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Apple Silicon Ai free?

Yes, Apple Silicon Ai is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Apple Silicon Ai support?

Apple Silicon Ai is cross-platform and runs anywhere OpenClaw / Claude Code is available (darwin).

Who created Apple Silicon Ai?

It is built and maintained by Twin Geeks (@twinsgeeks); the current version is v1.0.3.

More Skills

Apple Silicon Ai