Deepseek Deepseek Coder
/install deepseek-deepseek-coder
DeepSeek — Run DeepSeek Models Across Your Local Fleet
Run DeepSeek-V3, DeepSeek-R1, and DeepSeek-Coder on your own hardware. The fleet router picks the best device for every request — no cloud API needed, zero per-token costs, all data stays on your machines.
Supported DeepSeek models
| Model | Parameters | Ollama name | Best for |
|---|---|---|---|
| DeepSeek-V3 | 671B MoE (37B active) | deepseek-v3 |
General — matches GPT-4o on most benchmarks |
| DeepSeek-V3.1 | 671B MoE | deepseek-v3.1 |
Hybrid thinking/non-thinking modes |
| DeepSeek-V3.2 | 671B MoE | deepseek-v3.2 |
Improved reasoning + agent performance |
| DeepSeek-R1 | 1.5B–671B | deepseek-r1 |
Reasoning — approaches O3 and Gemini 2.5 Pro |
| DeepSeek-Coder | 1.3B–33B | deepseek-coder |
Code generation (87% code, 13% NL training) |
| DeepSeek-Coder-V2 | 236B MoE (21B active) | deepseek-coder-v2 |
Code — matches GPT-4 Turbo on code tasks |
Setup
pip install ollama-herd
herd # start the router (port 11435)
herd-node # run on each machine
Package: ollama-herd | Repo: github.com/geeks-accelerator/ollama-herd
Models are pulled on demand — the router auto-pulls when a request arrives for a model not yet on any node, or you can pull manually via the dashboard. No models are downloaded during installation.
Use DeepSeek through the fleet
OpenAI SDK
from openai import OpenAI
client = OpenAI(base_url="http://localhost:11435/v1", api_key="not-needed")
# DeepSeek-R1 for reasoning
response = client.chat.completions.create(
model="deepseek-r1:70b",
messages=[{"role": "user", "content": "Prove that there are infinitely many primes"}],
stream=True,
)
for chunk in response:
print(chunk.choices[0].delta.content or "", end="")
DeepSeek-Coder for code
response = client.chat.completions.create(
model="deepseek-coder-v2:16b",
messages=[{"role": "user", "content": "Write a Redis cache decorator in Python"}],
)
print(response.choices[0].message.content)
Ollama API
# DeepSeek-V3 general chat
curl http://localhost:11435/api/chat -d '{
"model": "deepseek-v3",
"messages": [{"role": "user", "content": "Explain quantum computing"}],
"stream": false
}'
# DeepSeek-R1 reasoning
curl http://localhost:11435/api/chat -d '{
"model": "deepseek-r1:70b",
"messages": [{"role": "user", "content": "Solve this step by step: ..."}],
"stream": false
}'
Hardware recommendations (optional — choose models that fit your RAM)
Cross-platform: These are example configurations. Any device (Mac, Linux, Windows) with equivalent RAM works. The fleet router runs on all platforms.
DeepSeek offers models at every size. Pick the one that fits your available memory — smaller models work great for most tasks:
| Model | Min RAM | Recommended hardware |
|---|---|---|
deepseek-r1:1.5b |
4GB | Any Mac |
deepseek-r1:7b |
8GB | Mac Mini M4 (16GB) |
deepseek-r1:14b |
12GB | Mac Mini M4 (24GB) |
deepseek-r1:32b |
24GB | Mac Mini M4 Pro (48GB) |
deepseek-r1:70b |
48GB | Mac Studio M4 Max (128GB) |
deepseek-coder-v2:16b |
12GB | Mac Mini M4 (24GB) |
deepseek-v3 |
256GB+ | Mac Studio M3 Ultra (512GB) |
The fleet router automatically sends requests to the machine where the model is loaded — no manual routing needed.
Why run DeepSeek locally
- Zero cost — DeepSeek API charges per token. Local is free after hardware.
- Privacy — code and business data never leave your network.
- No rate limits — DeepSeek API throttles during peak hours. Local has no throttle.
- Availability — DeepSeek API has had outages. Your hardware doesn't depend on their servers.
- Fleet routing — multiple machines share the load. One busy? Request goes to the next.
Fleet features
- 7-signal scoring — picks the optimal node for every request
- Auto-retry — fails over to next best node transparently
- VRAM-aware fallback — routes to a loaded model in the same category instead of cold-loading
- Context protection — prevents expensive model reloads from
num_ctxchanges - Request tagging — track per-project DeepSeek usage
Also available on this fleet
Other LLM models
Llama 3.3, Qwen 3.5, Phi 4, Mistral, Gemma 3 — any Ollama model routes through the same endpoint.
Image generation
curl -o image.png http://localhost:11435/api/generate-image \
-H "Content-Type: application/json" \
-d '{"model":"z-image-turbo","prompt":"a sunset","width":1024,"height":1024,"steps":4}'
Speech-to-text
curl http://localhost:11435/api/transcribe -F "[email protected]"
Embeddings
curl http://localhost:11435/api/embeddings -d '{"model":"nomic-embed-text","prompt":"query"}'
Dashboard
http://localhost:11435/dashboard — monitor DeepSeek requests alongside all other models. Per-model latency, token throughput, health checks.
Full documentation
Guardrails
- Model downloads require explicit user confirmation — DeepSeek models range from 1GB (1.5B) to 400GB+ (671B). Always confirm before pulling.
- Model deletion requires explicit user confirmation — never remove models without asking.
- Never delete or modify files in
~/.fleet-manager/. - If a DeepSeek model is too large for available memory, suggest a smaller variant (e.g.,
deepseek-r1:7binstead of:70b). - No models are downloaded automatically — all pulls are user-initiated or require opt-in via the
auto_pullsetting.
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install deepseek-deepseek-coder - 安装完成后,直接呼叫该 Skill 的名称或使用
/deepseek-deepseek-coder触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Deepseek Deepseek Coder 是什么?
DeepSeek DeepSeek-Coder — run DeepSeek-V3, DeepSeek-R1, DeepSeek-Coder across your local fleet. 7-signal scoring routes every request to the best device. Cro... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 143 次。
如何安装 Deepseek Deepseek Coder?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install deepseek-deepseek-coder」即可一键安装,无需额外配置。
Deepseek Deepseek Coder 是免费的吗?
是的,Deepseek Deepseek Coder 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Deepseek Deepseek Coder 支持哪些平台?
Deepseek Deepseek Coder 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(darwin, linux, windows)。
谁开发了 Deepseek Deepseek Coder?
由 Twin Geeks(@twinsgeeks)开发并维护,当前版本 v1.0.3。