Description

Set up and maintain an NVIDIA DGX Spark (GB10 Blackwell, 128GB unified memory) as a local LLM inference server running vLLM + LiteLLM + OpenClaw. Use when in...

README (SKILL.md)

DGX Spark Setup

Name: Dgx Spark Setup
Author: jimmy-hernandez

Complete setup guide for running Nemotron Super 120B (NVFP4) on a DGX Spark as a private OpenClaw backend with multi-user LiteLLM routing.

Architecture

MacBook (remote) ──Tailscale──► Mac Mini (OpenClaw host, SatPicks worker)
                                      │ LAN SSH
                                      ▼
                               DGX Spark (192.168.1.234)
                               ├── vLLM :8000  (inference)
                               └── LiteLLM :4000 (auth/routing)

Prerequisites

DGX Spark with Ubuntu (user: jhernandez)
Model downloaded to /home/jhernandez/models/nemotron-super-120b-nvfp4
Python 3.12 available (python3 --version)
uv installed (curl -LsSf https://astral.sh/uv/install.sh | sh)

1. vLLM Environment Setup

The DGX Spark uses the GB10 Blackwell chip (sm_121). Stock PyPI packages do NOT support sm_121 — everything must be custom built or sourced from specific index URLs.

mkdir -p ~/vllm-install
cd ~/vllm-install
uv venv .vllm --python 3.12
source .vllm/bin/activate

Install PyTorch (CUDA 13.0)

Must use uv pip install with the cu130 index — regular pip may resolve the wrong CUDA variant:

uv pip install torch torchvision torchaudio \
  --index-url https://download.pytorch.org/whl/cu130

Verify: python3 -c "import torch; print(torch.__version__)" → should show 2.11.0+cu130

Build Custom Triton (sm_121 support)

Stock Triton does not support sm_121. Must build from this exact commit:

cd ~/vllm-install
git clone https://github.com/triton-lang/triton.git
cd triton
git checkout 4caa0328bf8df64896dd5f6fb9df41b0eb2e750a
pip install ninja cmake wheel
pip install -e python/

Verify: python3 -c "import triton; print(triton.__version__)" → should show 3.5.0+git4caa0328

Install flashinfer

Versions must match exactly — mismatched cubin/flashinfer causes silent failures:

pip install flashinfer-python
pip install flashinfer  # cubin package — must match flashinfer-python version

Install vLLM from Source

cd ~/vllm-install
git clone https://github.com/vllm-project/vllm.git
cd vllm
git checkout 66a168a197ba214a5b70a74fa2e713c9eeb3251a
pip install -e . --no-build-isolation

2. Running vLLM

Always launch inside the tmux session so it survives SSH disconnects:

tmux new-session -s nemotron   # or: tmux attach -t nemotron

export PATH=$HOME/.local/bin:$PATH
source ~/vllm-install/.vllm/bin/activate

TORCH_CUDA_ARCH_LIST=12.1a \
VLLM_USE_FLASHINFER_MXFP4_MOE=1 \
TRITON_PTXAS_PATH=/usr/local/cuda/bin/ptxas \
  python -m vllm.entrypoints.openai.api_server \
  --model /home/jhernandez/models/nemotron-super-120b-nvfp4 \
  --trust-remote-code --max-model-len 8192 \
  --gpu-memory-utilization 0.85 --port 8000

Startup takes ~8 minutes (loading 17 safetensor shards). Ready when log shows Application startup complete.

Note: nvidia-smi shows N/A for memory on the GB10 (unified memory architecture) — this is normal, not a bug.

3. LiteLLM Setup

LiteLLM proxies vLLM and handles per-user auth and rate limiting.

Install

pip install litellm

Config (`~/litellm-config.yaml`)

See references/litellm-config-template.yaml for a full config with virtual keys and rate limits.

Run as systemd service

mkdir -p ~/.config/systemd/user
cat > ~/.config/systemd/user/litellm.service \x3C\x3C 'EOF'
[Unit]
Description=LiteLLM Proxy
After=network.target

[Service]
ExecStart=/home/jhernandez/.local/bin/litellm --config /home/jhernandez/litellm-config.yaml --port 4000
Restart=on-failure
RestartSec=5
StandardOutput=append:/home/jhernandez/litellm.log
StandardError=append:/home/jhernandez/litellm.log
Environment=PATH=/home/jhernandez/.local/bin:/usr/local/bin:/usr/bin:/bin

[Install]
WantedBy=default.target
EOF

systemctl --user daemon-reload
systemctl --user enable litellm
systemctl --user start litellm

Verify: curl http://localhost:4000/health/liveliness → "I'm alive!"

4. Tailscale

curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up
# Visit the auth URL shown, then approve in Tailscale admin
tailscale ip -4  # note this IP for OpenClaw client configs

5. OpenClaw Client Config

Point any OpenClaw instance at LiteLLM:

model:
  provider: openai-compatible
  baseUrl: http://\x3Cdgx-tailscale-ip>:4000/v1
  apiKey: \x3Cvirtual-key>
  model: nemotron-super

Troubleshooting

See references/troubleshooting.md for common failure modes and fixes.

Usage Guidance

This guide appears to legitimately describe how to prepare a DGX Spark for vLLM + LiteLLM, but it expects you to run privileged install/build steps on the target machine. Before proceeding: (1) verify the git commit hashes and PyTorch index URLs are correct and come from trusted repos, (2) inspect any remote install scripts (astral.sh and tailscale install.sh) before piping them to sh, (3) ensure you have backups and perform builds in an isolated environment or VM if possible, (4) update the example username/paths to match your system, (5) generate strong master/virtual keys locally (do not reuse example placeholders), and (6) confirm firewall/Tailscale access and user authorization policies — Tailscale opens remote access and requires careful admin approval. If you want, I can extract a checklist of the exact commands to review, or help you rewrite risky curl | sh steps into manual, reviewable steps.

Capability Analysis

Type: OpenClaw Skill Name: dgx-spark-setup Version: 1.0.0 The skill bundle provides a highly technical setup guide for vLLM on NVIDIA DGX Spark hardware. It is classified as suspicious due to several high-risk behaviors: the use of 'curl | sh' for installing third-party binaries (uv and Tailscale), the requirement for sudo privileges, and the use of the '--trust-remote-code' flag in vLLM which allows for arbitrary code execution from model repositories. While these actions are plausibly necessary for the stated purpose of configuring specialized Blackwell-based AI infrastructure, they represent significant security risks in an automated agent context (SKILL.md, references/troubleshooting.md).

Capability Assessment

⚠ Purpose & Capability

The SKILL.md describes precisely the DGX Spark setup and the artifacts (vLLM, LiteLLM, Triton build, flashinfer, Tailscale) are appropriate for that purpose. However the registry metadata claims no required binaries or env vars while the instructions clearly require python3.12, tmux, curl, git, sudo/systemctl, a CUDA toolchain (ptxas), and the 'uv' helper — this metadata omission is an inconsistency that could mislead automated gating or users.

ℹ Instruction Scope

Instructions stay within the stated scope (building vLLM and required components, configuring LiteLLM, and enabling remote access via Tailscale). They do direct the operator to modify systemd user services, run 'sudo tailscale up', and install packages and builds on the host; these are expected for a host setup but are high‑privilege operations and should only be performed on machines you control. The SKILL.md also contains a hard-coded example username ('jhernandez') and fixed filesystem paths which may not match the target environment.

⚠ Install Mechanism

This is an instruction-only skill (no install spec), but the runtime steps tell the user to run external installers and build from source. Notably it instructs two 'curl | sh' installs (astral.sh/uv and tailscale.com/install.sh), pip installs from PyTorch's cu130 index, and git clones/builds of Triton and vLLM pinned to commits. These are plausible for supporting GB10/sm_121 GPUs, but downloading and executing remote install scripts without verification and building/ piping arbitrary Python packages to the system carry elevated risk and should be audited before execution.

✓ Credentials

The skill does not request secrets or unrelated environment variables. The Litellm config template includes placeholders for a master key and per-user virtual keys (expected for a proxy service) and instructs generating them locally. No unrelated credentials or external tokens are requested by the skill.

ℹ Persistence & Privilege

The instructions create a per-user systemd service for LiteLLM and require running tailscale which configures system networking and may require sudo. The skill does not request always:true or try to alter other skills or global agent configuration, but it does instruct persistent system changes on the host which have lasting effect and should be applied consciously.

Version History

v1.0.0

- Initial release of DGX Spark setup skill. - Provides a step-by-step guide for configuring an NVIDIA DGX Spark (GB10 Blackwell, 128GB unified memory) as a local LLM inference server using vLLM, LiteLLM, and OpenClaw. - Covers requirements and instructions for building custom PyTorch and Triton with sm_121 (Blackwell) GPU compatibility. - Includes guidance on setting up LiteLLM with per-user virtual keys, running as a systemd user service, and configuring Tailscale for secure remote access. - Offers troubleshooting information and pointers for resolving compatibility issues (torch, Triton, flashinfer).

Metadata

Slug dgx-spark-setup

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Dgx Spark Setup?

Set up and maintain an NVIDIA DGX Spark (GB10 Blackwell, 128GB unified memory) as a local LLM inference server running vLLM + LiteLLM + OpenClaw. Use when in... It is an AI Agent Skill for Claude Code / OpenClaw, with 113 downloads so far.

How do I install Dgx Spark Setup?

Run "/install dgx-spark-setup" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Dgx Spark Setup free?

Yes, Dgx Spark Setup is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Dgx Spark Setup support?

Dgx Spark Setup is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Dgx Spark Setup?

It is built and maintained by Jimmy Hernandez (@jimmy-hernandez); the current version is v1.0.0.

More Skills

Dgx Spark Setup