功能描述

Set up and maintain an NVIDIA DGX Spark (GB10 Blackwell, 128GB unified memory) as a local LLM inference server running vLLM + LiteLLM + OpenClaw. Use when in...

使用说明 (SKILL.md)

DGX Spark Setup

Name: Dgx Spark Setup
Author: jimmy-hernandez

Complete setup guide for running Nemotron Super 120B (NVFP4) on a DGX Spark as a private OpenClaw backend with multi-user LiteLLM routing.

Architecture

MacBook (remote) ──Tailscale──► Mac Mini (OpenClaw host, SatPicks worker)
                                      │ LAN SSH
                                      ▼
                               DGX Spark (192.168.1.234)
                               ├── vLLM :8000  (inference)
                               └── LiteLLM :4000 (auth/routing)

Prerequisites

DGX Spark with Ubuntu (user: jhernandez)
Model downloaded to /home/jhernandez/models/nemotron-super-120b-nvfp4
Python 3.12 available (python3 --version)
uv installed (curl -LsSf https://astral.sh/uv/install.sh | sh)

1. vLLM Environment Setup

The DGX Spark uses the GB10 Blackwell chip (sm_121). Stock PyPI packages do NOT support sm_121 — everything must be custom built or sourced from specific index URLs.

mkdir -p ~/vllm-install
cd ~/vllm-install
uv venv .vllm --python 3.12
source .vllm/bin/activate

Install PyTorch (CUDA 13.0)

Must use uv pip install with the cu130 index — regular pip may resolve the wrong CUDA variant:

uv pip install torch torchvision torchaudio \
  --index-url https://download.pytorch.org/whl/cu130

Verify: python3 -c "import torch; print(torch.__version__)" → should show 2.11.0+cu130

Build Custom Triton (sm_121 support)

Stock Triton does not support sm_121. Must build from this exact commit:

cd ~/vllm-install
git clone https://github.com/triton-lang/triton.git
cd triton
git checkout 4caa0328bf8df64896dd5f6fb9df41b0eb2e750a
pip install ninja cmake wheel
pip install -e python/

Verify: python3 -c "import triton; print(triton.__version__)" → should show 3.5.0+git4caa0328

Install flashinfer

Versions must match exactly — mismatched cubin/flashinfer causes silent failures:

pip install flashinfer-python
pip install flashinfer  # cubin package — must match flashinfer-python version

Install vLLM from Source

cd ~/vllm-install
git clone https://github.com/vllm-project/vllm.git
cd vllm
git checkout 66a168a197ba214a5b70a74fa2e713c9eeb3251a
pip install -e . --no-build-isolation

2. Running vLLM

Always launch inside the tmux session so it survives SSH disconnects:

tmux new-session -s nemotron   # or: tmux attach -t nemotron

export PATH=$HOME/.local/bin:$PATH
source ~/vllm-install/.vllm/bin/activate

TORCH_CUDA_ARCH_LIST=12.1a \
VLLM_USE_FLASHINFER_MXFP4_MOE=1 \
TRITON_PTXAS_PATH=/usr/local/cuda/bin/ptxas \
  python -m vllm.entrypoints.openai.api_server \
  --model /home/jhernandez/models/nemotron-super-120b-nvfp4 \
  --trust-remote-code --max-model-len 8192 \
  --gpu-memory-utilization 0.85 --port 8000

Startup takes ~8 minutes (loading 17 safetensor shards). Ready when log shows Application startup complete.

Note: nvidia-smi shows N/A for memory on the GB10 (unified memory architecture) — this is normal, not a bug.

3. LiteLLM Setup

LiteLLM proxies vLLM and handles per-user auth and rate limiting.

Install

pip install litellm

Config (`~/litellm-config.yaml`)

See references/litellm-config-template.yaml for a full config with virtual keys and rate limits.

Run as systemd service

mkdir -p ~/.config/systemd/user
cat > ~/.config/systemd/user/litellm.service \x3C\x3C 'EOF'
[Unit]
Description=LiteLLM Proxy
After=network.target

[Service]
ExecStart=/home/jhernandez/.local/bin/litellm --config /home/jhernandez/litellm-config.yaml --port 4000
Restart=on-failure
RestartSec=5
StandardOutput=append:/home/jhernandez/litellm.log
StandardError=append:/home/jhernandez/litellm.log
Environment=PATH=/home/jhernandez/.local/bin:/usr/local/bin:/usr/bin:/bin

[Install]
WantedBy=default.target
EOF

systemctl --user daemon-reload
systemctl --user enable litellm
systemctl --user start litellm

Verify: curl http://localhost:4000/health/liveliness → "I'm alive!"

4. Tailscale

curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up
# Visit the auth URL shown, then approve in Tailscale admin
tailscale ip -4  # note this IP for OpenClaw client configs

5. OpenClaw Client Config

Point any OpenClaw instance at LiteLLM:

model:
  provider: openai-compatible
  baseUrl: http://\x3Cdgx-tailscale-ip>:4000/v1
  apiKey: \x3Cvirtual-key>
  model: nemotron-super

Troubleshooting

See references/troubleshooting.md for common failure modes and fixes.

安全使用建议

This guide appears to legitimately describe how to prepare a DGX Spark for vLLM + LiteLLM, but it expects you to run privileged install/build steps on the target machine. Before proceeding: (1) verify the git commit hashes and PyTorch index URLs are correct and come from trusted repos, (2) inspect any remote install scripts (astral.sh and tailscale install.sh) before piping them to sh, (3) ensure you have backups and perform builds in an isolated environment or VM if possible, (4) update the example username/paths to match your system, (5) generate strong master/virtual keys locally (do not reuse example placeholders), and (6) confirm firewall/Tailscale access and user authorization policies — Tailscale opens remote access and requires careful admin approval. If you want, I can extract a checklist of the exact commands to review, or help you rewrite risky curl | sh steps into manual, reviewable steps.

功能分析

Type: OpenClaw Skill Name: dgx-spark-setup Version: 1.0.0 The skill bundle provides a highly technical setup guide for vLLM on NVIDIA DGX Spark hardware. It is classified as suspicious due to several high-risk behaviors: the use of 'curl | sh' for installing third-party binaries (uv and Tailscale), the requirement for sudo privileges, and the use of the '--trust-remote-code' flag in vLLM which allows for arbitrary code execution from model repositories. While these actions are plausibly necessary for the stated purpose of configuring specialized Blackwell-based AI infrastructure, they represent significant security risks in an automated agent context (SKILL.md, references/troubleshooting.md).

能力评估

⚠ Purpose & Capability

The SKILL.md describes precisely the DGX Spark setup and the artifacts (vLLM, LiteLLM, Triton build, flashinfer, Tailscale) are appropriate for that purpose. However the registry metadata claims no required binaries or env vars while the instructions clearly require python3.12, tmux, curl, git, sudo/systemctl, a CUDA toolchain (ptxas), and the 'uv' helper — this metadata omission is an inconsistency that could mislead automated gating or users.

ℹ Instruction Scope

Instructions stay within the stated scope (building vLLM and required components, configuring LiteLLM, and enabling remote access via Tailscale). They do direct the operator to modify systemd user services, run 'sudo tailscale up', and install packages and builds on the host; these are expected for a host setup but are high‑privilege operations and should only be performed on machines you control. The SKILL.md also contains a hard-coded example username ('jhernandez') and fixed filesystem paths which may not match the target environment.

⚠ Install Mechanism

This is an instruction-only skill (no install spec), but the runtime steps tell the user to run external installers and build from source. Notably it instructs two 'curl | sh' installs (astral.sh/uv and tailscale.com/install.sh), pip installs from PyTorch's cu130 index, and git clones/builds of Triton and vLLM pinned to commits. These are plausible for supporting GB10/sm_121 GPUs, but downloading and executing remote install scripts without verification and building/ piping arbitrary Python packages to the system carry elevated risk and should be audited before execution.

✓ Credentials

The skill does not request secrets or unrelated environment variables. The Litellm config template includes placeholders for a master key and per-user virtual keys (expected for a proxy service) and instructs generating them locally. No unrelated credentials or external tokens are requested by the skill.

ℹ Persistence & Privilege

The instructions create a per-user systemd service for LiteLLM and require running tailscale which configures system networking and may require sudo. The skill does not request always:true or try to alter other skills or global agent configuration, but it does instruct persistent system changes on the host which have lasting effect and should be applied consciously.

版本历史

v1.0.0

- Initial release of DGX Spark setup skill. - Provides a step-by-step guide for configuring an NVIDIA DGX Spark (GB10 Blackwell, 128GB unified memory) as a local LLM inference server using vLLM, LiteLLM, and OpenClaw. - Covers requirements and instructions for building custom PyTorch and Triton with sm_121 (Blackwell) GPU compatibility. - Includes guidance on setting up LiteLLM with per-user virtual keys, running as a systemd user service, and configuring Tailscale for secure remote access. - Offers troubleshooting information and pointers for resolving compatibility issues (torch, Triton, flashinfer).

元数据

Slug dgx-spark-setup

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

Dgx Spark Setup 是什么？

Set up and maintain an NVIDIA DGX Spark (GB10 Blackwell, 128GB unified memory) as a local LLM inference server running vLLM + LiteLLM + OpenClaw. Use when in... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 113 次。

如何安装 Dgx Spark Setup？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install dgx-spark-setup」即可一键安装，无需额外配置。

Dgx Spark Setup 是免费的吗？

是的，Dgx Spark Setup 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Dgx Spark Setup 支持哪些平台？

Dgx Spark Setup 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Dgx Spark Setup？

由 Jimmy Hernandez（@jimmy-hernandez）开发并维护，当前版本 v1.0.0。

Dgx Spark Setup