← 返回 Skills 市场
wbavon

Gpu Container Setup Flagos

作者 Flagos · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ 安全检测通过
74
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install gpu-container-setup-flagos
功能描述
Automatically detect GPU vendor, find appropriate PyTorch container image, launch with correct mounts, and validate GPU functionality. Supports NVIDIA, Ascen...
使用说明 (SKILL.md)

GPU Container Setup Skill

This skill automates multi-vendor GPU container setup for PyTorch workloads.

Supported GPU Vendors

Vendor PyTorch Backend Detection
NVIDIA CUDA nvidia-smi
AMD ROCm (HIP) rocm-smi, /opt/rocm
Ascend torch_npu npu-smi, /usr/local/Ascend
Metax torch_musa mx-smi, /opt/metax
Iluvatar torch_corex ixsmi, /opt/iluvatar

Execution Flow

When invoked, follow these steps:

Step 1: Parse Arguments

Check if user provided:

  • --vendor \x3Cname> - Force specific vendor (skip detection)
  • --image \x3Cimage> - Force specific container image
  • --data \x3Cpath> - Force specific data mount path
  • --name \x3Cname> - Container name (default: pytorch-gpu)

Step 2: Detect GPU Vendor

Run the detection script:

python3 .claude/skills/gpu-container-setup/scripts/detect_gpu.py

Expected output:

{"vendor": "ascend", "devices": ["Ascend 910B"], "count": 8}

If detection fails and no --vendor flag provided, ask user which vendor to use.

Step 3: Find Data Disk

Run the data disk detection:

python3 .claude/skills/gpu-container-setup/scripts/find_data_disk.py

Expected output:

{"data_disk": "/mnt/data", "found": true, "size": "2.0T", "available": "1.5T"}

If no suitable disk found, ask user for data mount path.

Step 4: Find Container Image

Follow strict priority order (only proceed to next if current fails):

1. Primary Vendor Hub (hardcoded) → 2. BAAI Harbor → 3. Web Search → 4. Local Images → 5. Ask User

Step 4.1: Primary Vendor Hub (hardcoded URLs)

Vendor Registry API/Query
NVIDIA nvcr.io https://api.ngc.nvidia.com/v2/repos/nvidia/pytorch/tags
Ascend ascendhub.huawei.com Portal: https://ascendhub.huawei.com
Metax registry.metax-tech.com https://registry.metax-tech.com/v2/pytorch/metax-pytorch/tags/list
Iluvatar hub.iluvatar.com https://hub.iluvatar.com/v2/pytorch/iluvatar-pytorch/tags/list
AMD docker.io (rocm/pytorch) https://hub.docker.com/v2/repositories/rocm/pytorch/tags
# Example: Query NGC for latest NVIDIA PyTorch
TAG=$(curl -s "https://api.ngc.nvidia.com/v2/repos/nvidia/pytorch/tags" | jq -r '.tags[].name' | grep -E '^[0-9]{2}\.[0-9]{2}-py3$' | sort -rV | head -1)
IMAGE="nvcr.io/nvidia/pytorch:${TAG}"

Step 4.2: BAAI Harbor (fallback)

Only if Step 4.1 fails (unreachable, no image, pull fails).

# Query BAAI Harbor
curl -s "https://harbor.baai.ac.cn/api/v2.0/projects/flagrelease-public/repositories?page_size=100" | jq -r '.[].name' | grep "flagrelease-\x3Cvendor>"

Step 4.3: Web Search (fallback)

Only if Steps 4.1 and 4.2 fail. Search for "\x3Cvendor> pytorch docker official".

Step 4.4: Local Images (fallback)

Only if Steps 4.1-4.3 fail. Check docker images | grep pytorch.

Test Before Use

docker pull "${IMAGE}" && docker run --rm "${IMAGE}" python -c "import torch; print(torch.__version__)"

If test fails, try next source. If all fail, ask user for image.

Step 4.5: Update Skill (self-improvement)

IMPORTANT: If image found via Web Search (Step 4.3) passes all tests, update references/image-sources.md to add the newly discovered vendor hub as a primary source. This makes future lookups faster.

# After successful web search discovery:
# 1. Verify image works (pull + pytorch test + GPU test)
# 2. Extract registry URL pattern
# 3. Update references/image-sources.md Step 1 section with new vendor hub

Step 5: Build Docker Command

Refer to references/mount-requirements.md for vendor-specific requirements.

NVIDIA:

docker run -d --gpus all \
  --name pytorch-gpu \
  --shm-size=16g \
  -v \x3Cdata_disk>:/data \
  \x3Cimage> sleep infinity

AMD/ROCm:

docker run -d \
  --device=/dev/kfd --device=/dev/dri \
  --group-add video --group-add render \
  --name pytorch-gpu \
  --shm-size=16g \
  -v \x3Cdata_disk>:/data \
  \x3Cimage> sleep infinity

Ascend:

docker run -d \
  --device=/dev/davinci0 --device=/dev/davinci1 ... \
  --device=/dev/davinci_manager \
  --device=/dev/devmm_svm \
  --device=/dev/hisi_hdc \
  -v /usr/local/Ascend:/usr/local/Ascend:ro \
  -v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi:ro \
  --name pytorch-gpu \
  --shm-size=16g \
  -v \x3Cdata_disk>:/data \
  \x3Cimage> sleep infinity

Metax:

docker run -d \
  --device=/dev/mx0 --device=/dev/mx1 ... \
  -v /opt/metax:/opt/metax:ro \
  --name pytorch-gpu \
  --shm-size=16g \
  -v \x3Cdata_disk>:/data \
  \x3Cimage> sleep infinity

Iluvatar:

docker run -d \
  --device=/dev/bi0 --device=/dev/bi1 ... \
  -v /opt/iluvatar:/opt/iluvatar:ro \
  --name pytorch-gpu \
  --shm-size=16g \
  -v \x3Cdata_disk>:/data \
  \x3Cimage> sleep infinity

Step 6: Start Container

Execute the docker run command. If container with same name exists:

  1. Check if it's running - offer to use existing or replace
  2. If stopped - offer to restart or replace

Step 7: Validate PyTorch GPU

Copy and run validation script inside container:

docker cp .claude/skills/gpu-container-setup/scripts/validate_pytorch.py pytorch-gpu:/tmp/
docker exec pytorch-gpu python3 /tmp/validate_pytorch.py

Expected output:

{
  "status": "PASS",
  "backend": "npu",
  "device_count": 8,
  "device_names": ["Ascend 910B", ...],
  "tests": {
    "device_detection": true,
    "tensor_creation": true,
    "matrix_multiply": true,
    "gpu_to_cpu_transfer": true
  }
}

Step 8: Report Results

Summarize to user:

  • GPU vendor and devices detected
  • Container name and image used
  • Data mount path
  • Validation status
  • How to access: docker exec -it pytorch-gpu bash

Error Handling

Error Action
No GPU detected Ask user for vendor or check drivers
Image pull fails Try alternative registry or web search
Container start fails Check device permissions, show error
Validation fails Show detailed error, suggest fixes

Reference Files

  • references/gpu-detection.md - Detection methods by vendor
  • references/image-sources.md - Image discovery guide (registry APIs, priority order, selection criteria)
  • references/mount-requirements.md - Vendor mount specifications

Example Usage

User: /gpu-container-setup
User: setup a pytorch container
User: start container with ascend GPU
User: /gpu-container-setup --image nvcr.io/nvidia/pytorch:24.01-py3
User: /gpu-container-setup --image harbor.baai.ac.cn/flagrelease-public/ngctorch:2601
安全使用建议
Treat this as an incomplete review: the command runner failed before metadata.json or artifact files could be inspected, so installation should wait for a successful artifact review.
能力评估
Purpose & Capability
Artifact review was blocked by sandbox execution failure, so purpose and capabilities could not be confirmed from metadata.json or artifact files.
Instruction Scope
Instruction scope could not be evaluated from artifact text because local inspection commands failed before file contents could be read.
Install Mechanism
Install mechanism could not be evaluated because artifact files were not accessible through the available command runner.
Credentials
Environment access could not be assessed from evidence; no artifact-backed mismatch was available.
Persistence & Privilege
Persistence or privilege behavior could not be assessed from artifact evidence; no concrete risky behavior was observed.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install gpu-container-setup-flagos
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /gpu-container-setup-flagos 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
gpu-container-setup-flagos v1.0.0 - Initial release: automates setup of PyTorch containers, auto-detecting GPU vendor and configuring appropriate images, mounts, and validations. - Supports NVIDIA, AMD/ROCm, Ascend, Metax, and Iluvatar GPUs. - Multi-step workflow: argument parsing, vendor/disk/image detection, container launch, and GPU validation. - Robust image source priority: primary vendor registry → BAAI Harbor → web search → local images → ask user. - Features self-updating mechanism to improve image sources discovered via web search. - Includes detailed error handling and vendor-specific container requirements.
元数据
Slug gpu-container-setup-flagos
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Gpu Container Setup Flagos 是什么?

Automatically detect GPU vendor, find appropriate PyTorch container image, launch with correct mounts, and validate GPU functionality. Supports NVIDIA, Ascen... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 74 次。

如何安装 Gpu Container Setup Flagos?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install gpu-container-setup-flagos」即可一键安装,无需额外配置。

Gpu Container Setup Flagos 是免费的吗?

是的,Gpu Container Setup Flagos 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Gpu Container Setup Flagos 支持哪些平台?

Gpu Container Setup Flagos 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Gpu Container Setup Flagos?

由 Flagos(@wbavon)开发并维护,当前版本 v1.0.0。

💬 留言讨论