← Back to Skills Marketplace
wbavon

Vllm Plugin Fl Setup Flagos

by Flagos · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ Security Clean
69
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install vllm-plugin-fl-setup-flagos
Description
Install and configure vLLM-Plugin-FL for multiple hardware backends including NVIDIA, Ascend and etc. Use when setting up vllm-plugin-fl, configuring the env...
README (SKILL.md)

vLLM-Plugin-FL Setup

Overview

vLLM-Plugin-FL extends vLLM to support model inference/serving across diverse hardware backends (NVIDIA, Ascend, MetaX, Iluvatar, etc.) via FlagOS's unified operator library FlagGems and communication library FlagCX. This skill covers installation, hardware-specific environment configuration, and dependency setup.

Prerequisites

  • Linux OS (Ubuntu 20.04+ recommended)
  • Python 3.10+
  • vLLM v0.13.0 — install from the official v0.13.0 release or the fork vllm-FL
  • GPU with appropriate drivers (NVIDIA CUDA, Huawei Ascend, etc.)
  • pip package manager
  • Git

Verify vLLM version before proceeding:

python -c "import vllm; print(vllm.__version__)"
# Expected output: 0.13.0

Installation Workflow

Step 1: Identify Hardware Backend

# NVIDIA GPU
nvidia-smi

# Huawei NPU
npu-smi info

# Moore Threads GPU
mthreads-gmi

# Iluvatar GPU
ixsmi

Step 2: Install vLLM-Plugin-FL

First create a workspace directory and try cloning the source code:

mkdir -p ~/flagos-workspace && cd ~/flagos-workspace
git clone https://github.com/flagos-ai/vllm-plugin-FL

If git clone fails due to network issues, ask the user for their network proxy settings (e.g. http_proxy / https_proxy), configure the proxy, then retry the clone.

Then install from the source directory:

cd vllm-plugin-FL
pip install -r requirements.txt
pip install --no-build-isolation .
# Required to enable vLLM-Plugin-FL when running vLLM
export VLLM_PLUGINS='fl'

Verify vLLM-Plugin-FL installation:

python -c "import vllm_fl; print('vllm-plugin-FL installed successfully')"

Step 3: Install FlagGems

Ascend NPU users: Before installing FlagGems, you must first install FlagTree. See references/npu.md and complete the FlagTree installation step there before proceeding. Otherwise the FlagGems verification will fail repeatedly and keep reinstalling Triton.

# Install build dependencies
pip install -U scikit-build-core==0.11 pybind11 ninja cmake

# Clone FlagGems source code
cd ~/flagos-workspace
git clone https://github.com/flagos-ai/FlagGems

If git clone fails due to network issues, ask the user for their network proxy settings (e.g. http_proxy / https_proxy), configure the proxy, then retry the clone.

Then install from the source directory:

cd FlagGems
pip install --no-build-isolation .

Verify FlagGems installation:

python -c "import flag_gems; print('FlagGems installed successfully')"

Step 4: (Optional) Install FlagCX

FlagCX is a unified communication library for multi-device distributed inference, supporting both homogeneous and heterogeneous setups. Skip this step if running on a single device.

Note: Ascend NPU does not need FlagCX — skip this step for Ascend backends.

cd ~/flagos-workspace
git clone https://github.com/flagos-ai/FlagCX.git

If git clone fails due to network issues, ask the user for their network proxy settings (e.g. http_proxy / https_proxy), configure the proxy, then retry the clone.

Then build and install from the source directory:

cd FlagCX

git submodule update --init --recursive

# Build for your platform (e.g. USE_NVIDIA=1 for NVIDIA)
make USE_NVIDIA=1

export FLAGCX_PATH="$PWD"

# Install Python binding (replace [xxx] with your platform: nvidia, ascend, etc.)
cd plugin/torch/
FLAGCX_ADAPTOR=[xxx] pip install --no-build-isolation .

Verify FlagCX installation:

python -c "import flagcx; print('FlagCX installed successfully')"

Step 5: Backend-Specific Setup

Some hardware backends require additional setup. See the corresponding reference document:

Backend Chip Vendor Reference
Ascend NPU Huawei references/npu.md
MetaX GPU MetaX TBD
Iluvatar GPU (BI-V150) Iluvatar references/iluvatar_gpu.md
Pingtouge-Zhenwu Pingtouge TBD
Tsingmicro Tsingmicro TBD
Moore Threads GPU Moore Threads references/mthreads_gpu.md
Hygon DCU Hygon TBD

Quick Test

  1. Ask the user for the model name they want to test (e.g. Qwen3-4B, DeepSeek-R1).
  2. Search the machine for a local copy of that model:
    find / -maxdepth 5 -type d -name "\x3Cuser_provided_model_name>" 2>/dev/null
    
  3. If found, use the discovered path. If not found, tell the user and ask them to provide a different model name or a full local path, then repeat the search. If after 3 attempts no valid model is found, skip the quick test and inform the user to prepare a model before retrying.
  4. Ensure the FL plugin is enabled before running inference:
    export VLLM_PLUGINS='fl'
    
    For Moore Threads GPU, also set:
    export USE_FLAGGEMS=1
    export FLAGCX_PATH=/workspace/FlagCX  # MUST point to the actual FlagCX installation directory; this is only an example
    export VLLM_MUSA_ENABLE_MOE_TRITON=1
    
  5. Once a valid model path is resolved, run offline batched inference to verify the full stack:
from vllm import LLM, SamplingParams

model_path = "\x3Cresolved_model_path>"
prompts = [
    "Hello, my name is",
]
sampling_params = SamplingParams(max_tokens=10, temperature=0.0)

# For Moore Threads GPU, add: enforce_eager=True, block_size=64, attention_config={"backend": "TORCH_SDPA"}
# For Iluvatar BI-V150, add: enforce_eager=True
llm = LLM(model=model_path, max_num_batched_tokens=16384, max_num_seqs=2048)
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

Troubleshooting

Out of memory on model load: Use gpu_memory_utilization parameter to limit memory. Start with 0.8 and adjust:

from vllm import LLM
llm = LLM(model="...", gpu_memory_utilization=0.8)

FlagGems build failures: Ensure build dependencies are installed (scikit-build-core, pybind11, ninja, cmake). Check that your compiler supports C++17.

Plugin not loaded: If vLLM does not use the FL plugin, verify that VLLM_PLUGINS='fl' is set in your environment.

FlagCX communication errors: Ensure FLAGCX_PATH is correctly set and the library was built for your platform. For NVIDIA, verify with make USE_NVIDIA=1.

Ascend-specific issues: See references/npu.md for Ascend NPU troubleshooting, including FlagTree setup and eager execution requirements.

Cannot connect to GitHub: Ask the user for their network proxy settings (e.g. http_proxy / https_proxy), configure the proxy, then retry the git clone command.

References

Usage Guidance
Treat this as an incomplete low-confidence review, not a clearance. Re-run ClawScan in an environment where metadata.json and artifact/ can be read before installing or publishing this skill.
Capability Tags
cryptocan-make-purchases
Capability Assessment
Purpose & Capability
The requested metadata.json and artifact/ contents could not be read, so purpose and capability coherence could not be confirmed from artifacts.
Instruction Scope
Runtime instructions could not be inspected; no instruction-scope concern is supported by artifact evidence available to this review.
Install Mechanism
Install specifications could not be inspected; no install-mechanism concern is supported by artifact evidence available to this review.
Credentials
Environment access and proportionality could not be assessed from artifacts because local file reads failed before shell execution.
Persistence & Privilege
Persistence or privilege behavior could not be confirmed; no artifact-backed persistence or privilege concern was available.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install vllm-plugin-fl-setup-flagos
  3. After installation, invoke the skill by name or use /vllm-plugin-fl-setup-flagos
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release of vllm-plugin-fl-setup-flagos. - Provides guided installation and configuration of vLLM-Plugin-FL, FlagGems, and FlagCX for multiple hardware backends (NVIDIA, Ascend, Moore Threads, etc.). - Suggests specific backend workflows and highlights situations such as network proxy setup and Ascend-specific requirements. - Includes troubleshooting for installation, build errors, environment variables, and backend-specific issues. - Offers quick model inference test steps to verify successful setup. - Lists reference documents for further backend and troubleshooting guidance.
Metadata
Slug vllm-plugin-fl-setup-flagos
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Vllm Plugin Fl Setup Flagos?

Install and configure vLLM-Plugin-FL for multiple hardware backends including NVIDIA, Ascend and etc. Use when setting up vllm-plugin-fl, configuring the env... It is an AI Agent Skill for Claude Code / OpenClaw, with 69 downloads so far.

How do I install Vllm Plugin Fl Setup Flagos?

Run "/install vllm-plugin-fl-setup-flagos" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Vllm Plugin Fl Setup Flagos free?

Yes, Vllm Plugin Fl Setup Flagos is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Vllm Plugin Fl Setup Flagos support?

Vllm Plugin Fl Setup Flagos is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Vllm Plugin Fl Setup Flagos?

It is built and maintained by Flagos (@wbavon); the current version is v1.0.0.

💬 Comments