/install flagrelease-entrance-flagos
FlagRelease Pipeline Orchestrator
End-to-end LLM deployment + testing pipeline for multi-chip GPU backends. Orchestrates 4 sub-skills in sequence and produces a final report.
Skill Components
flagrelease/
├── SKILL.md # This file — orchestration flow
└── references/
└── pipeline-state.md # Pipeline state schema, gate logic, data flow
Sub-skills (each independently invokable):
../install-stack/ # Step 2: Install 5 packages
│ ├── SKILL.md
│ ├── scripts/
│ │ ├── detect_network.py # Probe GitHub/PyPI, return mirror config
│ │ ├── collect_env_info.py # Python/glibc/arch/vendor/disk info
│ │ ├── select_flagtree_wheel.py # Match vendor+python+glibc → wheel
│ │ └── validate_packages.py # Import-test all 5 packages
│ └── references/
│ ├── vendor-mappings.md # FlagCX make flags, adaptor names
│ └── network-mirrors.md # Mirror config rules
../env-verify/ # Step 3: Qwen3-0.6B smoke test
│ ├── SKILL.md
│ ├── scripts/
│ │ ├── run_offline_inference.py # Phase A: offline inference test
│ │ └── test_serve_mode.py # Phase B: serve + health + chat test
│ └── references/
│ └── error-classification.md # Layer-based error classification
../model-verify/ # Step 4: Target model ± multi-chip
│ ├── SKILL.md
│ ├── scripts/
│ │ └── diff_analysis.py # Compare Run A vs Run B results
│ └── references/
│ └── multichip-errors.md # Multi-chip error patterns
../perf-test/ # Steps 5+6: Accuracy + Performance
│ ├── SKILL.md
│ ├── scripts/
│ │ ├── run_benchmark.py # Run single benchmark profile
│ │ └── run_all_benchmarks.py # Run all profiles + summarize
│ └── references/
│ └── benchmark-profiles.md # Profile definitions and metrics
Pipeline Overview
[Prerequisite: /gpu-container-setup already done by another team]
│
▼
install-stack → Install 5 packages (vLLM, FlagTree, FlagGems, FlagCX, plugin)
│ scripts: detect_network, collect_env_info, select_flagtree_wheel
│
│ GATE: vLLM + plugin must succeed
▼
env-verify → Smoke test with Qwen3-0.6B (FlagGems/CX OFF)
│ scripts: run_offline_inference, test_serve_mode
│
│ Verify Layers 0-3
▼
model-verify → Target model test (OFF then ON), diff analysis
│ scripts: run_offline_inference, test_serve_mode, diff_analysis
│
│ Determine which stack works (full vs base)
▼
perf-test → Accuracy (placeholder) + Performance benchmarks
│ scripts: run_benchmark, run_all_benchmarks
▼
Final Report
Prerequisites
A running Docker container with:
- PyTorch installed and GPU-accessible
- Container name known (e.g.
flagrelease-worker)
This container is produced by /gpu-container-setup (maintained by another team).
Execution Flow
Read references/pipeline-state.md for the full state schema and gate logic.
Step 0: Gather Initial Context
Ask user for container name (or detect running containers):
docker ps --format '{{.Names}}' | head -10
Verify the container is running:
docker inspect --format='{{.State.Status}}' \x3CCONTAINER> | grep -q running
Initialize pipeline state (see references/pipeline-state.md).
Step 1: Install Software Stack
Read and follow ../install-stack/SKILL.md.
The install-stack skill will:
- Copy
scripts/collect_env_info.pyinto container → get vendor, Python, glibc - Copy
scripts/detect_network.pyinto container → get mirror config - Install 5 packages in order, using
scripts/select_flagtree_wheel.pyfor FlagTree - Run
scripts/validate_packages.pyinside container → get final status
Gate check: If gate_passed is false (vLLM or plugin failed) → STOP pipeline.
Report FAIL with install errors.
Store result in pipeline state.
Step 2: Environment Verification
Read and follow ../env-verify/SKILL.md.
The env-verify skill will:
- Download Qwen3-0.6B (if not cached)
- Copy
scripts/run_offline_inference.pyinto container → Phase A - Copy
scripts/test_serve_mode.pyinto container → Phase B - Classify errors using
references/error-classification.md
Decision: Fatal error → STOP. Non-fatal → record and continue.
Store result in pipeline state.
Step 3: Model Verification
Read and follow ../model-verify/SKILL.md.
This step is interactive — will ask user for model path.
The model-verify skill will:
- Get model info from user (AskUserQuestion)
- Reuse
run_offline_inference.pyandtest_serve_mode.pyfor Run A and Run B - Run
scripts/diff_analysis.pyto compare results - Determine
recommended_stack(full/base/none)
Decision: If recommended_stack is none (Run A failed) → STOP.
Store result in pipeline state (including model_path, tp_size, recommended_stack).
Step 4: Performance Test
Read and follow ../perf-test/SKILL.md.
The perf-test skill will:
- Start vllm serve with recommended stack
- Copy
scripts/run_all_benchmarks.pyinto container → run 5 profiles - Collect metrics and produce summary table
Store result in pipeline state.
Step 5: Final Report
Compile all results from pipeline state into a final report:
{
"status": "PASS | PARTIAL | FAIL",
"pipeline": "flagrelease",
"container": "\x3Cname>",
"vendor": "\x3Cvendor>",
"model": "\x3Cpath>",
"tensor_parallel_size": 8,
"steps": {
"install_stack": { "status": "...", "packages": {...} },
"env_verify": { "status": "...", "phase_a": "...", "phase_b": "..." },
"model_verify": { "status": "...", "run_a": "...", "run_b": "...", "recommended_stack": "..." },
"perf_test": { "status": "...", "profiles_passed": "5/5", "summary_table": "..." }
},
"errors": [...],
"conclusion": "Pipeline completed. ..."
}
Present to user with clear summary:
- Which packages installed / failed
- Whether base stack works
- Whether multi-chip stack works (and which component failed if not)
- Performance numbers (summary table)
- All errors with layer classification
Overall status:
PASS— all steps pass, full multi-chip stack worksPARTIAL— model works with degraded stack, or some perf profiles failedFAIL— model cannot serve (gate or Run A failure)
Design Rules
- Every operation has a timeout — no hangs allowed
- Every error is caught with precise location (step, phase, layer, cause)
- Pipeline always completes with success or structured error report
- One sub-step failure does NOT skip unrelated steps (unless gate failure)
- Network uses mirrors when direct access fails
- Scripts produce JSON — structured, parseable, comparable across runs
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install flagrelease-entrance-flagos - After installation, invoke the skill by name or use
/flagrelease-entrance-flagos - Provide required inputs per the skill's parameter spec and get structured output
What is Flagrelease Entrance Flagos?
Full FlagRelease pipeline orchestrator. Runs the complete LLM deployment, verification, and benchmarking pipeline for multi-chip GPU backends. Executes: inst... It is an AI Agent Skill for Claude Code / OpenClaw, with 77 downloads so far.
How do I install Flagrelease Entrance Flagos?
Run "/install flagrelease-entrance-flagos" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Flagrelease Entrance Flagos free?
Yes, Flagrelease Entrance Flagos is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Flagrelease Entrance Flagos support?
Flagrelease Entrance Flagos is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Flagrelease Entrance Flagos?
It is built and maintained by Flagos (@wbavon); the current version is v1.0.0.