← 返回 Skills 市场
wbavon

Model Verify Flagos

作者 Flagos · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ 安全检测通过
78
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install model-verify-flagos
功能描述
Verify the serving stack with a user-specified target model. Runs twice: first with FlagGems/FlagCX disabled (isolate model-specific errors), then with full...
使用说明 (SKILL.md)

Target Model Verification

Same layer-peeling approach as env-verify, but with the real target model that may require multi-GPU tensor parallelism. Runs the model twice (without and with multi-chip stack) and diffs results to isolate failures.

Skill Components

model-verify/
├── SKILL.md                            # This file — execution flow
├── scripts/
│   └── diff_analysis.py                # Compare Run A vs Run B, classify errors (JSON)
└── references/
    └── multichip-errors.md             # Multi-chip error patterns and diff truth table

Reused from env-verify:

  • env-verify/scripts/run_offline_inference.py — Phase A test (parameterized)
  • env-verify/scripts/test_serve_mode.py — Phase B test (parameterized)
  • env-verify/references/error-classification.md — Layer-based error rules

Prerequisites

  • Running container with software stack installed (from install-stack)
  • env-verify completed (at least Phase A passed)
  • User must provide model path

If invoked standalone, ask for container name, vendor, and model path. If invoked from /flagrelease, these are passed as context.

Execution Flow

Step 1: Get Model Info from User

Ask the user for (use AskUserQuestion if not provided):

  1. Model path (required) — local path inside container OR ModelScope/HuggingFace ID
  2. --tensor-parallel-size (optional) — default to GPU count
  3. Additional vllm args (optional)

Get default TP size:

docker exec \x3CCONTAINER> python3 -c "
import torch; print(torch.cuda.device_count() if torch.cuda.is_available() else 1)
"

If user does not provide model path → ask and wait. Do not guess.

Step 2: Download Model (if needed)

If model path is a remote ID (not starting with /):

docker exec \x3CCONTAINER> python3 -c "
from modelscope import snapshot_download
snapshot_download('\x3CMODEL_ID>', local_dir='/data/models/\x3CMODEL_NAME>')
"

If local directory, verify config.json exists:

docker exec \x3CCONTAINER> test -f \x3CMODEL_PATH>/config.json

Timeout: 600s for large model downloads.

Step 3: Run A — WITHOUT Multi-Chip Stack

Copy the test scripts from env-verify into the container (if not already there):

docker cp \x3CENV_VERIFY_DIR>/scripts/run_offline_inference.py \x3CCONTAINER>:/tmp/
docker cp \x3CENV_VERIFY_DIR>/scripts/test_serve_mode.py \x3CCONTAINER>:/tmp/

Phase A (offline):

docker exec \x3CCONTAINER> bash -c '
export USE_FLAGGEMS=0
unset FLAGCX_PATH
timeout 300 python3 /tmp/run_offline_inference.py \
    --model \x3CMODEL_PATH> \
    --tp \x3CTP_SIZE> \
    --trust-remote-code
' > /tmp/run_a_offline.json

Phase B (serve):

docker exec \x3CCONTAINER> bash -c '
export USE_FLAGGEMS=0
unset FLAGCX_PATH
timeout 360 python3 /tmp/test_serve_mode.py \
    --model \x3CMODEL_PATH> \
    --tp \x3CTP_SIZE> \
    --trust-remote-code \
    --health-timeout 300
' > /tmp/run_a_serve.json

Step 4: Run B — WITH Full Multi-Chip Stack

Skip logic: If ALL of FlagGems, FlagTree, FlagCX failed install → skip Run B. Report: "Run B skipped: no multi-chip packages installed." Check install-stack results to decide.

Phase A (offline):

docker exec \x3CCONTAINER> bash -c '
export USE_FLAGGEMS=1
export FLAGCX_PATH=/tmp/FlagCX
export VLLM_PLUGINS=fl
timeout 300 python3 /tmp/run_offline_inference.py \
    --model \x3CMODEL_PATH> \
    --tp \x3CTP_SIZE> \
    --trust-remote-code
' > /tmp/run_b_offline.json

Phase B (serve):

docker exec \x3CCONTAINER> bash -c '
export USE_FLAGGEMS=1
export FLAGCX_PATH=/tmp/FlagCX
export VLLM_PLUGINS=fl
timeout 360 python3 /tmp/test_serve_mode.py \
    --model \x3CMODEL_PATH> \
    --tp \x3CTP_SIZE> \
    --trust-remote-code \
    --health-timeout 300
' > /tmp/run_b_serve.json

Step 5: Diff Analysis

Copy and run scripts/diff_analysis.py to compare the two runs:

docker cp \x3CSKILL_DIR>/scripts/diff_analysis.py \x3CCONTAINER>:/tmp/
docker exec \x3CCONTAINER> python3 /tmp/diff_analysis.py \
    --run-a /tmp/run_a_offline.json \
    --run-b /tmp/run_b_offline.json

Read references/multichip-errors.md to interpret the diff and classify errors.

Step 6: Produce Report

{
  "status": "PASS | PARTIAL | FAIL",
  "stage": "model-verify",
  "model": "\x3CMODEL_PATH>",
  "tensor_parallel_size": 8,
  "run_a_without_multichip": {
    "flags": {"USE_FLAGGEMS": "0", "FLAGCX_PATH": "unset"},
    "phase_a_offline": "PASS | FAIL",
    "phase_b_serve": "PASS | FAIL",
    "output_sample": "...",
    "errors": []
  },
  "run_b_with_multichip": {
    "flags": {"USE_FLAGGEMS": "1", "FLAGCX_PATH": "/tmp/FlagCX"},
    "skipped": false,
    "phase_a_offline": "PASS | FAIL",
    "phase_b_serve": "PASS | FAIL",
    "output_sample": "...",
    "errors": []
  },
  "diff_analysis": {
    "conclusion": "BOTH_PASS | MULTICHIP_ERROR | SAME_ERROR | DIFFERENT_ERRORS",
    "detail": "...",
    "multichip_component": "FlagGems | FlagTree | FlagCX | plugin | null",
    "recommended_stack": "full | base | none"
  }
}

recommended_stack — tells downstream skills which stack to use:

  • full — Run B passed (USE_FLAGGEMS=1, FLAGCX_PATH set)
  • base — only Run A passed (USE_FLAGGEMS=0, FLAGCX_PATH unset)
  • none — Run A also failed (model can't serve)

Status logic:

  • PASS — both Run A and Run B succeed
  • PARTIAL — Run A passes, Run B fails
  • FAIL — Run A fails

Error Handling

Failure Behavior
Model path not provided Ask user, wait
Model path not found Report exact path, exit with error
Model too large for memory Report OOM, suggest reducing TP or dtype
TP > available GPUs Report "requested TP=X but only Y available"
Server hangs Kill after timeout, capture last logs
Run A and Run B both fail Report both errors separately

Rule: Run BOTH runs regardless of individual failures. Maximize error coverage.

Timeout Rules

Operation Timeout
Model download 600s
Phase A (offline) 300s
Phase B (serve + test) 360s
安全使用建议
The review could not validate the skill contents from the workspace in this run, so treat this as an incomplete low-confidence result rather than a substantive approval.
能力评估
Purpose & Capability
No purpose or capability mismatch was evidenced in the supplied message artifacts.
Instruction Scope
No unsafe instruction scope was evidenced in the supplied message artifacts.
Install Mechanism
No risky install mechanism was evidenced in the supplied message artifacts.
Credentials
No disproportionate environment access was evidenced in the supplied message artifacts.
Persistence & Privilege
No persistence or privilege issue was evidenced in the supplied message artifacts.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install model-verify-flagos
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /model-verify-flagos 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
model-verify-flagos 1.0.0 – Initial release - New skill to verify model serving stack with and without multi-chip features (FlagGems/FlagCX), isolating model-specific and stack-specific errors. - Runs target model in two phases (offline/serve mode), with and without multi-chip stack enabled. - Automatically handles model download if given a remote ID; checks prerequisites and asks user for any missing info. - Produces a structured JSON report summarizing results, error analysis, and recommended stack to use. - Includes robust error handling, dynamic timeouts, and integration with components reused from env-verify.
元数据
Slug model-verify-flagos
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Model Verify Flagos 是什么?

Verify the serving stack with a user-specified target model. Runs twice: first with FlagGems/FlagCX disabled (isolate model-specific errors), then with full... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 78 次。

如何安装 Model Verify Flagos?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install model-verify-flagos」即可一键安装,无需额外配置。

Model Verify Flagos 是免费的吗?

是的,Model Verify Flagos 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Model Verify Flagos 支持哪些平台?

Model Verify Flagos 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Model Verify Flagos?

由 Flagos(@wbavon)开发并维护,当前版本 v1.0.0。

💬 留言讨论