功能描述

Use FusionBench to run model fusion experiments. Covers running benchmarks, adding new merging algorithms, evaluating fused models, and managing model pools....

使用说明 (SKILL.md)

FusionBench Skill

Name: fusion-bench
Author: tanganke

FusionBench is a comprehensive benchmark/toolkit for deep model fusion (model merging).

Paper: arXiv:2406.03280
PyPI: pip install fusion-bench
Repo: https://code.tanganke.com/tanganke/fusion_bench
Docs: https://tanganke.github.io/fusion_bench/

Quick Start

# Install
pip install fusion-bench

# Run a simple experiment (CLIP ViT-B/32, task arithmetic on 8 tasks)
fusion_bench method=task_arithmetic modelpool=clip-vit-base-patch32 taskpool=clip-vit-base-patch32_8tasks

# Run with different merging method
fusion_bench method=ties_merging modelpool=clip-vit-base-patch32 taskpool=clip-vit-base-patch32_8tasks

Architecture Overview

fusion_bench/
├── method/           # Merging algorithms (30+)
├── modelpool/        # Model loading & management
├── config/           # Hydra YAML configs
├── tasks/            # Task evaluation
├── utils/            # Helpers (state_dict ops, lazy loading, etc.)
└── scripts/          # CLI & web UI

Key Components

ModelPool: Loads and manages pre-trained/fine-tuned models
- AutoModelPool: Auto-selects based on config
- CLIPVisionModelPool: For CLIP ViT models
- CausalLMPool: For Llama, GPT-2, etc.
Method: The merging algorithm
- Inherits from BaseModelFusionAlgorithm
- Implements run(modelpool) → merged model
TaskPool: Evaluation tasks
- CLIP: 8-38 classification tasks
- LLM: ARC, HellaSwag, MMLU, etc.

Supported Merging Methods

Basic

Method	Config Name	Description
Simple Average	`simple_average`	Uniform weight averaging
Weighted Average	`weighted_average`	Learnable task weights
Task Arithmetic	`task_arithmetic`	task_vector = fine-tuned - base
Slerp	`slerp`	Spherical interpolation

Sparse/Pruning

Method	Config Name	Description
TIES	`ties_merging`	Trim, Elect, Sign + merge
DARE	`dare`	Drop And REscale
Magnitude Pruning	`magnitude_pruning`	Prune by magnitude

Advanced

Method	Config Name	Description
AdaMerging	`adamerging`	Learn layer-wise coefficients
Fisher Merging	`fisher_merging`	Fisher-weighted merging
RegMean	`regmean`	Regression mean (closed-form)
RegMean++	`regmean_plusplus`	Enhanced RegMean with cross-layer deps

MoE-Based

Method	Config Name	Description
WE-MoE	`we_moe`	Weight Ensembling MoE
PWE-MoE	`pwe_moe`	Pareto-optimal WE-MoE
RankOne-MoE	`rankone_moe`	Rank-1 expert decomposition
Sparse-WE-MoE	`sparse_we_moe`	Sparse weight ensembling

Continual Merging

Method	Config Name	Description
OPCM	`opcm`	Orthogonal Projection Continual Merging
DOP	`dop`	Dual Orthogonal Projection
Gossip	`gossip`	Gossip-based continual merging

Specialized

Method	Config Name	Description
ISO-C/CTS	`isotropic_merging`	Isotropic merging in common/task subspace
AdaSVD	`ada_svd`	SVD-based adaptive merging
WUDI	`wudi`	Wasserstein distance merging
ExPO	`expo`	Exponential task vectors

Running Experiments

1. Basic Merging (CLI)

# Task Arithmetic on CLIP ViT-B/32
fusion_bench \
  method=task_arithmetic \
  modelpool=clip-vit-base-patch32 \
  taskpool=clip-vit-base-patch32_8tasks

# TIES merging with custom scaling
fusion_bench \
  method=ties_merging \
  method.scaling_coefficient=0.3 \
  modelpool=clip-vit-base-patch32 \
  taskpool=clip-vit-base-patch32_8tasks

2. LLM Merging

# Merge Llama models
fusion_bench \
  method=task_arithmetic \
  modelpool=llama2-7b \
  taskpool=llama2-7b_tasks

# With DARE
fusion_bench \
  method=dare \
  method.type=task_arithmetic \
  modelpool=llama2-7b

3. Using Fabric (Distributed/Mixed Precision)

fusion_bench \
  fabric=deepspeed_stage_2 \
  method=adamerging \
  modelpool=clip-vit-base-patch32

Adding a New Method

Step 1: Create method file

# fusion_bench/method/my_method.py
from fusion_bench.method.base_algorithm import BaseModelFusionAlgorithm
from fusion_bench.modelpool import BaseModelPool
import torch

class MyMergingAlgorithm(BaseModelFusionAlgorithm):
    """
    My custom merging algorithm.
    """
    def __init__(self, scaling_coefficient: float = 1.0, **kwargs):
        super().__init__(**kwargs)
        self.scaling_coefficient = scaling_coefficient
    
    @torch.no_grad()
    def run(self, modelpool: BaseModelPool):
        # 1. Load base model
        base_model = modelpool.load_model("_base_")
        base_sd = base_model.state_dict()
        
        # 2. Compute merged task vectors
        merged_tv = {}
        for model_name in modelpool.model_names:
            if model_name == "_base_":
                continue
            model = modelpool.load_model(model_name)
            tv = {k: v - base_sd[k] for k, v in model.state_dict().items()}
            # Your merging logic here
            for k in tv:
                if k not in merged_tv:
                    merged_tv[k] = tv[k] * self.scaling_coefficient
                else:
                    merged_tv[k] += tv[k] * self.scaling_coefficient
        
        # 3. Apply merged task vector
        for k in base_sd:
            base_sd[k] += merged_tv.get(k, 0)
        
        base_model.load_state_dict(base_sd)
        return base_model

Step 2: Register in `init.py`

# fusion_bench/method/__init__.py
_import_structure = {
    ...
    "my_method": ["MyMergingAlgorithm"],
}

Step 3: Create config

# config/method/my_method.yaml
_target_: fusion_bench.method.my_method.MyMergingAlgorithm
scaling_coefficient: 1.0

Step 4: Run

fusion_bench method=my_method modelpool=clip-vit-base-patch32

Model Pool Configuration

CLIP Models

# config/modelpool/clip-vit-base-patch32.yaml
_target_: fusion_bench.modelpool.CLIPVisionModelPool
model_names:
  - _base_
  - Cars
  - DTD
  - EuroSAT
  - GTSRB
  - MNIST
  - RESISC45
  - SUN397
  - SVHN
model_dir: ${oc.env:HOME}/.cache/fusion_bench/models

LLM Models

# config/modelpool/llama2-7b.yaml
_target_: fusion_bench.modelpool.CausalLMPool
model_names:
  - _base_
  - arc
  - hellaswag
  - mmlu
model_dir: ${oc.env:HOME}/.cache/fusion_bench/llama_models

Utilities

State Dict Arithmetic

from fusion_bench.utils.state_dict_arithmetic import StateDict

# Convenient operations on state dicts
sd1 = StateDict(model1.state_dict())
sd2 = StateDict(model2.state_dict())

merged = sd1 + sd2           # Add
diff = sd1 - sd2             # Subtract
scaled = sd1 * 0.5           # Scale
tv_merged = sd1 + 0.3 * sd2  # Linear combination

Lazy State Dict

from fusion_bench.utils.lazy_state_dict import LazyStateDict

# Load large models without OOM
lazy_sd = LazyStateDict.from_file("model.safetensors")
# Only loads tensors when accessed

Common Workflows

1. Evaluate a single merged model

from fusion_bench import AutoModelPool
from fusion_bench.method import SimpleAverageAlgorithm

pool = AutoModelPool.from_config("config/modelpool/clip-vit-base-patch32.yaml")
method = SimpleAverageAlgorithm()
merged_model = method.run(pool)

# Evaluate on tasks
for task_name in pool.model_names:
    if task_name == "_base_":
        continue
    acc = evaluate(merged_model, task_name)
    print(f"{task_name}: {acc:.2%}")

2. Hyperparameter search

# Sweep scaling coefficient
for coeff in 0.2 0.4 0.6 0.8 1.0; do
  fusion_bench \
    method=task_arithmetic \
    method.scaling_coefficient=$coeff \
    modelpool=clip-vit-base-patch32
done

3. Compare multiple methods

for method in simple_average task_arithmetic ties_merging dare; do
  echo "=== $method ==="
  fusion_bench \
    method=$method \
    modelpool=clip-vit-base-patch32 \
    taskpool=clip-vit-base-patch32_8tasks
done

Tips

Memory: Use fabric=deepspeed_stage_2 for large models
Caching: Models are cached in ~/.cache/fusion_bench/
Reproducibility: Set seed=42 in config
Debugging: Use hydra.verbose=true for detailed logs
Web UI: Run fusion_bench_webui for interactive exploration

Related Papers

FusionBench (arXiv:2406.03280) - The benchmark paper
SMILE (arXiv:2408.10174) - Sparse MoE from pre-trained models
WE-MoE - Weight Ensembling MoE for multi-task merging
OPCM/DOP - Continual model merging methods
RegMean++ (arXiv:2508.03121) - Enhanced RegMean

安全使用建议

This skill is coherent for running FusionBench experiments, but take these precautions before installing or running it: - Verify the PyPI package and source repository: check the fusion-bench package page on PyPI, confirm the package owner, review the package files, and inspect the linked repository (the SKILL.md repo is on code.tanganke.com rather than GitHub). Malicious packages can be distributed via PyPI. - Inspect the code before installing or run installation in a sandbox/container. pip install will download and run code on your machine. - Expect large downloads and heavy compute: merging LLMs and CLIP models can require substantial disk, memory, and possibly cloud/GPU resources. Ensure you understand where models will be pulled from (local paths vs. model hubs) and whether tokens/keys are needed. - If you'll load models from model hubs (Hugging Face, private storage), ensure any access tokens are granted only to trusted code and revoke them if unsure. - If you need higher assurance, ask the publisher for source verification (a public VCS like GitHub with tags/releases) or request a signed release. If you lack the ability to audit the package, consider running it in an isolated environment or using a vetted alternative.

功能分析

Type: OpenClaw Skill Name: fusion-bench Version: 1.0.0 The fusion-bench skill bundle is a legitimate integration for the FusionBench model merging toolkit (arXiv:2406.03280). The SKILL.md file contains standard documentation, installation instructions via pip, and CLI usage examples consistent with the tool's purpose. There are no signs of data exfiltration, malicious execution, or prompt injection attacks.

能力评估

✓ Purpose & Capability

Name, description, and SKILL.md all describe running model-fusion experiments and adding merging algorithms; nothing requested by the skill (no env vars, no unusual binaries, no config paths) appears unrelated to that purpose.

✓ Instruction Scope

SKILL.md contains CLI usage, example commands, and code snippets for adding methods. It does not instruct the agent to read unrelated files, exfiltrate data, or access unrelated system credentials. It does assume loading model weights and optionally using distributed runtimes (deepspeed/Fabric), which is consistent with the task.

ℹ Install Mechanism

The skill is instruction-only (no install spec), but the docs instruct the user to 'pip install fusion-bench' (PyPI) and link to a repo hosted at code.tanganke.com rather than a well-known host. Installing the PyPI package will execute code from an external source — verify the PyPI package and repository before installing.

ℹ Credentials

The skill declares no required environment variables and the instructions do not request secrets. However, at runtime loading certain models (e.g., Llama variants or models on huggingface.co) or using cloud/deepspeed could require access tokens, cloud credentials, or large compute resources; these are not requested by the skill itself but could be needed by the underlying tooling.

✓ Persistence & Privilege

Skill is not always-enabled, is user-invocable, has no install spec or code that would modify other skills or agent-wide settings. It does not request persistent privileges.

版本历史

v1.0.0

FusionBench skill v1.0.0 initial release: - Provides a comprehensive toolkit for deep model fusion and benchmarking. - Supports 30+ merging algorithms (simple average, TIES, AdaMerging, MoE-based, continual, specialized, and more). - Enables benchmarking and evaluation for CLIP models and LLMs on a wide variety of tasks. - Includes utilities for state dict arithmetic and lazy loading for large model files. - Offers clear architecture, extensibility guides, and step-by-step instructions for adding new model merging methods.

元数据

Slug fusion-bench

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

fusion-bench 是什么？

Use FusionBench to run model fusion experiments. Covers running benchmarks, adding new merging algorithms, evaluating fused models, and managing model pools.... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 205 次。

如何安装 fusion-bench？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install fusion-bench」即可一键安装，无需额外配置。

fusion-bench 是免费的吗？

是的，fusion-bench 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

fusion-bench 支持哪些平台？

fusion-bench 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 fusion-bench？

由 tanganke（@tanganke）开发并维护，当前版本 v1.0.0。

fusion-bench