Description

Use FusionBench to run model fusion experiments. Covers running benchmarks, adding new merging algorithms, evaluating fused models, and managing model pools....

README (SKILL.md)

FusionBench Skill

Name: fusion-bench
Author: tanganke

FusionBench is a comprehensive benchmark/toolkit for deep model fusion (model merging).

Paper: arXiv:2406.03280
PyPI: pip install fusion-bench
Repo: https://code.tanganke.com/tanganke/fusion_bench
Docs: https://tanganke.github.io/fusion_bench/

Quick Start

# Install
pip install fusion-bench

# Run a simple experiment (CLIP ViT-B/32, task arithmetic on 8 tasks)
fusion_bench method=task_arithmetic modelpool=clip-vit-base-patch32 taskpool=clip-vit-base-patch32_8tasks

# Run with different merging method
fusion_bench method=ties_merging modelpool=clip-vit-base-patch32 taskpool=clip-vit-base-patch32_8tasks

Architecture Overview

fusion_bench/
├── method/           # Merging algorithms (30+)
├── modelpool/        # Model loading & management
├── config/           # Hydra YAML configs
├── tasks/            # Task evaluation
├── utils/            # Helpers (state_dict ops, lazy loading, etc.)
└── scripts/          # CLI & web UI

Key Components

ModelPool: Loads and manages pre-trained/fine-tuned models
- AutoModelPool: Auto-selects based on config
- CLIPVisionModelPool: For CLIP ViT models
- CausalLMPool: For Llama, GPT-2, etc.
Method: The merging algorithm
- Inherits from BaseModelFusionAlgorithm
- Implements run(modelpool) → merged model
TaskPool: Evaluation tasks
- CLIP: 8-38 classification tasks
- LLM: ARC, HellaSwag, MMLU, etc.

Supported Merging Methods

Basic

Method	Config Name	Description
Simple Average	`simple_average`	Uniform weight averaging
Weighted Average	`weighted_average`	Learnable task weights
Task Arithmetic	`task_arithmetic`	task_vector = fine-tuned - base
Slerp	`slerp`	Spherical interpolation

Sparse/Pruning

Method	Config Name	Description
TIES	`ties_merging`	Trim, Elect, Sign + merge
DARE	`dare`	Drop And REscale
Magnitude Pruning	`magnitude_pruning`	Prune by magnitude

Advanced

Method	Config Name	Description
AdaMerging	`adamerging`	Learn layer-wise coefficients
Fisher Merging	`fisher_merging`	Fisher-weighted merging
RegMean	`regmean`	Regression mean (closed-form)
RegMean++	`regmean_plusplus`	Enhanced RegMean with cross-layer deps

MoE-Based

Method	Config Name	Description
WE-MoE	`we_moe`	Weight Ensembling MoE
PWE-MoE	`pwe_moe`	Pareto-optimal WE-MoE
RankOne-MoE	`rankone_moe`	Rank-1 expert decomposition
Sparse-WE-MoE	`sparse_we_moe`	Sparse weight ensembling

Continual Merging

Method	Config Name	Description
OPCM	`opcm`	Orthogonal Projection Continual Merging
DOP	`dop`	Dual Orthogonal Projection
Gossip	`gossip`	Gossip-based continual merging

Specialized

Method	Config Name	Description
ISO-C/CTS	`isotropic_merging`	Isotropic merging in common/task subspace
AdaSVD	`ada_svd`	SVD-based adaptive merging
WUDI	`wudi`	Wasserstein distance merging
ExPO	`expo`	Exponential task vectors

Running Experiments

1. Basic Merging (CLI)

# Task Arithmetic on CLIP ViT-B/32
fusion_bench \
  method=task_arithmetic \
  modelpool=clip-vit-base-patch32 \
  taskpool=clip-vit-base-patch32_8tasks

# TIES merging with custom scaling
fusion_bench \
  method=ties_merging \
  method.scaling_coefficient=0.3 \
  modelpool=clip-vit-base-patch32 \
  taskpool=clip-vit-base-patch32_8tasks

2. LLM Merging

# Merge Llama models
fusion_bench \
  method=task_arithmetic \
  modelpool=llama2-7b \
  taskpool=llama2-7b_tasks

# With DARE
fusion_bench \
  method=dare \
  method.type=task_arithmetic \
  modelpool=llama2-7b

3. Using Fabric (Distributed/Mixed Precision)

fusion_bench \
  fabric=deepspeed_stage_2 \
  method=adamerging \
  modelpool=clip-vit-base-patch32

Adding a New Method

Step 1: Create method file

# fusion_bench/method/my_method.py
from fusion_bench.method.base_algorithm import BaseModelFusionAlgorithm
from fusion_bench.modelpool import BaseModelPool
import torch

class MyMergingAlgorithm(BaseModelFusionAlgorithm):
    """
    My custom merging algorithm.
    """
    def __init__(self, scaling_coefficient: float = 1.0, **kwargs):
        super().__init__(**kwargs)
        self.scaling_coefficient = scaling_coefficient
    
    @torch.no_grad()
    def run(self, modelpool: BaseModelPool):
        # 1. Load base model
        base_model = modelpool.load_model("_base_")
        base_sd = base_model.state_dict()
        
        # 2. Compute merged task vectors
        merged_tv = {}
        for model_name in modelpool.model_names:
            if model_name == "_base_":
                continue
            model = modelpool.load_model(model_name)
            tv = {k: v - base_sd[k] for k, v in model.state_dict().items()}
            # Your merging logic here
            for k in tv:
                if k not in merged_tv:
                    merged_tv[k] = tv[k] * self.scaling_coefficient
                else:
                    merged_tv[k] += tv[k] * self.scaling_coefficient
        
        # 3. Apply merged task vector
        for k in base_sd:
            base_sd[k] += merged_tv.get(k, 0)
        
        base_model.load_state_dict(base_sd)
        return base_model

Step 2: Register in `init.py`

# fusion_bench/method/__init__.py
_import_structure = {
    ...
    "my_method": ["MyMergingAlgorithm"],
}

Step 3: Create config

# config/method/my_method.yaml
_target_: fusion_bench.method.my_method.MyMergingAlgorithm
scaling_coefficient: 1.0

Step 4: Run

fusion_bench method=my_method modelpool=clip-vit-base-patch32

Model Pool Configuration

CLIP Models

# config/modelpool/clip-vit-base-patch32.yaml
_target_: fusion_bench.modelpool.CLIPVisionModelPool
model_names:
  - _base_
  - Cars
  - DTD
  - EuroSAT
  - GTSRB
  - MNIST
  - RESISC45
  - SUN397
  - SVHN
model_dir: ${oc.env:HOME}/.cache/fusion_bench/models

LLM Models

# config/modelpool/llama2-7b.yaml
_target_: fusion_bench.modelpool.CausalLMPool
model_names:
  - _base_
  - arc
  - hellaswag
  - mmlu
model_dir: ${oc.env:HOME}/.cache/fusion_bench/llama_models

Utilities

State Dict Arithmetic

from fusion_bench.utils.state_dict_arithmetic import StateDict

# Convenient operations on state dicts
sd1 = StateDict(model1.state_dict())
sd2 = StateDict(model2.state_dict())

merged = sd1 + sd2           # Add
diff = sd1 - sd2             # Subtract
scaled = sd1 * 0.5           # Scale
tv_merged = sd1 + 0.3 * sd2  # Linear combination

Lazy State Dict

from fusion_bench.utils.lazy_state_dict import LazyStateDict

# Load large models without OOM
lazy_sd = LazyStateDict.from_file("model.safetensors")
# Only loads tensors when accessed

Common Workflows

1. Evaluate a single merged model

from fusion_bench import AutoModelPool
from fusion_bench.method import SimpleAverageAlgorithm

pool = AutoModelPool.from_config("config/modelpool/clip-vit-base-patch32.yaml")
method = SimpleAverageAlgorithm()
merged_model = method.run(pool)

# Evaluate on tasks
for task_name in pool.model_names:
    if task_name == "_base_":
        continue
    acc = evaluate(merged_model, task_name)
    print(f"{task_name}: {acc:.2%}")

2. Hyperparameter search

# Sweep scaling coefficient
for coeff in 0.2 0.4 0.6 0.8 1.0; do
  fusion_bench \
    method=task_arithmetic \
    method.scaling_coefficient=$coeff \
    modelpool=clip-vit-base-patch32
done

3. Compare multiple methods

for method in simple_average task_arithmetic ties_merging dare; do
  echo "=== $method ==="
  fusion_bench \
    method=$method \
    modelpool=clip-vit-base-patch32 \
    taskpool=clip-vit-base-patch32_8tasks
done

Tips

Memory: Use fabric=deepspeed_stage_2 for large models
Caching: Models are cached in ~/.cache/fusion_bench/
Reproducibility: Set seed=42 in config
Debugging: Use hydra.verbose=true for detailed logs
Web UI: Run fusion_bench_webui for interactive exploration

Related Papers

FusionBench (arXiv:2406.03280) - The benchmark paper
SMILE (arXiv:2408.10174) - Sparse MoE from pre-trained models
WE-MoE - Weight Ensembling MoE for multi-task merging
OPCM/DOP - Continual model merging methods
RegMean++ (arXiv:2508.03121) - Enhanced RegMean

Usage Guidance

This skill is coherent for running FusionBench experiments, but take these precautions before installing or running it: - Verify the PyPI package and source repository: check the fusion-bench package page on PyPI, confirm the package owner, review the package files, and inspect the linked repository (the SKILL.md repo is on code.tanganke.com rather than GitHub). Malicious packages can be distributed via PyPI. - Inspect the code before installing or run installation in a sandbox/container. pip install will download and run code on your machine. - Expect large downloads and heavy compute: merging LLMs and CLIP models can require substantial disk, memory, and possibly cloud/GPU resources. Ensure you understand where models will be pulled from (local paths vs. model hubs) and whether tokens/keys are needed. - If you'll load models from model hubs (Hugging Face, private storage), ensure any access tokens are granted only to trusted code and revoke them if unsure. - If you need higher assurance, ask the publisher for source verification (a public VCS like GitHub with tags/releases) or request a signed release. If you lack the ability to audit the package, consider running it in an isolated environment or using a vetted alternative.

Capability Analysis

Type: OpenClaw Skill Name: fusion-bench Version: 1.0.0 The fusion-bench skill bundle is a legitimate integration for the FusionBench model merging toolkit (arXiv:2406.03280). The SKILL.md file contains standard documentation, installation instructions via pip, and CLI usage examples consistent with the tool's purpose. There are no signs of data exfiltration, malicious execution, or prompt injection attacks.

Capability Assessment

✓ Purpose & Capability

Name, description, and SKILL.md all describe running model-fusion experiments and adding merging algorithms; nothing requested by the skill (no env vars, no unusual binaries, no config paths) appears unrelated to that purpose.

✓ Instruction Scope

SKILL.md contains CLI usage, example commands, and code snippets for adding methods. It does not instruct the agent to read unrelated files, exfiltrate data, or access unrelated system credentials. It does assume loading model weights and optionally using distributed runtimes (deepspeed/Fabric), which is consistent with the task.

ℹ Install Mechanism

The skill is instruction-only (no install spec), but the docs instruct the user to 'pip install fusion-bench' (PyPI) and link to a repo hosted at code.tanganke.com rather than a well-known host. Installing the PyPI package will execute code from an external source — verify the PyPI package and repository before installing.

ℹ Credentials

The skill declares no required environment variables and the instructions do not request secrets. However, at runtime loading certain models (e.g., Llama variants or models on huggingface.co) or using cloud/deepspeed could require access tokens, cloud credentials, or large compute resources; these are not requested by the skill itself but could be needed by the underlying tooling.

✓ Persistence & Privilege

Skill is not always-enabled, is user-invocable, has no install spec or code that would modify other skills or agent-wide settings. It does not request persistent privileges.

Version History

v1.0.0

FusionBench skill v1.0.0 initial release: - Provides a comprehensive toolkit for deep model fusion and benchmarking. - Supports 30+ merging algorithms (simple average, TIES, AdaMerging, MoE-based, continual, specialized, and more). - Enables benchmarking and evaluation for CLIP models and LLMs on a wide variety of tasks. - Includes utilities for state dict arithmetic and lazy loading for large model files. - Offers clear architecture, extensibility guides, and step-by-step instructions for adding new model merging methods.

Metadata

Slug fusion-bench

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is fusion-bench?

Use FusionBench to run model fusion experiments. Covers running benchmarks, adding new merging algorithms, evaluating fused models, and managing model pools.... It is an AI Agent Skill for Claude Code / OpenClaw, with 205 downloads so far.

How do I install fusion-bench?

Run "/install fusion-bench" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is fusion-bench free?

Yes, fusion-bench is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does fusion-bench support?

fusion-bench is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created fusion-bench?

It is built and maintained by tanganke (@tanganke); the current version is v1.0.0.

More Skills

fusion-bench