SubFlow: Sub-mode Conditioned Flow Matching for Diverse One-Step Generation

Yexiong Lin1, Jia Shi2, Shanshan Ye3, Wanyu Wang4, Yu Yao1, Tongliang Liu1
1Sydney AI Centre, The University of Sydney    2Xidian University    3University of Technology Sydney    4City University of Hong Kong
Traditional FM (100 steps)

(a) Traditional FM (100 steps)

MeanFlow (1 step)

(b) MeanFlow (1 step)

SubFlow (1 step)

(c) SubFlow (1 step)

Toy experiment on a 4-peak Gaussian mixture with imbalanced subclusters. (a) Traditional flow matching (100 steps) covers all modes. (b) MeanFlow suffers from severe mode collapse, concentrating only on dominant subclusters. (c) Our SubFlow restores full mode coverage in a single step by conditioning on sub-mode indices.

Abstract

Flow matching has emerged as a powerful generative framework, with recent few-step methods achieving remarkable inference acceleration. However, we identify a critical yet overlooked limitation: these models suffer from severe diversity degradation, concentrating samples on dominant modes while neglecting rare but valid variations of the target distribution. We trace this degradation to averaging distortion: when trained with MSE objectives, class-conditional flows learn a frequency-weighted mean over intra-class sub-modes, causing the model to over-represent high-density modes while systematically neglecting low-density ones.

To address this, we propose SubFlow (Sub-mode Conditioned Flow Matching), which eliminates averaging distortion by decomposing each class into fine-grained sub-modes via semantic clustering and conditioning the flow on sub-mode indices. Each conditioned sub-distribution is approximately unimodal, so the learned flow accurately targets individual modes with no averaging distortion, restoring full mode coverage in a single inference step. Crucially, SubFlow is entirely plug-and-play: it integrates seamlessly into existing one-step models such as MeanFlow and Shortcut Models without any architectural modifications. Extensive experiments on ImageNet-256 demonstrate that SubFlow yields substantial gains in generation diversity (Recall) while maintaining competitive image quality (FID), confirming its broad applicability across different one-step generation frameworks.

Interactive Tutorial

We provide an interactive Colab notebook to walk you through the entire SubFlow pipeline on a 2D toy example. The tutorial covers standard flow matching, MeanFlow, and SubFlow — you can train the models and visualize mode collapse vs. diversity recovery in real time.

Open in Colab

Method

The core insight of SubFlow is that dominant-mode bias arises because the conditional mean velocity \(\mathbb{E}[x_1 - x_0 \mid x_t, t, c]\) must average over all sub-modes within class \(c\). By further conditioning on a sub-mode index \(k\), each sub-distribution \((c, k)\) becomes approximately unimodal, and the conditional mean accurately points to a specific mode with no averaging distortion:

\[ v_\theta^*(x_t, t, c, k) = \mathbb{E}[x_1 - x_0 \mid x_t, t, c, k] \]

SubFlow consists of three simple steps: (a) Offline pre-processing: extract semantic features (e.g., DINOv3) and cluster within each class to obtain sub-mode assignments; (b) Training: optimize the vector field \(v_\theta(x_t, t, c, k)\) conditioned on both class and sub-mode; (c) Inference: sample \(k \sim p(k \mid c)\) and generate with the sub-mode-conditioned flow.

SubFlow Method Overview

Overview of SubFlow. (a) Offline pre-processing: semantic features are extracted and clustered within each class. (b) Training: the vector field is optimized with class and sub-mode conditioning. For CFG, only the class label is dropped while sub-mode index is always retained. (c) Inference: a sub-mode index is sampled from the empirical prior, and the conditioned vector field generates a sample.

Results

Main Results on ImageNet-256

Main Results on ImageNet-256

Qualitative Comparison

Qualitative comparison

Qualitative comparison between MeanFlow (left) and MeanFlow + SubFlow (right) on ImageNet-256. Green boxes highlight samples where SubFlow produces visibly higher image quality with sharper details and fewer artifacts.

Diverse Generation from the Same Noise

Diverse generation

First column: MeanFlow output from a fixed noise \(x_0\). Columns 2–6: MeanFlow + SubFlow with different sub-mode indices \(k\). Despite sharing the same noise, the generated images exhibit clearly distinct visual styles.

BibTeX

@article{lin2026subflow,
  title={SubFlow: Sub-mode Conditioned Flow Matching for Diverse One-Step Generation},
  author={Yexiong Lin and Jia Shi and Shanshan Ye and Wanyu Wang and Yu Yao and Tongliang Liu},
      year={2026},
      eprint={2604.12273},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2604.12273}
}