Representation Distribution Matching

One step from real.

Representation Distribution Matching for One-Step Visual Generation

We train a one-step image generator by matching generated and real feature distributions under frozen pretrained encoders. No online teacher, no adversary, no trajectory. Estimate the distance right and refuse to trust any single encoder, and a single network evaluation lands the closest to real reported to date.

Lan Feng¹, Wuyang Li¹, Éloi Zablocki², Matthieu Cord^2,3, Alexandre Alahi¹

¹ EPFL ² Valeo.ai ³ Sorbonne Université

Paper Live Demo arXiv Code BibTeX

1.30

SW_r14 distance to real, real validation data scores 1.00. One-step state of the art.

63.6%

of samples preferred over real photographs by PickScore, a learned human-preference model.

90h

H200 GPU-hours to post-train four-step FLUX.2 into a single step.

Post-training · our 1-step vs the 4-step FLUX.2 teacher

GenEvalkeeper 0.826 · 4-step 0.794

PickScorekeeper 22.76 · 4-step 22.58

Each dot is a checkpoint over 90 H200 GPU-hours; the dashed line is the four-step FLUX.2 teacher. Our one step clears it on GenEval within ~10 GPU-hours and on PickScore by ~30, reaching 0.826 and 22.76 at the keeper.

Each frame is one network evaluation · 1-step FLUX.2 after iRDM

One-step ImageNet · Distance to real

How close can a single step get?

Generative quality is a distance between distributions. We measure it with SW r14, a Sliced-Wasserstein distance averaged over fourteen frozen encoders, scaled so a fresh draw of real validation data scores 1.00. It shares no machinery with the training loss, so a low score cannot be gamed by matching the objective. Lower is closer. iRDM sits nearest the real line, below every released generator, including multi-step diffusion.

Model 1.02.03.04.05.06.0 SW r14

iRDM · ours · 1-NFE

1.30

pMF-H FD-SIM · 1-NFE

2.05

REPA-E SiT-XL

2.40

RAE-XL

2.43

LightningDiT-XL

3.10

SiT-XL/2 + REPA

3.61

MAR-H

3.87

SiT-XL/2

4.27

Drifting-L · 1-NFE

5.93

And do humans agree

A preference model we never train against.

PickScore is a learned human-preference proxy, and our objective never optimizes it. It prefers iRDM to every prior one-step generator, and for the first time to held-out real photographs.

preferred over real photographs
first one-step model to pass

63.6%

preferred over pMF-H FD-SIM
the prior best one-step generator

71.2%

preferred over RAE-XL
a recent multi-step model

75.7%

preferred over REPA-E SiT-XL
a recent multi-step model

73.2%

The method

Two axes fix every instance.

Every teacher-free distribution-matching generator is set by two choices, and prior methods fixed both at once. We vary one at a time. The first is how the distributions are compared. The second is which representations they are compared in. Getting each right is what closes the gap.

Axis 01 · Comparison

How the distributions are compared

An exact within-batch repulsion, paired with a Nyström attraction to a reference frozen once over the full data.

MMDEstimated right. The classical MMD, once dismissed as too weak, becomes a strong objective with an exact within-batch repulsion and a Nyström attraction toward a frozen full-data reference.
BATCHLarge and fresh. The generated batch is the operative variable. Quality climbs to an optimum above 2048, an order past common practice, with gradient caching absorbing the memory.
JOINTMatch the joint, not the marginal. On conditional tasks we match the joint image-text law, so prompt fidelity becomes part of the objective.

Axis 02 · Representation

Which spaces they are compared in

Any single encoder can be gamed. A diverse battery, held in balance, cannot.

The fourteen-encoder panel

Inception ConvNeXt DINOv2* MAE SigLIP2 CLIP DINOv3 SigLIP v1* PE-Core RADIO* WebSSL AIMv2 DreamSim FLUX VAE*

* four encoders held out from training, a generalization check

GAMEOne encoder is never enough. Matched alone, even DINOv2 is driven below the real score while samples stay visibly fake. The limitation is single-encoder matching itself, not the choice of encoder.
BALANCEA battery under constrained optimization. A proportional Lagrangian controller upweights whichever encoder is hardest to satisfy and drops those already at their floor, so no space can be gamed.

Matching only DINOv2 drives its distance to the real floor yet improves quality unevenly: a lizard becomes indistinguishable from real while a typewriter keeps clear artifacts.

Single-encoder gaming. Matching only DINOv2 reaches the real floor, yet the lizard becomes photoreal while the typewriter keeps clear artifacts. A saturated single-encoder score does not imply realism.

Text-to-image post-training

Four-step FLUX.2, in a single step.

The same recipe carries to text-to-image. With the joint image-text objective, we post-train the four-step FLUX.2 [klein] into a one-step model that surpasses the four-step teacher on both GenEval and PickScore, in 90 H200 GPU-hours.

Four-step FLUX.2 [klein] compared with one-step iRDM at matched quality, and GenEval and PickScore over post-training compute.

GenEval overall
1-step iRDM vs 4-step base

0.826vs0.794

PickScore
1-step iRDM vs 4-step base

22.76vs22.58

Joint vs marginal
GenEval overall

0.826vs0.801

Compute
single run

90h H200

Against a one-step DMD2 distillation of the same teacher, iRDM also leads: GenEval 0.826 vs 0.804, PickScore 22.76 vs 22.36.

Head to head

Our one step vs the four-step teacher.

Four-step FLUX.2 [klein] is the distillation target. We post-train it into a single step, then set them side by side on the same epic and complex prompts. The four-step teacher averages a PickScore of 23.08; our one-step student holds that quality at a single forward pass, and on a number of prompts scores higher.

Sci-fi worlds Towering mech silhouette in rainy neon city, magenta and cyan glow, volumetric fog, moody cinematic backlight. our 1-step · PS 24.26 ★

4-step teacher: towering mech in rainy neon city — 4-step · teacher

Our 1-step: towering mech in rainy neon city — 1-step · ours

Space & cosmos Swirling crimson nebula with a bright newborn star, volumetric glow, deep cosmic blacks, breathtaking scale. our 1-step · PS 23.89 ★

4-step teacher: crimson nebula with newborn star — 4-step · teacher

Our 1-step: crimson nebula with newborn star — 1-step · ours

Mythical creatures A blazing phoenix rising from glowing embers, fiery orange wings spread against a dark smoldering sky. our 1-step · PS 23.80 ★

4-step teacher: phoenix rising from embers — 4-step · teacher

Our 1-step: phoenix rising from embers — 1-step · ours

Underwater Giant manta ray soaring through turquoise god rays, glowing surface above, deep indigo abyss, epic scale. our 1-step · PS 23.41 ★

4-step teacher: manta ray in turquoise god rays — 4-step · teacher

Our 1-step: manta ray in turquoise god rays — 1-step · ours

Vehicles in motion Steam train crossing a misty stone viaduct, billowing smoke, volumetric god rays, moody blue valley fog. our 1-step · PS 22.52

4-step teacher: steam train on a misty viaduct — 4-step · teacher

Our 1-step: steam train on a misty viaduct — 1-step · ours

Epic fantasy A lone dragon glides over jagged misty peaks at dawn, golden god rays piercing the fog, vast silhouette. our 1-step · PS 22.64

4-step teacher: dragon over misty peaks at dawn — 4-step · teacher

Our 1-step: dragon over misty peaks at dawn — 1-step · ours

See all 48 prompt pairs

Samples

Twelve images, twelve forward passes.

Cherry-picked one-step samples from FLUX.2 [klein] after iRDM post-training. Each is a single network evaluation, with no iterative refinement.

Live demo

Generate in one step, yourself.

The post-trained one-step FLUX.2 [klein] runs live on HuggingFace Spaces. Type a prompt and get an image in a single forward pass.

FLUX.2-klein · 1-step RDM Open in HuggingFace ↗

Embedded from the public Space. If it does not load here, open it in a new tab with the link above.

Citation

BibTeX

@article{feng2026rdm,
  title   = {Representation Distribution Matching for One-Step Visual Generation},
  author  = {Feng, Lan and Li, Wuyang and Zablocki, {\'E}loi and Cord, Matthieu and Alahi, Alexandre},
  journal = {arXiv preprint arXiv:2607.02375},
  year    = {2026}
}