Diffusing DeBias:
Synthetic Bias Amplification for Model Debiasing

1MaLGa-DIBRIS, University of Genoa, Italy   2AI for Good (AIGO), Istituto Italiano di Tecnologia, Genova, Italy   3Telecom-Paris / École Polytechnique, France   4Dept. of Computer Science, University of Verona, Italy

We introduce Diffusing DeBias (DDB), a plug-in debiasing framework that leverages conditional diffusion models to generate synthetic bias-aligned images and train a robust Bias Amplifier for unsupervised model debiasing.

Teaser: Diffusing DeBias overview

Abstract

The effectiveness of deep learning is often limited by spurious correlations in training data. We propose Diffusing DeBias (DDB), which exploits conditional diffusion probabilistic models to generate synthetic bias-aligned images. These synthetic samples are used to train a robust Bias Amplifier (BA) that avoids memorization of bias-conflicting real samples and can be plugged into both two-step and end-to-end unsupervised debiasing recipes. DDB yields state-of-the-art results on multiple popular biased benchmarks while not degrading performance on unbiased data.

Approach

DDB trains a class-conditional diffusion model on the (biased) training set and then uses classifier-free guidance to sample a large set of synthetic images that amplify the dataset's bias patterns. A Bias Amplifier is trained on these synthetic bias-aligned samples; because it never sees the real dataset, it does not memorize the scarce bias-conflicting examples. The BA is then used to extract pseudo-labels or per-sample signals used by downstream debiasing algorithms (e.g., GroupDRO or an LfF-style reweighting).

DDB pipeline: Diffuse bias -> train BA -> apply Recipe I / II

Results (highlights)

  • DDB achieves SOTA performance vs. other unsupervised debiasing methods on Waterbirds, BFFHQ, BAR, ImageNet-9/A and UrbanCars.
  • Bias Amplifier trained on synthetic data avoids memorization and identifies bias-aligned vs. bias-conflicting samples with high accuracy.
  • CDPM generation settings (CFG scale, number of synth. images) are studied in ablations; 1k synthetic images/class often suffice.
DDB results summary table

Citation


Massimiliano Ciranni, Vito Paolo Pastore, Roberto Di Via, Enzo Tartaglione, Francesca Odone, Vittorio Murino. Diffusing DeBias: Synthetic Bias Amplification for Model Debiasing. NeurIPS 2025.