Problem statement & goal
Generative models like GANs can be hard to train. This paper revives a diffusion idea: learn to reverse a gradual noise process so you can sample realistic images from pure noise—with a stable training objective.
Generative diffusion
Ho, Jain & Abbeel · NeurIPS 2020
Fetching research paper
Downloading PDF from the archive
Original source not responding
We could not fetch or display this PDF. The host may be down, blocking embedding, or your connection may have dropped.
A button will appear below to pick another paper from the lab.
Continue reading
Choose another paper from the research lab.
These notes are written in plain language for this specific paper—so you can grasp the ideas before you wrestle with the authors’ formal wording. Use the button to open the PDF near the matching section (approximate page; Chromium-style viewers support #page=, otherwise we open a new tab).
Generative models like GANs can be hard to train. This paper revives a diffusion idea: learn to reverse a gradual noise process so you can sample realistic images from pure noise—with a stable training objective.
A forward process adds Gaussian noise step by step until data looks like noise. A neural net learns to denoise one step at a time (reverse process). Training uses a simple noise-prediction loss related to score matching.
They train on CIFAR-10 and LSUN / ImageNet bedroom/church subsets—standard vision sets where FID and Inception scores let you compare to GANs and other generative models.
Sample quality (FID) competes with or beats contemporary GANs on some benchmarks, with stable optimization. Later work (DDIM, latent diffusion) speeds sampling; this paper establishes the baseline DDPM recipe.
Many sampling steps mean slow generation compared to one-shot GANs. Resolution and compute were limited compared to today’s latent diffusion; conditioning (text-to-image) comes in follow-on systems.
They relate to score-based models, VAEs, GANs, and earlier diffusion. The contribution is a clear probabilistic picture plus practical training that sparked the modern diffusion wave.
Hyperparameters, noise schedule, and architecture are specified; code followed in the community. Students can reproduce small DDPMs on CIFAR in a course; full ImageNet-scale diffusion is a larger project.
Eight highlights per paper—why each part matters before you read dense notation and proofs.
Sample new images from a learned distribution—not just classify or regress. DDPM offers a principled latent trajectory from noise to data, unlike one-shot GAN generators.
A fixed Markov chain gradually adds Gaussian noise until the signal looks like pure noise. You need a closed-form posterior q(x_{t-1}|x_t, x_0) for tractable training targets.
A neural net predicts noise or mean at each step to run the chain backward. Small per-step corrections accumulate into sharp samples after hundreds of steps.
The variational objective ties to score matching intuition—each step is a denoising problem. That link explains why DDPM training feels stable compared to adversarial min-max.
How you predict noise vs. mean vs. variance changes training dynamics. Ho et al.’s choices mattered for FID—later papers refine schedules and parametrization further.
Earlier diffusion was slow or fuzzy; this work showed competitive likelihood and quality on images. That reignited diffusion as a serious alternative to GANs.
Many sequential steps are expensive at inference. DDIM, distillation, and latent diffusion (SD) exist largely to cut steps without killing quality.
Text-to-image stacks add cross-attention and CLIP—but the core is still noise schedules and U-Net denoisers from DDPM-class training. This paper is the baseline definition.