New Self-paced AI courses — learn ML, deep learning, and agents on your schedule. Enroll free

Generative diffusion

Denoising Diffusion Probabilistic Models

Ho, Jain & Abbeel · NeurIPS 2020

Paper PDF

Open in new tab

Fetching research paper

Downloading PDF from the archive

If the viewer is blank (blocked by the publisher or your network), use Open in new tab. Scrolling inside the frame moves through the PDF pages when embedding is supported.

Reading map

These notes are written in plain language for this specific paper—so you can grasp the ideas before you wrestle with the authors’ formal wording. Use the button to open the PDF near the matching section (approximate page; Chromium-style viewers support #page=, otherwise we open a new tab).

Problem statement & goal

Generative models like GANs can be hard to train. This paper revives a diffusion idea: learn to reverse a gradual noise process so you can sample realistic images from pure noise—with a stable training objective.

Methodology & architecture

A forward process adds Gaussian noise step by step until data looks like noise. A neural net learns to denoise one step at a time (reverse process). Training uses a simple noise-prediction loss related to score matching.

Datasets & benchmarks

They train on CIFAR-10 and LSUN / ImageNet bedroom/church subsets—standard vision sets where FID and Inception scores let you compare to GANs and other generative models.

Results & evaluation metrics

Sample quality (FID) competes with or beats contemporary GANs on some benchmarks, with stable optimization. Later work (DDIM, latent diffusion) speeds sampling; this paper establishes the baseline DDPM recipe.

Limitations & future work

Many sampling steps mean slow generation compared to one-shot GANs. Resolution and compute were limited compared to today’s latent diffusion; conditioning (text-to-image) comes in follow-on systems.

Reproducibility

Hyperparameters, noise schedule, and architecture are specified; code followed in the community. Students can reproduce small DDPMs on CIFAR in a course; full ImageNet-scale diffusion is a larger project.

What to focus on

Eight highlights per paper—why each part matters before you read dense notation and proofs.

Generative goal

Sample new images from a learned distribution—not just classify or regress. DDPM offers a principled latent trajectory from noise to data, unlike one-shot GAN generators.

Forward diffusion

A fixed Markov chain gradually adds Gaussian noise until the signal looks like pure noise. You need a closed-form posterior q(x_{t-1}|x_t, x_0) for tractable training targets.

Reverse denoising

A neural net predicts noise or mean at each step to run the chain backward. Small per-step corrections accumulate into sharp samples after hundreds of steps.

ELBO & denoising score

The variational objective ties to score matching intuition—each step is a denoising problem. That link explains why DDPM training feels stable compared to adversarial min-max.

Variance & parameterization

How you predict noise vs. mean vs. variance changes training dynamics. Ho et al.’s choices mattered for FID—later papers refine schedules and parametrization further.

Why it beat older diffusion

Earlier diffusion was slow or fuzzy; this work showed competitive likelihood and quality on images. That reignited diffusion as a serious alternative to GANs.

Sampling cost

Many sequential steps are expensive at inference. DDIM, distillation, and latent diffusion (SD) exist largely to cut steps without killing quality.

Lineage to Stable Diffusion

Text-to-image stacks add cross-attention and CLIP—but the core is still noise schedules and U-Net denoisers from DDPM-class training. This paper is the baseline definition.

← Back to Research Lab