Diffusion Models | Generative AI | AI / ML

Diffusion models are a class of generative model that learns to create images by reversing a noise-addition process. During training, Gaussian noise is progressively added to images; the model learns to denoise step-by-step. At inference, starting from pure noise, it iteratively denoises to produce a coherent image.

Key Points

Forward process: gradually add Gaussian noise to an image over T timesteps until it is pure noise
Reverse process: learn to predict and remove noise at each step (the model's job)
U-Net backbone: the denoising network; skip connections preserve fine-grained spatial detail
CLIP guidance: condition image generation on text by using a contrastive text-image model
Latent Diffusion: denoise in a compressed latent space (used by Stable Diffusion) — 4× faster
Classifier-Free Guidance (CFG): improves text-image alignment; higher CFG = more adherence to prompt
Sampling methods: DDPM (1000 steps), DDIM (50 steps), DPM++ (15–20 steps) — faster inference
ControlNet: add spatial control — use depth maps, pose skeletons, edges as additional conditions
InPainting: regenerate masked portions of an image while preserving the rest
Video generation: extend diffusion to temporal sequences (Sora uses diffusion + Transformer)

Model	Company	Key Feature
Stable Diffusion	Stability AI (open)	Open weights, local runs
DALL-E 3	OpenAI	Tight text coherence via GPT-4 captions
Midjourney	Midjourney	Artistic quality, Discord-based
Imagen 3	Google DeepMind	Photorealism, text rendering
Flux	Black Forest Labs	Open, high fidelity
Sora	OpenAI	Text-to-video, minutes long

Real-World Example

Adobe Firefly integrates diffusion models into Photoshop Generative Fill — allowing designers to erase objects, extend backgrounds, and generate design elements from text. Stability AI's Stable Diffusion 3.5 runs on consumer laptops, enabling entirely local image generation.

←PreviousFine-Tuning NextMultimodal AI→