Paper List
Estimation of Non-Normalized Statistical Models by Score Matching
Deep Unsupervised Learning using Nonequilibrium Thermodynamics
Generative Modeling by Estimating Gradients of the Data Distribution
Denoising Diffusion Probabilitistic Models
Denoising Diffusion Implicit Models
Pseudo Numerical Methods for Diffusion Models on Manifolds
Fast Sampling of Diffusion Models with Exponential Integrator
High-Resolution Image Synthesis with Latent Diffusion Models
DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps
Diffusion Models: A Comprehensive Survey of Methods and Applications
What is the problem / phenemenon?
Diffusion probabilistic models are used to generate high quality images.
Based on the forward process that adding random noise to an image, the model is trying to “learn” the reverse process that reconstruct the image based on the random noise and the reverse noise adding operation.
The distribution $q(x_{t-1} \vert x_t)$ in reverse process is still Gaussian when the noise in forward process is small. (DDPM)
What is the difference between the loss function in Dickstein and Ho?
Dickstein - maximize the model log likelihood by maximizing the lower bound negative log likelihood $K$:
\[L = \int d \mathbf{x}^{(0)} q(\mathbf{x}^{(0)}) \log p(\mathbf{x}^{(0)}) = \int d \mathbf{x}^{(0)} q(\mathbf{x}^{(0)}) \log \left [ \int d\mathbf{x}^{(1 \dots T)} q(\mathbf{x}^{(1 \dots T)} | \mathbf{x}^{(0)}) p(\mathbf{x}^{(T)}) \prod_{t=1}^T \frac{p(\mathbf{x}^{(t-1)} | \mathbf{x}^{(t)})}{q(\mathbf{x}^{(t)} | \mathbf{x}^{(t-1)})}\right ].\] \[K = - \sum_{t=2}^T \int d \mathbf{x}^{(0)} d \mathbf{x}^{(t)} q(\mathbf{x}^{(0)},\mathbf{x}^{(t)}) D_{KL} \left ( q(\mathbf{x}^{(t-1)} | \mathbf{x}^{(t)} , \mathbf{x}^{(0)}) | | p(\mathbf{x}^{(t-1)} | \mathbf{x}^{(t)})\right ) + H_q(\mathbf{x}^{(T)}| \mathbf{x}^{(0)}) - H_q(\mathbf{x}^{(1)} | \mathbf{x}^{(0)}) - H_p(\mathbf{x}^{T}).\]Ho:
\[K = \mathbb{E}_q \left [ D_{KL} \left ( q(\mathbf{x}_T | \mathbf{x}_0 ) | | p(\mathbf{x}_T) \right ) + \sum_{t > 1} D_{KL}(q(\mathbf{x}_{t-1} | \mathbf{x}_t, \mathbf{x}_0) | | p_\theta(\mathbf{x}_{t-1} | \mathbf{x}_t)) - \log p_\theta(x_0 | x_1) \right ]\]Only the middle parameterized term is necessary to be minimized
\[L_{simple} =\mathbb{E}_{t, \mathbf{x}_0, \mathbf{\epsilon}} \left [ \| \mathbf{\epsilon} - \mathbf{\epsilon}_{\theta} (\sqrt{\bar{\alpha}_t} \mathbf{x}_0 + \sqrt{1-\bar{\alpha}_t} \mathbf{\epsilon}, t)\|^2 \right ] .\]What are the improvements of different samplers?
$ DDPM -> DDIM $
First, the forward process is considered from another perspective that is non-markovian:
\[q_\sigma(\mathbf{x}_{1:T} | \mathbf{x}_0) = q_\sigma(\mathbf{x}_T | \mathbf{x}_0) \prod_{t=2}^T q_{\sigma}(\mathbf{x}_{t-1} | \mathbf{x}_t, \mathbf{x}_0)\]During generation, we only sample a subset of $S$ diffusion steps ${\tau_1, \dots, \tau_S }$.
$ -> PNDM $
$ -> DEIS $
$ -> DPM$
What is the meaning of stable in stable diffusion?
I don’t see anything `stable’ in the paper, it’s not a good name… The previous name is latent diffusion.
Conditioned generation
Based on score approach, there is a text-guided generation which generate the data conditioned on some text input.
Reference / Further to read
-
Weng, Lilian. (Jul 2021). What are diffusion models? Lil’Log. https://lilianweng.github.io/posts/2021-07-11-diffusion-models/.
-
https://yang-song.net/blog/2021/score/
-
https://github.com/acids-ircam/diffusion_models