paper reading: What is diffusion model?

Paper List

What is the problem / phenemenon?

Diffusion probabilistic models are used to generate high quality images.

Based on the forward process that adding random noise to an image, the model is trying to “learn” the reverse process that reconstruct the image based on the random noise and the reverse noise adding operation.

The distribution $q(x_{t-1} \vert x_t)$ in reverse process is still Gaussian when the noise in forward process is small. (DDPM)

What is the difference between the loss function in Dickstein and Ho?

Dickstein - maximize the model log likelihood by maximizing the lower bound negative log likelihood $K$:

\[L = \int d \mathbf{x}^{(0)} q(\mathbf{x}^{(0)}) \log p(\mathbf{x}^{(0)}) = \int d \mathbf{x}^{(0)} q(\mathbf{x}^{(0)}) \log \left [ \int d\mathbf{x}^{(1 \dots T)} q(\mathbf{x}^{(1 \dots T)} | \mathbf{x}^{(0)}) p(\mathbf{x}^{(T)}) \prod_{t=1}^T \frac{p(\mathbf{x}^{(t-1)} | \mathbf{x}^{(t)})}{q(\mathbf{x}^{(t)} | \mathbf{x}^{(t-1)})}\right ].\] \[K = - \sum_{t=2}^T \int d \mathbf{x}^{(0)} d \mathbf{x}^{(t)} q(\mathbf{x}^{(0)},\mathbf{x}^{(t)}) D_{KL} \left ( q(\mathbf{x}^{(t-1)} | \mathbf{x}^{(t)} , \mathbf{x}^{(0)}) | | p(\mathbf{x}^{(t-1)} | \mathbf{x}^{(t)})\right ) + H_q(\mathbf{x}^{(T)}| \mathbf{x}^{(0)}) - H_q(\mathbf{x}^{(1)} | \mathbf{x}^{(0)}) - H_p(\mathbf{x}^{T}).\]

Ho:

\[K = \mathbb{E}_q \left [ D_{KL} \left ( q(\mathbf{x}_T | \mathbf{x}_0 ) | | p(\mathbf{x}_T) \right ) + \sum_{t > 1} D_{KL}(q(\mathbf{x}_{t-1} | \mathbf{x}_t, \mathbf{x}_0) | | p_\theta(\mathbf{x}_{t-1} | \mathbf{x}_t)) - \log p_\theta(x_0 | x_1) \right ]\]

Only the middle parameterized term is necessary to be minimized

\[L_{simple} =\mathbb{E}_{t, \mathbf{x}_0, \mathbf{\epsilon}} \left [ \| \mathbf{\epsilon} - \mathbf{\epsilon}_{\theta} (\sqrt{\bar{\alpha}_t} \mathbf{x}_0 + \sqrt{1-\bar{\alpha}_t} \mathbf{\epsilon}, t)\|^2 \right ] .\]

What are the improvements of different samplers?

$ DDPM -> DDIM $

First, the forward process is considered from another perspective that is non-markovian:

\[q_\sigma(\mathbf{x}_{1:T} | \mathbf{x}_0) = q_\sigma(\mathbf{x}_T | \mathbf{x}_0) \prod_{t=2}^T q_{\sigma}(\mathbf{x}_{t-1} | \mathbf{x}_t, \mathbf{x}_0)\]

During generation, we only sample a subset of $S$ diffusion steps ${\tau_1, \dots, \tau_S }$.

$ -> PNDM $

$ -> DEIS $

$ -> DPM$

What is the meaning of stable in stable diffusion?

I don’t see anything `stable’ in the paper, it’s not a good name… The previous name is latent diffusion.

Conditioned generation

Based on score approach, there is a text-guided generation which generate the data conditioned on some text input.

Reference / Further to read

Weng, Lilian. (Jul 2021). What are diffusion models? Lil’Log. https://lilianweng.github.io/posts/2021-07-11-diffusion-models/.
https://yang-song.net/blog/2021/score/
https://github.com/acids-ircam/diffusion_models