Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Choice of hyperparameter values #2

Open
robincourant opened this issue Apr 15, 2024 · 1 comment
Open

Choice of hyperparameter values #2

robincourant opened this issue Apr 15, 2024 · 1 comment

Comments

@robincourant
Copy link

Hi,

Firstly, thank you very much for your work!

I am curious about how you determine the values of $P_{\text{mean}}$ and $P_{\text{std}}$ as per the "Loss weighting" paragraph in Section B.2, where it states "Pmean = −0.4 and Pstd = 1.0 instead of −1.2 and 1.2".

Is there an underlying intuition or is it based on a pure hp search?

@rupertmenneer
Copy link

It seems they provide a rough intuition in the same paragraph

Loss weighting. With the EDM training loss (Equation 14), the quality of the resulting distribution tends to be quite sensitive to the choice of Pmean, Pstd, and λ(σ). The role of Pmean and Pstd is to focus the training effort on the most important noise levels, whereas λ(σ) aims to ensure that the gradients originating from each noise level are roughly of the same magnitude. Referring to Figure 5a of Karras et al. [37], the value of L(Dθ; σ) behaves somewhat unevenly over the course of training: It remains largely unchanged for the lowest and highest noise levels, but drops quickly for the ones in between. Karras et al. [37] suggest setting Pmean and Pstd so that the resulting log-normal distribution (Equation 16) roughly matches the location of this in-between region. When operating with VAE latents, we have observed that the in-between region has shifted considerably toward higher noise levels compared to RGB images. We thus set Pmean = −0.4 and Pstd = 1.0 instead of −1.2 and 1.2, respectively, to roughly match its location.

If you refer to figure 5a in the original Elucidating paper, it looks like they to match the distribution of Pmean and Pstd to cover the loss that "drops quickly for the [noise levels] in between" the highest and lowest. I.e. we want to spend more time training in this middle region.

Screenshot 2024-11-20 at 09 26 45

I read this as you first need a trained model to select the P parameters to roughly match this log-normal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants