Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several questions about model training #4

Open
omglet1 opened this issue Nov 3, 2023 · 3 comments
Open

Several questions about model training #4

omglet1 opened this issue Nov 3, 2023 · 3 comments

Comments

@omglet1
Copy link

omglet1 commented Nov 3, 2023

Hi, @hunto.
Thanks for your answers to my previous questions :https://github.com/hunto/DiffKD/issues/3
Your work is very meaningful, and it can bring new changes to knowledge distillation. This led me to try to reproduce your code for other computational vision tasks, such as human pose estimation, etc. However, I found that I still have a lot of questions about your articles and models, which caused me to have a lot of bad situations.

My questions are as follows:

  1. When the diffusion model processes teacher or student features, does it perform operations such as normalization on these features? I have this question because some of the diffusion model codes I have seen perform some processing on the input of the diffusion model, but I have not seen relevant content in your code and paper.

  2. In your paper, you stated that you would use a 1x1 convolution to make the number of channels of the student features consistent with the latent teacher features. I also find the operation in your code. However, after the diffusion model denoises the student features, should we use a 1x1 convolution to restore the number of student feature channels, or directly modify the settings of the student head?

  3. I found that although the code I implemented achieved the denoising of student features, what frustrated me was that the noise adaptive matching module did not work. The output γ of the module did not decrease during the training phase, but was equal to 1.
    vis_gamma
    γ is basically equal to one in 1 Epoch, and is always equal to 1 thereafter. I wonder if you have encountered a similar situation? Does the occurrence of this situation mean that the effectiveness of the module may be affected by the task or data set?

@BeiDaoya
Copy link

你好,我如何才能辨别学生特征是否成功去噪呢?

@hunto
Copy link
Owner

hunto commented Mar 11, 2024

@omglet1

  1. I think both w/ normalization or w/o normalization work as the original feature (image) in diffusion are not required to be Gaussian.
  2. The transformed student features are only used for distillation, the original dimensions in student will not be changed.
  3. The values of gamma is related to your tasks and models, if the teacher and student features have a similar amount of noises, the gamma should be close to 1 as no additional noise need to be added.

@hunto
Copy link
Owner

hunto commented Mar 11, 2024

你好,我如何才能辨别学生特征是否成功去噪呢?

@BeiDaoya 你可以参考一下论文中的可视化图,通过可视化观察原始学生、降噪后学生、老师特征的相似度

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants