You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, @hunto.
Thanks for your answers to my previous questions :https://github.com/hunto/DiffKD/issues/3
Your work is very meaningful, and it can bring new changes to knowledge distillation. This led me to try to reproduce your code for other computational vision tasks, such as human pose estimation, etc. However, I found that I still have a lot of questions about your articles and models, which caused me to have a lot of bad situations.
My questions are as follows:
When the diffusion model processes teacher or student features, does it perform operations such as normalization on these features? I have this question because some of the diffusion model codes I have seen perform some processing on the input of the diffusion model, but I have not seen relevant content in your code and paper.
In your paper, you stated that you would use a 1x1 convolution to make the number of channels of the student features consistent with the latent teacher features. I also find the operation in your code. However, after the diffusion model denoises the student features, should we use a 1x1 convolution to restore the number of student feature channels, or directly modify the settings of the student head?
I found that although the code I implemented achieved the denoising of student features, what frustrated me was that the noise adaptive matching module did not work. The output γ of the module did not decrease during the training phase, but was equal to 1.
γ is basically equal to one in 1 Epoch, and is always equal to 1 thereafter. I wonder if you have encountered a similar situation? Does the occurrence of this situation mean that the effectiveness of the module may be affected by the task or data set?
The text was updated successfully, but these errors were encountered:
I think both w/ normalization or w/o normalization work as the original feature (image) in diffusion are not required to be Gaussian.
The transformed student features are only used for distillation, the original dimensions in student will not be changed.
The values of gamma is related to your tasks and models, if the teacher and student features have a similar amount of noises, the gamma should be close to 1 as no additional noise need to be added.
Hi, @hunto.
Thanks for your answers to my previous questions :https://github.com/hunto/DiffKD/issues/3
Your work is very meaningful, and it can bring new changes to knowledge distillation. This led me to try to reproduce your code for other computational vision tasks, such as human pose estimation, etc. However, I found that I still have a lot of questions about your articles and models, which caused me to have a lot of bad situations.
My questions are as follows:
When the diffusion model processes teacher or student features, does it perform operations such as normalization on these features? I have this question because some of the diffusion model codes I have seen perform some processing on the input of the diffusion model, but I have not seen relevant content in your code and paper.
In your paper, you stated that you would use a 1x1 convolution to make the number of channels of the student features consistent with the latent teacher features. I also find the operation in your code. However, after the diffusion model denoises the student features, should we use a 1x1 convolution to restore the number of student feature channels, or directly modify the settings of the student head?
I found that although the code I implemented achieved the denoising of student features, what frustrated me was that the noise adaptive matching module did not work. The output γ of the module did not decrease during the training phase, but was equal to 1.
γ is basically equal to one in 1 Epoch, and is always equal to 1 thereafter. I wonder if you have encountered a similar situation? Does the occurrence of this situation mean that the effectiveness of the module may be affected by the task or data set?
The text was updated successfully, but these errors were encountered: