-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Table 6 question #5
Comments
Hi @rkdckddnjs9 , Yep. The denoising process is not involved in the inference of student after KD training. During KD, the student head still uses the original feature (not the denoising one) for prediction. You can consider the denoising as a stronger aligment module to trasnform the student features, and the transformed features are ONLY used for distillation. |
Oh @hunto , |
Yes. Involving the diffusion model to also transform the feature in inference is an interesting idea, but I didn't try it before in the paper since it may cause some issue of unfair comparisons in student architectures. |
thank you for your excellent works!
In table 6, the student's parameters and FLOPs do not change. Is this because the student features are put into the head without going through the denoising process?
The text was updated successfully, but these errors were encountered: