-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid gradient when finetuning and learning rate with gradient clip setting #65
Comments
Sorry for lack of detail, on my phone and cant check stuff right now. If you have issues with stability, you could check which params give nans and manually use fp32 there. You might also want to freeze the batchnorm of the network, ive found the batchnorm can cause a lot of issues. |
How many days you spend to train the roma model? I also find if I replace the dino with other vit the training result is bad |
I'm stuck with the same problem, do you have any ideas on how to solve it? |
It was trained for 4 days with 4 A100 GPUs. You can also avoid issues by using bfloat16 instead of float16. |
Gradient is NAN when training from scratch, is there any solution to this? |
hello, I compared the ROMA and DKM and found that the main differences are the coordinate_decoder implementation and the use of DINO features. I have trouble understanding that ROMA seems to be harder to converge and easier to get NAN, while DKM doesn't even need to clip and scale the grad. |
Hi Author,
Thank you for sharing this project and for your kindness for answering my previous questions. I have some of questions want to ask about training:
Thank you so much.
The text was updated successfully, but these errors were encountered: