-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base checkpoint selection #10
Comments
Hi @alsbhn, could you please tell me what you mean by "didn't work"? Do you mean the code was not runnable with this setting or something about the performance? |
The code works well and with no error. But the issue is with the performance. When I use "distilbert-base-uncased" or "msmarco-distilbert-margin-mse" as base checkpoint the performance increases after a couple of 10000 steps as expected but using other models like all-mpnet-base-v2 and all-MiniLM-L6-v2 the model does not perform well on my dataset and the performance even decreases as I train it for more steps. |
Thanks for pointing out this issue. I need some time to check what could be the exact reason. As I can imagine, there might be four potential reasons: |
I see in the code that two models (distilbert-base-uncased, msmarco-distilbert-margin-mse) are recommended to use as initial checkpoints. I tried to use other Sentence-Transformers models like all-mpnet-base-v2 but it didn't work. Is there a difference in the architecture of the models and the implementation out there? What models can be used here as initial checkpoints?
The text was updated successfully, but these errors were encountered: