-
Notifications
You must be signed in to change notification settings - Fork 998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mask CLS at the end of seq #550
Conversation
I believe fixing this issue requires re-training CoCa models. However, it's not supervised to me that the current trained models working well: [CLS] token was still able to "see" all the relevant preceding text, but not itself. Fixing the bug without re-training models would hurt retrieval/classification performance, I guess. |
Indeed, likely needs retraining we this change. |
@rwightman what would be the best way to address this? Does it make sense to retrain the model? |
my opinion is
|
but first fix the CI here |
@rom1504, @iejMac it appears to be an issue with caching older test run times or smth similar, removing open_clip/.github/workflows/ci.yml Lines 96 to 98 in fb72f4d
seems to fix it, however I am not super confident about github ci, |
@rom1504 sorry for several tags, I think it was a caching issue but honestly not sure how it got fixed |
Moved to #551 |
@yiren-jiran thanks for finding the bug, in #549 it looks like I made a mistake in creating the cls mask.
This PR should be the fix, will have to look how much changing this affects models performance.