You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Excellent work!And I have a question about data selection.
In the dataset section, you adopted data preprocessing and filtering to speed up training.
What is the proprecessing and filtering strategy? Since the pretraining models generally obey the data scaling rule, I think it would make a great difference to results.
The text was updated successfully, but these errors were encountered:
Thanks for your reminder. This part will be added to the updated paper.
In fact, we didn’t do preprocessing. we only did filtering to speed up pre-training.
For LAION, we used English data only. Following BLIP, we removed an image if the shorter edge is smaller than 224 pixels. We also removed an image if (height/width) or (width/height) is larger than 3.
For video clip-text pairs, we removed a pair if the number of words is less than 2. Following previous work (I don’t remember which one…I need to check it later), we used CLIP score to filter data. We sampled a frame for a video clip and we calculated the CLIP score between the frame and the text. We removed a video clip-text pair if the score is less than 0.25.
Excellent work!And I have a question about data selection.
In the dataset section, you adopted data preprocessing and filtering to speed up training.
What is the proprecessing and filtering strategy? Since the pretraining models generally obey the data scaling rule, I think it would make a great difference to results.
The text was updated successfully, but these errors were encountered: