We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I'm getting strange results when running the code on an RTX 3090 GPU. I first used the code in CLIP4Clip to compress the video size to 3fps : https://github.com/ArrowLuo/CLIP4Clip/blob/master/preprocess/compress_video.py and then froze the clip model by using those code: for param in self.clip.parameters(): param.requires_grad = False # not update by gradient the train log on MSRVTT as follows : [2024-05-12 08:25:31,329 tvr 320 INFO]: eta: 4:50:08 epoch: 2/5 iteration: 3800/7030 time: 1.3135 (5.3897) data: 0.4849 (4.5103) loss: 6.1797 (6.1809) E_loss: 6.1559 (6.1561) M_loss: 0.0250 (0.0248) lr: logit_scale: 100.00max mem: 8443 [2024-05-12 08:28:30,665 tvr 320 INFO]: eta: 4:44:24 epoch: 2/5 iteration: 3850/7030 time: 1.3637 (5.3663) data: 0.4905 (4.4867) loss: 6.1970 (6.1808) E_loss: 6.1726 (6.1559) M_loss: 0.0248 (0.0248) lr: logit_scale: 100.00max mem: 8443 [2024-05-12 08:31:26,774 tvr 320 INFO]: eta: 4:38:42 epoch: 2/5 iteration: 3900/7030 time: 1.2943 (5.3427) data: 0.4724 (4.4631) loss: 6.1943 (6.1810) E_loss: 6.1701 (6.1561) M_loss: 0.0245 (0.0248) lr: logit_scale: 100.00max mem: 8443 [2024-05-12 08:31:26,780 tvr 485 INFO]: [start] extract train feature [2024-05-12 08:35:03,700 tvr 505 INFO]: [finish] extract train feature [2024-05-12 08:35:03,700 tvr 546 INFO]: [start] extract text+video feature [2024-05-12 08:35:33,605 tvr 573 INFO]: [finish] extract text+video feature [2024-05-12 08:35:33,605 tvr 577 INFO]: 1000 1000 1000 1000 [2024-05-12 08:35:33,605 tvr 581 INFO]: [start] calculate the similarity [2024-05-12 08:35:33,605 tvr 387 INFO]: [finish] map to main gpu [2024-05-12 08:35:33,609 tvr 401 INFO]: [finish] map to main gpu [2024-05-12 08:36:08,858 tvr 584 INFO]: [end] calculate the similarity [2024-05-12 08:36:08,858 tvr 587 INFO]: [start] compute_metrics [2024-05-12 08:36:08,858 tvr 613 INFO]: sim matrix size: 1000, 1000 [2024-05-12 08:36:08,878 tvr 616 INFO]: Length-T: 1000, Length-V:1000 [2024-05-12 08:36:08,878 tvr 618 INFO]: [end] compute_metrics [2024-05-12 08:36:08,878 tvr 621 INFO]: time profile: feat 29.9s match 35.25275s metrics 0.01992s [2024-05-12 08:36:08,878 tvr 623 INFO]: Text-to-Video: R@1: 0.5 - R@5: 1.1 - R@10: 1.4 - R@50: 4.4 - Median R: 798.0 - Mean R: 683.1 [2024-05-12 08:36:08,878 tvr 625 INFO]: Video-to-Text: R@1: 0.6 - R@5: 1.1 - R@10: 1.7 - R@50: 4.6 - Median R: 810.5 - Mean R: 686.7 [2024-05-12 08:36:09,399 tvr 239 INFO]: Model saved to /root/autodl-tmp/outputs/pytorch_model.bin.step3900.2 [2024-05-12 08:36:10,072 tvr 239 INFO]: Model saved to /root/autodl-tmp/outputs/pytorch_model.bin.best.2 Can you give me some suggestions to deal with these problems ? Thanks
The text was updated successfully, but these errors were encountered:
No branches or pull requests
I'm getting strange results when running the code on an RTX 3090 GPU. I first used the code in CLIP4Clip to compress the video size to 3fps :
https://github.com/ArrowLuo/CLIP4Clip/blob/master/preprocess/compress_video.py
and then froze the clip model by using those code:
for param in self.clip.parameters():
param.requires_grad = False # not update by gradient
the train log on MSRVTT as follows :
[2024-05-12 08:25:31,329 tvr 320 INFO]: eta: 4:50:08 epoch: 2/5 iteration: 3800/7030 time: 1.3135 (5.3897) data: 0.4849 (4.5103) loss: 6.1797 (6.1809) E_loss: 6.1559 (6.1561) M_loss: 0.0250 (0.0248) lr: logit_scale: 100.00max mem: 8443
[2024-05-12 08:28:30,665 tvr 320 INFO]: eta: 4:44:24 epoch: 2/5 iteration: 3850/7030 time: 1.3637 (5.3663) data: 0.4905 (4.4867) loss: 6.1970 (6.1808) E_loss: 6.1726 (6.1559) M_loss: 0.0248 (0.0248) lr: logit_scale: 100.00max mem: 8443
[2024-05-12 08:31:26,774 tvr 320 INFO]: eta: 4:38:42 epoch: 2/5 iteration: 3900/7030 time: 1.2943 (5.3427) data: 0.4724 (4.4631) loss: 6.1943 (6.1810) E_loss: 6.1701 (6.1561) M_loss: 0.0245 (0.0248) lr: logit_scale: 100.00max mem: 8443
[2024-05-12 08:31:26,780 tvr 485 INFO]: [start] extract train feature
[2024-05-12 08:35:03,700 tvr 505 INFO]: [finish] extract train feature
[2024-05-12 08:35:03,700 tvr 546 INFO]: [start] extract text+video feature
[2024-05-12 08:35:33,605 tvr 573 INFO]: [finish] extract text+video feature
[2024-05-12 08:35:33,605 tvr 577 INFO]: 1000 1000 1000 1000
[2024-05-12 08:35:33,605 tvr 581 INFO]: [start] calculate the similarity
[2024-05-12 08:35:33,605 tvr 387 INFO]: [finish] map to main gpu
[2024-05-12 08:35:33,609 tvr 401 INFO]: [finish] map to main gpu
[2024-05-12 08:36:08,858 tvr 584 INFO]: [end] calculate the similarity
[2024-05-12 08:36:08,858 tvr 587 INFO]: [start] compute_metrics
[2024-05-12 08:36:08,858 tvr 613 INFO]: sim matrix size: 1000, 1000
[2024-05-12 08:36:08,878 tvr 616 INFO]: Length-T: 1000, Length-V:1000
[2024-05-12 08:36:08,878 tvr 618 INFO]: [end] compute_metrics
[2024-05-12 08:36:08,878 tvr 621 INFO]: time profile: feat 29.9s match 35.25275s metrics 0.01992s
[2024-05-12 08:36:08,878 tvr 623 INFO]: Text-to-Video: R@1: 0.5 - R@5: 1.1 - R@10: 1.4 - R@50: 4.4 - Median R: 798.0 - Mean R: 683.1
[2024-05-12 08:36:08,878 tvr 625 INFO]: Video-to-Text: R@1: 0.6 - R@5: 1.1 - R@10: 1.7 - R@50: 4.6 - Median R: 810.5 - Mean R: 686.7
[2024-05-12 08:36:09,399 tvr 239 INFO]: Model saved to /root/autodl-tmp/outputs/pytorch_model.bin.step3900.2
[2024-05-12 08:36:10,072 tvr 239 INFO]: Model saved to /root/autodl-tmp/outputs/pytorch_model.bin.best.2
Can you give me some suggestions to deal with these problems ? Thanks
The text was updated successfully, but these errors were encountered: