-
Notifications
You must be signed in to change notification settings - Fork 287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Igor lig 4447 w mse benchmark #1474
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #1474 +/- ##
==========================================
- Coverage 85.50% 85.45% -0.05%
==========================================
Files 135 135
Lines 5657 5672 +15
==========================================
+ Hits 4837 4847 +10
- Misses 820 825 +5 ☔ View full report in Codecov by Sentry. |
benchmarks/imagenet/resnet50/wmse.py
Outdated
|
||
# we use a projection head with output dimension 64 | ||
# and w_size of 128 to support a batch size of 256 | ||
self.projection_head = WMSEProjectionHead(output_dim=64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the output dimension is wrong here. From the paper:
Finally, we use an embedding size
of 64 for CIFAR-10 and CIFAR-100, and an embedding of
size of 128 for STL-10 and Tiny ImageNet. For ImageNet-
100 we use a configuration similar to the Tiny ImageNet
experiments, and 240 epochs of training. Finally, in the
ImageNet experiments (Tab. 3), we use the implementation
and the hyperparameter configuration of (Chen et al., 2020b)
(same number of layers in the projection head, etc.) based
on their open-source implementation2, the only difference
being the learning rate and the loss function (respectively,
0.075 and the contrastive loss in (Chen et al., 2020b) vs. 0.1
and Eq. 6 in W-MSE 4
So they're using a SimCLR2 projection head.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And most likely the embedding dim is the same as the one for SimCLR2.
benchmarks/imagenet/resnet50/wmse.py
Outdated
self.projection_head = WMSEProjectionHead(output_dim=64) | ||
|
||
self.criterion_WMSE4loss = WMSELoss( | ||
w_size=128, embedding_dim=64, num_samples=4, gather_distributed=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For ImageNet they probably use w_size=256
:
For CIFAR-10
and CIFAR-100, the slicing sub-batch size is 128, for Tiny
ImageNet and STL-10, it is 256
"weight_decay": 0.0, | ||
}, | ||
], | ||
lr=0.1 * math.sqrt(self.batch_size_per_device * self.trainer.world_size), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The denominator is missing here.
@@ -59,10 +61,18 @@ def forward(self, x: torch.Tensor) -> torch.Tensor: | |||
|
|||
f_cov_shrinked = (1 - self.eps) * f_cov + self.eps * eye | |||
|
|||
# get type of f_cov_shrinked and temporary convert to full precision | |||
# to support chelosky decomposition | |||
f_cov_shrinked_type = f_cov_shrinked.dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks super duper hacky. Why is it necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, as written in the comment. The original code is not using half precision.
if self.gather_distributed and dist.is_initialized(): | ||
world_size = dist.get_world_size() | ||
if world_size > 1: | ||
input = torch.cat(gather(input), dim=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure this is correct? Intuitively I think there could be problems because now every device computes the exact same loss, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it but I will add it again. That seems the most easy and proper way to support multi-GPU training. I'll make sure we divide the loss by the number of devices to make runs more comparable between different multi-gpu setups.
@@ -699,6 +699,27 @@ def __init__( | |||
) | |||
|
|||
|
|||
class WMSEProjectionHead(SimCLRProjectionHead): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this. We should be able to use SimCLR instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guarin, we should make sure things are consistent. I'm not sure what we agreed on. AFAIK, The same goes for the transforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't the default values different?
In any case, I prefer if all components of the WMSE
model are called WMSESomething
. Mixing components from different models is always confusing and it makes the components harder to discover in the code. If two models have the same head then we can just subclass from the first model and update the docstring.
num_layers: int = 2, | ||
batch_norm: bool = True, | ||
): | ||
super(WMSEProjectionHead, self).__init__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super(WMSEProjectionHead, self).__init__( | |
super().__init__( |
In general the class should not be passed to the super method.
Changes