-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to reproduce result for validation set #40
Comments
Hi @sgdgp, some users have reported that they had to train the model themselves instead of using the pretrained models. I still haven't figured out the source of this issue, but it seems like only certain users are affected by this. I'll mention it in the FAQ. You might have better luck with the Docker setup. |
Thanks @MohitShridhar . Also the training is with decoder teacher forcing enabled right ? |
Ah no, leave it to the default |
Oh I see. Thanks! |
Not sure if this is causing the issue, but check that your versions of torch and torchvision are consistent with requirements.txt |
I am trying with the dockerfile. I will update on the status soon. |
@sgdgp Have you managed to reproduce the paper's results after all? @MohitShridhar How did you choose the model that produces the results in your paper? |
@IgorDroz we picked the best_seen model. You can try training the model yourself (from scratch) if the problem persists. |
I trained from scratch and got (valid_seen) while in the paper you achieved : The only difference between me and you is the initialization but still you got x2 better results in SR.. Additionally, i wanted to ask you regarding the testing. Is it only done via submission? or since the challenge has finished, will you be able to release a code with the actual GT of the test? Thanks, |
Sorry, what's the initialization difference? And also, is this inside a Docker container?
No. The leaderboard is a perpetual benchmark for ALFRED. As with any benchmark in the community, the test set will remain a secret to prevent cheating/overfitting. To evaluate on the test set, use the leaderboard submission. |
@MohitShridhar The initialization of the neural net, the initial weights. And no, it is not inside a docker. |
@IgorDroz can you report your |
@MohitShridhar How can i check the resnet checkpoint? |
@IgorDroz, it's usually inside |
@MohitShridhar Sorry for the late answer, so probably this is the difference, i use resnet18-5c106cde.pth. Now it makes sense, thanks! |
Oops, sorry. I just checked again. I am also using The next thing to try would be run this inside docker to make sure the setup is exactly the same. |
@MohitShridhar Hi again, Just saw your answer. yet i am not able to reproduce your results, docker shouldn't really matter as the environment is the same and i should be able to get similar results to yours... a recap of what i tried and what i got:
Which results did you achieve with this model? because they are pretty far from what you have reported in the paper:
this time i used P100 GPU like you, yet the results are different. ai2thor==2.1.0 |
@IgorDroz Docker is a way to ensure that the setup is completely identical (like CUDA, torch, torchvision etc). Check out this work, and their reproduced results. Their models are also substantially better than the baselines reported in the ALFRED paper. I am not sure what else could be causing this issue. Sorry. |
@MohitShridhar I will definitely check their work out, thanks! |
@IgorDroz I don't think the leaderboard topper has made their paper/code publicly available. It's probably a recent submission (or to be submitted), so you'd have to wait for the anonymity period to end. |
@MohitShridhar okay, thanks a lot! |
Cannot reproduce results either using the pre-trained best-seen model (and resnet18-5c106cde.pt). I'm on torch==1.9.0 (py3.7_cuda10.2_cudnn7.6.5_0), results look similar to the ones posted above by other users. SR: 8/820 = 0.010 Was anyone able to reproduce the results at all? Just aking. |
Hi,
Thanks for the amazing dataset and for sharing your code.
I am unable to reproduce the results for validation set seen.
I downloaded the checkpoints as provided by you and I am using the best_seen.pth
I am getting SR 0.0097 and GC 0.0659 whereas the result on val seen in the paper is SR 0.037 and GC 0.1.
Could you point to any stuff I might have missed ?
For starting XServer I used
sudo nvidia-xconfig -a --use-display-device=None --virtual=1024x786
sudo /usr/bin/X :0 &
I face two warnings
UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. warnings.warn("Default upsampling behavior when mode={} is changed "
UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead. warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
The second warning won't affect the results but I wanted to confirm if upsampling with align corners was intended or whether the warning appeared earlier too and I should ignore it ?
The text was updated successfully, but these errors were encountered: