-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MGNet pretraining goes wrong #15
Comments
Hi, I think you should follow our paper to train it by stages, just as TMNet referred in our work. A general strategy is to first train the AtlasNet (by setting tmn_subnetworks=1 in mgnet.yaml). After it converges, loading the weights (by setting weight path in mgnet.yaml), and fix it to train the second stage. |
I have trained with tmn_subnetworks set to 1. However, When I was trying to load the weights and fix them to train the second stage, I didn't find an option to fix the loaded weights. Option 'train.freeze' seems to be able to control which submodule needs to be fixed. But it can't be used to fix the weights of the first stage. |
Noticed that apart from the difference of optimizer settings (learning rate 1e-4 vs 1e-3, different scheduler) between the code and the paper, the batch_size settings are different too (2 vs 32). Can I go through README to reproduce the results of the paper or do I need another modification not mentioned in README? |
Hi, Boundary loss will only work for points on open boundaries, which works at the second stage (tmn_subnetworks =2). So it will be 0s if tmn_subnetworks =1. The first stage means shape deformation and the second stage is for topology modification. Edge loss is a regularization term to penalize extra-long edges. It will not change much during training. Face loss is to classify whether a point on edges/faces should be removed. We will update our README to make it more detailed after our deadline ends. Here is our training strategy, you can also follow the strategy in this work : We first set 'tmn_subnetworks=1' and turn off the edge classifier by setting 'with_edge_classifier=False' in config.yaml for training (it is equivalent to AtlasNet). After converging, turn on the 'with_edge_classifier=True' to train the edge classifier in the first stage. The above are the modules in the first stage. After that, we fix the above modules to train the second-stage decoder using this function. You can add a line |
Thanks a lot for your patience and detailed explanation! I'll try the steps and refer to the work. |
@pidan1231239 Sorry about asking not related question. I want to know how did you visualize the training process? Is this written in the source code? |
I used weights and biases. Added a few lines of code. |
Thank you for your fast reply! I will also give it a try |
No problem! |
@pidan1231239 Hi, have you reproduced the results reported in the paper? I'm also trying to do it but only got 0.103016 (average chamfer distance). |
The downloaded checkpoint can achieve 0.008187 Chamfer loss, which is before ICP alignment and probably not the exact code for the final evaluation. In my best try, the loss can lower down to 0.01028, with batch size changed to 32 like in the paper. However, I used two GPUs in the second stage and one in others because of the memory limitation. Don't know if there is something I missed. |
Hi, the author didn't reply to this question. I am also curious about whether we should follow the batch size and learning rate in the paper or in this GitHub. The batch size, lr and epoch number are all different. |
Hi Yinyu:
I tried to pretrain MGNet with
python main.py configs/mgnet.yaml --mode train
and test it withpython main.py configs/mgnet.yaml --mode test
.However, after 50 epochs of training, the learning rate quickly reduced to a seemingly unreasonable level of 1e-08 with the best chamfer_loss stuck at 5.67 after the 6th epoch.
log.txt
Also, the test results of the best checkpoint looks like below:
log.txt
Is there anything I missed?
The text was updated successfully, but these errors were encountered: