MGNet pretraining goes wrong #15

chengzhag · 2020-09-24T01:08:50Z

Hi Yinyu:

I tried to pretrain MGNet with python main.py configs/mgnet.yaml --mode train and test it with python main.py configs/mgnet.yaml --mode test.

However, after 50 epochs of training, the learning rate quickly reduced to a seemingly unreasonable level of 1e-08 with the best chamfer_loss stuck at 5.67 after the 6th epoch.
log.txt

Also, the test results of the best checkpoint looks like below:
log.txt

Is there anything I missed?

The text was updated successfully, but these errors were encountered:

yinyunie · 2020-09-25T10:21:03Z

Hi,

I think you should follow our paper to train it by stages, just as TMNet referred in our work. A general strategy is to first train the AtlasNet (by setting tmn_subnetworks=1 in mgnet.yaml). After it converges, loading the weights (by setting weight path in mgnet.yaml), and fix it to train the second stage.

chengzhag · 2020-09-27T09:25:56Z

I have trained with tmn_subnetworks set to 1. However, When I was trying to load the weights and fix them to train the second stage, I didn't find an option to fix the loaded weights.

Option 'train.freeze' seems to be able to control which submodule needs to be fixed. But it can't be used to fix the weights of the first stage.

chengzhag · 2020-09-27T09:53:51Z

Noticed that apart from the difference of optimizer settings (learning rate 1e-4 vs 1e-3, different scheduler) between the code and the paper, the batch_size settings are different too (2 vs 32). Can I go through README to reproduce the results of the paper or do I need another modification not mentioned in README?

chengzhag · 2020-09-27T10:20:05Z

Also, with tmn_subnetworks set to 1, the training loss and testing loss looks like below.

looks like there is some thing wrong with edge, face, boundary loss.

yinyunie · 2020-09-27T16:42:05Z

Hi,

Boundary loss will only work for points on open boundaries, which works at the second stage (tmn_subnetworks =2). So it will be 0s if tmn_subnetworks =1. The first stage means shape deformation and the second stage is for topology modification.

Edge loss is a regularization term to penalize extra-long edges. It will not change much during training.

Face loss is to classify whether a point on edges/faces should be removed.

We will update our README to make it more detailed after our deadline ends. Here is our training strategy, you can also follow the strategy in this work :

We first set 'tmn_subnetworks=1' and turn off the edge classifier by setting 'with_edge_classifier=False' in config.yaml for training (it is equivalent to AtlasNet). After converging, turn on the 'with_edge_classifier=True' to train the edge classifier in the first stage. The above are the modules in the first stage.

After that, we fix the above modules to train the second-stage decoder using this function. You can add a line
self.mesh_reconstruction.module.freeze_by_stage(2, ['decoder'])
at this place and remember to turn on 'with_edge_classifier=True' and 'tmn_subnetworks=2'.

chengzhag · 2020-09-28T00:05:41Z

Thanks a lot for your patience and detailed explanation! I'll try the steps and refer to the work.

chengzhag · 2020-10-13T00:36:26Z

Hi Yinyu:
I followed the three steps (MGN1, MGN2, MGN3) and got the following results:

It seems that the third step didn't improve the chamfer loss at all. Where did I do wrong?

The test Avg_Chamfer of stage 3 is 9.70. Not as good as the 8.36 of your paper and the 8.14 of the downloaded MGNet checkpoint.

Another question is if the Avg_Chamfer provided by your test code is the same metric in your paper? The paper mentioned that an ICP algorithm is applied to the output which is not in the code.

chengzhag · 2020-10-14T13:09:53Z

I tried another run. Looks like the learning rate of the first step starts to go down 30 epochs later accidentally. Which results in a better chamfer score after the first step:

However, the test chamfer becomes worse after the third step, which is strange. The best chamfer I got is 9.07 which is after the second step of training. This is still not so close to your results.

May I get some more tips about the training process? Is there something wrong with my procedure?

WenM1222 · 2020-11-25T11:03:35Z

@pidan1231239 Sorry about asking not related question. I want to know how did you visualize the training process? Is this written in the source code?

chengzhag · 2020-11-25T11:06:13Z

I used weights and biases. Added a few lines of code.

WenM1222 · 2020-11-25T11:22:45Z

Thank you for your fast reply! I will also give it a try

chengzhag · 2020-11-25T11:23:34Z

No problem!

Wi-sc · 2020-12-02T10:51:02Z

@pidan1231239 Hi, have you reproduced the results reported in the paper? I'm also trying to do it but only got 0.103016 (average chamfer distance).

chengzhag · 2020-12-02T11:09:18Z

The downloaded checkpoint can achieve 0.008187 Chamfer loss, which is before ICP alignment and probably not the exact code for the final evaluation.

In my best try, the loss can lower down to 0.01028, with batch size changed to 32 like in the paper. However, I used two GPUs in the second stage and one in others because of the memory limitation. Don't know if there is something I missed.

Cindy0725 · 2023-05-19T02:09:08Z

Noticed that apart from the difference of optimizer settings (learning rate 1e-4 vs 1e-3, different scheduler) between the code and the paper, the batch_size settings are different too (2 vs 32). Can I go through README to reproduce the results of the paper or do I need another modification not mentioned in README?

Hi, the author didn't reply to this question. I am also curious about whether we should follow the batch size and learning rate in the paper or in this GitHub. The batch size, lr and epoch number are all different.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MGNet pretraining goes wrong #15

MGNet pretraining goes wrong #15

chengzhag commented Sep 24, 2020

yinyunie commented Sep 25, 2020

chengzhag commented Sep 27, 2020

chengzhag commented Sep 27, 2020

chengzhag commented Sep 27, 2020

yinyunie commented Sep 27, 2020

chengzhag commented Sep 28, 2020

chengzhag commented Oct 13, 2020

chengzhag commented Oct 14, 2020

WenM1222 commented Nov 25, 2020

chengzhag commented Nov 25, 2020

WenM1222 commented Nov 25, 2020

chengzhag commented Nov 25, 2020

Wi-sc commented Dec 2, 2020 •

edited

Loading

chengzhag commented Dec 2, 2020

Cindy0725 commented May 19, 2023

MGNet pretraining goes wrong #15

MGNet pretraining goes wrong #15

Comments

chengzhag commented Sep 24, 2020

yinyunie commented Sep 25, 2020

chengzhag commented Sep 27, 2020

chengzhag commented Sep 27, 2020

chengzhag commented Sep 27, 2020

yinyunie commented Sep 27, 2020

chengzhag commented Sep 28, 2020

chengzhag commented Oct 13, 2020

chengzhag commented Oct 14, 2020

WenM1222 commented Nov 25, 2020

chengzhag commented Nov 25, 2020

WenM1222 commented Nov 25, 2020

chengzhag commented Nov 25, 2020

Wi-sc commented Dec 2, 2020 • edited Loading

chengzhag commented Dec 2, 2020

Cindy0725 commented May 19, 2023

Wi-sc commented Dec 2, 2020 •

edited

Loading