Does training is stable? #3

kleinzcy · 2023-12-17T03:57:11Z

Hi, authors:

Thanks for your work and code. I have tried to run your code on 2 A100. But the result is ~7, which seems hard to achieve 5.26 on Celeba 256x256. Therefore, I am curious about the stability of training. Do the results vary a lot for several runs?

hao-pt · 2023-12-19T09:47:34Z

The training is relatively stable and consistent for each single experiment. It is not as varied as your provided results. May you provide us your training hyper-params and the detail of model checkpoint you used for evaluation like checkpoint at which epoch? One practice is to enable --use_ema args when training to mitigate the large oscillation of model performance.

kleinzcy · 2023-12-20T02:07:58Z

Thanks for your reply. The script I use is as follows:

accelerate launch --main_process_port 33996 --num_processes 2 train_flow_latent.py --exp celeb_f8_dit_g2 \
    --dataset celeba_256 --datadir celeba_hq/celeba-lmdb \
    --batch_size 32 --num_epoch 500 \
    --image_size 256 --f 8 --num_in_channels 4 --num_out_channels 4 \
    --nf 256 --ch_mult 1 2 3 4 --attn_resolution 16 8 4 --num_res_blocks 2 \
    --lr 2e-4 --scale_factor 0.18215 --no_lr_decay \
    --model_type DiT-L/2 --num_classes 1 --label_dropout 0. \
    --save_content --save_content_every 10

And I use the checkpoint of 474 and 500 epochs for evaluation. I will try to use --use_ema.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does training is stable? #3

Does training is stable? #3

kleinzcy commented Dec 17, 2023

hao-pt commented Dec 19, 2023

kleinzcy commented Dec 20, 2023 •

edited

Loading

Does training is stable? #3

Does training is stable? #3

Comments

kleinzcy commented Dec 17, 2023

hao-pt commented Dec 19, 2023

kleinzcy commented Dec 20, 2023 • edited Loading

kleinzcy commented Dec 20, 2023 •

edited

Loading