Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does training is stable? #3

Open
kleinzcy opened this issue Dec 17, 2023 · 2 comments
Open

Does training is stable? #3

kleinzcy opened this issue Dec 17, 2023 · 2 comments

Comments

@kleinzcy
Copy link

Hi, authors:

Thanks for your work and code. I have tried to run your code on 2 A100. But the result is ~7, which seems hard to achieve 5.26 on Celeba 256x256. Therefore, I am curious about the stability of training. Do the results vary a lot for several runs?

@hao-pt
Copy link
Collaborator

hao-pt commented Dec 19, 2023

The training is relatively stable and consistent for each single experiment. It is not as varied as your provided results. May you provide us your training hyper-params and the detail of model checkpoint you used for evaluation like checkpoint at which epoch? One practice is to enable --use_ema args when training to mitigate the large oscillation of model performance.

@kleinzcy
Copy link
Author

kleinzcy commented Dec 20, 2023

Thanks for your reply. The script I use is as follows:

accelerate launch --main_process_port 33996 --num_processes 2 train_flow_latent.py --exp celeb_f8_dit_g2 \
    --dataset celeba_256 --datadir celeba_hq/celeba-lmdb \
    --batch_size 32 --num_epoch 500 \
    --image_size 256 --f 8 --num_in_channels 4 --num_out_channels 4 \
    --nf 256 --ch_mult 1 2 3 4 --attn_resolution 16 8 4 --num_res_blocks 2 \
    --lr 2e-4 --scale_factor 0.18215 --no_lr_decay \
    --model_type DiT-L/2 --num_classes 1 --label_dropout 0. \
    --save_content --save_content_every 10 

And I use the checkpoint of 474 and 500 epochs for evaluation. I will try to use --use_ema.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants