Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Over 450 Generated Images. FID 271.254 . What's Wrong.... #11

Open
sumorday opened this issue Mar 20, 2024 · 22 comments
Open

Over 450 Generated Images. FID 271.254 . What's Wrong.... #11

sumorday opened this issue Mar 20, 2024 · 22 comments

Comments

@sumorday
Copy link

sumorday commented Mar 20, 2024

スクリーンショット 2024-03-20 午後5 12 14 スクリーンショット 2024-03-20 午後5 12 27 スクリーンショット 2024-03-20 午後5 13 26

image_epoch_450

Hi! I have generated 450 images, and the facial features are already clear. However, I'm not sure why the FID value is 270. Should I keep only the 'model_450.pth' and 'image_epoch_450.png' checkpoints for testing?

Tôi đã tạo ra 450 hình ảnh, và các đặc điểm khuôn mặt đã rõ ràng. Tuy nhiên, tôi không chắc chắn tại sao giá trị FID lại là 270. Tôi có nên chỉ giữ lại các điểm kiểm tra 'model_450.pth' và 'image_epoch_450.png' để kiểm tra không?

@sumorday sumorday changed the title Only 270 FID Despite Over 400 Generated Images: What's Wrong.... Over 400 Generated Images. FID 271.254 . What's Wrong.... Mar 20, 2024
@sumorday sumorday changed the title Over 400 Generated Images. FID 271.254 . What's Wrong.... Over 450 Generated Images. FID 271.254 . What's Wrong.... Mar 20, 2024
@hao-pt
Copy link
Collaborator

hao-pt commented Mar 20, 2024

I think you misunderstand here. To compute FID, following standard practice, you should generate 50_000 images for statistical significance.

@sumorday
Copy link
Author

sumorday commented Mar 20, 2024

I think you misunderstand here. To compute FID, following standard practice, you should generate 50_000 images for statistical significance.

スクリーンショット 2024-03-20 午後6 40 07

or

スクリーンショット 2024-03-20 午後7 04 23

Thank you!
Does this mean I should change num_epoch from 500 to 50,000?
Or where should I set this?

@hao-pt
Copy link
Collaborator

hao-pt commented Mar 20, 2024

50_000 is the number of output images that you use your trained model at epoch 450 model_450.pth to generate, not the number of generated images during training. To test, you need to modify some args in test_args/celeb256_dit.txt like EPOCH_ID, EXP and then run bash_scripts/run_test_ddp.sh test_args/celeb256_dit.txt.

@sumorday
Copy link
Author

samples_celeba_256_dopri5_1e-05_1e-05
スクリーンショット 2024-03-21 午後5 56 37

According to the standard procedure, I used the command bash bash_scripts/run_test.sh test_args/celeb256_dit.txt and found that it ultimately generates an image of CelebA, located in the main directory.
Is this image supposed to be the final result? However, it did not return an FID value. I noticed that the file test_flow_latent.py automatically samples 50,000 images. Epoch ID 475 represents the approximate appearance of the images at this stage.

スクリーンショット 2024-03-21 午後6 02 16

Theo quy trình tiêu chuẩn, tôi đã sử dụng lệnh bash bash_scripts/run_test.sh test_args/celeb256_dit.txt và phát hiện rằng nó cuối cùng sẽ tạo ra một hình ảnh của CelebA, nằm trong thư mục chính. Liệu hình ảnh này có phải là kết quả cuối cùng không? Tuy nhiên, nó không trả về giá trị FID. Tôi nhận thấy rằng tệp test_flow_latent.py tự động lấy mẫu 50,000 hình ảnh. Epoch ID 475 đại diện cho sự xuất hiện xấp xỉ của các hình ảnh ở giai đoạn này.

@sumorday
Copy link
Author

sumorday commented Mar 25, 2024

截屏2024-03-25 13 51 01
Using this code python pytorch_fid/fid_score.py ./pytorch_fid/celebahq_stat.npy ./saved_info/latent_flow/celeba_256/celeb_f8_dit will result in inaccurate FID scores. 344.229

However, running the following code as per the GitHub instructions bash bash_scripts/run_test.sh test_args/celeb256_dit.txt directly throws an error indicating missing and unexpected keys in the state dictionary.
This typically happens when there is a mismatch between the model architecture and the saved state dictionary.

Any ideas? thank you !

Sử dụng mã này python pytorch_fid/fid_score.py ./pytorch_fid/celebahq_stat.npy ./saved_info/latent_flow/celeba_256/celeb_f8_dit sẽ dẫn đến điểm FID không chính xác. 344.229

Tuy nhiên, chạy mã sau đây theo hướng dẫn trên GitHub bash bash_scripts/run_test.sh test_args/celeb256_dit.txt trực tiếp sẽ gây ra lỗi chỉ ra các khóa bị thiếu và không mong đợi trong từ điển trạng thái. Điều này thường xảy ra khi có sự không phù hợp giữa kiến trúc của mô hình và từ điển trạng thái đã lưu.

Bạn có ý kiến gì không? Cảm ơn bạn!

@sumorday
Copy link
Author

sumorday commented Mar 25, 2024

截屏2024-03-25 14 06 16

Also, I tried to use the 475.pth file from the original GitHub repository to test, but I found that it only generated one image and couldn't calculate the FID score.

Tôi đã thử sử dụng tệp 475.pth từ kho lưu trữ GitHub gốc để kiểm tra, nhưng tôi thấy rằng nó chỉ tạo ra một hình ảnh và không thể tính điểm FID được.

截屏2024-03-25 14 07 47

@xiaweijiexox
Copy link

You‘ve not indicate the appropriate epoch to evaluate the fid of celeba-256-adm. Can you show the number?

@xiaweijiexox
Copy link

When I compute the fid with your origin code on dataset celebA-256(unconditional generation) with adm, I find that fid is only 9.21 in 475 epoch. And I sample 50000 images, so I think the output is comparatively accurate. This experiment is not given pth by you, so I don't know what's the problem.
image

@quandao10
Copy link
Collaborator

Hi, I'm understanding that you retrain our model and get 9.21. Is it correct ?

@quandao10
Copy link
Collaborator

Please note that: our stat file is computed using jpg images. If the generated image is png image, it leads to very high fid.

@xiaweijiexox
Copy link

I'm sure that I generate jpg images ,because I used your code directly, and I checked that moments ago. Maybe you can provide the pth file, I don't have idea about the concrete epoch to stop, but I'm sure that the outcome of 475 epoch is 9.21.

@quandao10
Copy link
Collaborator

I trained the model for 600 epochs and evaluate at 475 for CelebHQ-256

@xiaweijiexox
Copy link

I've found that the model has more fluctuation after 500 epochs(just fids), do you think so?

@xiaweijiexox
Copy link

I began to test DiT. I think that wouldn't cause doubt.

@quandao10
Copy link
Collaborator

Yes, the model seems unstable after 500 epoch. In our paper, we use Cosine Learning rate decay and it depends on the total epoch. To be more stable, we suggest to use ema model and you could use ema code from DiT repo. Ema model is more stable and have better FID. Please, consider to use dropout if model converge to fast, you could have a look at https://arxiv.org/pdf/2102.09672 appendix section about overfitting on CIFAR10

@xiaweijiexox
Copy link

OK, thanks. I'm trying again with ema.

@xiaweijiexox
Copy link

When I use your EMA.py, I find that “AttributeError: 'EMA' object has no attribute '_optimizer_state_dict_pre_hooks'. Did you mean: 'register_state_dict_pre_hook'?” What do you mean by "ema code from DiT repo" ?

@xiaweijiexox
Copy link

Should I use the file DiT/train.py to revise your code? I've made the revision, but I'm not sure about it. Why you have an EMA.py, but I still need to use the ema in DiT?

@quandao10
Copy link
Collaborator

Yes, you should use DiT/train.py to revise my code. I found it is more easier and compact when following DiT repo.

@sumorday
Copy link
Author

So, by running the code
bash_scripts/run.sh test_args/celeb256_dit.txt,
it can automatically perform the so-called flowing matching in the latent space, right?
Of course, directly downloading the 475.pth file provided on GitHub and generating 50,000 images, the tested FID value is indeed 5.24.
I didn’t use the 475.pth file provided on LFM GitHub, but trained from scratch.
The test result of 475.pth did not achieve a FID value of 5.2. just only 6.02.. all images are jpg images
I am wondering if it is running correctly.

@quandao10
Copy link
Collaborator

Yes, I think you run it correctly, I wonder what environment you use to run model. I found that the architecture is more stable with torch 1.x version. I retrained our model on torch 2.x, the result is around 5.8 to 6.1, same to you.

@sumorday
Copy link
Author

Yes, I think you run it correctly, I wonder what environment you use to run model. I found that the architecture is more stable with torch 1.x version. I retrained our model on torch 2.x, the result is around 5.8 to 6.1, same to you.

Thank you.
The issue with the torch.distributed.checkpoint module is that it does not exist in PyTorch 1.x versions. Therefore, if I downgrade to a 1.x version, the code will not work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants