-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions about the setting of the Training Phases #18
Comments
As you can see, even without the path you mentioned, HiSD works fine with the extracted style in the inference. This is because of the cycle-translation path. In the cycle-back phase, the image is manipulated by the extracted style and we add an adversarial objective for the cycle-back image, too (you can find the significance of this objective in the ablation study). In an ideal situation, it seems unnecessary to use another phase using extracted style guiding the translation. However, I agree that using the style extracted from a reference image during training can possibly further enhance the ability of the extractor and stabilize the training in the late iterations. The extractor is seen to cheat and fail to extract the style if you train the model for a long time. I will definitely try this later and update the code if it helps (if does't help, I will report the result in this issue). But still, you're welcomed to have a try and edit the code yourself. Thank you for your suggestion. |
I have tried the experiments of adding another random reference image on the AFHQ dataset when training the generator and discriminator. I find adding that can help the reference-guided translation. When running that model on the AFHQ dataset, the diversity is limited of HiSD compared to other models such as StarGAN-v2 and DRIT++. Adding another random reference image can slightly help the generation of reference-branch. |
@HelenMao Thank you for sharing this result. |
I think this operation can stabilize the training of the Extractor in the AFHQ dataset, but I am not sure in the CelebA dataset. But as you mention in [the reply issue 19], (#19 (comment)), HiSD only focuses on manipulating the shape and maintains the background and color. That is really interesting and I have not figured out why, since other frameworks change the background and can generate diverse textures. I once thought it is because of the mask operation. HiSD translator has added the original features to the transformed features through the attention mask, but I fail to learn the mask on my own framework. In my experiment, the results of my framework still change the background and produce diverse textures when directly copying your attention module in the HiSD model. I think the diversity loss such as mode seeking loss operation may have some influence but I am not sure, I would like to copy your attention module to the starGAN-v2 to see whether it will have some influence. |
I have calculated the FID results of randomly generated results too in AFHQ dataset. |
Thank you for this information, I will try this later. As you noticed, if there is no early stop after around 300k iterations (less tag means less iterations), the extractor will get mode collapse. It would be very helpful If this operation can improve this. The mode seeking loss (diversification loss) may influence the disentanglement indeed, although I like the simple but effective idea very much. The reviewer also asked me why we don't add this loss. I reply that "In our setting, the gains from diversifying small objects (i.e., glasses) are far less than the gains from diversifying background colors. Therefore, we think that it may cause the manipulation of global information and aggravate the mode collapse of small objects." But I'm not sure and this is just my guess. |
Have you tried only use the guided image to generate the style vector? I think its more natural to get the style code from an existing image rather than generated from an random style vector like StyleGAN. Using an random style code mask this work more like an generate task rather than the transfer task. |
Although some works focus on only the reference-guided task. Both of these two tasks are necessary for practical use. Imagine you want to add glasses for an input picture, the target is determined but you may not have a reference image, then you can directly sample the style code by a simple prior distribution. Therefore, reference-guided task is customized while latent-guided task is convenient. This is exactly a transfer task because the output is based on the input (eg, identity), no matter which kind of style code is used. |
According to your paper and training code, three image generate phases ( raw / self / random style code) had been used. Howerver, it's easy to pick up another similar phase, like random style image, as an training phase. Specifically, randomly pick another image as the input of style encoder, follow the same data flow like random style code. Some other works( like https://github.com/saic-mdal/HiDT ) had been picked such phases into their traning phases and got satisfied results.
As a result, I just wondering that have you been tried this phase before? How was the result be like and why did not add it into your paper?
The text was updated successfully, but these errors were encountered: