-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request for [D,R] RIFE trained on Style loss, also called Gram matrix loss (the best perceptual loss function) #12
Comments
Hi, thanks for the advice and sorry for the late reply. |
Many thanks for your response and willingness to solve this intriguing problem. I completely understand the lack of time as I know this problem from my example. I am ashamed of how my rankings look and the fact that I don't have time to update them. Unfortunately all my activity here is my hobby and my job for life is something completely different. I would love to help, but unfortunately I'm not even a programmer, let alone something as complicated as training AI models. I'm planning to buy an NVIDIA GeForce RTX 5090 graphics card early next year, and if someone develop the software with a GUI to train AI models, then of course I'd be happy to help with training or other testing. I will, however, try to help as much as I can to publicise this thread and maybe someone else will get interested in the topic and help with this experiment. I think one thing would help a lot to increase interest in this experiment and your InterpAny-Clearer method in general. If you could add to the comparison: [T] RIFE one more model, without any training: [T] RIFE v4.15 I think generating an animated GIF file with x128 interpolation won't take much time and will greatly increase interest in your method and allow me to better publicise InterpAny-Clearer among the many enthusiasts who use this very practical model on a daily basis, rather than the base version of RIFE. In my opinion, it is the comparison with the practical, current best version of RIFE i.e. RIFE v4.15 that will be the best confirmation of the need for the InterpAny-Clearer and give me the needed proof to publicise in my rankings and beyond the need to use this improvement in all practical models in the future, not only with RIFE. Of course, I would also like to know what loss function to recommend in the introduction to my rankings so that future models use it too. This is why it is so important to find out which loss function works best with InterpAny-Clearer. You recommend using [D,R] RIFE rather than [D,R] RIFE-vgg and I agree that the example given with the young woman's face justifies this. However, I think it is worth doing this experiment to see if Style loss, also called Gram matrix loss will give an even better effect. Then we will have a clear proof of which loss function InterpAny-Clearer works best with. |
see RIFE v4.17 - trained on Style loss, also called Gram matrix: https://github.com/hzwer/Practical-RIFE#model-list |
You were right to write here that your new work may be of great interest to me. The level of detail retention is really impressive when compared to the baseline models and your work could revolutionise the way VFI models are developed, especially those for practical applications.
I am particularly grateful to you for training the [D,R] RIFE-vgg model and the comparison of x128 interpolation with the [D,R] RIFE and [T] RIFE models. You recommend [D,R] RIFE for more stable results and after seeing the GIF files with the interpolation results I fully share your opinion. These results mean that I will have to completely revise my introduction to Video Frame Interpolation Rankings and Video Deblurring Rankings.
I think, however, that there is a solution to get all the advantages of the [D,R] RIFE-vgg and [D,R] RIFE models while eliminating their disadvantages. That solution is Style loss, also called Gram matrix loss.
The first time Style loss to train the video frame interpolation model was used by Google Research when training their FILM-𝓛S model, link: The second time this loss was used by Disney Research when training their UGFI 𝓛S model, link:
Both models achieved some of the best and maybe even the best LPIPS results (because we do not have a direct comparison with the other two models of the top four, and the results are very close to each other):
Vimeo-90K triplet: LPIPS<=0.017 [excluding LPIPS(SqueezeNet) results]
{Input fr.}
dataset
repository
model
However, the most interesting thing is the visual comparison of the three loss functions. In my opinion, Style loss clearly gives the best result perceptually:
Source: FILM - Loss Functions Ablation https://film-net.github.io/
Furthermore, interpolation with FILM-𝓛S can eliminate the artefacts seen with FILM-𝓛1, as can be seen in Fig. 1 in Supplementary Material.
More details on Style loss and more examples are on YouTube: https://www.youtube.com/watch?v=OAD-BieIjH4&t=160s
UGFI model trained with Style loss also retains an amazing amount of fine detail, as the examples at the bottom of Figure 6 in Supplementary Material show particularly well.
The Style loss equation can be found in Sec.3.1
The loss combination weights for the FILM-𝓛S model are in Sec. 1.1 in Supplementary Material.
The loss combination weights for the UGFI 𝓛S model are in Sec. 3.3
You did a great job with the [T] RIFE, [D,R] RIFE, [D,R] RIFE-vgg comparison. You've shown something I haven't seen anywhere so far, which is that at x128 interpolation VGG Loss can give more messy artifacts than the benefit of preserving fine detail. I think now more people may be seriously considering whether to use the perceptual loss function for practical purposes.
Therefore, I have a great request to you to train the [D,R] RIFE model using Style loss and compare it with the other 3 models whose x128 interpolation results you showed as GIF files. This may be the best model for practical use - retaining the most detail with your method combined with training using Style loss without the creation of messy artefacts.
I would like to include the results of the comparison you manage to achieve in the introduction to Video Frame Interpolation Rankings and Video Deblurring Rankings to draw the attention of other researchers to how to train VFI models for practical applications and of course to your method as well.
Also have a look at what your neighbours in your city developed a month ago: a no-reference Perceptual Quality Assessment for Video Frame Interpolation, in particular TABLE I.
The text was updated successfully, but these errors were encountered: