You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for the great work on RoMa! I've been investigating its use in medical image registration (specifically histopathology) and it works quite well out of the box. There are however several histopathology-specific pretrained DinoV2 ViT-L14 models out there, so I've been experimenting with one of those for more domain-specific features. Unfortunately, my results have drastically deteriorated (e.g. barely any matches vs thousands with original backbone) with any other backbone than the original (while taking into account domain-specific image normalization etc.).
Would you have any thoughts why a different ViT-L14 may not work as expected? Would swapping out the backbone perhaps require retraining of the matcher as well? Or are there perhaps any other (minor) details I may have overlooked while implementing the domain-specific backbone?
The text was updated successfully, but these errors were encountered:
That's unfortunate. Would you see any avenues (i.e. retraining of the matcher to learn to handle different features) that might resolve this? I'd imagine this is very tricky for different architectures (i.e. ViT-S), but intuitively it seems like this would be quite reasonable with the exact same architecture with just different weights.
You could try mapping the new backbone to DINOv2 on some reasonable in distribution images, for example attach a linear head at the end of it. Then roma should be able to handle the input, as it's aligned with dinov2 features.
It's tricky in general to be invariant to the features without losing performance on the target tasks.
Hi!
Thanks for the great work on RoMa! I've been investigating its use in medical image registration (specifically histopathology) and it works quite well out of the box. There are however several histopathology-specific pretrained DinoV2 ViT-L14 models out there, so I've been experimenting with one of those for more domain-specific features. Unfortunately, my results have drastically deteriorated (e.g. barely any matches vs thousands with original backbone) with any other backbone than the original (while taking into account domain-specific image normalization etc.).
Would you have any thoughts why a different ViT-L14 may not work as expected? Would swapping out the backbone perhaps require retraining of the matcher as well? Or are there perhaps any other (minor) details I may have overlooked while implementing the domain-specific backbone?
The text was updated successfully, but these errors were encountered: