Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using different domain-specific ViT-L14 backbone #90

Open
dnschouten opened this issue Dec 16, 2024 · 3 comments
Open

Using different domain-specific ViT-L14 backbone #90

dnschouten opened this issue Dec 16, 2024 · 3 comments

Comments

@dnschouten
Copy link

Hi!

Thanks for the great work on RoMa! I've been investigating its use in medical image registration (specifically histopathology) and it works quite well out of the box. There are however several histopathology-specific pretrained DinoV2 ViT-L14 models out there, so I've been experimenting with one of those for more domain-specific features. Unfortunately, my results have drastically deteriorated (e.g. barely any matches vs thousands with original backbone) with any other backbone than the original (while taking into account domain-specific image normalization etc.).

Would you have any thoughts why a different ViT-L14 may not work as expected? Would swapping out the backbone perhaps require retraining of the matcher as well? Or are there perhaps any other (minor) details I may have overlooked while implementing the domain-specific backbone?

@Parskatt
Copy link
Owner

Hi there! Unfortunately we extensively use the features themselves (not just the correlation), so if you swap the backbone you get big issues.

@dnschouten
Copy link
Author

That's unfortunate. Would you see any avenues (i.e. retraining of the matcher to learn to handle different features) that might resolve this? I'd imagine this is very tricky for different architectures (i.e. ViT-S), but intuitively it seems like this would be quite reasonable with the exact same architecture with just different weights.

@Parskatt
Copy link
Owner

You could try mapping the new backbone to DINOv2 on some reasonable in distribution images, for example attach a linear head at the end of it. Then roma should be able to handle the input, as it's aligned with dinov2 features.

It's tricky in general to be invariant to the features without losing performance on the target tasks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants