Rethinking Visual Geo-localization for Large-Scale Applications

This is the official pyTorch implementation of the CVPR 2022 paper "Rethinking Visual Geo-localization for Large-Scale Applications". The paper presents a new dataset called San Francisco eXtra Large (SF-XL, go here to download it), and a highly scalable training method (called CosPlace), which allows to reach SOTA results with compact descriptors.

[CVPR OpenAccess] [ArXiv] [Video] [BibTex]

The images below represent respectively:

the map of San Francisco eXtra Large
a visualization of how CosPlace Groups (read datasets) are formed
results with CosPlace vs other methods on Pitts250k (CosPlace trained on SF-XL, others on Pitts30k)

Train

After downloading the SF-XL dataset, simply run

$ python3 train.py --train_set_folder path/to/sf_xl/raw/train/database --val_set_folder path/to/sf_xl/processed/val --test_set_folder path/to/sf_xl/processed/test

the script automatically splits SF-XL in CosPlace Groups, and saves the resulting object in the folder cache. By default training is performed with a ResNet-18 with descriptors dimensionality 512, which fits in less than 4GB of VRAM.

To change the backbone or the output descriptors dimensionality simply run

$ python3 train.py --backbone ResNet50 --fc_output_dim 128

You can also speed up your training with Automatic Mixed Precision (note that all results/statistics from the paper did not use AMP)

$ python3 train.py --use_amp16

Run $ python3 train.py -h to have a look at all the hyperparameters that you can change. You will find all hyperparameters mentioned in the paper.

Dataset size and lightweight version

The SF-XL dataset is about 1 TB. For training only a subset of the images is used, and you can use this subset for training, which is only 360 GB. If this is still too heavy for you (e.g. if you're using Colab), but you would like to run CosPlace, we also created a small version of SF-XL, which is only 5 GB. Obviously, using the small version will lead to lower results, and it should be used only for debugging / exploration purposes. More information on the dataset and lightweight version are on the README that you can find on the dataset download page (go here to find it).

Reproducibility

Results from the paper are fully reproducible, and we followed deep learning's best practices (average over multiple runs for the main results, validation/early stopping and hyperparameter search on the val set). If you are a researcher comparing your work against ours, please make sure to follow these best practices and avoid picking the best model on the test set.

Test

You can test a trained model as such

$ python3 eval.py --backbone ResNet50 --fc_output_dim 128 --resume_model path/to/best_model.pth

You can download plenty of trained models below.

Visualize predictions

Predictions can be easily visualized through the num_preds_to_save parameter. For example running this

python3 eval.py --backbone ResNet50 --fc_output_dim 512 --resume_model path/to/best_model.pth \
    --num_preds_to_save=3 --exp_name=cosplace_on_stlucia

will generate under the path ./logs/cosplace_on_stlucia/*/preds images such as

Given that saving predictions for each query might take long, you can also pass the parameter --save_only_wrong_preds which will save only predictions for wrongly predicted queries (i.e. where the first prediction is wrong), which should be the most interesting failure cases.

Trained Models

We now have all our trained models on PyTorch Hub, so that you can use them in any codebase without cloning this repository simply like this

import torch
model = torch.hub.load("gmberton/cosplace", "get_trained_model", backbone="ResNet50", fc_output_dim=2048)

As an alternative, you can download the trained models from the table below, which provides links to models with different backbones and dimensionality of descriptors, trained on SF-XL.

Model	Dimension of Descriptors
Model	32	64	128	256	512	1024	2048
ResNet-18	link	link	link	link	link	-	-
ResNet-50	link	link	link	link	link	link	link
ResNet-101	link	link	link	link	link	link	link
ResNet-152	link	link	link	link	link	link	link
VGG-16	-	link	link	link	link	-	-

Or you can download all models at once at this link

Issues

If you have any questions regarding our code or dataset, feel free to open an issue or send an email to berton.gabri@gmail.com

Acknowledgements

Parts of this repo are inspired by the following repositories:

CosFace implementation in PyTorch
CNN Image Retrieval in PyTorch (for the GeM layer)
Visual Geo-localization benchmark (for the evaluation / test code)

Cite

Here is the bibtex to cite our paper

@InProceedings{Berton_CVPR_2022_CosPlace,
    author    = {Berton, Gabriele and Masone, Carlo and Caputo, Barbara},
    title     = {Rethinking Visual Geo-Localization for Large-Scale Applications},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {4878-4888}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Rethinking Visual Geo-localization for Large-Scale Applications

Train

Dataset size and lightweight version

Reproducibility

Test

Visualize predictions

Trained Models

Issues

Acknowledgements

Cite

Files

README.md

Latest commit

History

README.md

File metadata and controls

Rethinking Visual Geo-localization for Large-Scale Applications

Train

Dataset size and lightweight version

Reproducibility

Test

Visualize predictions

Trained Models

Issues

Acknowledgements

Cite