This is the code repository for paper "G3: An Effective and Adaptive Framework for Worldwide Geolocalization Using Large Multi-Modality Models"
You can download the images and metadata of MP16-Pro from huggingface: Jia-py/MP16-Pro
# test on cuda12.0
conda create -n g3 python=3.9
pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu121
pip install transformers accelerate huggingface_hub pandas
- Geo-alignment
You can run python run_G3.py
to train the model.
- Geo-diversification
First, you need to build the index file using python IndexSearch.py
.
Parameters in IndexSearch.py
- index name --> which model you want to use for embedding
- dataset --> im2gps3k or yfcc4k
- database --> default mp16
Then, you also need to construct index for negative samples by modifying images_embeds to -1 * images_embeds
Then, you can run llm_predict_hf.py
or llm_predict.py
to generate llm predictions.
After that, running aggregate_llm_predictions.py
to aggregate the predictions.
- Geo-verification
python IndexSearch.py --index=g3 --dataset=im2gps3k or yfcc4k
to verificate predictions and evaluate.
@article{jia2024g3,
title={G3: An Effective and Adaptive Framework for Worldwide Geolocalization Using Large Multi-Modality Models},
author={Jia, Pengyue and Liu, Yiding and Li, Xiaopeng and Zhao, Xiangyu and Wang, Yuhao and Du, Yantong and Han, Xiao and Wei, Xuetao and Wang, Shuaiqiang and Yin, Dawei},
journal={arXiv preprint arXiv:2405.14702},
year={2024}
}