Satlas aims to provide open AI-generated geospatial data that is highly accurate, available globally, and updated on a frequent (monthly) basis. One of the data applications in Satlas is globally generated Super-Resolution imagery for 2023.
We describe the many findings that led to the global super-resolution outputs in the paper, Zooming Out on Zooming In: Advancing Super-Resolution for Remote Sensing. Supplementary material is available here.
This repository contains the training and inference code for the AI-generated Super-Resolution data found at https://satlas.allen.ai/, as well as code, data, and model weights corresponding to the paper.
The experiments branch contains config files for experiments from the paper, while the main branch is limited to showcasing the main features.
Initialize conda:
conda create --name ssr python==3.9
conda activate ssr
pip install -r requirements.txt
conda install gdal
There are two training sets:
- The urban set (train_urban_set), with ~1.1 million pairs from locations within a 5km radius of cities in the USA with a population >= 50k. There are 12 Sentinel-2 bands included in this set. Download links: part1 part2 part3 part4
- The full set (train_full_set), consisting of ~44million pairs from all locations where NAIP imagery was available between 2019-2020. The full set, in the data format established prior to 2023-12-08, can be downloaded at this link.
The urban set (termed S2-NAIP) was used for all experiments in the paper, because we found the full set to be overwhelmed with monotonous landscapes.
There are three val/test sets:
- The validation set (val_set) consists of 8192 image pairs. There are 12 Sentinel-2 bands included in this set. (download)
- A small subset of this validation set (small_val_set) with 256 image pairs that are specifically from urban areas, which is useful for qualititive analysis and faster validation. (download)
- A test set (test_set) containing eight 16x16 grids of Sentinel-2 tiles from interesting locations including Dry Tortugas National Park, Bolivia, France, South Africa, and Japan. (download)
Additional data includes:
- A set of NAIP images from 2016-2018 corresponding to the train_urban_set and small_val_set NAIP images (old-naip). These are used as input to the discriminator for the model variant described in supplementary Section A.5.2. (download)
- JSON files containing tile weights for the train_urban_set and train_full_set (train_tile_weights). Using OpenStreetMap categories, we count the number of tiles where each category appears at least once and then weight tiles by the inverse frequency of the rarest category appearing in that tile. (download)
- For train_urban_set, there is a JSON file with mappings between each NAIP chip and polygons of OpenStreetMap categories in that chip (osm_chips_to_masks.json). This is used for the object-discriminator variation described in supplementary Section A.5.1. (download)
- RRDBNet weights from a model pretrained on SatlasPretrain. Used in experiment described in supplementary Section A.5.3. (download)
All of the above data (except for the full training set due to size) can be downloaded at this link, or individual links are provided above for ease of downloading.
The train_urban_set, split into many partitions, val_set, and test_set are available for download on HuggingFace as well.
Weights from models trained on the S2-NAIP dataset are listed below.
Varying number of input Sentinel-2 images (just RGB bands):
Number Input Images | Weights | Config |
---|---|---|
1 | 1-S2-images | esrgan_baseline_1S2.yml |
2 | 2-S2-images | esrgan_baseline_2S2.yml |
4 | 4-S2-images | esrgan_baseline_4S2.yml |
8 | 8-S2-images | esrgan_baseline.yml |
16 | 16-S2-images | esrgan_baseline_16S2.yml |
Different Sentinel-2 bands used as input (8 input images):
Bands | Weights | Config |
---|---|---|
10m | 10m-S2-bands | esrgan_baseline_10m.yml |
20m | 20m-S2-bands | esrgan_baseline_20m.yml |
60m | 60m-S2-bands | esrgan_baseline_60m.yml |
The dataset consists of image pairs from Sentinel-2 and NAIP satellites, where a pair is a time series of Sentinel-2 images that overlap spatially and temporally [within 3 months] with a NAIP image. The imagery is from 2019-2020 and is limited to the USA.
The images adhere to the same Web-Mercator tile system as in SatlasPretrain.
The NAIP images included in this dataset are 25% of the original NAIP resolution. Each image is 128x128px with RGB channels.
In each set, there is a naip
folder containing images in this format: naip/{image_uuid}/{tile}/rgb.png
, where image_uuid is
the image's unique identifier with the capture timestamp, and tile refers to its location in a 2^17 x 2^17 Web-Mercator grid (ex. 12345_67890).
We use the Sentinel-2 L1C imagery. Models that input 3 bands use the TCI file provided by ESA. This contains an 8-bit image that has been normalized by ESA to the 0-255 range. The image is normalized for input to the model by dividing the 0-255 RGB values by 255, and retaining the RGB order. Most experiments utilize just TCI, but for non-TCI bands, the 16-bit source data is divided by 8160 and clipped to 0-1.
For each NAIP image, there is a time series of corresponding 32x32px Sentinel-2 images. These time series are saved as pngs in the
shape, [number_sentinel2_images * 32, 32, 3]
. Note that the input images do not need to be in chronological order.
In each set, there is a sentinel2
folder containing these time series in the format: sentinel2/{tile}/{band}.png
, where
tile refers to its location in a 2^17 x 2^17 Web-Mercator grid (ex. 12345_67890) and band refers to the Sentinel-2 bands
(tci, b01, b05, b06, b07, b08, b09, b10, b11, b12).
In the paper, we experiment with SRCNN, HighResNet, SR3, and ESRGAN. For a good balance of output quality and inference speed, we use the ESRGAN model for generating global super-resolution outputs.
Our ESRGAN model is an adaptation of the original ESRGAN, with changes that allow the input to be a time series of Sentinel-2 images. All models are trained to upsample by a factor of 4.
The SR3 diffusion model code has lived in a separate repository. We are working to release that as well.
To train a model on this dataset, run the following command, with the desired configuration file:
python -m ssr.train -opt ssr/options/esrgan_s2naip_urban.yml
There are several sample configuration files in ssr/options/
. Make sure the configuration file specifies
correct paths to your downloaded data, the desired number of low-resolution input images, model parameters,
and pretrained weights (if applicable).
Add the --debug
flag to the above command if wandb logging, model saving, and visualization creation
is not wanted.
To train with multiple GPUs, use the following command:
PYTHONPATH=. python -m torch.distributed.launch --nproc_per_node=8 --master_port=1234 ssr/train.py -opt ssr/options/esrgan_s2naip_urban.yml --launcher pytorch
To evaluate the model on a validation or test set, when ground truth high-res images are available, run the following command, with the desired configuration file:
python -m ssr.test -opt ssr/options/esrgan_s2naip_urban.yml
This will test the model using data and parameters specified in ['datasets']['test']
, and will save the model
outputs as pngs in the results/
directory. Specified metrics will be displayed to the screen at the end.
To run inference on data, when ground truth high-res images are not available, run the following command:
python -m ssr.infer -opt ssr/options/infer_example.yml
Inference settings are specified in the configuration file. The data_dir
can be of any directory structure, but must contain pngs.
Both the original low-res images and the super-res images will be saved to the save_path
.
When running inference on an entire Sentinel-2 tile (consisting of a 16x16 grid of chunks), there is the infer_grid.py
script
that will stitch the individual chunks together into one large image.
Try this out on the S2NAIP test set with this command:
python -m ssr.infer_grid -opt ssr/options/infer_grid_example.yml
There are instances where the generated super resolution outputs are incorrect.
Specifically:
- Sometimes the model generates vessels in the water or cars on a highway, but because the input is a time series of Sentinel-2 imagery (which can span a few months), it is unlikely that those things persist in one location.
- Sometimes the model generates natural objects like trees or bushes where there should be a building, or vice versa. This is more common in places that look vastly different from the USA, such as the example below in Kota, India.
Thanks to these codebases for foundational Super-Resolution code and inspiration:
Image Super-Resolution via Iterative Refinement (SR3)
If you have any questions, please email [email protected]
or open an issue.