The AWS Open Data program hosts a lot of publicly available datasets. This repo compiles the list of all datasets on AWS as a CSV file and as a JSON file, making it easier to find and use them programmatically. The list is updated daily.
A complete list of AWS open datasets as individual YAML files is available here.
This repo provides the list of AWS open datasets in two formats:
- Tab separated values (TSV) file: aws_open_datasets.tsv
- JSON file: aws_open_datasets.json
The TSV file can be easily read into a Pandas DataFrame using the following code:
import pandas as pd
url = 'https://github.com/giswqs/aws-open-data/raw/master/aws_open_datasets.tsv'
df = pd.read_csv(url, sep='\t')
df.head()
- A list of open datasets on AWS: aws-open-data
- A list of open geospatial datasets on AWS: aws-open-data-geo
- A list of open geospatial datasets on AWS with a STAC endpoint: aws-open-data-stac
- A list of STAC endpoints from stacindex.org: stac-index-catalogs
- A list of geospatial datasets on Microsoft Planetary Computer: Planetary-Computer-Catalog
- A list of geospatial datasets on Google Earth Engine: Earth-Engine-Catalog
- A list of geospatial datasets on NASA's Common Metadata Repository (CMR): NASA-CMR-STAC
- A list of geospatial data catalogs: geospatial-data-catalogs