This repository provides a curated list of websites that offer public datasets across various domains, including government data, computer vision, finance, health, and more. Whether you're a data scientist, researcher, or machine learning enthusiast, these resources can provide you with the datasets you need for your projects.
Explore these resources based on categories such as government data, computer vision datasets, health statistics, and more. The datasets are from trusted sources and cover a wide range of use cases.
These websites offer a vast range of datasets spanning multiple domains and fields of research:
- Kaggle
A popular platform that hosts competitions, provides notebooks, and offers a wide variety of datasets for machine learning and data science. - Google Dataset Search
A powerful search engine designed to help users discover datasets available across the web, useful for research across different domains. - AWS Open Data
A repository of public datasets hosted on Amazon Web Services, covering topics like satellite imagery, genomics, and more. - UCI Machine Learning Repository
One of the oldest repositories, offering datasets for empirical machine learning and artificial intelligence research. - Data.world
A social platform for data collaboration, where users can find, create, and share datasets and analysis tools. - FiveThirtyEight
Offers datasets used in FiveThirtyEight's data journalism and visualizations, covering topics from politics to sports. - KDnuggets
A collection of datasets curated for data mining, machine learning, and analytics enthusiasts. - Data Is Plural
A popular newsletter that features interesting and unique datasets across various domains. - Data Science Central
A community-driven platform that offers datasets, tutorials, and discussions for data science professionals. - Awesome Public Datasets on GitHub
A large GitHub repository featuring a wide collection of public datasets across various fields.
These websites focus on datasets specifically designed for computer vision tasks like image recognition, object detection, and segmentation:
- Kaggle Computer Vision
A collection of datasets specifically tailored for computer vision challenges such as image classification and segmentation. - Papers with Code - Computer Vision
A repository linking research papers with their code and corresponding datasets, helping researchers replicate results in computer vision. - 10 Best Open Source Datasets for Computer Vision
A curated list of top open-source datasets for various computer vision tasks. - DagsHub Computer Vision
A platform providing datasets for computer vision, allowing collaboration through Git and DVC. - Open Images Dataset V7
A large-scale dataset of annotated images for computer vision research, containing millions of images. - VisualData.io
A platform that enables users to discover and explore computer vision datasets for different types of visual tasks. - YouTube-8M
A video dataset featuring millions of YouTube video IDs and associated labels for machine learning research.
These platforms provide open access to government statistics and public datasets that span demographics, economics, and much more:
- Data.gov
The U.S. government's open data portal, offering access to over 250,000 datasets from various federal agencies. - Data.gov.uk
The U.K. government's open data portal, providing public sector data across fields such as health, education, and transportation. - Data.gov.sa
The Saudi Arabian government’s open data initiative, offering datasets on various public sectors like economics and health. - CAPMAS
The official statistics portal for Egypt, offering national data on population, labor, and social services. - EU Open Data Portal
Offers datasets from institutions and bodies of the European Union, spanning economics, law, and technology. - Census.gov
The U.S. Census Bureau’s data portal, providing demographic data on the U.S. population and economy. - HealthData.gov
Offers datasets focused on healthcare in the U.S., including hospital and public health data. - World Bank Open Data
Provides access to global development data, including economic indicators and international financial statistics. - Humanitarian Data Exchange
A platform for sharing humanitarian data, aimed at improving response to global crises. - Open Data Soft
A portal offering public and private datasets on various topics, including transportation and energy. - UN Data
A repository of global datasets from the United Nations, covering areas like health, education, and economics. - Enigma Public
A platform for accessing public datasets on business, economics, and government. - Global Open Data Index
A global ranking of countries based on the availability and accessibility of open government data.
These sources offer health-related datasets from reputable organizations, such as the CDC and WHO:
- CDC WONDER
A U.S. public health data resource that provides access to a variety of CDC datasets for research and analysis. - UNICEF Data
Data and statistics related to children’s health, education, and development from UNICEF. - Global Health Observatory Data Repository (WHO)
WHO’s global repository for health-related data and statistics across various health indicators. - World Health Organization (WHO) Data
Access global health data including disease statistics and healthcare system performance.
Finance and economics datasets from organizations like IMF, the World Bank, and Quandl:
- Quandl
Offers financial and alternative datasets, including stock prices, futures, and cryptocurrency data. - IMF Data
Economic data and statistics from the International Monetary Fund, covering topics such as international trade and public finances. - KAPSARC Data Portal
Energy economics and policy data from the King Abdullah Petroleum Studies and Research Center. - Eurostat
Provides statistical data covering the European Union, including economics, industry, and population trends.
Academic and research data repositories for research professionals:
- ICPSR
A vast archive of social science data for research, managed by the University of Michigan. - Harvard Dataverse
A repository that enables researchers to share, preserve, and cite their data. - Academic Torrents
A decentralized repository for sharing large academic datasets, making it easier to distribute and download research data. - Figshare
A platform for academics to share research outputs, including datasets, with proper attribution. - Zenodo
An open-access platform that allows researchers to share datasets and publications. - DataDryad
A nonprofit repository for data underlying scientific and medical publications.
Access datasets related to environmental data, climate change, and agriculture:
- NASA Earth Data
Provides global environmental data from NASA’s Earth observation missions. - EarthData (NASA)
NASA’s gateway to climate and environmental data for researchers studying the planet. - FAOSTAT
Agricultural data from the Food and Agriculture Organization of the United Nations, covering global food production and consumption.
These sources offer scientific and technical data from various domains, including space science, physics, and data for machine learning research:
- SDSS Data Archive
Astronomical data from the Sloan Digital Sky Survey, a major initiative to map the universe. - IEEE DataPort
A platform for researchers to access and share datasets across different technical domains. - CERN Open Data Portal
Provides open access to data from CERN’s large-scale physics experiments, including data from the Large Hadron Collider. - KEEL Dataset Repository
A collection of datasets for machine learning and data mining experiments, with a focus on imbalanced datasets. - Machine Learning Repository by Yahoo Labs
Provides datasets for research in information retrieval, natural language processing, and machine learning.
Business-related datasets including crime records, trip data, and trends analysis:
- NYC TLC Trip Record Data
Taxi trip data from New York City, offering insights into urban mobility patterns. - Inside Airbnb
Datasets on Airbnb listings worldwide, helping users analyze trends in the short-term rental market. - FBI Crime Data Explorer
Official crime statistics from the FBI, including reports on national, state, and local crime trends. - BFI Industry Data & Insights
Provides data on the film and television industries, with insights into viewership and production trends. - Google Trends
Explore trending topics and search patterns across Google’s search engine on various subjects. - Yelp Dataset
A dataset containing information on businesses and reviews from the Yelp platform, useful for sentiment analysis and business intelligence. - Inside-BigData
News and data trends focused on the business of AI, machine learning, and big data industries. - OpenCorporates
The largest open database of companies worldwide, providing business data for transparency and analysis. - Statista
A statistics portal that offers data on industries, markets, and consumer behavior globally.
Explore datasets related to cultural and historical records:
- IMDb Datasets
A collection of data on movies, TV shows, and cast/crew information from IMDb. - DBpedia
Extracts structured information from Wikipedia, offering a semantic web of linked data across different domains.