Skip to content

This repository provides a list of websites offering public datasets across fields like computer vision, finance, health, government, and more. The datasets are categorized to help you quickly find the data you need for research, analysis, and machine learning projects.

Notifications You must be signed in to change notification settings

AbdooMohamedd/Awesome-Data-Websites

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 

Repository files navigation

Awesome-Data-Websites

This repository provides a curated list of websites that offer public datasets across various domains, including government data, computer vision, finance, health, and more. Whether you're a data scientist, researcher, or machine learning enthusiast, these resources can provide you with the datasets you need for your projects.

Explore these resources based on categories such as government data, computer vision datasets, health statistics, and more. The datasets are from trusted sources and cover a wide range of use cases.


Famous Data Websites

These websites offer a vast range of datasets spanning multiple domains and fields of research:

  • Kaggle
    A popular platform that hosts competitions, provides notebooks, and offers a wide variety of datasets for machine learning and data science.
  • Google Dataset Search
    A powerful search engine designed to help users discover datasets available across the web, useful for research across different domains.
  • AWS Open Data
    A repository of public datasets hosted on Amazon Web Services, covering topics like satellite imagery, genomics, and more.
  • UCI Machine Learning Repository
    One of the oldest repositories, offering datasets for empirical machine learning and artificial intelligence research.
  • Data.world
    A social platform for data collaboration, where users can find, create, and share datasets and analysis tools.
  • FiveThirtyEight
    Offers datasets used in FiveThirtyEight's data journalism and visualizations, covering topics from politics to sports.
  • KDnuggets
    A collection of datasets curated for data mining, machine learning, and analytics enthusiasts.
  • Data Is Plural
    A popular newsletter that features interesting and unique datasets across various domains.
  • Data Science Central
    A community-driven platform that offers datasets, tutorials, and discussions for data science professionals.
  • Awesome Public Datasets on GitHub
    A large GitHub repository featuring a wide collection of public datasets across various fields.

Computer Vision

These websites focus on datasets specifically designed for computer vision tasks like image recognition, object detection, and segmentation:

  • Kaggle Computer Vision
    A collection of datasets specifically tailored for computer vision challenges such as image classification and segmentation.
  • Papers with Code - Computer Vision
    A repository linking research papers with their code and corresponding datasets, helping researchers replicate results in computer vision.
  • 10 Best Open Source Datasets for Computer Vision
    A curated list of top open-source datasets for various computer vision tasks.
  • DagsHub Computer Vision
    A platform providing datasets for computer vision, allowing collaboration through Git and DVC.
  • Open Images Dataset V7
    A large-scale dataset of annotated images for computer vision research, containing millions of images.
  • VisualData.io
    A platform that enables users to discover and explore computer vision datasets for different types of visual tasks.
  • YouTube-8M
    A video dataset featuring millions of YouTube video IDs and associated labels for machine learning research.

Government and Public Data

These platforms provide open access to government statistics and public datasets that span demographics, economics, and much more:

  • Data.gov
    The U.S. government's open data portal, offering access to over 250,000 datasets from various federal agencies.
  • Data.gov.uk
    The U.K. government's open data portal, providing public sector data across fields such as health, education, and transportation.
  • Data.gov.sa
    The Saudi Arabian government’s open data initiative, offering datasets on various public sectors like economics and health.
  • CAPMAS
    The official statistics portal for Egypt, offering national data on population, labor, and social services.
  • EU Open Data Portal
    Offers datasets from institutions and bodies of the European Union, spanning economics, law, and technology.
  • Census.gov
    The U.S. Census Bureau’s data portal, providing demographic data on the U.S. population and economy.
  • HealthData.gov
    Offers datasets focused on healthcare in the U.S., including hospital and public health data.
  • World Bank Open Data
    Provides access to global development data, including economic indicators and international financial statistics.
  • Humanitarian Data Exchange
    A platform for sharing humanitarian data, aimed at improving response to global crises.
  • Open Data Soft
    A portal offering public and private datasets on various topics, including transportation and energy.
  • UN Data
    A repository of global datasets from the United Nations, covering areas like health, education, and economics.
  • Enigma Public
    A platform for accessing public datasets on business, economics, and government.
  • Global Open Data Index
    A global ranking of countries based on the availability and accessibility of open government data.

Health and Medicine

These sources offer health-related datasets from reputable organizations, such as the CDC and WHO:


Finance and Economics

Finance and economics datasets from organizations like IMF, the World Bank, and Quandl:

  • Quandl
    Offers financial and alternative datasets, including stock prices, futures, and cryptocurrency data.
  • IMF Data
    Economic data and statistics from the International Monetary Fund, covering topics such as international trade and public finances.
  • KAPSARC Data Portal
    Energy economics and policy data from the King Abdullah Petroleum Studies and Research Center.
  • Eurostat
    Provides statistical data covering the European Union, including economics, industry, and population trends.

Education and Research

Academic and research data repositories for research professionals:

  • ICPSR
    A vast archive of social science data for research, managed by the University of Michigan.
  • Harvard Dataverse
    A repository that enables researchers to share, preserve, and cite their data.
  • Academic Torrents
    A decentralized repository for sharing large academic datasets, making it easier to distribute and download research data.
  • Figshare
    A platform for academics to share research outputs, including datasets, with proper attribution.
  • Zenodo
    An open-access platform that allows researchers to share datasets and publications.
  • DataDryad
    A nonprofit repository for data underlying scientific and medical publications.

Environment and Climate

Access datasets related to environmental data, climate change, and agriculture:

  • NASA Earth Data
    Provides global environmental data from NASA’s Earth observation missions.
  • EarthData (NASA)
    NASA’s gateway to climate and environmental data for researchers studying the planet.
  • FAOSTAT
    Agricultural data from the Food and Agriculture Organization of the United Nations, covering global food production and consumption.

Science and Technology

These sources offer scientific and technical data from various domains, including space science, physics, and data for machine learning research:

  • SDSS Data Archive
    Astronomical data from the Sloan Digital Sky Survey, a major initiative to map the universe.
  • IEEE DataPort
    A platform for researchers to access and share datasets across different technical domains.
  • CERN Open Data Portal
    Provides open access to data from CERN’s large-scale physics experiments, including data from the Large Hadron Collider.
  • KEEL Dataset Repository
    A collection of datasets for machine learning and data mining experiments, with a focus on imbalanced datasets.
  • Machine Learning Repository by Yahoo Labs
    Provides datasets for research in information retrieval, natural language processing, and machine learning.

Business and Industry

Business-related datasets including crime records, trip data, and trends analysis:

  • NYC TLC Trip Record Data
    Taxi trip data from New York City, offering insights into urban mobility patterns.
  • Inside Airbnb
    Datasets on Airbnb listings worldwide, helping users analyze trends in the short-term rental market.
  • FBI Crime Data Explorer
    Official crime statistics from the FBI, including reports on national, state, and local crime trends.
  • BFI Industry Data & Insights
    Provides data on the film and television industries, with insights into viewership and production trends.
  • Google Trends
    Explore trending topics and search patterns across Google’s search engine on various subjects.
  • Yelp Dataset
    A dataset containing information on businesses and reviews from the Yelp platform, useful for sentiment analysis and business intelligence.
  • Inside-BigData
    News and data trends focused on the business of AI, machine learning, and big data industries.
  • OpenCorporates
    The largest open database of companies worldwide, providing business data for transparency and analysis.
  • Statista
    A statistics portal that offers data on industries, markets, and consumer behavior globally.

Cultural and Historical Data

Explore datasets related to cultural and historical records:

  • IMDb Datasets
    A collection of data on movies, TV shows, and cast/crew information from IMDb.
  • DBpedia
    Extracts structured information from Wikipedia, offering a semantic web of linked data across different domains.

About

This repository provides a list of websites offering public datasets across fields like computer vision, finance, health, government, and more. The datasets are categorized to help you quickly find the data you need for research, analysis, and machine learning projects.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published