I will build a machine learning model to predict whether or not a customer cancelled a hotel booking.
I will use a dataset on hotel bookings from the article "Hotel booking demand datasets", published in the Elsevier journal, Data in Brief. The abstract of the article states
This data article describes two datasets with hotel demand data. One of the hotels (H1) is a resort hotel and the other is a city hotel (H2). Both datasets share the same structure, with 31 variables describing the 40,060 observations of H1 and 79,330 observations of H2. Each observation represents a hotel booking. Both datasets comprehend bookings due to arrive between the 1st of July of 2015 and the 31st of August 2017, including bookings that effectively arrived and bookings that were canceled.