Based on aspects of building location and construction, our goal is to predict the level of damage to buildings caused by the 2015 Gorkha earthquake in Nepal.The data mainly consists of information on the buildings' structure and their legal ownership. Each row in the dataset represents a specific building in the region that was hit by Gorkha earthquake.
How artificial intelligence and predictive analysis can help in faster damage recovery from earthquake
Data collected from DrivenData.org competition website
Inhouse data was collected through surveys by the Central Bureau of Statistics that work under the National Planning Commission Secretariat of Nepal. It is rumoured that this survey is one of the largest post-disaster datasets ever collected, containing valuable information on earthquake impacts, household conditions, and socio-economic-demographic statistics
Predict the ordinal variable damage_grade, which represents a level of damage to the building that was hit by the earthquake. There are 3 grades of the damage:
1 represents low damage 2 represents a medium amount of damage 3 represents almost complete destruction
The dataset mainly consists of information on the buildings' structure and their legal ownership. Each row in the dataset represents a specific building in the region that was hit by Gorkha earthquake.
There are 39 columns in this dataset, where the building_id column is a unique and random identifier. The remaining 38 features are described in the section below. Categorical variables have been obfuscated random lowercase ascii characters. The appearance of the same character in distinct columns does not imply the same original value.
geographic region in which building exists, from largest (level 1) to most specific sub-region (level 3). Possible values: level 1: 0-30
level 2: 0-1427
level 3: 0-12567
number of floors in the building before the earthquake.
age of the building in years.
normalized area of the building footprint.
normalized height of the building footprint.
surface condition of the land where the building was built. Possible values: n, o, t.
type of foundation used while building. Possible values: h, i, r, u, w.
type of roof used while building. Possible values: n, q, x.
type of the ground floor. Possible values: f, m, v, x, z.
type of constructions used in higher than the ground floors (except of roof). Possible values: j, q, s, x.
position of the building. Possible values: j, o, s, t.
building plan configuration. Possible values: a, c, d, f, m, n, o, q, s, u.
flag variable that indicates if the superstructure was made of Adobe/Mud.
flag variable that indicates if the superstructure was made of Mud Mortar - Stone.
flag variable that indicates if the superstructure was made of Stone.
flag variable that indicates if the superstructure was made of Cement Mortar - Stone.
flag variable that indicates if the superstructure was made of Mud Mortar - Brick.
flag variable that indicates if the superstructure was made of Cement Mortar - Brick.
flag variable that indicates if the superstructure was made of Timber.
flag variable that indicates if the superstructure was made of Bamboo.
flag variable that indicates if the superstructure was made of non-engineered reinforced concrete.
flag variable that indicates if the superstructure was made of engineered reinforced concrete.
flag variable that indicates if the superstructure was made of any other material.
legal ownership status of the land where building was built. Possible values: a, r, v, w.
number of families that live in the building.
flag variable that indicates if the building was used for any secondary purpose.
flag variable that indicates if the building was used for agricultural purposes.
flag variable that indicates if the building was used as a hotel.
flag variable that indicates if the building was used for rental purposes.
flag variable that indicates if the building was used as a location of any institution.
flag variable that indicates if the building was used as a school.
flag variable that indicates if the building was used for industrial purposes.
flag variable that indicates if the building was used as a health post.
flag variable that indicates if the building was used fas a government office.
flag variable that indicates if the building was used as a police station. has_secondary_use_other (type: binary): flag variable that indicates if the building was secondarily used for other purposes.
We are predicting the level of damage from 1 to 3(Low,Medium,High). The level of damage is an ordinal variable meaning that ordering is important. This can be viewd as a classification or Regression Problem
To measure the performance of our algorithms, we have used the F1 score which balances the precision and recall of a classifier
F1 - performance on a binary classifier
But since we have three possible labels we used a variant called the micro averaged F1 score.
In Python, we can easily calculate this loss using sklearn.metrics.f1_score with the keyword argument average='micro'
Model Micro avg./f1 scor Logistic Regression 0.59 KNN 0.63 Linear SVM 0.63 Decision Tree 0.64 Random Forest 0.72 Catboost 0.73 Deep Neural Network 0.57 (Tensorflow)
No Featureindex Importances
1 geo_level_3_id 26.67 2 geo_level_2_id 20.12 3 Age 8.8 4 geo_level_2_id 8.6 5 ground_floor_type_v 5.20 6 roof_type_x 3.8 7 count_floors_pre_eq 3.23 8 has_super_structure_mud_mortar_stone 3.21 9 foundation_type_i 3.17 10 height_percentage 2.71
This modelling proves that seismic damage prediction using Machine Learning models is possible. Nevertheless, limitations concerning the prediction accuracy are present.