-
Notifications
You must be signed in to change notification settings - Fork 0
/
procedure.txt
29 lines (21 loc) · 865 Bytes
/
procedure.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Written by Pratik Goyal
https://pratikgl.github.io
The script file is final_model.py
The output file is lola.csv
Step 1:
the first step was converting the NaN values of 'condition' column to 3.
Step 2:
hot encoding the columns 'Condition' and 'color_type'
step 3:
imputation for column 'X1'. The imputation method used is mean by target
where the target variable was 'X2'
step 4:
Feature Selection -> selectKbest python function was used for feature selection
where k was 64, 50 respectively for 'breed_category' and 'pet_category'
step 5:
gradient boosting classifier was used for training the model
step 6:
the same preprocessing step was done for the test data and after preprocessing,
they were fitted in the model to predict the variables viz: 'breed_category' and 'pet_category'
step 7:
the output was converted to a dataframe and then to a csv file