TheHere is an approach we followed in making the model
Numpy and Pandas are used manipulating the dataframe and its columns and cells matplotlib along with Seaborn to visualize our data.
We load both the training and test data set to take a look at our data table to see the values that we'll be working with
The NaN values of age are filled by the mean of the ages. The NaN values of fares are filled by the mean of the fares
All the features of train data were plotted against survived column to study and then it was observed that passengers having Pclass as 1 had most chances of surviving and Pclass 3 having least. Even Female were observed too survive more than male. People having less SibSp and Parch has more chances of survival
Because values in the Sex and Embarked columns are categorical values, we have to represent these strings as numerical values in order to perform our classification with our model
These is done by creating a new column Place and mapping {'S': 1, 'C': 2, 'Q': 3} In column sex male is mapped to 2 and female to 1 and a new column Gender is made In column of age and fare mapping is done for ease of classification according to order of survival and new column named A is formed
A new column Family is made by adding SibSp and Parch which tells the total number of family members on board for each member.
The column PGA is made by multiplying Pclass Gender and A
The titles of name are also being extracted and mapped with numbers. The title of their names like Ms. or Mr may also provide a hint as to whether the passenger survived or not.
We import different classifiers from sklearn and predict the accuracy of the model Lastly the survived column of the test dataset is being predicted.