Using the titanic data to predict the survival of the passengers. WorkFlow of the project (work still in progress)
-
Loading Libraries a. Numpy b. Pandas c. Matplotlib and seaborn d. sklearn for accuracy and algorithms with data-preprocessing purposes
-
Exploratory Data Analysis -Exploring the data like how many rows and columns shape of training and testing data, finding the missing values in the dataset
-Dummy encoding done on the categorical data.
-For Certain algorithms to work we must normalize the data so I have normalized using StandardScaler method
- Training and Testing of Data importing KNN, GaussianNB, DecisionTree etc.. libraries, train_test_split library for model selection and to avoid overfitting of the model used.
Optional- Data Visualization tried making notebook more interactive
Work in Progress!! got 0.77 accuracy so far, will be improving it.
To get a better understanding of the workflow of a Machine Learning project, have a read:
-
sklearn documentation is also recommended.
-
https://medium.com/analytics-vidhya/workflow-guide-to-machine-learning-c0545c843f04 (My blog on machine learning do read it!!)
-
https://medium.com/@NotAyushXD/workflow-of-a-machine-learning-project-ec1dba419b94
-
https://www.kaggle.com/digvijayyadav/titanic-codesprediction (Do upvote it if you like my kernel)