Skip to content

anushkaguptaaa/pytanic

Repository files navigation

pytanic

Data Visualization using Python

RMS Titanic, known for its infamous shipwreck in the North Atlantic Ocean on 15 April 1912. Among the deadliest tragediest of all time, killing more than 1500 poeple of the estimated 2224 passengers and crew. The disaster drew much public attention, which not only led to better safety guidelines for ships but also provided foundational material for the disaster film genre.

The dataset contains the details of only 891 passengers.

Project Components

  1. Data Inspection
  2. Data Cleaning
  3. Data Visualization

Overview of the Dataset

Given table contains the details of the columns along with their parameters, which is crucial for the understanding of the data analyst working with the dataset.

Variable Attributes / Definition Meaning if any
Survival 0
1
No
Yes
pclass 1
2
3
Class A
Class B
Class C
Sex F
M
Female
Male
Age Age in years
sisbsp Sibling
Spouse
brother, sister, stepbrother, stepsister
husband, wife (mistresses and fiancés were ignored)
parch Parent
Child
mother, father
daughter, son, stepdaughter, stepson
ticket ticket number
fare passenger fare
cabin cabin number
embarked Port of Embarkation
C
Q
S

Cherbourg
Queenstown
Southampton

NOTE

In other projects you would notice that the analyst has two .csv files namely, train.csv and test.csv.

test.csv ➨ used for testing the model generated.
train.csv ➨ used for training the model with the dataset we work on.

The conclusive values and end results made by models also varies with the percentage of dataset alloted for each of the two .csv files.
Which means that we may have different results when the data alloted for train.csv and test.csv is 50-50 as opposed to a case where it is 70-30

Whereas in my project there is only one csv file, because I have decided not to divide my dataset in any manner and work with the dataset in it's entirity.

Conclusion

These the following conclusion we can make after analysing the following data.

  • Most passengers were travelling to
  • Women were given priority during the evaculation
  • The chances of survival was correlated to the fare paid by each passenger

You can see the online deployment of the notebook by clicking on this link

Links to all the resources from where I learnt the following 1. https://medium.com/analytics-vidhya/data-visualization-titanic-data-set-91531c3ab5a6
2. https://medium.com/@rohanhgupta91/analyze-titanic-dataset-of-kaggle-ab220334b75c
3. https://medium.com/analytics-vidhya/what-is-the-difference-between-training-and-test-dataset-d20820e5f632
4. https://towardsdatascience.com/machine-learning-with-the-titanic-dataset-7f6909e58280
5. https://www.kaggle.com/subinium/awesome-visualization-with-titanic-dataset
6. https://www.kaggle.com/startupsci/titanic-data-science-solutions/
7. https://github.com/abhishekchhibber/Titanic-Data-Visualization
8. https://mastermindlab.github.io/titanic/
9. https://harvard-iacs.github.io/2019-CS109A/labs/lab-5/student/