This repository contains my solution for the Kaggle competition, "Titanic: Machine Learning from Disaster." The challenge is to predict the survival of passengers aboard the Titanic based on various features such as age, gender, ticket class, and more.
The dataset consists of two main files:
train.csv
: This file contains the training data, including features and the ground truth (survival status) for a subset of passengers.test.csv
: This file contains the test data, where the goal is to predict the survival status of the passengers.
For more details about the dataset and competition, please refer to the Kaggle Titanic competition page.
For this project, I used a Random Forest model to predict passenger survival. Random Forest is a versatile and powerful machine learning algorithm that often performs well on a variety of datasets.
I followed these steps in my approach:
- Data Preprocessing: Handling missing values, encoding categorical features, and feature scaling.
- Model Building: Creating a Random Forest classifier.
- Model Training: Fitting the model on the training data.
- Model Evaluation: Assessing the model's performance using appropriate evaluation metrics.
- Prediction: Making predictions on the test data.
- Submission: Preparing and submitting the predictions on Kaggle.
The code and detailed analysis can be found in the Jupyter Notebook provided in this repository.
My Random Forest model achieved an accuracy of 87% on the test data.
I'd like to thank Kaggle for providing this exciting competition and the data science community for their valuable insights and contributions.
Feel free to explore the code and analysis in the Jupyter Notebook and provide any feedback or suggestions.