A Streamlit Web app for Olympic Data Analysis using Machine Learning
Dataset link : https://www.kaggle.com/heesoo37/120-years-of-olympic-history-athletes-and-results
Instructions to run code :
- Install Python packages pandas, matplotlib, numpy, seaborn etc. using pip.
- Then download the Olympics dataset from Kaggle.
- Then import the necessary libraries in our Jupyter Notebook.
- Use pandas read the dataset into a DataFrame and perform data preprocessing steps, such as handling missing values, encoding categorical variables, and scaling numerical features.
- Using pandas and matplotlib/seaborn to visualize the data and gain insights into athlete performance, medals distribution, etc and create new features from existing data that can improve model performance.
- Define the target variable: for our analysis and used scikit-learn's train_test_split to split the dataset into training and testing sets:
- Use RandomForestClassifier Algorithm and train it on the training data.
- And then use the testing data to evaluate the model's performance.
- Adjust hyperparameters use techniques like grid search or random search to optimize the model's performance.
- Create visualizations: Use matplotlib or seaborn to visualize the model predictions, performance metrics, and insights from the analysis.