Skip to content

Latest commit

 

History

History
16 lines (13 loc) · 1.29 KB

README.md

File metadata and controls

16 lines (13 loc) · 1.29 KB

A Streamlit Web app for Olympic Data Analysis using Machine Learning

Dataset link : https://www.kaggle.com/heesoo37/120-years-of-olympic-history-athletes-and-results

Instructions to run code :

  1. Install Python packages pandas, matplotlib, numpy, seaborn etc. using pip.
  2. Then download the Olympics dataset from Kaggle.
  3. Then import the necessary libraries in our Jupyter Notebook.
  4. Use pandas read the dataset into a DataFrame and perform data preprocessing steps, such as handling missing values, encoding categorical variables, and scaling numerical features.
  5. Using pandas and matplotlib/seaborn to visualize the data and gain insights into athlete performance, medals distribution, etc and create new features from existing data that can improve model performance.
  6. Define the target variable: for our analysis and used scikit-learn's train_test_split to split the dataset into training and testing sets:
  7. Use RandomForestClassifier Algorithm and train it on the training data.
  8. And then use the testing data to evaluate the model's performance.
  9. Adjust hyperparameters use techniques like grid search or random search to optimize the model's performance.
  10. Create visualizations: Use matplotlib or seaborn to visualize the model predictions, performance metrics, and insights from the analysis.