This data science project in R aims to predict the severity of Parkinson's disease based on the UCI Parkinsons dataset using machine learning algorithms. The dataset includes various features related to Parkinson's symptoms, and we leverage decision tree, random forest, support vector machine (SVM) and XGBoost algorithms for prediction. Additionally, Lasso regularization is applied for feature selection to enhance model interpretability and efficiency.
- Dataset
- Preprocessing
- Feature Selection
- Models
- Hyperparameter Tuning
- Evaluation
- Shiny App
- Usage
- Report
- Contributing
- License
We use the UCI Parkinsons dataset for this data science project. The dataset includes information about various symptoms and features related to Parkinson's disease.Motor UPDRS and Total UPDRS are target variables in this dataset.
The data is preprocessed by scaling the features using min-max scaling to ensure uniformity and enhance model performance.
Lasso regularization is applied for feature selection to identify the most relevant features, enhancing model interpretability and efficiency.
We employ a decision tree regression model to predict the severity of Parkinson's disease based on the dataset features.
A random forest regression model is utilized for predicting the severity of Parkinson's disease, offering an ensemble approach for improved accuracy.
The support vector machine is employed for regression to predict the severity of Parkinson's disease.
XGBoost, an efficient gradient boosting algorithm, is used to predict disease severity, providing a robust alternative to traditional models.
To optimize model performance, hyperparameter tuning is performed for each algorithm.
The performance of each model is evaluated using metrics such as RMSE (Root Mean Squared Error), R-squared and MAE (Mean Absolute Error).
A Shiny app is developed to provide an interactive interface for visualizing and analyzing the predictions made by the random forest model.
- Clone the repository.
- Ensure all project files are in the same folder.
- Open R Studio and set the working directory to the project folder.
- Install the required R libraries using the following command:
install.packages(c("dplyr", "e1071", "rpart", "randomForest", "caTools", "corrplot", "xgboost", "Hmisc", "caret", "glmnet"))
- Run the following R scripts in order:
Data_Preprocessing.R
: Preprocess the data and apply min-max scaling.Decision_Tree.R
: Run the decision tree regression model.RandomForest.R
: Train the random forest regression model and save .SVM.R
: Employ the support vector machine for regression.XGBoost.R
: Utilize XGBoost for predicting disease severity.
- Run the Shiny app by executing
app.R
. Ensure that the RF trained models files are in the same directory as the app.
For a complete report or further inquiries, feel free to contact us via email click here.
Contributions are welcome! Feel free to open issues or pull requests.
This project is licensed under the License. Please read carefully.