Detecting Parkinson’s Disease – Machine Learning Project

What is Parkinson’s Disease?

Parkinson’s disease is a progressive disorder of the central nervous system affecting movement and inducing tremors and stiffness. It has 5 stages to it and affects more than 1 million individuals every year in India. This is chronic and has no cure yet. It is a neurodegenerative disorder affecting dopamine-producing neurons in the brain.

What is XGBoost?

XGBoost is a new Machine Learning algorithm designed with speed and performance in mind. XGBoost stands for eXtreme Gradient Boosting and is based on decision trees. In this project, we will import the XGBClassifier from the xgboost library; this is an implementation of the scikit-learn API for XGBoost classification.

Detecting Parkinson’s Disease with XGBoost – Objective

To build a model to accurately detect the presence of Parkinson’s disease in an individual.

Detecting Parkinson’s Disease with XGBoost – About the Machine Learning Project

In this Python machine learning project, using the Python libraries scikit-learn, numpy, pandas, and xgboost, we will build a model using an XGBClassifier. We’ll load the data, get the features and labels, scale the features, then split the dataset, build an XGBClassifier, and then calculate the accuracy of our model.

Prerequisites

You’ll need to install the following libraries with pip:

pip install numpy pandas sklearn xgboost

Requirement already satisfied: numpy in c:\users\hemac\anaconda3\lib\site-packages (1.21.0)
Requirement already satisfied: pandas in c:\users\hemac\anaconda3\lib\site-packages (1.2.5)
Requirement already satisfied: sklearn in c:\users\hemac\anaconda3\lib\site-packages (0.0)
Requirement already satisfied: xgboost in c:\users\hemac\anaconda3\lib\site-packages (1.4.2)
Requirement already satisfied: python-dateutil>=2.7.3 in c:\users\hemac\anaconda3\lib\site-packages (from pandas) (2.8.2)
Requirement already satisfied: pytz>=2017.3 in c:\users\hemac\anaconda3\lib\site-packages (from pandas) (2021.1)
Requirement already satisfied: six>=1.5 in c:\users\hemac\anaconda3\lib\site-packages (from python-dateutil>=2.7.3->pandas) (1.16.0)
Requirement already satisfied: scikit-learn in c:\users\hemac\anaconda3\lib\site-packages (from sklearn) (0.24.2)
Requirement already satisfied: scipy in c:\users\hemac\anaconda3\lib\site-packages (from xgboost) (1.7.0)
Requirement already satisfied: joblib>=0.11 in c:\users\hemac\anaconda3\lib\site-packages (from scikit-learn->sklearn) (1.0.1)
Requirement already satisfied: threadpoolctl>=2.0.0 in c:\users\hemac\anaconda3\lib\site-packages (from scikit-learn->sklearn) (2.1.0)
Note: you may need to restart the kernel to use updated packages.


WARNING: You are using pip version 21.2.1; however, version 21.2.2 is available.
You should consider upgrading via the 'C:\Users\hemac\Anaconda3\python.exe -m pip install --upgrade pip' command.

You’ll also need to install Jupyter jupyter notebook, and then use the command prompt to run it:

Open command prompt and run Pip install jupyter notebook And open it by running jupyter notebook in cmd

This will open a new Jupyter notebook in your browser. Here, you will create a new console and type in your code, then press Shift+Enter to execute one or more lines at a time.

Steps for Detecting Parkinson’s Disease with XGBoost

Make necessary imports:

import numpy as np
import pandas as pd
import os, sys
from sklearn.preprocessing import MinMaxScaler
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

Now, let’s read the data into a DataFrame and get the first 5 records.

# Read the data
df=pd.read_csv('parkinsons.data')
df.head()

	name	MDVP:Fo(Hz)	MDVP:Fhi(Hz)	MDVP:Flo(Hz)	MDVP:Jitter(%)	MDVP:Jitter(Abs)	MDVP:RAP	MDVP:PPQ	Jitter:DDP	MDVP:Shimmer	...	Shimmer:DDA	NHR	HNR	status	RPDE	DFA	spread1	spread2	D2	PPE
0	phon_R01_S01_1	119.992	157.302	74.997	0.00784	0.00007	0.00370	0.00554	0.01109	0.04374	...	0.06545	0.02211	21.033	1	0.414783	0.815285	-4.813031	0.266482	2.301442	0.284654
1	phon_R01_S01_2	122.400	148.650	113.819	0.00968	0.00008	0.00465	0.00696	0.01394	0.06134	...	0.09403	0.01929	19.085	1	0.458359	0.819521	-4.075192	0.335590	2.486855	0.368674
2	phon_R01_S01_3	116.682	131.111	111.555	0.01050	0.00009	0.00544	0.00781	0.01633	0.05233	...	0.08270	0.01309	20.651	1	0.429895	0.825288	-4.443179	0.311173	2.342259	0.332634
3	phon_R01_S01_4	116.676	137.871	111.366	0.00997	0.00009	0.00502	0.00698	0.01505	0.05492	...	0.08771	0.01353	20.644	1	0.434969	0.819235	-4.117501	0.334147	2.405554	0.368975
4	phon_R01_S01_5	116.014	141.781	110.655	0.01284	0.00011	0.00655	0.00908	0.01966	0.06425	...	0.10470	0.01767	19.649	1	0.417356	0.823484	-3.747787	0.234513	2.332180	0.410335

5 rows × 24 columns

Get the features and labels from the DataFrame (dataset). The features are all the columns except ‘status’, and the labels are those in the ‘status’ column.

# Get the features and labels
features=df.loc[:,df.columns!='status'].values[:,1:]
labels=df.loc[:,'status'].values

The ‘status’ column has values 0 and 1 as labels; let’s get the counts of these labels for both- 0 and 1.

#Get the count of each label (0 and 1) in labels
print(labels[labels==1].shape[0], labels[labels==0].shape[0])

147 48

We have 147 ones and 48 zeros in the status column in our dataset.

Initialize a MinMaxScaler and scale the features to between -1 and 1 to normalize them. The MinMaxScaler transforms features by scaling them to a given range. The fit_transform() method fits to the data and then transforms it. We don’t need to scale the labels.

#DataFlair - Scale the features to between -1 and 1
scaler=MinMaxScaler((-1,1))
x=scaler.fit_transform(features)
y=labels

Now, split the dataset into training and testing sets keeping 20% of the data for testing.

# Split the dataset
x_train,x_test,y_train,y_test=train_test_split(x, y, test_size=0.2, random_state=7)

Initialize an XGBClassifier and train the model. This classifies using eXtreme Gradient Boosting- using gradient boosting algorithms for modern data science problems. It falls under the category of Ensemble Learning in ML, where we train and predict using many models to produce one superior output.

# Train the model
model=XGBClassifier()
model.fit(x_train,y_train)

[10:54:28] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.4.0/src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.


C:\Users\hemac\Anaconda3\lib\site-packages\xgboost\sklearn.py:1146: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].
  warnings.warn(label_encoder_deprecation_msg, UserWarning)





XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
              importance_type='gain', interaction_constraints='',
              learning_rate=0.300000012, max_delta_step=0, max_depth=6,
              min_child_weight=1, missing=nan, monotone_constraints='()',
              n_estimators=100, n_jobs=8, num_parallel_tree=1, random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', validate_parameters=1, verbosity=None)

Finally, generate y_pred (predicted values for x_test) and calculate the accuracy for the model. Print it out.

#  Calculate the accuracy
y_pred=model.predict(x_test)
print(accuracy_score(y_test, y_pred)*100)

94.87179487179486

Summary

In this machine learning project, we learned to detect the presence of Parkinson’s Disease in individuals using various factors. We used an XGBClassifier for this and made use of the sklearn library to prepare the dataset. This gives us an accuracy of 94.87%, which is great considering the number of lines of code in this python project.

Run Parkinson'sdisease-App.py

Hope you enjoyed this project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project1-Parkinson'sdisease.md

Project1-Parkinson'sdisease.md

Detecting Parkinson’s Disease – Machine Learning Project

What is Parkinson’s Disease?

What is XGBoost?

Detecting Parkinson’s Disease with XGBoost – Objective

Detecting Parkinson’s Disease with XGBoost – About the Machine Learning Project

Prerequisites

You’ll also need to install Jupyter jupyter notebook, and then use the command prompt to run it:

Steps for Detecting Parkinson’s Disease with XGBoost

Summary

Files

Project1-Parkinson'sdisease.md

Latest commit

History

Project1-Parkinson'sdisease.md

File metadata and controls

Detecting Parkinson’s Disease – Machine Learning Project

What is Parkinson’s Disease?

What is XGBoost?

Detecting Parkinson’s Disease with XGBoost – Objective

Detecting Parkinson’s Disease with XGBoost – About the Machine Learning Project

Prerequisites

You’ll also need to install Jupyter jupyter notebook, and then use the command prompt to run it:

Steps for Detecting Parkinson’s Disease with XGBoost

Summary