README

Glaucoma Prediction

Glaucoma is a common eye condition where the optic nerve, which connects the eye to the brain, becomes damaged. It's usually caused by fluid building up in the front part of the eye, which increases pressure inside the eye. Glaucoma can lead to loss of vision if it's not diagnosed and treated early.

TODOS

What worked? (90% accuracy)

densenet sequential with ben on himanchu dataset, using NLLLoss criterion, Adam optimizer

Limitation

very much dependent on dataset
disk extraction is good but is very subjective to the dataset
trained on very small dataset

Preliminary

create a gmail account (glaucomadetection@gmail.com)
understand the difference between possibility of glaucoma by classification (vs measurements)

Preprocessing

ben transformation
extract disk from fundus images
improve extraction algorithms
perform EDA on disk image to find troubling images (cases where crop does not work)
convert python function to extract disk to torch transform class (failed)
transformation to disk during training failed. create a disk dataset before training the model.
train on new dataset with and without ben transformation
handle imbalanced class with class weighting
convert Kaggle dataset to the format that we have templated our notebooks with
for kaggle dataset get disks using new algorithm

Obseverations in regards to disk generation

extraction of disk does not help (too many vague areas left unfilled)
however, cropping shows very good promise
but, cropping requires somewhat similar of fundus images

Datasets

find datasets https://deepblue.lib.umich.edu/data/concern/data_sets/3b591905z, https://www.kaggle.com/andrewmvd/ocular-disease-recognition-odir5k
create a dataset from Magrabia
create a dataset from Messidor
create a dataset from Ocular Disease Recognition
create EDA on non measurement dataset (Ocular Disease Recognition)
create a dataset from ocular disease recognition to include normal and glaucoma images
(Kaggle dataset, custom generated, filtered)https://www.kaggle.com/sshikamaru/glaucoma-detection?select=glaucoma.csv
train on Kaggle dataset (without changing anything)

Training

inception v3 with and without ben on ocular, kaggle, and himanchu dataset
inception v3 with ben on ocular, kaggle, and himanchu dataset (disk extracted, normal, and cropped dataset)
densenet linear with ben on ocular, kaggle, and himanchu dataset
densenet linear with ben on ocular, kaggle, and himanchu dataset (disk extracted, normal, and cropped dataset)
densenet sequential with ben on ocular, kaggle, and himanchu dataset
densenet sequential with ben on ocular, kaggle, and himanchu dataset (disk extracted, normal, and cropped dataset)
add datasets from cheers for testing
add datasets from cheers for training

Diabetic Retinopathy Prediction

Diabetic retinopathy is a complication of diabetes, caused by high blood sugar levels damaging the back of the eye (retina). It can cause blindness if left undiagnosed and untreated. However, it usually takes several years for diabetic retinopathy to reach a stage where it could threaten your sight.

What worked? (90% accuracy)

Large dataset from EyePACS (Kaggle competition used training (30%) and testing data (70%) from Kaggle. After the competition, the labels were published). Flipped the ratios for our use case.
Remove out of focus images
Remove too bright, and too dark images.
Link to clean dataset https://www.kaggle.com/ayushsubedi/drunstratified
To handle class imbalanced issue, used weighted random samplers. Undersampling to match no of images in the least class (4) did not work. Pickled weights for future use.
Ben Graham transformation and augmentations
Inception v3 fine tuning, with aux logits trained (better results compared to other architecture)
Perform EDA on inference to observe what images were causing issues
Removed the images and created another dataset (Link to the new dataset https://www.kaggle.com/ayushsubedi/cleannonstratifieddiabeticretinopathy
See 5, 6, and 7

TODOS

Datasets

Binary Stratified (cleaned): https://drive.google.com/drive/folders/12-60Gm7c_TMu1rhnMhSZjrkSqqAuSsQf?usp=sharing Categorical Stratified (cleaned): https://drive.google.com/drive/folders/1-A_Mx9GdeUwCd03TUxUS3vwcutQHFFSM?usp=sharing Non Stratified (cleaned): https://www.kaggle.com/ayushsubedi/drunstratified Recleaned Non Stratified: https://www.kaggle.com/ayushsubedi/cleannonstratifieddiabeticretinopathy

Priliminary

create a new gmail account to store datasets (diabeticretinopathyglaucoma@gmail.com)
https://www.youtube.com/watch?v=VIrkurR446s&ab_channel=khanacademymedicine What is diabetic retinopathy?
collect all previous analysis notebooks
conduct preliminary EDA (for balanced dataset, missing images etc)
create balanced test train split for DR (stratify)
store the dataset in drive for colab
identify a few research papers, create a file to store subsequently found research papers
identify right technology stack to use (for ML, training, PM, model versioning, stage deployment, actual deployment)
perform basic augmentation
create a version 0 base model
apply a random transfer learning model
create a metric for evaluation
store the model in zenodo, or find something for version control
create a model that takes image as an input
create a streamlit app that reads model
streamlit app to upload and test prediction
test deployment to free tier heroku
identify gaps
create priliminary test set
create folder structures for saved model in the drive
figure out a way to move files from kaggle to drive (without download/upload)
research saving model (the frugal way)
research saving model to google drive after each epoch so that during unforseen interuptions, the training of the model can be continued

Resource

upgrade to 25GB RAM in Google Colab possibly w/ Tesla P100 GPU
upgrade to Colab Pro

Baseline

medicmind grading (accuracy: 0.8)
medicmind classification (0.47)

Transfer Learning

resnet
alexnet
vgg
squeezenet
densenet
inception
efficient net

Dataset clean images

create a backup of primary dataset (zip so that kaggle kernels can consume them too)
find algorithms to detect black/out of focus images
identify correct threshold for dark and out of focus images
remove black images
remove out of focus images
create a stratified dataset with 2015 data only (convert train and test both to train and use), remove black images and out of focus images (also create test set)
create non-stratified dataset with 2015 clean data only (train, test, valid) (upload in kaggle if google drive full)
create a binary dataset (train, test, valid)
create confusion matrices (train, test, valid) after clean up (dark and blurry)
the model is confusing labels 0 and 1 as 2, is this due to disturbance in image in 0.
concluded that the result is due to the model not capturing class 0 enough (due to undersampling)

Inference

create a csv with preds probability and real label
calculate recall, precision, accuracy, confusion matrix
identify different prediction issues
relationship between difference in preds and accuracy
inference issue: labels 0 being predicted as 4
inference issue: Check images from Grade 2, 3 being predicted as Grade 0
inference issue: Check images from Grade 4 being predicted as Grade 0
inference issue: Check images from Grade 0 being predicted as Grade 4
inference issue: A significant Grade 2 is being predicted as Grade 0
inference issue: More than 50% of Grade 1 is being predicted as Grade 0
create a new dataset

Model Improvement

research kaggle winning augmentation for DR
research appropriate augmentation: optical distortion, grid distortion, piecewise affine transform, horizontal flip, vertical flip, random rotation, random shift, random scale, a shift of RGB values, random brightness and contrast, additive Gaussian noise, blur, sharpening, embossing, random gamma, and cutout
train on various pretrained models or research which is supposed to be ideal for this case https://pytorch.org/vision/stable/models.html
create several neural nets (test different layers)
experiment with batch size
Reducing lighting-condition effects
Cropping uninformative area
Create custom dataloader based on ben graham kaggle winning strategy
finetune vs feature extract
oversample
undersample
add specificity and sensitivity to indicators
create train loss and valid loss charts
test regression models (treat this as a grading problem)
pickle weights

Additional Models

check if left/right eye classification model is required

Additional datasets

make datasets more extensive (add test dataset with recoverd labels to train dataset 2015)
add APTOS dataset
request labelled datasets from cheers
add datasets from cheers for testing
add datasets from cheers for training

Test datasets

find datasets for testing (dataset apart from APTOS and EyePACS)
update folder structures to match our use case
find dataset for testing after making sure old test datasets are not in vaid/train (4 will be empty)

Concepts/Research Papers

read reports from kaggle competition winning authors
Deep Learning Approach to Diabetic Retinopathy Detection https://arxiv.org/pdf/2003.02261.pdf
Google research https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45732.pdf
Nature article https://www.nature.com/articles/s41746-019-0172-3
read ravi's article
https://deim.urv.cat/~itaka/itaka2/PDF/acabats/PhD_Thesis/TESI_doctoral_Jordi_De_la_Torre.pdf
what can go wrong https://yerevann.github.io/2015/08/17/diabetic-retinopathy-detection-contest-what-we-did-wrong/
https://arxiv.org/pdf/1902.07208.pdf
identify more papers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

README

Glaucoma Prediction

TODOS

What worked? (90% accuracy)

Limitation

Preliminary

Preprocessing

Obseverations in regards to disk generation

Datasets

Training

Diabetic Retinopathy Prediction

What worked? (90% accuracy)

TODOS

Datasets

Priliminary

Resource

Baseline

Transfer Learning

Dataset clean images

Inference

Model Improvement

Additional Models

Additional datasets

Test datasets

Concepts/Research Papers

Files

README.md

Latest commit

History

README.md

File metadata and controls

README

Glaucoma Prediction

TODOS

What worked? (90% accuracy)

Limitation

Preliminary

Preprocessing

Obseverations in regards to disk generation

Datasets

Training

Diabetic Retinopathy Prediction

What worked? (90% accuracy)

TODOS

Datasets

Priliminary

Resource

Baseline

Transfer Learning

Dataset clean images

Inference

Model Improvement

Additional Models

Additional datasets

Test datasets

Concepts/Research Papers