Skip to content

Latest commit

 

History

History
144 lines (87 loc) · 11.5 KB

File metadata and controls

144 lines (87 loc) · 11.5 KB

Recognizing grape breeds based on their leaf images with Deep Convolutional networks

Introduction

This project was initially a part of my data mining course, but then I found it so exciting and started to read about it and do some research on it by taking advice from my professor Dr. H Sajedi. So, I tried different things I learned on it, like designing my own network and different methods to get a better result by augmenting data in various ways or exploiting other types of neural networks.

Note: If you had problem in opening any of the .ipynb files, I exported them as .pdf files available in "pdf_files" directory.

Methodology

I think to make it easier to follow and faster to search; it is better to explain what I did in four parts:

Data

First of all, download base data using wget command and unzip it using unzip it as below:

wget https://www.muratkoklu.com/datasets/Grapevine_Leaves_Image_Dataset.zip
unzip -q Grapevine_Leaves_Image_Dataset.zip 
  • Here is a sample of the base data:

base_data

Then, to create an out-of-sample set with a small python script, we randomly chose and moved 20% of each class to the new directory. Then our data was ready to load, loaded as TensorFlow datasets with tf.keras.utils.image_dataset_from_directory function. Then because the margins of the images are white and white pixels are large in value (RGB code: 255,255,255) but contain no information, I tried to transfer the colors to turn the useless white into black (RGB code: 0,0,0) by the map method below:

train_data2 = train_data.map(lambda x, y: (255-x, y))
validation_data2 = validation_data.map(lambda x, y: (255-x, y))
test_data2 = test_data.map(lambda x, y: (255-x, y))

Afterward, since rotating, flipping, or zooming on an image, its class does not change; I tried to augment the base data with newly generated randomly changed images.

layers.RandomFlip("horizontal"),
layers.RandomFlip("vertical"),
layers.RandomZoom(height_factor=(-0.2,0.2), width_factor=(-0.2,0.2),fill_mode='constant', fill_value=0),
layers.RandomRotation(0.3, fill_mode='constant', fill_value=0)
  • Here is a sample of the augmented transformed data:

augmented_transformed_data

Although I used these layers inside my architecture to use the true power of randomness, I stored simple augmented data in a dataset to somehow save the GPU processor and time in the try and error phases.

My architecture

In this part which is available here!; What I did was creating a model starting with 3 data augmentation layers to prevent from overfitting and also provide better learning, then 12 convolution and pooling layers to extracting every little information, and finally after flattening, five dense layers were in charge of classification. I used a bunch of different architectures and changed each one many times to end up with this result, which is good enough to be compared with famous networks on this data.

  • A more detailed summary of the model is shown below:

Model_Summary

Afterward, in the training phase, I used the adam optimizer and SparseCategoricalCrossentropy loss function to train the network for 200 epochs and a batch size of 32. The accuracy and loss during the training is provided below:

  • accuracy curve

My_model_acc

  • loss curve

My_model_loss

In the end, I tried to test the model with the unseen out-of-sample data to see whether the results were real or not [overfitting]. For this test, we show the model 100 images [20 from each class] and check the predicted class with the real one. The result in the table below shows great work, and a good thing to be mentioned is due to the confusion matrix, the learning was not biased, which is very important in classification tasks.

  • Result table:

My_model_res

Pre-trained models

In this part, which codes are available here!, I tried different models including Xception, VGG16, VGG19, ResNet50, ResNet101, ResNet152, InceptionV3, and InceptionResNetV2 in the same structure in order to find the best model. Consequently, test it with different seeds and compare its results with my model.

The architecture I used starts with three Keras data augmentation layers in which the input data is randomly rotated, flipped, or zoomed, then the model itself has been placed, and finally, three dense layers for classifying the model's output into our desired five classes.

  • As an example, you can see more details for the VGG16 model summary below:

Pre_trained_arc_vgg16

You can find the codes, accuracy, and loss curves for each model in the notebook with more details. Also below, you can see and compare all of them at once:

  • Pre-trained models accuracy on training data during training phase:

Pre_trained_train_acc

  • Pre-trained models accuracy on validation data during training phase:

Pre_trained_val_acc

To sum up, what I found out was that Xception, InceptionV3, and InceptionResNetV2 was so bad and weren't even close to the others. But on the other hand, both the VGG and ResNet networks worked quite well and ended up with accuracies of around 80 percent. However, The best model was ResNet152, which reached 84% on unseen out-of-sample data!

  • The chart below compared the accuracy of pre-trained models on out-of-sample data:

Pre_trained_results

Exploiting denoising and autoencoder networks

The idea of this part (codes are available here!) was to create a model in which the important parts of images are found and emphasized before the image goes into our classifier part of the network, so maybe this makes the model focus on essential pieces of information only and provides us a better result. Which unfortunately didn't happen.

For the denoising part, I firstly created a noisy version of each image in the augmented dataset with a noise factor of 0.3 and a mean of 127. Afterward, I trained my network with these images for 30 epochs. Even though I spent a week working on it, I couldn't manage to get a good result. The model summary and a sample of its output are presented below:

  • Denoising model summary:

Denoising_architecture

  • Denoising sample output:

Denoising_sample

images in the first row are the noisy ones, and the second row is the network's output from the same image in row one. The last row is the actual image itself.

After creating this denoising network, it was time to attach it to the rest of the network's body. Since we have previously seen the performance of the ResNet152, I decided to use a combination of all I got to get the best I can.

  • The final architecture was this :

DenoisingResNet152_arc

The layer sequential_10 is the previously trained denoising network.

After all of this, I trained the model with 100 epochs of data, and what I got was this accuracy curve during the training phase, which shows us that the model learning converged on about 70%.

  • The accuracy curve:

Denoising_train_acc

  • The results table:

Denoising_result

For the autoencoder part, the situation was much worse, and despite my endeavor to make a good network, the best model's output looked like a faded purple circle which clearly missed so much information.

  • Autoencoder output:

Autoencoder_output

The images in the first row are the actual images, and the images in the second row are the reconstructed form of the encoded version of the ones above.

With what we see above, it is expected not to get a good result. Like the denoising part, I used the ResNet152 here too. For the report's sake, I bring the results below, although its accuracy is just a little bit more than random answering.

The results table:

AutoencoderResNet_result

References

  1. A CNN-SVM study based on selected deep features for grapevine leaves classification

  2. Image denoising method based on a deep convolution neural network