This project was initially a part of my data mining course, but then I found it so exciting and started to read about it and do some research on it by taking advice from my professor Dr. H Sajedi. So, I tried different things I learned on it, like designing my own network and different methods to get a better result by augmenting data in various ways or exploiting other types of neural networks.
Note: If you had problem in opening any of the
.ipynb
files, I exported them as
I think to make it easier to follow and faster to search; it is better to explain what I did in four parts:
- Data and data augmentation
- My own model architecture
- Pre-trained models
- Denoising and autoencoder networks
First of all, download base data using wget
command and unzip it using unzip
it as below:
wget https://www.muratkoklu.com/datasets/Grapevine_Leaves_Image_Dataset.zip
unzip -q Grapevine_Leaves_Image_Dataset.zip
- Here is a sample of the base data:
Then, to create an out-of-sample set with a small python script, we randomly chose and moved 20% of each class to the new directory. Then our data was ready to load, loaded as TensorFlow datasets with tf.keras.utils.image_dataset_from_directory
function.
Then because the margins of the images are white and white pixels are large in value (RGB code: 255,255,255) but contain no information, I tried to transfer the colors to turn the useless white into black (RGB code: 0,0,0) by the map method below:
train_data2 = train_data.map(lambda x, y: (255-x, y))
validation_data2 = validation_data.map(lambda x, y: (255-x, y))
test_data2 = test_data.map(lambda x, y: (255-x, y))
Afterward, since rotating, flipping, or zooming on an image, its class does not change; I tried to augment the base data with newly generated randomly changed images.
layers.RandomFlip("horizontal"),
layers.RandomFlip("vertical"),
layers.RandomZoom(height_factor=(-0.2,0.2), width_factor=(-0.2,0.2),fill_mode='constant', fill_value=0),
layers.RandomRotation(0.3, fill_mode='constant', fill_value=0)
- Here is a sample of the augmented transformed data:
Although I used these layers inside my architecture to use the true power of randomness, I stored simple augmented data in a dataset to somehow save the GPU processor and time in the try and error phases.
In this part which is available here!; What I did was creating a model starting with 3 data augmentation layers to prevent from overfitting and also provide better learning, then 12 convolution and pooling layers to extracting every little information, and finally after flattening, five dense layers were in charge of classification. I used a bunch of different architectures and changed each one many times to end up with this result, which is good enough to be compared with famous networks on this data.
- A more detailed summary of the model is shown below:
Afterward, in the training phase, I used the adam
optimizer and SparseCategoricalCrossentropy
loss function to train the network for 200 epochs and a batch size of 32. The accuracy and loss during the training is provided below:
- accuracy curve
- loss curve
In the end, I tried to test the model with the unseen out-of-sample data to see whether the results were real or not [overfitting]. For this test, we show the model 100 images [20 from each class] and check the predicted class with the real one. The result in the table below shows great work, and a good thing to be mentioned is due to the confusion matrix, the learning was not biased, which is very important in classification tasks.
- Result table:
In this part, which codes are available here!, I tried different models including Xception
, VGG16
, VGG19
, ResNet50
, ResNet101
, ResNet152
, InceptionV3
, and InceptionResNetV2
in the same structure in order to find the best model. Consequently, test it with different seeds and compare its results with my model.
The architecture I used starts with three Keras data augmentation layers in which the input data is randomly rotated, flipped, or zoomed, then the model itself has been placed, and finally, three dense layers for classifying the model's output into our desired five classes.
- As an example, you can see more details for the VGG16 model summary below:
You can find the codes, accuracy, and loss curves for each model in the notebook with more details. Also below, you can see and compare all of them at once:
- Pre-trained models accuracy on training data during training phase:
- Pre-trained models accuracy on validation data during training phase:
To sum up, what I found out was that Xception
, InceptionV3
, and InceptionResNetV2
was so bad and weren't even close to the others. But on the other hand, both the VGG and ResNet networks worked quite well and ended up with accuracies of around 80 percent. However, The best model was ResNet152
, which reached 84% on unseen out-of-sample data!
- The chart below compared the accuracy of pre-trained models on out-of-sample data:
The idea of this part (codes are available here!) was to create a model in which the important parts of images are found and emphasized before the image goes into our classifier part of the network, so maybe this makes the model focus on essential pieces of information only and provides us a better result. Which unfortunately didn't happen.
For the denoising part, I firstly created a noisy version of each image in the augmented dataset with a noise factor of 0.3 and a mean of 127. Afterward, I trained my network with these images for 30 epochs. Even though I spent a week working on it, I couldn't manage to get a good result. The model summary and a sample of its output are presented below:
- Denoising model summary:
- Denoising sample output:
images in the first row are the noisy ones, and the second row is the network's output from the same image in row one. The last row is the actual image itself.
After creating this denoising network, it was time to attach it to the rest of the network's body. Since we have previously seen the performance of the ResNet152
, I decided to use a combination of all I got to get the best I can.
- The final architecture was this :
The layer sequential_10 is the previously trained denoising network.
After all of this, I trained the model with 100 epochs of data, and what I got was this accuracy curve during the training phase, which shows us that the model learning converged on about 70%.
- The accuracy curve:
- The results table:
For the autoencoder part, the situation was much worse, and despite my endeavor to make a good network, the best model's output looked like a faded purple circle which clearly missed so much information.
- Autoencoder output:
The images in the first row are the actual images, and the images in the second row are the reconstructed form of the encoded version of the ones above.
With what we see above, it is expected not to get a good result. Like the denoising part, I used the ResNet152
here too. For the report's sake, I bring the results below, although its accuracy is just a little bit more than random answering.
The results table: