Galaxy Type Classification

Description

In this project, I trained 3 models on a dataset containing pictures of 4 types of galaxies, and compared the models. I should also mention that this project is based on and inspired by an assignment in this course.

Dataset

The datasset we're using in this project is EFIGI dataset which contains 4458 images classified by their shape:

Ellipticals: 289 images
Lenticulars: 537 images
Spirals: 3315 images
Irregulars: 317 images

About 74% of the images are from one class and this bias in the dataset can cause unwanted behaviours in our models.

Models

A MLP model with 3 hidden layers and CrossEntropyLoss: This approach can reach a pretty high accuracy(74%) in only one epoch. However, the model basically returns 2(Spiral) for any input image. This behaviour of the first model comes from the bias in our training set.
Same MLP model but with WeightedCrossEntropyLoss: In this trial, even though the accuracy didn't go very high, at least our model learned something and was able to predict samples from all classes.
CNN model with WeightedCrossEntropyLoss: The last model was a Convolutional Neural Network (ResNet) which, with much less parameters, could achieve the same accuracy and even improve it slightly. The CNN model also prevents overfitting.

Conclusion

Bias in the dataset can have high impact on the model's traitability. In this case, the difference between frequencies of classes caused a big issue in the learning process.
In image datasets, CNNs almost always outperform fully connected networks. In this case also, the 3rd model worked better than the other one.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Galaxy Type Classification

Description

Dataset

Models

Conclusion

Files

README.md

Latest commit

History

README.md

File metadata and controls

Galaxy Type Classification

Description

Dataset

Models

Conclusion