- Teacher model - VGG16
- Student model - Mobilenet
- python 3.10
- pytorch 2.2.1+cu121
- ptflops
VGG16, a renowned CNN architecture from the University of Oxford, excels in image classification with its 16 layers and simple structure. It comprises 13 convolutional and 3 fully connected layers, employing 3x3 filters and 2x2 max-pooling. Despite its effectiveness, its size and depth can be computationally intensive.
MobileNet, developed by Google, is tailored for mobile and embedded devices, featuring 28 layers and innovative depthwise separable convolutions. This reduces parameters and computational complexity while maintaining performance. MobileNet adapts to various input sizes and is widely used for transfer learning, albeit potentially sacrificing some accuracy for efficiency.
We have used the CIFAR-100 dataset.
- It contains 60,000 color images.
- Images are of size 32x32 pixels.
- The dataset is organized into 100 classes, each containing 600 images.
- There are 50,000 training images and 10,000 testing images.
- Each image is labeled with one of the 100 fine-grained classes.
To train the VGG16 model
# use gpu to train vgg16
$ python train.py -net vgg16 -gpu
To train the Mobilenet model
# use gpu to train mobilenet
$ python train.py -net mobilenet -gpu
To perform knowledge distillation from the trained VGG16 to the Mobilenet model
# use gpu to train mobilenet
$ python knowledge_distillation_train.py -gpu -teacher path_to_best_vgg16_weights_file -student path_to_best_mobilenet_weights_file
The weights file with the best accuracy would be written to the disk with name suffix 'best' (default in checkpoint folder).
Test the VGG16 model
$ python test.py -net vgg16 -weights path_to_best_vgg16_weights_file
Test the mobilenet model
$ python test.py -net mobilenet -weights path_to_best_mobilenet_weights_file
Test the knowledge distilled mobilenet model
$ python knowledge_distillation_test.py -gpu -weights path_to_best_knowledge_distilled_mobilenet_weights_file
- VGG Very Deep Convolutional Networks for Large-Scale Image Recognition
- Mobilenet MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
- Hyperparameter settings Improved Regularization of Convolutional Neural Networks with Cutout, which is init lr = 0.1 divided by 5 at 60th, 120th, 160th epochs, trained for 200 epochs with batchsize 128 and weight decay 5e-4, Nesterov momentum of 0.9.
- Code reference GitHub : weiaicunzai
dataset | network | learning rate | batch size | size (MB) | params | top1 err | top5 err | time(ms) per inference step (CPU) | time(ms) per inference step (CPU) | FLOPs |
---|---|---|---|---|---|---|---|---|---|---|
cifar100 | vgg16 | 0.1 | 128 | 136.52 | 34.0M | 27.77 | 10.12 | 177.2584 | 10.7589 | 334.14 |
cifar100 | mobilenet | 0.1 | 128 | 24.03 | 3.32M | 33.06 | 10.15 | 57.6361 | 9.0793 | 48.32 |
cifar100 | knowledge distilled mobilenet | 0.1 | 128 | 24.03 | 3.32M | 32.61 | 10.26 | 56.7409 | 9.6162 | 48.32 |
cifar100 | knowledge distilled mobilenet | 0.001 | 64 | 24.03 | 3.32M | 32.16 | 10.83 | 58.2087 | 9.0350 | 48.32 |