Main reference for CNN Very good quotes on machinelearningmastery to add in documentation. Easier alternative for CNN development C++ Faster R-CNN implementation YOLO V5 Paper Algorithm accuracies related to FPS
! Please take into consideration using EfficientDet, due to the fact that it has greater accuracy on low FPS, and our camera gets images very slow.
YOLO V5 Architecture Defense for two-stage object detector EfficientNet, currently state-of-the-art in nets as it seems. Faster R-CNN paper
Convolution is a linear operation that involves the multiplication of a set of weights with the input, much like a traditional neural network. Given that the technique was designed for two-dimensional input, the multiplication is performed between an array of input data and a two-dimensional array of weights, called a filter or a kernel.
The output from multiplying the filter with the input array one time is a single value. As the filter is applied multiple times to the input array, the result is a two-dimensional array of output values that represent a filtering of the input. As such, the two-dimensional output array from this operation is called a “feature map“.
Consider that the filters that operate directly on the raw pixel values will learn to extract low-level features, such as lines. The filters that operate on the output of the first line layers may extract features that are combinations of lower-level features, such as features that comprise multiple lines to express shapes. This process continues until very deep layers are extracting faces, animals, houses, and so on.
A limitation of the feature map output of convolutional layers is that they record the precise position of features in the input. This means that small movements in the position of the feature in the input image will result in a different feature map. This can happen with re-cropping, rotation, shifting, and other minor changes to the input image.
A common approach to addressing this problem from signal processing is called down sampling. This is where a lower resolution version of an input signal is created that still contains the large or important structural elements, without the fine detail that may not be as useful to the task.
Dense and fully connected are two names for the same thing.
The Kann Library's dense layers function as Keras'. This affects the output size. Please check the colab link
We can divide the whole network (for classification) into two parts:
Feature extraction: In the conventional classification algorithms, like SVMs, we used to extract features from the data to make the classification work. The convolutional layers are serving the same purpose of feature extraction. CNNs capture better representation of data and hence we don’t need to do feature engineering.
Classification: After feature extraction we need to classify the data into various classes, this can be done using a fully connected (FC) neural network. In place of fully connected layers, we can also use a conventional classifier like SVM. But we generally end up adding FC layers to make the model end-to-end trainable.
Briefly, fully connected layers add up info and output a 1D array of classes with prediciton precentages.
Smaller batch sizes tend to give better results, as there is less generalization. Bigger batch sizes allow paralelization.
EfficientNet for ConvNets(Image classification(?)) Detection vs. Recognition with source code Edge boxes Selective search Mask RCNN -- apparently this is the state of the art in object detection + recognition. Mask RCNN easily explained Possibly useful Mask RCNN C++ repo Mask RCNN in-depth Keras Mask RCNN
According to Mask RCNN in-dept, Faster RCNN should be faster than Mask RCNN due to the latter's overhead.
RPN in Faster R-CNN (second answer) RPN in Faster R-CNN explained(scroll down)