Creating a CNN model to detect facial landmark in an image.
Requirements
- Tensorflow 1.8.0
- Tkinter
- Numpy
- OpenCv 3
Description Use Convolutional Neural Network to detect facial landmark in an image.
- Collecting Dataset
- Preprocessing Data
- Creating Network Architecture
- Defining Loss function
- Training model
- detect facial landmark in real-time
Dataset We will use data provided by Large-scale CelebFaces Attributes (CelebA) Dataset. This is a huge dataset total of 2 GB images. Here I have used 150000 images for the training set with 0.08% images for validation dataset. This data contains text image segments which look like images shown below:
To download the dataset either you can directly download from this link.
Now we are having our dataset, to make it acceptable for our model we need to use some preprocessing. We need to preprocess both the input image and output labels. To preprocess our input image we will use followings:
- Read .txt LandmarkData and cropdata file and convert into .csv file
- Read the image and convert into a gray-scale image
- Crop only faces from image using cropdata and also subract LandmarkData with cropdata
- Make each image of size (96,96)
- make each landamrk point into (96,96) size
- Expand image dimension as (96,96,1) to make it compatible with the input shape of architecture
- Normalize the image pixel values by dividing it with 255.
- Normalize the landamrk point values by subracting with 48 and then dividing it with 48
we will create our model architecture and train it with the preprocessed data.
Our model consists of three parts:
A convolutional neural network (CNN, or ConvNet) is a class of deep learning, feed artificial neural networks that has successfully been applied to analyzing visual imagery. CNN compares any image piece by piece and the pieces that it looks for in an while detection is called as features. The convolutional neural network to extract features from the image
There are five main operations in the CNN: a) Convolution. b) ReLU. c) Pooling or Sub
The primary purpose of Convolution in case of a CNN is to extract features from the input image. Each convolution layer takes image as a batch input of four dimension N x Color-Channel x width x height. Kernels or filters are also four dimensional (Number of feature maps in, number of feature maps out, filter width and filter height) which are set of learnable parameters (weights and biases). In each convolution layer, four dimensional convolution is calculate between image batch and feature maps by dot p between them. After convolution only parameter that changes are image width and height.
An additional operation called ReLU has been used after every Convolution operation. A Rectified Linear Unit (ReLU) is a cell of a neural network which uses the following activation function to calculate its output given x: R(x) = Max(0,x)
In this layer, the dimensionality of the feature map reduces to get shrink maps that would reduce the parameters and computations. Pooling can be Max, Average or Sum. Number of filters in convolution layer is same as the number of output maps from pooling. Pooling takes input from rectified feature maps and then downsized it according to algorithm.
This is the final layer where the actual classification occurs where this layer takes downsized or shrink feature maps obtained after convolution, ReLU and pooling layer and flatten. It is a traditional Multi uses a softmax activation function. Convol layers generate high-level features. The purpose of the fully connected layer is to use these features to classify into various classes based on labels.
Let’s see the steps that we used to create the architecture:
we use mean square errors loss function.
Test the model Our model is now trained with images. Now its time to test the model. We can use our training model.