The Computer Pointer Controller app allows you to control the movement of the mouse pointer on the screen through your head position and gazing angles.
- You need to install Intel openvino.
See this guide for installing openvino.
Clone the repository:- https://github.com/RutvikJ77/Mouse-controller.git
Initialize the openVINO environment:-
source /opt/intel/openvino/bin/setupvars.sh -pyver 3.5
Download the following models by using openVINO model downloader:-
1. Face Detection Model
python /opt/intel/openvino/deployment_tools/tools/model_downloader/downloader.py --name "face-detection-adas-binary-0001"
2. Facial Landmarks Detection Model
python /opt/intel/openvino/deployment_tools/tools/model_downloader/downloader.py --name "landmarks-regression-retail-0009"
3. Head Pose Estimation Model
python /opt/intel/openvino/deployment_tools/tools/model_downloader/downloader.py --name "head-pose-estimation-adas-0001"
4. Gaze Estimation Model
python /opt/intel/openvino/deployment_tools/tools/model_downloader/downloader.py --name "gaze-estimation-adas-0002"
To run the app on your local machine, Open the Terminal and run the following commands:
1. Change the directory to src directory of project repository
cd <project-repo-path>/src
2. Run the main.py file
python main.py -f <Path of xml file of face detection model> \
-fl <Path of xml file of facial landmarks detection model> \
-hp <Path of xml file of head pose estimation model> \
-g <Path of xml file of gaze estimation model> \
-i <Path of input video file or enter cam for taking input video from webcam>
Example case:
python3 src/main.py -f intel/face-detection-adas-binary-0001/FP32-INT1/face-detection-adas-binary-0001.xml
-fl intel/landmarks-regression-retail-0009/FP32/landmarks-regression-retail-0009.xml
-hp intel/head-pose-estimation-adas-0001/FP32/head-pose-estimation-adas-0001.xml
-g intel/gaze-estimation-adas-0002/FP32/gaze-estimation-adas-0002.xml
-i bin/demo.mp4 -flags fd hp fld ge
- If you want to run app on GPU:-
python main.py -f <Path of xml file of face detection model> \
-fl <Path of xml file of facial landmarks detection model> \
-hp <Path of xml file of head pose estimation model> \
-g <Path of xml file of gaze estimation model> \
-i <Path of input video file or enter cam for taking input video from webcam>
-d GPU
- If you want to run app on FPGA:-
python main.py -f <Path of xml file of face detection model> \
-fl <Path of xml file of facial landmarks detection model> \
-hp <Path of xml file of head pose estimation model> \
-g <Path of xml file of gaze estimation model> \
-i <Path of input video file or enter cam for taking input video from webcam>
-d HETERO:FPGA,CPU
- Face Detection Model
- Facial Landmarks Detection Model
- Head Pose Estimation Model
- Gaze Estimation Model
Following are the command line arguments to use for running the main.py file python main.py
:-
- -h : Get the information about all the command line arguments
- -fl (required) : Specify the path of Face Detection model's xml file
- -hp (required) : Specify the path of Head Pose Estimation model's xml file
- -g (required) : Specify the path of Gaze Estimation model's xml file
- -i (required) : Specify the path of input video file or enter cam for taking input video from webcam
- -d (optional) : Specify the target device to infer the video file on the model. Supported devices are: CPU,GPU,FPGA (For running on FPGA used HETERO:FPGA,CPU), MYRIAD.
- -l (optional) : Specify the absolute path of cpu extension if some layers of models are not supported on the device.
- -prob (optional) : Specify the probability threshold for face detection model to detect the face accurately from video frame.
- -flags (optional) : Specify the flags from fd, fld, hp, ge if you want to visualize the output of corresponding models of each frame. (write flags with space seperation. Ex:- -flags fd fld hp).
Benchmark results of the application.
Having run the model on different hardwares:
- Intel Core i5-6500TE CPU
- Intel Core i5-6500TE GPU
- IEI Mustang F100-A10 FPGA
- Intel Xeon E3-1268L v5 CPU
- Intel Atom x7-E3950 UP2 GPU
On comparing the model loading time, inference time and the Frames per second output following is my analysis.
-
As seen from the graph FPGA took greater time for inference than any other device. FPGA is a self programmable piece of hardware and has following advantages:-
-
It is robust and helps in custom solutions.
-
It has also longer life-span compared to any other device.
-
GPU performed very well compared to other devices in terms of Frames per second and especially with the FP16 format which it support greatly.
-
Accuracy is the factor which is sensitive to precision.
-
Model size can be reduced by lowering the precision rate but then we need to compensate with the accuracy as much essential data is lost in conversion from FP32 to FP16/INT8
- Add a face authentication model.
- Using the rock paper scissor model add more functionalities like click,scroll.
To avoid certain edge cases following implementations are done:
- If the model fails to predict a face it prints unable to detect the face and reads another frame.
- If there are more faces detected, the model takes only one face for the control of the mouse.