Emotion recognition is the process of identifying human emotions using AI. This project seeks to recognise emotions from speech clips (audio + video). Generally, the technology works best at this task if it uses multiple modalities, for this reason we implemented a two-stream model to analyze facial expressions from video and voice tone from audio. These tasks are called Speech Emotion Recognition (SER) and Facial Emotion Recognition (FER) respectively.
In this project we trained the models on speech clips of RAVDESS dataset which contains 8 emotion classes: neutral, calm, happy, sad, angry, fearful, surprise and disgust (7 used).
More details about the processing and architecture in Project_Slides.pdf
. Dimostrative video and deployed model in the DEMO
folder.
Emotion-Recognition_SER-FER_RAVDESS
├───Datasets
│ ├───RAVDESS
│ ├───RAVDESS_audio
│ ├───RAVDESS_frames
│ ├───RAVDESS_frames_black
│ └───RAVDESS_frames_face_BW
├───DEMO
│ ├───Examples
│ ├───ER DEMO.mp4
│ └───ER_FullClip_DEMO.ipynb
├───Models
│ ├───Audio Stream
│ └───Video Stream
├───Other
│ └───haarcascade_frontalface_default.xml
├───Plots
├───StreamAudio_1D.ipynb
├───StreamAudio_2D.ipynb
├───FullClip_Test.ipynb
├───StreamVideo_FaceOnly.ipynb
├───StreamVideo_FramesExtraction.ipynb
├───StreamVideo_FullFrame.ipynb
├───StreamVideo_Test.ipynb
├───Project_Slides.pdf
├───README.md
├───LICENSE.md
└───requirements.txt
To classify emotions (using our trained model):
- Copy your clips in
DEMO/Examples
- Run
ER_FullClip_DEMO.ipynb
in DEMO folder
To replicate this project (training and inference):
- Download the speech clips of RAVDESS dataset and save it in
Datasets/RAVDESS
folder - Train video and audio models
- Video Stream: extract frames with
StreamVideo_FramesExtraction.ipynb
(multiple type of frames are generated -> best are "224x224 only faces BW"), train model withStreamVideo_FullFrame.ipynb
andStreamVideo_FaceOnly.ipynb
(depending on the frames generated) and test the results withStreamVideo_Test.ipynb
- Audio Stream: use
StreamAudio_1D.ipynb
andStreamAudio_2D.ipynb
to train models (2D works better)
- Video Stream: extract frames with
- Use
FullClip_Test.ipynb
to assess global performance - Use
ER_FullClip_DEMO.ipynb
in DEMO folder to classify videos with the trained models.