Analysing cinema is a time-consuming process. In the cinematography domain alone, there's a lot of factors to consider, such as shot scale, shot composition, camera movement, color, lighting, etc. Whatever you shoot is in some way influenced by what you've watched. There's only so much one can watch, and even lesser that one can analyse thoroughly.
This is where neural networks offer ample promise. They can recognise patterns in images that weren't possible until less than a decade ago, thus offering an unimaginable speed up in analysing cinema.
Thus, the dataset for this project had to be constructed from scratch. It is diverse, consisting of samples from over 300 movies, collected from various sources. Each image had to be looked over several times to ensure that it had been categorised correctly.
In total, the dataset consists of 2,724 (2,180 training + 544 validation) images, split into 6 shot types:
- Long Shot: 263 images
- Medium Shot: 142 images
- Medium Close Up: 223 images
- Close Up: 841 images
- Extreme Close Up:1041 images
- Extreme Wide Shot:210 images
Extreme Close Up-An Extreme Close Up (ECU) highly zooms in to any one feature of the subject to draw attention to that feature specifically.
Close Up-A Close Up (CU) shows the face of the character, sometimes including the neck/shoulders.Emphasises the facial expressions of the character.
Medium Close Up-A Medium Close Up (MCU) shows the character from the chest/shoulders up. It allows one to see nuances of the character's facial expressions, and some upper-body language.
Medium Shot- A Medium Shot (MS) shows the character from the waist up.It allows one to see nuances of the character's body language, and to some degree the facial expressions.
Long Shot-A Long Shot (LS) includes characters in their entirety, and a large portion of the surrounding area.
Extreme Wide Shot-An Extreme Wide Shot (EWS) emphasises the vastness of the location.When there is a subject, it usually occupies a very small part of the frame.
Even though we are getting great accuracy with our model there is still alot of work to do. Most of it has to do with the dataset, which has to be built out bigger. Because as we can see the model has learned ecu well because of the sheer number of them in our dataset.
- Use it to generate stats for film analysis
- Integration into a live video
- Add heatmaps to show activations