The end goal of this project is to develop an in-house device that can do the following:
Audio-Translation: The audio from a person is taken as an input and an appropriate gesture (or combination of gestures) is output on the screen (we will have to create standard templates for the gestures) in real-time.
Gesture-Translation: The images captured from the gestures of a person are processed in real-time and output the audio (display the text on the screen) corresponding to the gesture (or combination of gestures)
Practical Project Experience in Computer Vision Using Tensorflow and/or PyTorch Libraries
The overall outline of this project would start right with the basics i.e. identifying the relevant datasets and papers and end with the final step of deployment of the identified algorithms.
- Literature Review: Identify the relevant datasets and papers in this domain
- Ideation: Select a model/Modify an existing model and implement/test it
- Experimentation: Perform various experiments on different datasets, with different parameters and take note of the observations and results
- Deployment: Improve the model to increase it's accurancy and/or reduce inference time for deployment onto a device with low compute power
- Basic knowledge of Python and OOPs in python
- Familiarity with using Git
- Basic theoritical concepts of Computer Vision
- Passion for Learning
You should be able to write clean efficient code with proper commenting and documentations of each experiment.