DIFFMOD is an image captioning model. Image captioning models are currently owned by some big companies such as Instagram, Facebook and Google. And the models which are available, are either monetised or not working at all.
DIFFMOD is different from such models as we want our model to be publically available. Inspired by the open source models like stable diffusion and auto gpt, we want DIFFMOD to be an open source library to revolutionise the image captioning community.
Python: Keras, Tensorflow, Flask, OpenCV
Model: EfficientNet
Demo:
We trained it over flickr8k. We're now planning to upscale the model, and training it to MSCOCO with over 330,000 images. Also planning to deploy online on one of our subdomains.
- Avdhan
- Satya
- Mansi