Image-Captioning-Project

By Eduardo Venegas, Moises Chávez, Leonardo Galindo, and Alberto Cortés.

📖 Project Overview

Technology is a great way to help those in need, as it continous to develop it also presents new possibilities, one such being human vision aided and complemented by computer vision.

This project consists of a web and mobile based app that generates captions from images captured by the device's camera. The goal of the app is to help people with limited vision to see the world around them by using an easy to use UI from where the users can take a picture, upload it to the app a be presented with an audio caption in multiple languages of what they are seeing.

This can be run in a Web Browser or compiled to an Adroid or IOS device using the Ionic Capacitor tool.

📍 Table of Contents

💻 Technologies Used

Docker
Google CLoud
Ionic Angular
TensorFlow
Github Actions
Flask
Python
Typescript

📚 Workflow

The architecture of the app is composed by a client app that captures images and a server app that process the images using a Deep Learning model and returns the generated captions.

The client app is an Ionic Angular app takes a picture and encodes the captured image to a base64 string that is sent through a POST request to the server.
The server app is a Flask container running in Docker that holds a trained Captioning Neural Network composed of a Convolutional Neural Network to extract features of an image and a Recurrent Neural Network that generated captions from the features of the image using a Long Short Term Memory model.
Once an image is passed as input to the Captioning Neural Network it returns a text caption that is returned to the client app.
Finally, the client app presents the text caption as an audio output that is being translated to any of the 11 available languages using text-to-speech, once the audio is finished the app returns to the main activity.
The client app can be used in a Web Browser or compiled to an Android or IOS device by using Ionic.
Response time under 1.5 seconds

🔍 Site Overview

Home Page

Users can select the language of the audio output and press the button that launches a camera intent.

Supported Languages

Currently 11 languages are being supported, handling translations and pronunciation.

Camera Page

The user can capture an image and confirm or reject the captured image.

🤖 CI/CD

This project has a full Continuous Integration and Delivery system.

All code is tested the moment a pull request is created by building it in Github Actions
You can merge into main when all tests pass.
When Continuous Delivery is triggered, Github Actions builds the API image and pushes it into a Github Package Registry.
SSHs into a Google Cloud instance, pull the new images, stop the current docker compose and run it again.
Also Github Actions connects with Firebase to deliver automatic client deployments.
As an extra, an Android an IOS app can be compiled from the main source code.

⬇️ Installation

Make sure you have python3 and pip installed

Create and activate virtual environment using virtualenv

$ python -m venv python3-virtualenv
$ source python3-virtualenv/bin/activate

Use the package manager pip to install all dependencies

pip install -r requirements.txt

Install the node modules

npm i

💼 Usage

Make sure to have ionic installed

$ flask run
$ ionic serve

📝 Contributing

Contributions are welcome! Please refer to the guidelines.

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
.github/workflows		.github/workflows
__pycache__		__pycache__
api		api
e2e		e2e
src		src
user_conf.d		user_conf.d
.browserslistrc		.browserslistrc
.editorconfig		.editorconfig
.eslintrc.json		.eslintrc.json
.firebaserc		.firebaserc
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
angular.json		angular.json
capacitor.config.ts		capacitor.config.ts
docker-compose.yml		docker-compose.yml
dockerfile		dockerfile
entrypoint.sh		entrypoint.sh
example.env		example.env
firebase.json		firebase.json
ionic.config.json		ionic.config.json
karma.conf.js		karma.conf.js
nginx-certbot.env		nginx-certbot.env
nginx-pod.yaml		nginx-pod.yaml
package-lock.json		package-lock.json
package.json		package.json
prod-test.sh		prod-test.sh
requirements.txt		requirements.txt
tsconfig.app.json		tsconfig.app.json
tsconfig.json		tsconfig.json
tsconfig.spec.json		tsconfig.spec.json
wsgi.py		wsgi.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image-Captioning-Project

📖 Project Overview

📍 Table of Contents

💻 Technologies Used

📚 Workflow

🔍 Site Overview

Home Page

Supported Languages

Camera Page

🤖 CI/CD

⬇️ Installation

💼 Usage

📝 Contributing

About

Releases

Packages

Contributors 4

Languages

License

LaloVene/Image-Captioning-Project

Folders and files

Latest commit

History

Repository files navigation

Image-Captioning-Project

📖 Project Overview

📍 Table of Contents

💻 Technologies Used

📚 Workflow

🔍 Site Overview

Home Page

Supported Languages

Camera Page

🤖 CI/CD

⬇️ Installation

💼 Usage

📝 Contributing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages