An end-to-end implementation of deep learn pipeline. From data preparation to training to production.
In training directory, there is a notebook which has been tested to work on colab. You will require google cloud account to cache processed data because doing OCR takes a lot of time so it is stored on google cloud storage.
Follow the notebook, it will prepare Tobacco-3482
dataset for training. Notebook will split data into
training and validation dataset as follow:
After training, you will see the results like this:
After training, model is exported to google cloud storage. After downloading model, extract that in classifier model directory.
If you want to try out the demo without training, you can download pre-trained model from above notebook here. Extract the model in classifier/model directory and follow the steps below.
To run the application, you will need docker and docker-compose. Clone this repo in some directory and cd into that directory. Run the following command:
docker-compose up
This command will build the required containers and configure and run those containers locally. After initialization go to this address and you will see a screen like this:
- Click browse and select some document image
- Click classify and you will see it added to processing list
- After processing it will show the class and confidence
Only following document classes are supported:
Email
Form
ADVE
Report
Scientific
News
Letter
Resume
Memo
Note