ICDAR 2019 Worksheets for Tutorial

These are worksheets for the ICDAR 2019 Tutorial on Deep Learning for Document Analysis. They implement data generation and full training for text recognition (OCR) and document segmentation using a variety of common approaches, including convolution, U-net, LSTM, 2D LSTM, transpose convolutions, and upscaling.

Please see the individual worksheets for details. The models themselves are found in ocrlib/ocrmodels.py.

Note that the worksheets checked into the repository are only partially trained (using run-all). That is, they are only trained enough to make sure that the code works upon checkin. For good models, train for at least 10 epochs or more.

Running the Code

You can run the Jupyter server with ./run-jupyter and then connect to the server at http://localhost:9888. This will build a Docker container and then execute it as you in the current directory.

Data

You can download the training data used with these notebooks by running `run-download.

You can generate OCR data using the word-image-generation.ipynb notebook. You need to have a basic set of TrueType fonts installed for that.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
data		data
models		models
ocrlib		ocrlib
.gitattributes		.gitattributes
Dockerfile		Dockerfile
README.md		README.md
conv-only.ipynb		conv-only.ipynb
conv-resnet.ipynb		conv-resnet.ipynb
lstm-keep.ipynb		lstm-keep.ipynb
lstm-normalized.ipynb		lstm-normalized.ipynb
lstm-pyr.ipynb		lstm-pyr.ipynb
lstm-resnet.ipynb		lstm-resnet.ipynb
lstm-transposed.ipynb		lstm-transposed.ipynb
lstm2-ctc-words.ipynb		lstm2-ctc-words.ipynb
run-all		run-all
run-cmd		run-cmd
run-download		run-download
run-jupyter		run-jupyter
seg-lstm.ipynb		seg-lstm.ipynb
seg-simple.ipynb		seg-simple.ipynb
seg-unet.ipynb		seg-unet.ipynb
startup.py		startup.py
whole-page-segmentation.ipynb		whole-page-segmentation.ipynb
word-image-generation.ipynb		word-image-generation.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ICDAR 2019 Worksheets for Tutorial

Running the Code

Data

About

Releases

Packages

Languages

prasiyer/icdar2019-worksheets

Folders and files

Latest commit

History

Repository files navigation

ICDAR 2019 Worksheets for Tutorial

Running the Code

Data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages