Skip to content

Latest commit

 

History

History
 
 

mask_rcnn_spot

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Mask-RCNN Spotter

This code repository contains the implementation of a simple Mask-RCNN based Text Spotter. Many advanced text spotters are built based on such framework, e.g.,

Preparing Dataset

Original images can be downloaded from: Total-Text , ICDAR2013 , ICDAR2015, ICDAR2017_MLT.

The formatted training datalists can be found in demo/text_spotting/datalist

Train On Your Own Dataset

1.Download the pre-trained model, which was well trained on SynthText and COCO-Text.

2.Modify the paths (ann_file, img_prefix, work_dir, etc..) in the config files.

3.Modify the paths in training scripting and run the following bash command in the command line

cd $DAVAR_LAB_OCR_ROOT$/demo/text_spotting/mask_rcnn_spot/
bash dist_train.sh

Notice:We provide the implementation of online validation. If you want to close it to save training time, you may modify the startup script to add --no-validate command.

Train From Scratch

If you want to re-implement the model's performance from scratch, please following these steps:

1.End-to-End pre-training using the SynthText and COCO-Text. See demo/text_spotting/mask_rcnn_spot/configs/mask_rcnn_r50_conv6_e2e_pretrain.py for more details.

2.Fine-tune model on the mixed real dataset (include:ICADR2013, ICDAR2015, ICDAR2017-MLT, Total-Text). See demo/text_spotting/mask_rcnn_spot/configs/mask_rcnn_r50_conv6_e2e_finetune_ic13.py for more details.

Notice:We provide the implementation of online validation, if you want to close it to save training time, you may modify the startup script to add --no-validate command.

Offline Inference and Evaluation

We provide a demo of forward inference and evaluation. You can modify the parameter (iou_constraint, lexicon_type, etc..) in the testing script, and start testing:

cd $DAVAR_LAB_OCR_ROOT$/demo/text_spotting/mask_rcnn_spot/tools/
bash test_ic13.sh

The offline evaluation tool can be found in davarocr/demo/text_spotting/evaluation/.

Visualization

We provide a script to visualize the intermediate output results of the model. You can modify the paths (test_dataset, config_file, etc..) in the script, and start generating visualization results:

cd $DAVAR_LAB_OCR_ROOT$/demo/text_spotting/mask_rcnn_spot/tools/
python vis.py

Some visualization results are shown:

./vis/img_225_text.jpg ./vis/img92_text.jpg

Trained Model Download

All of the models are re-implemented and well trained in the based on the opensourced framework mmdetection.

Note: The following trained model based on mask_rcnn_r50_fpn+res32+bilstm+attention uses only synthtext pre-training, and does not use random crop, color jitter, mix-train strategy, so the reported performance is slightly worse than that of mask_rcnn_r50_fpn+conv6+bilstm+attention.

Results on various datasets and trained models download:

Pipeline Pretrained-Dataset Links
mask_rcnn_r50_fpn+conv6+bilstm+attention SynthText
COCO-Text

cfg , pth (Access Code: ngPI)

mask_rcnn_r50_fpn+res32+bilstm+attention SynthText

cfg , pth (Access Code: QVYc)

Dataset Backbone Pretrained Finetune Test Scale End-to-End Word Spotting Links
General Weak Strong General Weak Strong
ICDAR2013 ResNet-50
Conv-6x
SynthText
COCO-Text
ICDAR2013
ICDAR2015
ICDAR2017_MLT
Total-Text
L-1440 82.1 85.6 86.1 85.6 89.9 90.5

cfg , pth (Access Code: Vum3)

ICDAR2013 ResNet-50
ResNet-32
SynthText ICDAR2013
ICDAR2015
ICDAR2017_MLT
Total-Text
L-1440 82.7 86.0 86.6 86.1 90.4 91.1

cfg , , pth (Access Code: Y266)

ICDAR2015 ResNet-50
Conv-6x
SynthText
COCO-Text
ICDAR2013
ICDAR2015
ICDAR2017_MLT
Total-Text
L-2000 66.3 75.3 78.4 66.7 78.1 81.7

cfg , pth (Access Code: Vum3)

ICDAR2015 ResNet-50
ResNet-32
SynthText ICDAR2013
ICDAR2015
ICDAR2017_MLT
Total-Text
L-2000 62.9 72.2 75.7 63.5 75.0 79.1

cfg , pth (Access Code: IdJA)

Dataset Backbone Pretrained Finetune Test Scale End-to-End Word Spotting Links
None Full None Full
Total-Text ResNet-50
Conv-6x
SynthText
COCO-Text
ICDAR2013
ICDAR2015
ICDAR2017_MLT
Total-Text
L-1350 63.6 72.2 66.1 76.5

cfg , pth (Access Code: Vum3)

Total-Text ResNet-50
ResNet-32
SynthText ICDAR2013
ICDAR2015
ICDAR2017_MLT
Total-Text
L-1350 62.8 71.5 65.2 75.8

cfg , pth (Access Code: CyB3)

Citation:

@inproceedings{He_2017,
  title={Mask R-CNN},
  author={He, Kaiming and Gkioxari, Georgia and Dollar, Piotr and Girshick, Ross},
  booktitle={2017 IEEE International Conference on Computer Vision (ICCV)},
  year={2017}
}

License

This project is released under the Apache 2.0 license

Copyright

If there is any suggestion and problem, please feel free to contact the author with [email protected] or [email protected].