This implementation adds useful features on bert classification:
- Multi-label
- Focal loss weighting
- Auto cross-label data synthesis
- Adding exclude loss part among specific labels
- Upsampling
- Robust mean over all positive or negative loss
- Generating very fast inference-time model
N.B. I deleted the efficient model service part (about 500 qps per 1080ti gpu).