This is the pytorch version of Bert-LWAN which is from paper An Empirical Study on Large-Scale Multi-Label Text Classification Including Few and Zero-Shot Labels
The model architecture is referrenced on CAML which is from paper Explainable Prediction of Medical Codes from ClinicalText