Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

finetune on GLUE task ends up with same probabality #332

Open
TingchenFu opened this issue Apr 5, 2021 · 2 comments
Open

finetune on GLUE task ends up with same probabality #332

TingchenFu opened this issue Apr 5, 2021 · 2 comments

Comments

@TingchenFu
Copy link

Hi guys,
first of all Thanks to your great model! I finetuned the pretrained model named mlm_tlm_xnli15_1024.pth for MNLI-m task(to be specific, two class classification task). although seting the hyperparams as recommended:
python glue-xnli.py --exp_name test_xnli_mlm_tlm # experiment name --dump_path ./dumped/ # where to store the experiment --model_path mlm_tlm_xnli15_1024.pth # model location --data_path ./data/processed/XLM15 # data location --transfer_tasks XNLI,SST-2 # transfer tasks (XNLI or GLUE tasks) --optimizer_e adam,lr=0.000025 # optimizer of projection (lr \in [0.000005, 0.000025, 0.000125]) --optimizer_p adam,lr=0.000025 # optimizer of projection (lr \in [0.000005, 0.000025, 0.000125]) --finetune_layers "0:_1" # fine-tune all layers --batch_size 8 # batch size (\in [4, 8]) --n_epochs 250 # number of epochs --epoch_size 20000 # number of sentences per epoch --max_len 256 # max number of words in sentences --max_vocab 95000 # max number of words in vocab
after several epochs I got EXACTLY same propabality output for all the valid cases:
-0.27187905 -0.27174124 -0.27346167 -0.27336964 -0.27150354 -0.27345833 -0.2712339 -0.2730249 -0.2720655 -0.2718483
the number is the probabality of being classified as positive case given by the model.
could any one tell me what happened and is there any possible solution for that?

@TingchenFu
Copy link
Author

I found that after the first embedding layer in TransformerModel.fwd

tensor = self.embeddings(x)

tensor is same for all the different cases. self.embedding is defined as :

self.embeddings = Embedding(self.n_words, self.dim, padding_idx=self.pad_index)

where self.n_words=95000 and self.dim=1024 as in the pretrained_params. Is there anything wrong?

@TingchenFu
Copy link
Author

the train log is here:https://paste.ubuntu.com/p/SbDw33JPjN/
and the complete probability result of valid dataset is here: https://paste.ubuntu.com/p/xXD9FfGdcT/

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant