why language_model.py has different vectors #91

zysNLP · 2021-06-27T06:57:53Z

In language_model.py codes, class NextSentencePrediction and class MaskedLanguageModel has different input x in their forward function, In class NextSentencePrediction the input use "x[: 0] in "return self.softmax(self.linear(x[:, 0]))" but in class MaskedLanguageModel the input use x in "return self.softmax(self.linear(x))", I think if there is something wrong?

As I debug to this, both x in their has shape of (batch_size, seq_len, embedding_dim) like (64, 50, 256), we known this mains I have 64 sentences and each sentences has 50 words and each word is a 256 dim of vector. But the x[:, 0] means I take the first word in every 64 sentences so lead x[:, 0] to has shape of (64, 256). I don't understand why the task NextSentencePrediction should use this kind of input, can someone help me to explain this?

boykis82 · 2021-07-14T02:10:45Z

x[:, 0] includes all semantics of x[:,0:50] because of self attention.
You can use x[:,0] or x[:,1] or sum(x, axis=1) or mean(x, axis=1) ... whatever you wants.
But in my experience, there are no performance difference.
It's enough to use only x[:,0] when you train classification task.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why language_model.py has different vectors #91

why language_model.py has different vectors #91

zysNLP commented Jun 27, 2021 •

edited

Loading

boykis82 commented Jul 14, 2021

why language_model.py has different vectors #91

why language_model.py has different vectors #91

Comments

zysNLP commented Jun 27, 2021 • edited Loading

boykis82 commented Jul 14, 2021

zysNLP commented Jun 27, 2021 •

edited

Loading