You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In language_model.py codes, class NextSentencePrediction and class MaskedLanguageModel has different input x in their forward function, In class NextSentencePrediction the input use "x[: 0] in "return self.softmax(self.linear(x[:, 0]))" but in class MaskedLanguageModel the input use x in "return self.softmax(self.linear(x))", I think if there is something wrong?
As I debug to this, both x in their has shape of (batch_size, seq_len, embedding_dim) like (64, 50, 256), we known this mains I have 64 sentences and each sentences has 50 words and each word is a 256 dim of vector. But the x[:, 0] means I take the first word in every 64 sentences so lead x[:, 0] to has shape of (64, 256). I don't understand why the task NextSentencePrediction should use this kind of input, can someone help me to explain this?
The text was updated successfully, but these errors were encountered:
x[:, 0] includes all semantics of x[:,0:50] because of self attention.
You can use x[:,0] or x[:,1] or sum(x, axis=1) or mean(x, axis=1) ... whatever you wants.
But in my experience, there are no performance difference.
It's enough to use only x[:,0] when you train classification task.
In language_model.py codes, class NextSentencePrediction and class MaskedLanguageModel has different input x in their forward function, In class NextSentencePrediction the input use "x[: 0] in "return self.softmax(self.linear(x[:, 0]))" but in class MaskedLanguageModel the input use x in "return self.softmax(self.linear(x))", I think if there is something wrong?
As I debug to this, both x in their has shape of (batch_size, seq_len, embedding_dim) like (64, 50, 256), we known this mains I have 64 sentences and each sentences has 50 words and each word is a 256 dim of vector. But the x[:, 0] means I take the first word in every 64 sentences so lead x[:, 0] to has shape of (64, 256). I don't understand why the task NextSentencePrediction should use this kind of input, can someone help me to explain this?
The text was updated successfully, but these errors were encountered: