You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Can you please explain, what is the purpose of including the <UNK> tokens in the owords vector produced by skipgram function? What should model learn by using these as training examples?
Also, what is the purpose of variable ws in train function, if it's not used anywhere after its definition?
The text was updated successfully, but these errors were encountered:
<UNK> means unknown token. This should be defined in the preprocessing step. ws means word score, and it is calculated heuristically, so you can use it if you want, or not.
Hi, yes, I understand what <UNK> means. What I don't understand is why these dummy tokens (btw, I suppose they probably were meant to represent padding and not unknown words) are included in the output of skipgram function. I checked the original implementation of Word2Vec and I can't see there this step of including padding nor unknown tokens. What information should model learn from them being included as training examples?
Hi,
Can you please explain, what is the purpose of including the
<UNK>
tokens in theowords
vector produced byskipgram
function? What should model learn by using these as training examples?Also, what is the purpose of variable
ws
intrain
function, if it's not used anywhere after its definition?The text was updated successfully, but these errors were encountered: