-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The problem about Hard Shrinkage operation #10
Comments
Hi sjp611, |
I suppose adding 1 to all 0s in the weight w is more suitable so that the entropy loss becomes 0*log(0+1) for every 0 in the weight w. Do you think so? |
Hi, I met another problem when I tried to train the model. I set the Thank you in advance. |
Sorry I didn't respond at the time. I don't think it is a suitable solution because p is used as a probability distribution (it mightn't be a real distribution though). So add 1 to it could break the aim of p. |
The entropy loss is zero means that the memory items are pointing the one-hot vector. This means that the memory items are sparsity. You can check if the model works correctly by checking the min-max and the argmax value of attention weights. |
nah. Adding ''1' to all zero items will yield an issue at backward when training. |
Yes, I check the attention weights before the hard shrink operation. And I found that after softmax, the attention values are pretty much the same along with the memory slot dimension (i.e. the hyperparameter |
You should try training without the entropy loss and look if your model learn something without that constraint. If that is the case, you could try a threshold smaller than 1/N. If not, I think it may be some bug in your model or data processing. |
In this paper, we can get 0 value by ReLU Activation in the hard shrinkage operation (Equation 7).
The result is used in Equation 9 to minimize entropy of w^.
When we minimize entropy(Eq 9), 0 value can be used at logarithm.
The result of log(0) is -inf.
How did you solve this problem?
The text was updated successfully, but these errors were encountered: