Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The problem about Hard Shrinkage operation #10

Open
sjp611 opened this issue Dec 31, 2019 · 8 comments
Open

The problem about Hard Shrinkage operation #10

sjp611 opened this issue Dec 31, 2019 · 8 comments

Comments

@sjp611
Copy link

sjp611 commented Dec 31, 2019

In this paper, we can get 0 value by ReLU Activation in the hard shrinkage operation (Equation 7).
The result is used in Equation 9 to minimize entropy of w^.
When we minimize entropy(Eq 9), 0 value can be used at logarithm.
The result of log(0) is -inf.

How did you solve this problem?

@sjp611 sjp611 changed the title The problem on Hard Shrinkage operation The problem about Hard Shrinkage operation Dec 31, 2019
@fluowhy
Copy link

fluowhy commented Feb 3, 2020

Hi sjp611,
a practical solution is adding a small constant (1e-10) to the logarithm argument: log(p + 1e-10). It is not the best option but at least is numerically stable.

@Zk-soda
Copy link

Zk-soda commented Feb 19, 2020

Hi sjp611,
a practical solution is adding a small constant (1e-10) to the logarithm argument: log(p + 1e-10). It is not the best option but at least is numerically stable.

I suppose adding 1 to all 0s in the weight w is more suitable so that the entropy loss becomes 0*log(0+1) for every 0 in the weight w. Do you think so?

@LiUzHiAn
Copy link

LiUzHiAn commented Jul 9, 2020

Hi,

I met another problem when I tried to train the model. I set the mem_dim = 2k and reset the params of memory as per the given code. It turns out that the entropy loss always is ZERO. Any ideas to fix this?

Thank you in advance.

@fluowhy
Copy link

fluowhy commented Jul 9, 2020

Hi sjp611,
a practical solution is adding a small constant (1e-10) to the logarithm argument: log(p + 1e-10). It is not the best option but at least is numerically stable.

I suppose adding 1 to all 0s in the weight w is more suitable so that the entropy loss becomes 0*log(0+1) for every 0 in the weight w. Do you think so?

Sorry I didn't respond at the time. I don't think it is a suitable solution because p is used as a probability distribution (it mightn't be a real distribution though). So add 1 to it could break the aim of p.

@sjp611
Copy link
Author

sjp611 commented Jul 10, 2020

Hi,

I met another problem when I tried to train the model. I set the mem_dim = 2k and reset the params of memory as per the given code. It turns out that the entropy loss always is ZERO. Any ideas to fix this?

Thank you in advance.

The entropy loss is zero means that the memory items are pointing the one-hot vector. This means that the memory items are sparsity. You can check if the model works correctly by checking the min-max and the argmax value of attention weights.

@Wolfybox
Copy link

Hi sjp611,
a practical solution is adding a small constant (1e-10) to the logarithm argument: log(p + 1e-10). It is not the best option but at least is numerically stable.

I suppose adding 1 to all 0s in the weight w is more suitable so that the entropy loss becomes 0*log(0+1) for every 0 in the weight w. Do you think so?

nah. Adding ''1' to all zero items will yield an issue at backward when training.

@LiUzHiAn
Copy link

Hi,
I met another problem when I tried to train the model. I set the mem_dim = 2k and reset the params of memory as per the given code. It turns out that the entropy loss always is ZERO. Any ideas to fix this?
Thank you in advance.

The entropy loss is zero means that the memory items are pointing the one-hot vector. This means that the memory items are sparsity. You can check if the model works correctly by checking the min-max and the argmax value of attention weights.

Yes, I check the attention weights before the hard shrink operation. And I found that after softmax, the attention values are pretty much the same along with the memory slot dimension (i.e. the hyperparameter N in the paper, say, 2K). No matter how the number of the memory slot varies, cases are the same. And these same values are always less than the shrink_threshold if I set the shrink_threshold as a value in the interval [1/N,3/N]. Hence, entropy loss will be ZERO in the end.

@fluowhy
Copy link

fluowhy commented Jul 22, 2020

Hi,
I met another problem when I tried to train the model. I set the mem_dim = 2k and reset the params of memory as per the given code. It turns out that the entropy loss always is ZERO. Any ideas to fix this?
Thank you in advance.

The entropy loss is zero means that the memory items are pointing the one-hot vector. This means that the memory items are sparsity. You can check if the model works correctly by checking the min-max and the argmax value of attention weights.

Yes, I check the attention weights before the hard shrink operation. And I found that after softmax, the attention values are pretty much the same along with the memory slot dimension (i.e. the hyperparameter N in the paper, say, 2K). No matter how the number of the memory slot varies, cases are the same. And these same values are always less than the shrink_threshold if I set the shrink_threshold as a value in the interval [1/N,3/N]. Hence, entropy loss will be ZERO in the end.

You should try training without the entropy loss and look if your model learn something without that constraint. If that is the case, you could try a threshold smaller than 1/N. If not, I think it may be some bug in your model or data processing.
It is really difficult to know exactly what your problem is, but I recommend you to check http://karpathy.github.io/2019/04/25/recipe/ . Please do not consider it a recipe but a source of empirical tips and tricks to debug your model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants