The problem about Hard Shrinkage operation #10

sjp611 · 2019-12-31T10:40:49Z

In this paper, we can get 0 value by ReLU Activation in the hard shrinkage operation (Equation 7).
The result is used in Equation 9 to minimize entropy of w^.
When we minimize entropy(Eq 9), 0 value can be used at logarithm.
The result of log(0) is -inf.

How did you solve this problem?

fluowhy · 2020-02-03T17:58:28Z

Hi sjp611,
a practical solution is adding a small constant (1e-10) to the logarithm argument: log(p + 1e-10). It is not the best option but at least is numerically stable.

Zk-soda · 2020-02-19T07:51:35Z

Hi sjp611,
a practical solution is adding a small constant (1e-10) to the logarithm argument: log(p + 1e-10). It is not the best option but at least is numerically stable.

I suppose adding 1 to all 0s in the weight w is more suitable so that the entropy loss becomes 0*log(0+1) for every 0 in the weight w. Do you think so?

LiUzHiAn · 2020-07-09T14:23:05Z

Hi,

I met another problem when I tried to train the model. I set the mem_dim = 2k and reset the params of memory as per the given code. It turns out that the entropy loss always is ZERO. Any ideas to fix this?

Thank you in advance.

fluowhy · 2020-07-09T14:39:53Z

Hi sjp611,
a practical solution is adding a small constant (1e-10) to the logarithm argument: log(p + 1e-10). It is not the best option but at least is numerically stable.

I suppose adding 1 to all 0s in the weight w is more suitable so that the entropy loss becomes 0*log(0+1) for every 0 in the weight w. Do you think so?

Sorry I didn't respond at the time. I don't think it is a suitable solution because p is used as a probability distribution (it mightn't be a real distribution though). So add 1 to it could break the aim of p.

sjp611 · 2020-07-10T07:58:22Z

Hi,

I met another problem when I tried to train the model. I set the mem_dim = 2k and reset the params of memory as per the given code. It turns out that the entropy loss always is ZERO. Any ideas to fix this?

Thank you in advance.

The entropy loss is zero means that the memory items are pointing the one-hot vector. This means that the memory items are sparsity. You can check if the model works correctly by checking the min-max and the argmax value of attention weights.

Wolfybox · 2020-07-18T01:39:08Z

Hi sjp611,
a practical solution is adding a small constant (1e-10) to the logarithm argument: log(p + 1e-10). It is not the best option but at least is numerically stable.

I suppose adding 1 to all 0s in the weight w is more suitable so that the entropy loss becomes 0*log(0+1) for every 0 in the weight w. Do you think so?

nah. Adding ''1' to all zero items will yield an issue at backward when training.

LiUzHiAn · 2020-07-22T07:36:21Z

Hi,
I met another problem when I tried to train the model. I set the mem_dim = 2k and reset the params of memory as per the given code. It turns out that the entropy loss always is ZERO. Any ideas to fix this?
Thank you in advance.

The entropy loss is zero means that the memory items are pointing the one-hot vector. This means that the memory items are sparsity. You can check if the model works correctly by checking the min-max and the argmax value of attention weights.

Yes, I check the attention weights before the hard shrink operation. And I found that after softmax, the attention values are pretty much the same along with the memory slot dimension (i.e. the hyperparameter N in the paper, say, 2K). No matter how the number of the memory slot varies, cases are the same. And these same values are always less than the shrink_threshold if I set the shrink_threshold as a value in the interval [1/N,3/N]. Hence, entropy loss will be ZERO in the end.

fluowhy · 2020-07-22T14:50:01Z

Hi,
I met another problem when I tried to train the model. I set the mem_dim = 2k and reset the params of memory as per the given code. It turns out that the entropy loss always is ZERO. Any ideas to fix this?
Thank you in advance.

The entropy loss is zero means that the memory items are pointing the one-hot vector. This means that the memory items are sparsity. You can check if the model works correctly by checking the min-max and the argmax value of attention weights.

Yes, I check the attention weights before the hard shrink operation. And I found that after softmax, the attention values are pretty much the same along with the memory slot dimension (i.e. the hyperparameter N in the paper, say, 2K). No matter how the number of the memory slot varies, cases are the same. And these same values are always less than the shrink_threshold if I set the shrink_threshold as a value in the interval [1/N,3/N]. Hence, entropy loss will be ZERO in the end.

You should try training without the entropy loss and look if your model learn something without that constraint. If that is the case, you could try a threshold smaller than 1/N. If not, I think it may be some bug in your model or data processing.
It is really difficult to know exactly what your problem is, but I recommend you to check http://karpathy.github.io/2019/04/25/recipe/ . Please do not consider it a recipe but a source of empirical tips and tricks to debug your model.

sjp611 changed the title ~~The problem on Hard Shrinkage operation~~ The problem about Hard Shrinkage operation Dec 31, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The problem about Hard Shrinkage operation #10

The problem about Hard Shrinkage operation #10

sjp611 commented Dec 31, 2019

fluowhy commented Feb 3, 2020

Zk-soda commented Feb 19, 2020

LiUzHiAn commented Jul 9, 2020

fluowhy commented Jul 9, 2020

sjp611 commented Jul 10, 2020

Wolfybox commented Jul 18, 2020

LiUzHiAn commented Jul 22, 2020

fluowhy commented Jul 22, 2020

The problem about Hard Shrinkage operation #10

The problem about Hard Shrinkage operation #10

Comments

sjp611 commented Dec 31, 2019

fluowhy commented Feb 3, 2020

Zk-soda commented Feb 19, 2020

LiUzHiAn commented Jul 9, 2020

fluowhy commented Jul 9, 2020

sjp611 commented Jul 10, 2020

Wolfybox commented Jul 18, 2020

LiUzHiAn commented Jul 22, 2020

fluowhy commented Jul 22, 2020