Initial state in RNNs should not be learnable by default #807

tanhevg · 2019-07-09T19:39:54Z

Currently, the RNN cells are initialised as param's. (e.g. here and here). This causes the initial state to be modified during the backprop, which can in turn affect the model when reset! is called.

The default behaviour of the initial cell state should be for it to stay constant and not to be affected by the backprop. Having it learned, as per now, is still useful in some contexts, so this should stay as an option

The text was updated successfully, but these errors were encountered:

mkschleg · 2020-11-06T17:32:54Z

Has there been any updates on this? When I'm using RNNs with Flux.reset!, there are gradients like mentioned at the end of #808. My workaround is to just keep a copy of the initial hidden state around outside the recur cell and use that to reset the hidden state. (This is in Zygote).

CarloLucibello · 2020-12-27T08:49:47Z

@jeremiedb has this been fixed?

jeremiedb · 2020-12-28T23:30:34Z

Treating the initial state as learnable parameters is still the default behavior for RNN, nothing was changed in the latest PR.

My position on the subject is however that the initial state should continue to be treated as learnable parameters. It's debatable whether one case is more prevalent to the other, on my end, for NLP or time-series, learnable has been the desired case.
The more objective argument I would have, is that the CUDNN RNN handles initial states as learnable parameters, and as such I think it makes it an expected default behavior. Adding that ignoring the initial state as a learnable parameter is fairly trivial, I have some difficulty seeing how changing the current behavior brings improvement.

I could perhaps add a quick section in the docs about that initial state handling, given that I skipped discussing explicitly that question.

jeremiedb · 2020-12-29T05:49:19Z

Actually, an option to make learnable parameters a more first class citizen could be to use to same approach taken with the bias option for Dense layer. So we could add learnable_init=true option in the RNN cells for that purpose. It would result in the initial state to be set to Zeros, that is non-learnable zeros if learnable_init=false.

CarloLucibello · 2020-12-29T07:46:35Z

Unless the are some cases in which one needs a non-learnable non-zero initial state, it seems a good solution.
That and showing in the docs some examples along the lines of

# Non-trainable state0 for all RNN cell
trainable(m::RNNcell) = (m.Wi, m.Wh, m.b)

# Or in alternative
# exclude state0 from params only for a specific cell 
ps = Flux.params(m)
delete!(ps, m.state0)

should solve the problem

tanhevg mentioned this issue Jul 9, 2019

Do not learn RNN initial state by default #808

Closed

CarloLucibello closed this as completed Dec 29, 2020

CarloLucibello reopened this Dec 29, 2020

ToucheSir mentioned this issue Mar 11, 2022

Gradient dimension mismatch error when training rnns #1891

Closed

ToucheSir mentioned this issue Aug 26, 2022

Add a macro to opt-in to fancy printing, and to everything else #1932

Merged

3 tasks

mcabbott added the RNN label Oct 5, 2022

CarloLucibello mentioned this issue Oct 14, 2024

RNNs redesign #2500

Merged

13 tasks

CarloLucibello closed this as completed in #2500 Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial state in RNNs should not be learnable by default #807

Initial state in RNNs should not be learnable by default #807

tanhevg commented Jul 9, 2019

mkschleg commented Nov 6, 2020 •

edited

Loading

CarloLucibello commented Dec 27, 2020

jeremiedb commented Dec 28, 2020

jeremiedb commented Dec 29, 2020

CarloLucibello commented Dec 29, 2020

Initial state in RNNs should not be learnable by default #807

Initial state in RNNs should not be learnable by default #807

Comments

tanhevg commented Jul 9, 2019

mkschleg commented Nov 6, 2020 • edited Loading

CarloLucibello commented Dec 27, 2020

jeremiedb commented Dec 28, 2020

jeremiedb commented Dec 29, 2020

CarloLucibello commented Dec 29, 2020

mkschleg commented Nov 6, 2020 •

edited

Loading