RNNs redesign #2500

CarloLucibello · 2024-10-14T21:52:31Z

A complete rework of our recurrent layers, making them more similar to their pytorch counterpart.
This is in line with the proposal in #1365 and should allow to hook into the cuDNN machinery (future PR).
Hopefully, this ends the infinite source of troubles that the recurrent layers have been.

Recur is no more. Mutating its internal state was a source of problems for AD (explicit differentiation for RNN gives wrong results #2185)
Now RNNCell is exported and takes care of the minimal recursion step, i.e. a single time:
- has forward cell(x , h)
- x can be of size in or in x batch_size
- h can be of size out or out x batch_size
- returns hnew of size out or out x batch_size
RNN instead takes in a (batched) sequence and a (batched) hidden state and returns the hidden state for the whole sequence:
- has forward rnn(x, h)
- x can be of size in x len or in x len x batch_size
- h can be of size out or out x batch_size
- returns hnew of size out x len or out x len x batch_size
LSTM and GRU are similarly changed.

Close #2185, close #2341, close #2258, close #1547, close #807, close #1329

Related to #1678

PR Checklist

darsnack · 2024-10-18T13:42:25Z

Fully agree with updating the design to be non-mutating. There are two options we've discussed in the past:

y, h = cell(x, h) like here (I guess this PR removes y as a return value which is fine)
y, cell = cell(x) / y, cell = Flux.apply(cell, x)

Option 1 is outlined in this PR so I won't say anything about it.

Option 2 is a more drastic redesign to make all layers (not just recurrent) non-mutating. Why?

Do a design that covers stateful layers in general (e.g. norm layers) and not just recurrent cells
Keep a nice feature of Flux's current design which is that the model contains all info: parameters, state, flags, etc.

CarloLucibello · 2024-10-20T17:34:49Z

I thought about Option 2. On the upside, it seems a nice intermediate spot between current Flux and Lux. The downside is that the interface would seem a bit exotic to flux and pytorch users. Moreover, it would be problematic for normalization layers.

Also, we need to distinguish between normalization layers and recurrent layers.

Normalization layer at training time update some internal buffers, within a stopgrad barrier. The buffer update has no influence on the output of the layer and the final loss. You typically apply the layer only once during the forward pass. Normalization layers are typically part of larger models (chains or custom structs). Therefore for normalization layers: 1) we haven't had the gradient computation problems we had for recurrent layers; 2) you want the layer with the updated buffer to be inserted back in your model, but this would require a mutating operation or returning a new model.
For recurrent layers, Option 2 would be sensible, but is it worth it? Once you adopt the perspective that a cell takes two inputs, x and h, and gives back an output, hnew, all problems disappear. I think we add complexity for no gain in trying to keep the state internal.

ToucheSir · 2024-10-20T21:53:21Z

The main benefit for keeping the state "internal" or having it be part of a unified interface like apply would be that Chain works with RNNs again. Whether that's worth the extra complexity is the question. Given our priorities, I think it's best left as future work.

finish RNNCell RNN rework LSTMCell LSTM more work gru extended testing runtests add tests finish RNNCell RNN rework LSTMCell LSTM more work gru extended testing reset! deprecation fix test unbreak l2 test fix tests fixes

codecov · 2024-10-22T12:39:58Z

Codecov Report

Attention: Patch coverage is 76.92308% with 30 lines in your changes missing coverage. Please review.

Project coverage is 34.93%. Comparing base (c9bab66) to head (76cf275).
Report is 10 commits behind head on master.

Files with missing lines	Patch %	Lines
src/layers/recurrent.jl	78.74%	27 Missing ⚠️
src/deprecations.jl	0.00%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2500      +/-   ##
==========================================
+ Coverage   33.46%   34.93%   +1.46%     
==========================================
  Files          31       31              
  Lines        1829     1878      +49     
==========================================
+ Hits          612      656      +44     
- Misses       1217     1222       +5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

CarloLucibello · 2024-10-27T08:52:42Z

I think this is ready.

CarloLucibello · 2024-11-03T10:55:56Z

I will merge this tomorrow if there are no further comments or objections.

CarloLucibello added breaking RNN labels Oct 14, 2024

CarloLucibello added this to the v0.15 milestone Oct 14, 2024

CarloLucibello changed the title ~~Cl/rnn~~ RNN redesign Oct 14, 2024

MartinuzziFrancesco mentioned this pull request Oct 16, 2024

New RNN design in Flux MartinuzziFrancesco/RecurrentLayers.jl#1

Closed

CarloLucibello force-pushed the cl/rnn branch from 8abc593 to aeb421b Compare October 17, 2024 10:06

CarloLucibello changed the title ~~RNN redesign~~ RNNs redesign Oct 17, 2024

CarloLucibello force-pushed the cl/rnn branch from 6f35f2d to 834bed3 Compare October 20, 2024 17:53

CarloLucibello force-pushed the cl/rnn branch 3 times, most recently from cf56985 to 73dae52 Compare October 21, 2024 23:36

CarloLucibello added 4 commits October 22, 2024 14:11

add tests

982cc0d

finish RNNCell RNN rework LSTMCell LSTM more work gru extended testing runtests add tests finish RNNCell RNN rework LSTMCell LSTM more work gru extended testing reset! deprecation fix test unbreak l2 test fix tests fixes

news

88a4ba4

rewrite docs

5629e17

fix docs

76cf275

CarloLucibello force-pushed the cl/rnn branch from aa0655d to 76cf275 Compare October 22, 2024 12:12

CarloLucibello merged commit a015e5a into master Nov 4, 2024
9 of 11 checks passed

This was referenced Dec 7, 2024

changes for Flux v0.15 JuliaGraphs/GraphNeuralNetworks.jl#550

Merged

hotfix LSTM ouput #2547

Merged

CarloLucibello mentioned this pull request Dec 11, 2024

cell output is not clearly distinguishable from the state #2548

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RNNs redesign #2500

RNNs redesign #2500

CarloLucibello commented Oct 14, 2024 •

edited

Loading

darsnack commented Oct 18, 2024 •

edited

Loading

CarloLucibello commented Oct 20, 2024

ToucheSir commented Oct 20, 2024

codecov bot commented Oct 22, 2024 •

edited

Loading

CarloLucibello commented Oct 27, 2024

CarloLucibello commented Nov 3, 2024

RNNs redesign #2500

RNNs redesign #2500

Conversation

CarloLucibello commented Oct 14, 2024 • edited Loading

PR Checklist

darsnack commented Oct 18, 2024 • edited Loading

CarloLucibello commented Oct 20, 2024

ToucheSir commented Oct 20, 2024

codecov bot commented Oct 22, 2024 • edited Loading

Codecov Report

CarloLucibello commented Oct 27, 2024

CarloLucibello commented Nov 3, 2024

CarloLucibello commented Oct 14, 2024 •

edited

Loading

darsnack commented Oct 18, 2024 •

edited

Loading

codecov bot commented Oct 22, 2024 •

edited

Loading