-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds constant memory backprop #2
base: master
Are you sure you want to change the base?
Conversation
…er, in some tests, the computed gradient is NaN for AffineCoupling, and this bug remains even if O(n) gradient calculations are used.
Great job, the code looks very promising. Unfortunately I'm a bit sleep deprived today (so I'll focus on smaller refactoring issues today), and I'm busy tomorrow, but I'll probably have all Sunday to merge the code. I try to summarize the changes below to make sure I understand them. Please read my summary and highlight any potential misunderstandings on my behalf. Overall. There are changes in five files. The main merging work will be with respect to the files containing Generator and Layers, since the code for data loader and unit tests doesn't really interact with previous code. Generator class: Previously, the Generator inherited the functions Merging these changes seems fairly straight forward, since the previous Generator class had no functionality. I understood all todo with one exception. On line 232 at I really appreciated all other TODOs, they were very easy to read and understand, I believe I'll be able to implement all of them. Layers: There is a new virtual class called LayerWithGrads which all layers should inherit from. It implements I imagine the biggest difficulty is with respect to the Coupling Layers, especially getting them to work with the multi-scale architecture simultaneously. I will probably be able to get everything except that to work on Sunday. |
Out of curiosity. Did you find any way of benchmarking memory usage on GPU? The O(\sqrt(L)) gradient checkpointing was implemented by OpenAI and they have a very nice graph showing the memory consumption as the number of residual blocks increase. When I finish all the merging, I'll try to figure out how they measured memory consumption, and try to make a similar plot which also shows the increase in computation time. |
Couldn't have summarized it better myself! |
Does this model have
What is the error exactly, because the unit tests involving these layers were passing on my system.
I am not sure what is causing this, it maybe that the We should probably have a discussion about these issues tomorrow. |
The architecture is changed a bit, adding a new abstract class for layers to inherit from.Some changes are also made in the
Generator
class. Unit tests for gradient computations are added, and a function to make data generator from images is added.However, there seems to be a small bug in gradient computations involving
AffineCoupling
layer, which requires some math to be figured out.