-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of algorithm one from the paper #8
base: master
Are you sure you want to change the base?
Conversation
src/omniglot/wrapper.py
Outdated
# This line breakes gradient computation for now | ||
# meta_layers required_grad properties are set to False if | ||
# we call init_adaptation | ||
# self.model.init_adaptation() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling self.model.init_adaptation()
produces error when calling backward()
at the end of each meta_batch
since it sets meta_layer's require_grad properties to False
. This may need us to freeze/unfreeze meta_layers in a more controlled way.
src/omniglot/wrapper.py
Outdated
if meta_train: | ||
# at the end of collection for K steps N tasks we do the backward | ||
# pass. | ||
backward(self.meta_loss, self.model.meta_parameters( | ||
include_init=False)) | ||
self._final_meta_update() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we collect k
times inner iteration for N
tasks we can call backward pass calculate gradients.
src/omniglot/wrapper.py
Outdated
if meta_train: | ||
opt = SGD(self.model.optimizer_parameter_groups(tensor=True)) | ||
opt.zero_grad() | ||
outer_input, outer_target = next(iter(batches)) | ||
l_outer, (l_inner, a1, a2) = step( | ||
criterion=self.criterion, | ||
x_inner=inner_input, x_outer=outer_input, | ||
y_inner=inner_target, y_outer=outer_target, | ||
model=self.model, | ||
optimizer=opt, scorer=None) | ||
self.meta_loss = self.meta_loss + l_outer | ||
del l_inner, a1, a2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These lines are calculating outer_loss
at each state of model parameter \theta_{k}^{\tau}. However, I am not sure about how should we handle freezing and unfreezing meta and adaptation layers.
According to pseudocode, gradients of \theta_{0} must be collected using \theta_{0: k}^{\tau}. How should we implement it correctly?
# init_objective = INIT_OBJECTIVES[self.init_objective] | ||
# init_objective(model.named_init_parameters(suffix=None), | ||
# params, self.norm, self.bsz, step_fn) | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have commented out initialization objective for now. Should we also use leap based initialization for online learning?
This PR is the initial effort for implementing Algorithm one for online learning using Warpgrad. I started analysing the implementation of algorithm 2. Since online learning algorithm does not require to store datapoints and model states in the buffer, I have reused step function from
warpgrad.utils
inside inner training loop.Summary of changes:
warpgrad.utils
leap
based initialization should be applied also for online learning.step
function is called insiderun_batches
function of the wrapper class for eachk
times of inner update.meta_loss
property of wrapper class.