Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of algorithm one from the paper #8

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

rcmalli
Copy link

@rcmalli rcmalli commented Feb 17, 2021

This PR is the initial effort for implementing Algorithm one for online learning using Warpgrad. I started analysing the implementation of algorithm 2. Since online learning algorithm does not require to store datapoints and model states in the buffer, I have reused step function from warpgrad.utils inside inner training loop.

Summary of changes:

  • New wrapper for online algorithm added. This reuses functions from warpgrad.utils
  • Simple updater class is added. However, it works only as placeholder and does nothing in the backward pass call. I am not sure if leap based initialization should be applied also for online learning.
  • step function is called inside run_batches function of the wrapper class for each k times of inner update.
  • Generated losses are accumulated using meta_loss property of wrapper class.

# This line breakes gradient computation for now
# meta_layers required_grad properties are set to False if
# we call init_adaptation
# self.model.init_adaptation()
Copy link
Author

@rcmalli rcmalli Feb 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling self.model.init_adaptation() produces error when calling backward() at the end of each meta_batch since it sets meta_layer's require_grad properties to False. This may need us to freeze/unfreeze meta_layers in a more controlled way.

Comment on lines 236 to 241
if meta_train:
# at the end of collection for K steps N tasks we do the backward
# pass.
backward(self.meta_loss, self.model.meta_parameters(
include_init=False))
self._final_meta_update()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we collect k times inner iteration for N tasks we can call backward pass calculate gradients.

Comment on lines 297 to 308
if meta_train:
opt = SGD(self.model.optimizer_parameter_groups(tensor=True))
opt.zero_grad()
outer_input, outer_target = next(iter(batches))
l_outer, (l_inner, a1, a2) = step(
criterion=self.criterion,
x_inner=inner_input, x_outer=outer_input,
y_inner=inner_target, y_outer=outer_target,
model=self.model,
optimizer=opt, scorer=None)
self.meta_loss = self.meta_loss + l_outer
del l_inner, a1, a2
Copy link
Author

@rcmalli rcmalli Feb 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines are calculating outer_loss at each state of model parameter \theta_{k}^{\tau}. However, I am not sure about how should we handle freezing and unfreezing meta and adaptation layers.

According to pseudocode, gradients of \theta_{0} must be collected using \theta_{0: k}^{\tau}. How should we implement it correctly?

Comment on lines +59 to +62
# init_objective = INIT_OBJECTIVES[self.init_objective]
# init_objective(model.named_init_parameters(suffix=None),
# params, self.norm, self.bsz, step_fn)
pass
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have commented out initialization objective for now. Should we also use leap based initialization for online learning?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant