You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
WSD Scheduler should calculate stable steps in trainer.py. And if num_warmup_steps is provided in kwargs, schedule_func should respect the kwargs.
My guess is that the intention is it to decay till min and stay there till the end of training, but min_lr_ratio is set to the default of 0, wouldn't the learning rate be always 0? Would like to have some insights on this if possible.
I want to run WSD scheduler for my training, but I do not want to have to calculate the stable steps.
Your contribution
I can contribute to this, but I would like to better understand the edge cases or possible scenarios I might have missed out from the maintainers. However, here is my current workaround:
def get_wsd_schedule(
+ num_training_steps: int = 0,
):
...
assert num_stable_steps or num_training_steps, "One of either stable steps or training steps must be provided"
if not num_stable_steps:
num_stable_steps = num_training_steps - num_warmup_steps - num_decay_steps
if name == SchedulerType.WARMUP_STABLE_DECAY:
return schedule_func(optimizer, num_warmup_steps=num_warmup_steps,num_training_steps=num_training_steps, **scheduler_specific_kwargs)
The text was updated successfully, but these errors were encountered:
Feature request
WSD Scheduler should calculate stable steps in
trainer.py
. And if num_warmup_steps is provided in kwargs, schedule_func should respect the kwargs.My guess is that the intention is it to decay till min and stay there till the end of training, but
min_lr_ratio
is set to the default of 0, wouldn't the learning rate be always 0? Would like to have some insights on this if possible.Additionally, trying to pass in
num_warmup_steps
inlr_scheduler_kwargs
will result in duplicate keys:Motivation
I want to run WSD scheduler for my training, but I do not want to have to calculate the stable steps.
Your contribution
I can contribute to this, but I would like to better understand the edge cases or possible scenarios I might have missed out from the maintainers. However, here is my current workaround:
The text was updated successfully, but these errors were encountered: