Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WSD Scheduler to auto infer training steps #34905

Open
wheynelau opened this issue Nov 25, 2024 · 1 comment
Open

WSD Scheduler to auto infer training steps #34905

wheynelau opened this issue Nov 25, 2024 · 1 comment
Labels
Feature request Request for a new feature

Comments

@wheynelau
Copy link

Feature request

WSD Scheduler should calculate stable steps in trainer.py. And if num_warmup_steps is provided in kwargs, schedule_func should respect the kwargs.

My guess is that the intention is it to decay till min and stay there till the end of training, but min_lr_ratio is set to the default of 0, wouldn't the learning rate be always 0? Would like to have some insights on this if possible.

TypeError: get_wsd_schedule() missing 1 required positional argument: 'num_stable_steps'

Additionally, trying to pass in num_warmup_steps in lr_scheduler_kwargs will result in duplicate keys:

     return schedule_func(optimizer, num_warmup_steps=num_warmup_steps, **scheduler_specific_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: transformers.optimization.get_wsd_schedule() got multiple values for keyword argument 'num_warmup_steps'

Motivation

I want to run WSD scheduler for my training, but I do not want to have to calculate the stable steps.

Your contribution

I can contribute to this, but I would like to better understand the edge cases or possible scenarios I might have missed out from the maintainers. However, here is my current workaround:

def get_wsd_schedule(
    + num_training_steps: int = 0,
):

...
    assert num_stable_steps or num_training_steps, "One of either stable steps or training steps must be provided"
    if not num_stable_steps:
        num_stable_steps = num_training_steps - num_warmup_steps - num_decay_steps
    if name == SchedulerType.WARMUP_STABLE_DECAY:
        return schedule_func(optimizer, num_warmup_steps=num_warmup_steps,num_training_steps=num_training_steps, **scheduler_specific_kwargs)
@wheynelau wheynelau added the Feature request Request for a new feature label Nov 25, 2024
@Rocketknight1
Copy link
Member

cc @muellerzr @SunMarc for Trainer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

2 participants