Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Torch probably needs to be less than 2.0 #70

Open
sgbaird opened this issue Jun 7, 2023 · 4 comments
Open

Torch probably needs to be less than 2.0 #70

sgbaird opened this issue Jun 7, 2023 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@sgbaird
Copy link
Member

sgbaird commented Jun 7, 2023

https://github.com/sparks-baird/matsci-opt-benchmarks/actions/runs/5193086890/jobs/9363184385

def test_matbench_metric_calculator():
        actual_param = userparam_to_crabnetparam(user_param, seed=50)
>       matbench_metric_calculator(actual_param, dummy=True)

tests/crabnet_hyperparameter_test.py:41: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
.tox/default/lib/python3.10/site-packages/matsci_opt_benchmarks/crabnet_hyperparameter/utils/parameters.py:361: in matbench_metric_calculator
    cb.fit(train_df=train_df)
.tox/default/lib/python3.10/site-packages/crabnet/crabnet_.py:409: in fit
    self._train()
.tox/default/lib/python3.10/site-packages/crabnet/crabnet_.py:628: in _train
    self.optimizer.step()
.tox/default/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:69: in wrapper
    return wrapped(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

args = (SWA (
Parameter Group 0
    betas: (0.89992, 0.7[49](https://github.com/sparks-baird/matsci-opt-benchmarks/actions/runs/5193086890/jobs/9363184385#step:5:50)95)
    eps: 1e-06
    initial_lr: 0.0001
    lookahead_alpha: 0.5
    lookahead_k: 6
    lookahead_step: 0
    lr: 0.0001
    n_avg: 0
    step_counter: 0
    weight_decay: 0.0
),)
kwargs = {}
self = SWA (
Parameter Group 0
    betas: (0.89992, 0.74995)
    eps: 1e-06
    initial_lr: 0.0001
    lookahead_alpha: 0.5
    lookahead_k: 6
    lookahead_step: 0
    lr: 0.0001
    n_avg: 0
    step_counter: 0
    weight_decay: 0.0
)
_ = [], profile_name = 'Optimizer.step#SWA.step'

    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        self, *_ = args
        profile_name = "Optimizer.step#{}.step".format(self.__class__.__name__)
        with torch.autograd.profiler.record_function(profile_name):
            # call optimizer step pre hooks
>           for pre_hook in chain(_global_optimizer_pre_hooks.values(), self._optimizer_step_pre_hooks.values()):
E           AttributeError: 'SWA' object has no attribute '_optimizer_step_pre_hooks'. Did you mean: '_optimizer_step_code'?

.tox/default/lib/python3.10/site-packages/torch/optim/optimizer.py:2[71](https://github.com/sparks-baird/matsci-opt-benchmarks/actions/runs/5193086890/jobs/9363184385#step:5:72): AttributeError
@sgbaird sgbaird added the bug Something isn't working label Jun 7, 2023
@sgbaird sgbaird self-assigned this Jun 7, 2023
sgbaird added a commit that referenced this issue Jun 19, 2023
@sgbaird
Copy link
Member Author

sgbaird commented Jun 23, 2023

For me to address this error without forcing torch < 2.0, I will need to adjust CrabNet to use SWA from the official PyTorch implementation rather than the code that Anthony or Kaai copied (probably from a blog) when developing CrabNet.

Aside: I'm not sure how this will affect hyperparameters in the context of matsci-opt-benchmarks and crabnet-hyperparameter.

Official PyTorch SWA

Uncited source code within CrabNet

  • Entirety of https://github.com/sparks-baird/CrabNet/blob/main/crabnet/utils/optim.py (in particular, SWA class)
  • class Lamb(Optimizer):
    r"""Implements Lamb algorithm.
    It has been proposed in `Large Batch Optimization for Deep Learning:
    Training BERT in 76 minutes`_.
    Arguments:
    params (iterable): iterable of parameters to optimize or dicts defining
    parameter groups
    lr (float, optional): learning rate (default: 1e-3)
    betas (Tuple[float, float], optional): coefficients used for computing
    running averages of gradient and its square (default: (0.9, 0.999))
    eps (float, optional): term added to the denominator to improve
    numerical stability (default: 1e-8)
    weight_decay (float, optional): weight decay (L2 penalty) (default: 0)
    adam (bool, optional): always use trust ratio = 1, which turns this
    into Adam. Useful for comparison purposes.
    _Large Batch Optimization for Deep Learning: Training BERT in 76
    minutes:
    https://arxiv.org/abs/1904.00962
    """
    def __init__(
    self,
    params,
    lr=1e-3,
    betas=(0.9, 0.999),
    eps=1e-6,
    weight_decay=0,
    adam=False,
    min_trust=None,
    ):
    if not 0.0 <= lr:
    raise ValueError(f"Invalid learning rate: {lr}")
    if not 0.0 <= eps:
    raise ValueError(f"Invalid epsilon value: {eps}")
    if not 0.0 <= betas[0] < 1.0:
    raise ValueError(f"Invalid beta parameter at index 0: {betas[0]}")
    if not 0.0 <= betas[1] < 1.0:
    raise ValueError(f"Invalid beta parameter at index 1: {betas[1]}")
    if min_trust and not 0.0 <= min_trust < 1.0:
    raise ValueError(f"Minimum trust range from 0 to 1: {min_trust}")
    defaults = dict(lr=lr, betas=betas, eps=eps, weight_decay=weight_decay)
    self.adam = adam
    self.min_trust = min_trust
    super(Lamb, self).__init__(params, defaults)
    def step(self, closure=None):
    """Performs a single optimization step.
    Arguments:
    closure (callable, optional): A closure that reevaluates the model
    and returns the loss.
    """
    loss = None
    if closure is not None:
    loss = closure()
    for group in self.param_groups:
    for p in group["params"]:
    if p.grad is None:
    continue
    grad = p.grad.data
    if grad.is_sparse:
    err_msg = (
    "Lamb does not support sparse gradients, "
    + "consider SparseAdam instad."
    )
    raise RuntimeError(err_msg)
    state = self.state[p]
    # State initialization
    if len(state) == 0:
    state["step"] = 0
    # Exponential moving average of gradient values
    state["exp_avg"] = torch.zeros_like(p.data)
    # Exponential moving average of squared gradient values
    state["exp_avg_sq"] = torch.zeros_like(p.data)
    exp_avg, exp_avg_sq = state["exp_avg"], state["exp_avg_sq"]
    beta1, beta2 = group["betas"]
    state["step"] += 1
    # Decay the first and second moment running average coefficient
    # m_t
    exp_avg.mul_(beta1).add_((1 - beta1) * grad)
    # v_t
    # exp_avg_sq.mul_(beta2).addcmul_((1 - beta2) * grad *
    exp_avg_sq.mul_(beta2).addcmul_(grad, grad, value=(1 - beta2))
    # Paper v3 does not use debiasing.
    # bias_correction1 = 1 - beta1 ** state['step']
    # bias_correction2 = 1 - beta2 ** state['step']
    # Apply bias to lr to avoid broadcast.
    step_size = group[
    "lr"
    ] # * math.sqrt(bias_correction2) / bias_correction1
    weight_norm = p.data.pow(2).sum().sqrt().clamp(0, 10)
    adam_step = exp_avg / exp_avg_sq.sqrt().add(group["eps"])
    if group["weight_decay"] != 0:
    adam_step.add_(group["weight_decay"], p.data)
    adam_norm = adam_step.pow(2).sum().sqrt()
    if weight_norm == 0 or adam_norm == 0:
    trust_ratio = 1
    else:
    trust_ratio = weight_norm / adam_norm
    if self.min_trust:
    trust_ratio = max(trust_ratio, self.min_trust)
    state["weight_norm"] = weight_norm
    state["adam_norm"] = adam_norm
    state["trust_ratio"] = trust_ratio
    if self.adam:
    trust_ratio = 1
    p.data.add_(-step_size * trust_ratio * adam_step)
    return loss
  • class Lookahead(Optimizer):
    def __init__(self, base_optimizer, alpha=0.5, k=6):
    if not 0.0 <= alpha <= 1.0:
    raise ValueError(f"Invalid slow update rate: {alpha}")
    if not 1 <= k:
    raise ValueError(f"Invalid lookahead steps: {k}")
    defaults = dict(lookahead_alpha=alpha, lookahead_k=k, lookahead_step=0)
    self.base_optimizer = base_optimizer
    self.param_groups = self.base_optimizer.param_groups
    self.defaults = base_optimizer.defaults
    self.defaults.update(defaults)
    self.state = defaultdict(dict)
    # manually add our defaults to the param groups
    for name, default in defaults.items():
    for group in self.param_groups:
    group.setdefault(name, default)
    def update_slow(self, group):
    for fast_p in group["params"]:
    if fast_p.grad is None:
    continue
    param_state = self.state[fast_p]
    if "slow_buffer" not in param_state:
    param_state["slow_buffer"] = torch.empty_like(fast_p.data)
    param_state["slow_buffer"].copy_(fast_p.data)
    slow = param_state["slow_buffer"]
    slow.add_(group["lookahead_alpha"] * (fast_p.data - slow))
    fast_p.data.copy_(slow)
    def sync_lookahead(self):
    for group in self.param_groups:
    self.update_slow(group)
    def step(self, closure=None):
    # assert id(self.param_groups) == id(self.base_optimizer.param_groups)
    loss = self.base_optimizer.step(closure)
    for group in self.param_groups:
    group["lookahead_step"] += 1
    if group["lookahead_step"] % group["lookahead_k"] == 0:
    self.update_slow(group)
    return loss
    def state_dict(self):
    fast_state_dict = self.base_optimizer.state_dict()
    slow_state = {
    (id(k) if isinstance(k, torch.Tensor) else k): v
    for k, v in self.state.items()
    }
    fast_state = fast_state_dict["state"]
    param_groups = fast_state_dict["param_groups"]
    return {
    "state": fast_state,
    "slow_state": slow_state,
    "param_groups": param_groups,
    }
    def load_state_dict(self, state_dict):
    fast_state_dict = {
    "state": state_dict["state"],
    "param_groups": state_dict["param_groups"],
    }
    self.base_optimizer.load_state_dict(fast_state_dict)
    # We want to restore the slow state, but share param_groups reference
    # with base_optimizer. This is a bit redundant but least code
    slow_state_new = False
    if "slow_state" not in state_dict:
    print("Loading state_dict from optimizer without Lookahead applied.")
    state_dict["slow_state"] = defaultdict(dict)
    slow_state_new = True
    slow_state_dict = {
    "state": state_dict["slow_state"],
    "param_groups": state_dict[
    "param_groups"
    ], # this is pointless but saves code
    }
    super(Lookahead, self).load_state_dict(slow_state_dict)
    self.param_groups = (
    self.base_optimizer.param_groups
    ) # make both ref same container
    if slow_state_new:
    # reapply defaults to catch missing lookahead specific ones
    for name, default in self.defaults.items():
    for group in self.param_groups:
    group.setdefault(name, default)

Maybe Lamb is from https://github.com/cybertronai/pytorch-lamb/blob/master/pytorch_lamb/lamb.py?

Maybe Lookahead is from https://github.com/lonePatient/lookahead_pytorch/blob/master/optimizer.py?

Places that require refactoring

... or at least a double check. This might not be comprehensive, but it's a rough idea.

  • CrabNet/crabnet/crabnet_.py

    Lines 372 to 391 in b9835dc

    base_optim = Lamb(
    params=self.model.parameters(),
    lr=self.lr,
    betas=self.betas,
    eps=self.eps,
    weight_decay=self.weight_decay,
    adam=self.adam,
    min_trust=self.min_trust,
    )
    optimizer = Lookahead(base_optimizer=base_optim, alpha=self.alpha, k=self.k)
    self.optimizer = SWA(optimizer)
    lr_scheduler = CyclicLR(
    self.optimizer,
    base_lr=self.base_lr,
    max_lr=self.max_lr,
    cycle_momentum=False,
    step_size_up=self.stepsize,
    )
    self.lr_scheduler = lr_scheduler
  • CrabNet/crabnet/crabnet_.py

    Lines 395 to 401 in b9835dc

    self.stepping = True
    self.swa_start = 2 # start at (n/2) cycle (lr minimum)
    self.xswa: List[int] = []
    self.yswa: List[float] = []
    self.lr_list: List[float] = []
    self.discard_n = 3
  • self.lr_list.append(self.optimizer.param_groups[0]["lr"])
  • CrabNet/crabnet/crabnet_.py

    Lines 423 to 432 in b9835dc

    if self.optimizer.discard_count >= self.discard_n:
    if self.verbose:
    print(
    f"Discarded: {self.optimizer.discard_count}/{self.discard_n}weight updates, early-stopping now"
    )
    self.optimizer.swap_swa_sgd()
    break
    if not (self.optimizer.discard_count >= self.discard_n):
    self.optimizer.swap_swa_sgd()
  • CrabNet/crabnet/crabnet_.py

    Lines 627 to 633 in b9835dc

    self.optimizer.step()
    self.optimizer.zero_grad()
    if self.stepping:
    self.lr_scheduler.step()
    # hyperparameter updates
    swa_check = self.epochs_step * self.swa_start - 1
  • CrabNet/crabnet/crabnet_.py

    Lines 651 to 661 in b9835dc

    self.optimizer.update_swa(mae_v)
    minima.append(self.optimizer.minimum_found)
    if learning_time and not any(minima):
    self.optimizer.discard_count += 1
    if self.verbose:
    print(f"Epoch {self.epoch} failed to improve.")
    print(
    f"Discarded: {self.optimizer.discard_count}/"
    f"{self.discard_n} weight updates"
    )
  • base_optim = Lamb(params=self.model.parameters())
  • if epoch == epochs - 1 or self.optimizer.discard_count >= self.discard_n:

cc @anthony-wang @Kaaiian

sgbaird added a commit to sparks-baird/mat_discover that referenced this issue Jun 23, 2023
@cseeg
Copy link

cseeg commented Jul 27, 2023

What version of pytorch was usually used?

@sgbaird
Copy link
Member Author

sgbaird commented Jul 27, 2023

Probably just the latest version before 2.0. You shouldn't have to do anything fancy when you're installing locally on your computer, since it's already written into the latest requirements.

@sgbaird
Copy link
Member Author

sgbaird commented Nov 1, 2023

From #73 (reply in thread) by @DavidSiretMarques

I'm using pytorch for other things too, so that isn't an option...

That will make things tough. While not ideal, have you tried forcing installation of PyTorch < 2.0 and seeing if that breaks things in your other code?

Do you know the cause of the problem?
I've dedicated some time to this, but only enough to identify the places that require refactoring and some potential solutions.

It was weird that in my old mac computer it worked and in my newer windows computer didn't..., they should have the exact same env with the same libraries...
That's pretty interesting. I'm not sure why that would happen, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants