Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refac] Fix saving/loading models with device #477

Merged
merged 17 commits into from
Jul 10, 2024
Merged

Conversation

inafergra
Copy link
Collaborator

Fixes #475 following up on the work done by @dominikandreasseitz in this unfinished PR. The unfinished PR also included some refactoring of the TrainConfig class (not related to this bug) which was causing conflicts with main that were messy to solve, thus this new PR. As discussed, it would be better to add the TrainConfig refactoring in a separate PR.

Copy link
Collaborator

@DanieleCucurachi DanieleCucurachi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @inafergra !

Overall looks good, I just pointed out a minor issue in the original qadence code that is still to be fixed.

qadence/ml_tools/saveload.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@dominikandreasseitz dominikandreasseitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @inafergra ,

  1. can we add a test which loads both legacy (without a device) and current checkpoints (with device) and asserts that they are successfully loaded?
  2. i remember @awennersteen proposed on the previous PR to handle both cases and log a error/debug in case the user uses legacy checkpoints. the most important thing is that this implementation handles both.
    thanks!

@inafergra
Copy link
Collaborator Author

thanks @inafergra ,

1. can we add a test which loads both legacy (without a device) and current checkpoints (with device) and asserts that they are successfully loaded?

2. i remember @awennersteen proposed on the previous PR to handle both cases and log a error/debug in case the user uses legacy checkpoints. the most important thing is that this implementation handles both.
   thanks!

Thanks for the comments Dominik. I added tests checking that both QNNs and QuantumModels can be loaded with the legacy ckpts names. The function get_latest_checkpoint_name() can find both legacy and device ckpts.

@gvelikova
Copy link
Collaborator

Hey @inafergra have you tested how this modification with the device works with GPUs?

@inafergra
Copy link
Collaborator Author

Hey @inafergra have you tested how this modification with the device works with GPUs?

Hey @gvelikova, yes I tested it in my local (single) gpu and it works fine

@dominikandreasseitz dominikandreasseitz removed their request for review July 4, 2024 08:29
@dominikandreasseitz
Copy link
Collaborator

hey @inafergra , @Roland-djee will take over the review here!

Copy link
Collaborator

@RolandMacDoland RolandMacDoland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @inafergra. Can't say much since I'm less versed in torch intricacies than @gvelikova or @awennersteen (but slowly curing myself). So I let them be better judges at this point. Approving now but subjected to their final approval.

qadence/ml_tools/saveload.py Show resolved Hide resolved
qadence/ml_tools/saveload.py Outdated Show resolved Hide resolved
qadence/ml_tools/saveload.py Outdated Show resolved Hide resolved
qadence/ml_tools/train_grad.py Show resolved Hide resolved
tests/ml_tools/test_checkpointing.py Show resolved Hide resolved
tests/ml_tools/test_checkpointing.py Outdated Show resolved Hide resolved
@inafergra inafergra dismissed dominikandreasseitz’s stale review July 5, 2024 12:19

Changes are already implemented

@inafergra
Copy link
Collaborator Author

@awennersteen if you don't have any more comments this should be ready to merge now

@inafergra
Copy link
Collaborator Author

Merging this now as discussed with @gvelikova, any additional changes can be included in a future PR solving #484 which also involves checkpointing.

@inafergra inafergra merged commit f51a1eb into main Jul 10, 2024
8 checks passed
@inafergra inafergra deleted the ifg/fix_ckpt_match branch July 10, 2024 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactoring Refactoring of legacy code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] get_latest_checkpoint_name() looks for the wrong ckpt name
6 participants