-
Notifications
You must be signed in to change notification settings - Fork 27.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
safe_globals are needed to resume training on upcoming PyTorch 2.6 #34631
Labels
Comments
dvrogozh
changed the title
safe_globals are needed to resume from training on upcoming PyTorch 2.6
safe_globals are needed to resume training on upcoming PyTorch 2.6
Nov 6, 2024
dvrogozh
added a commit
to dvrogozh/transformers
that referenced
this issue
Nov 6, 2024
Starting from version 2.4 PyTorch introduces a stricter check for the objects which can be loaded with torch.load(). Starting from version 2.6 loading with weights_only=True requires allowlisting of such objects. Fixes: huggingface#34631 See: pytorch/pytorch#137602 See: https://pytorch.org/docs/stable/notes/serialization.html#torch.serialization.add_safe_globals Signed-off-by: Dmitry Rogozhkin <[email protected]>
See #34632 for potential fix. |
dvrogozh
added a commit
to dvrogozh/transformers
that referenced
this issue
Nov 6, 2024
Starting from version 2.4 PyTorch introduces a stricter check for the objects which can be loaded with torch.load(). Starting from version 2.6 loading with weights_only=True requires allowlisting of such objects. Fixes: huggingface#34631 See: pytorch/pytorch#137602 See: https://pytorch.org/docs/stable/notes/serialization.html#torch.serialization.add_safe_globals Signed-off-by: Dmitry Rogozhkin <[email protected]>
@muellerzr @SunMarc : can you, please, take a look on the issue and PR #34632? |
cc @ydshieh, could you take a look as well please? Thanks! |
LysandreJik
added
PyTorch
Anything PyTorch
dependencies
Pull requests that update a dependency file
labels
Nov 15, 2024
dvrogozh
added a commit
to dvrogozh/transformers
that referenced
this issue
Nov 16, 2024
Starting from version 2.4 PyTorch introduces a stricter check for the objects which can be loaded with torch.load(). Starting from version 2.6 loading with weights_only=True requires allowlisting of such objects. This commit adds allowlist of some numpy objects used to load model checkpoints. Usage is restricted by context manager. User can still additionall call torch.serialization.add_safe_globals() to add other objects into the safe globals list. Fixes: huggingface#34631 See: pytorch/pytorch#137602 See: https://pytorch.org/docs/stable/notes/serialization.html#torch.serialization.add_safe_globals Signed-off-by: Dmitry Rogozhkin <[email protected]>
dvrogozh
added a commit
to dvrogozh/transformers
that referenced
this issue
Nov 19, 2024
Starting from version 2.4 PyTorch introduces a stricter check for the objects which can be loaded with torch.load(). Starting from version 2.6 loading with weights_only=True requires allowlisting of such objects. This commit adds allowlist of some numpy objects used to load model checkpoints. Usage is restricted by context manager. User can still additionall call torch.serialization.add_safe_globals() to add other objects into the safe globals list. Fixes: huggingface#34631 See: pytorch/pytorch#137602 See: https://pytorch.org/docs/stable/notes/serialization.html#torch.serialization.add_safe_globals Signed-off-by: Dmitry Rogozhkin <[email protected]>
dvrogozh
added a commit
to dvrogozh/transformers
that referenced
this issue
Nov 20, 2024
Starting from version 2.4 PyTorch introduces a stricter check for the objects which can be loaded with torch.load(). Starting from version 2.6 loading with weights_only=True requires allowlisting of such objects. This commit adds allowlist of some numpy objects used to load model checkpoints. Usage is restricted by context manager. User can still additionall call torch.serialization.add_safe_globals() to add other objects into the safe globals list. Accelerate library also stepped into same problem and addressed it with PR-3036. Fixes: huggingface#34631 See: pytorch/pytorch#137602 See: https://pytorch.org/docs/stable/notes/serialization.html#torch.serialization.add_safe_globals See: huggingface/accelerate#3036 Signed-off-by: Dmitry Rogozhkin <[email protected]>
dvrogozh
added a commit
to dvrogozh/transformers
that referenced
this issue
Nov 21, 2024
Starting from version 2.4 PyTorch introduces a stricter check for the objects which can be loaded with torch.load(). Starting from version 2.6 loading with weights_only=True requires allowlisting of such objects. This commit adds allowlist of some numpy objects used to load model checkpoints. Usage is restricted by context manager. User can still additionall call torch.serialization.add_safe_globals() to add other objects into the safe globals list. Accelerate library also stepped into same problem and addressed it with PR-3036. Fixes: huggingface#34631 See: pytorch/pytorch#137602 See: https://pytorch.org/docs/stable/notes/serialization.html#torch.serialization.add_safe_globals See: huggingface/accelerate#3036 Signed-off-by: Dmitry Rogozhkin <[email protected]>
dvrogozh
added a commit
to dvrogozh/transformers
that referenced
this issue
Nov 21, 2024
Starting from version 2.4 PyTorch introduces a stricter check for the objects which can be loaded with torch.load(). Starting from version 2.6 loading with weights_only=True requires allowlisting of such objects. This commit adds allowlist of some numpy objects used to load model checkpoints. Usage is restricted by context manager. User can still additionally call torch.serialization.add_safe_globals() to add other objects into the safe globals list. Accelerate library also stepped into same problem and addressed it with PR-3036. Fixes: huggingface#34631 See: pytorch/pytorch#137602 See: https://pytorch.org/docs/stable/notes/serialization.html#torch.serialization.add_safe_globals See: huggingface/accelerate#3036 Signed-off-by: Dmitry Rogozhkin <[email protected]>
dvrogozh
added a commit
to dvrogozh/transformers
that referenced
this issue
Nov 22, 2024
Starting from version 2.4 PyTorch introduces a stricter check for the objects which can be loaded with torch.load(). Starting from version 2.6 loading with weights_only=True requires allowlisting of such objects. This commit adds allowlist of some numpy objects used to load model checkpoints. Usage is restricted by context manager. User can still additionally call torch.serialization.add_safe_globals() to add other objects into the safe globals list. Accelerate library also stepped into same problem and addressed it with PR-3036. Fixes: huggingface#34631 See: pytorch/pytorch#137602 See: https://pytorch.org/docs/stable/notes/serialization.html#torch.serialization.add_safe_globals See: huggingface/accelerate#3036 Signed-off-by: Dmitry Rogozhkin <[email protected]>
dvrogozh
added a commit
to dvrogozh/transformers
that referenced
this issue
Nov 22, 2024
Starting from version 2.4 PyTorch introduces a stricter check for the objects which can be loaded with torch.load(). Starting from version 2.6 loading with weights_only=True requires allowlisting of such objects. This commit adds allowlist of some numpy objects used to load model checkpoints. Usage is restricted by context manager. User can still additionally call torch.serialization.add_safe_globals() to add other objects into the safe globals list. Accelerate library also stepped into same problem and addressed it with PR-3036. Fixes: huggingface#34631 See: pytorch/pytorch#137602 See: https://pytorch.org/docs/stable/notes/serialization.html#torch.serialization.add_safe_globals See: huggingface/accelerate#3036 Signed-off-by: Dmitry Rogozhkin <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
With:
PyTorch 2.6 flips default on handling
torch.load(wights_only=True)
(done via pytorch/pytorch#137602). With this change, some tests in Huggingface Transformers start to fail. I did not test everything, but at least these are affected:tests/trainer/test_trainer.py::TrainerIntegrationTest::test_auto_batch_size_with_resume_from_checkpoint
tests/trainer/test_trainer.py::TrainerIntegrationTest::test_can_resume_training
tests/trainer/test_trainer.py::TrainerIntegrationTest::test_compare_trainer_and_checkpoint_args_logging
tests/trainer/test_trainer.py::TrainerIntegrationTest::test_resume_training_with_frozen_params
tests/trainer/test_trainer.py::TrainerIntegrationTest::test_resume_training_with_gradient_accumulation
tests/trainer/test_trainer.py::TrainerIntegrationTest::test_resume_training_with_safe_checkpoint
tests/trainer/test_trainer.py::TrainerIntegrationTest::test_resume_training_with_shard_checkpoint
What's the way to handle this case with Huggingface Transformers? Should Transformers retain internal allowed list of safe globals? And/or Transformers API should be extended to allow external safe globals specification? Or this is end user responisbility and such list should be retained on higher level scripts side?
See the log for one of the tests below. Can be reproduced on 1 card system with NVidia A10 or Intel PVC:
CC: @muellerzr @SunMarc
The text was updated successfully, but these errors were encountered: