You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I was just trying to run the training using python train.py params_x1x3x4_diffusion_mosesaq_20240824 0, as suggested in the readme, I got the following error:
RuntimeError: Trying to resize storage that is not resizable
Hi! I have not experienced this error, so I suspect it has something to do with our different training setups or package versions.
To help debug, can you try the following:
Make sure you can successfully run inference code provided by the RUNME_{}.ipynb notebooks.
In train.py, make sure you can call dataset[0] after initializing dataset = HeteroDatset(...)
In train.py, make sure you can call next(iter(train_loader)) after initializing train_loader = torch_geometric.loader.DataLoader(...), with batch_size = 0 and batch_size > 0.
If all of that works, then I would guess it is related to an issue with DDPM in Pytorch-Lightning with your particular system set-up. Are you trying to train with 1 GPU? On a CPU? On multiple GPUs? The parameters in parameters/params_x1x3x4_diffusion_mosesaq_20240824.py specify 'num_gpus': 2 and 'multiprocessing_spawn': True. Both of those could be causing issues with your specific setup?
Also, does this error occur at the start of the training epochs? Or mid-way through training?
Additionally, make sure that the versions of your packages are the same as those listed in the README, particularly your Pytorch-Lightning, Pytorch, and PyG versions.
It would also help if you could provide the complete error traceback.
When I was just trying to run the training using
python train.py params_x1x3x4_diffusion_mosesaq_20240824 0
, as suggested in the readme, I got the following error:According to lucidrains/denoising-diffusion-pytorch#248 the solution is to change
num_workers
in the dataloader to 0 but that resulted in the following error:Could you please provide some guidance on this?
The text was updated successfully, but these errors were encountered: