-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About OOM issue related to 'create_graph=True' #7
Comments
So have you solved the problem? I have the same problem |
I run into same issue when training on ImageNet. |
I haven't used this code recently. That's why I can't remember clearly how to avoid this problem. |
use_hessian is important if we want the scale factors in EWGS equation to be based on hessian. Line 116 in 56c654c |
Dear Author,
Above all, thank you for sharing nice codes.
BTW, about quant training on CIFAR10,
Have you ever faced with OOM issues by loss.backward(create_graph=True) in update_grad_scales?
When I tried it by below args, I was faced with the "RuntimeError: CUDA out of memory" issue.
python train_quant.py --gpu_id '0'
--weight_levels 8
--act_levels 8
--baseline False
--use_hessian True
--load_pretrain True
--pretrain_path '../results/ResNet20_CIFAR10/fp/checkpoint/last_checkpoint.pth'
--log_dir '../results/ResNet20_CIFAR10/ours(hess)/W8A8/
Do you have some idea to avoid this issue?
Thank you in advance.
The text was updated successfully, but these errors were encountered: