关于训练时梯度的问题 #27

genzhengmiaohong · 2024-02-27T08:14:49Z

您好，我在修改train.py文件进行网络训练的时候，在最后loss计算梯度的时候出现了如下错误：RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation，请问您知道该问题如何解决吗？我的cuda版本12.2，因此使用requirement.txt中的版本不合适，我先使用了torch2.1.0的版本，之后更换到 2.2.1+cu118版本均会出现该问题。希望您的回复。

tangyz213 · 2024-02-29T11:15:15Z

你解决了吗？我也遇到了这个问题

ByChelsea · 2024-03-01T11:55:57Z

Can you provide more detailed error information, please? I need to pinpoint the location of the error.

yangzc0214 · 2024-04-07T08:16:30Z

Can you provide more detailed error information, please? I need to pinpoint the location of the error.

Traceback (most recent call last):
File "train.py", line 177, in
train(args)
File "train.py", line 140, in train
loss.backward()
File "C:\Users\yzc.conda\envs\APRIL_GAN\lib\site-packages\torch_tensor.py", line 522, in backward
torch.autograd.backward(
File "C:\Users\yzc.conda\envs\APRIL_GAN\lib\site-packages\torch\autograd_init_.py", line 266, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.HalfTensor [8, 1369, 768]], which is output 0 of DivBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

my env: windows11, torch 2.2.2+cu121
In my env, I modified line 122 in train.py to the following and then the error disappeared

patch_tokens[layer] = patch_tokens[layer] / patch_tokens[layer].norm(dim=-1, keepdim=True)

oylz · 2024-04-15T06:19:22Z

fix it here

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

关于训练时梯度的问题 #27

关于训练时梯度的问题 #27

genzhengmiaohong commented Feb 27, 2024

tangyz213 commented Feb 29, 2024

ByChelsea commented Mar 1, 2024 •

edited

Loading

yangzc0214 commented Apr 7, 2024 •

edited

Loading

oylz commented Apr 15, 2024

关于训练时梯度的问题 #27

关于训练时梯度的问题 #27

Comments

genzhengmiaohong commented Feb 27, 2024

tangyz213 commented Feb 29, 2024

ByChelsea commented Mar 1, 2024 • edited Loading

yangzc0214 commented Apr 7, 2024 • edited Loading

oylz commented Apr 15, 2024

ByChelsea commented Mar 1, 2024 •

edited

Loading

yangzc0214 commented Apr 7, 2024 •

edited

Loading