You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
在处理好了PG19数据之后,进行训练,一直发现有问题
[WARNING|logging.py:329] 2024-05-14 16:24:22,784 >> LlamaModel is using LlamaSdpaAttention, but torch.nn.functional.scaled_dot_product_attention does not support output_attentions=True. Falling back to the manual attention implementation, but specifying the manual implementation will be required from Transformers version v5.0.0 onwards. This warning can be removed using the argument attn_implementation="eager" when loading the model.
Traceback (most recent call last):
File "/home/runyu.cai/nope_head_scale/run_clm.py", line 130, in
main()
File "/home/runyu.cai/nope_head_scale/run_clm.py", line 91, in main
train_result: TrainOutput = trainer.train(resume_from_checkpoint=None)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1537, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1854, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2735, in training_step
loss = self.compute_loss(model, inputs)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2758, in compute_loss
outputs = model(**inputs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1519, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1528, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 1523, in forward
else self._run_ddp_forward(*inputs, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward
return self.module(*inputs, **kwargs) # type: ignore[index]
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1519, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1528, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 1181, in forward
outputs = self.model(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1519, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1528, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 1041, in forward
attention_mask = _prepare_4d_causal_attention_mask(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_attn_mask_utils.py", line 306, in _prepare_4d_causal_attention_mask
attention_mask = attn_mask_converter.to_4d(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_attn_mask_utils.py", line 136, in to_4d
expanded_attn_mask = causal_4d_mask.masked_fill(expanded_attn_mask.bool(), torch.finfo(dtype).min)
RuntimeError: The size of tensor a (1024) must match the size of tensor b (750) at non-singleton dimension 3
这个问题我调节了很多参数都没有结果,请问一下能怎么解决这个问题呢?
The text was updated successfully, but these errors were encountered:
在处理好了PG19数据之后,进行训练,一直发现有问题
[WARNING|logging.py:329] 2024-05-14 16:24:22,784 >> LlamaModel is using LlamaSdpaAttention, but
torch.nn.functional.scaled_dot_product_attention
does not supportoutput_attentions=True
. Falling back to the manual attention implementation, but specifying the manual implementation will be required from Transformers version v5.0.0 onwards. This warning can be removed using the argumentattn_implementation="eager"
when loading the model.Traceback (most recent call last):
File "/home/runyu.cai/nope_head_scale/run_clm.py", line 130, in
main()
File "/home/runyu.cai/nope_head_scale/run_clm.py", line 91, in main
train_result: TrainOutput = trainer.train(resume_from_checkpoint=None)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1537, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1854, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2735, in training_step
loss = self.compute_loss(model, inputs)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2758, in compute_loss
outputs = model(**inputs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1519, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1528, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 1523, in forward
else self._run_ddp_forward(*inputs, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward
return self.module(*inputs, **kwargs) # type: ignore[index]
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1519, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1528, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 1181, in forward
outputs = self.model(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1519, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1528, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 1041, in forward
attention_mask = _prepare_4d_causal_attention_mask(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_attn_mask_utils.py", line 306, in _prepare_4d_causal_attention_mask
attention_mask = attn_mask_converter.to_4d(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_attn_mask_utils.py", line 136, in to_4d
expanded_attn_mask = causal_4d_mask.masked_fill(expanded_attn_mask.bool(), torch.finfo(dtype).min)
RuntimeError: The size of tensor a (1024) must match the size of tensor b (750) at non-singleton dimension 3
这个问题我调节了很多参数都没有结果,请问一下能怎么解决这个问题呢?
The text was updated successfully, but these errors were encountered: