attention mask不匹配问题 #1

RENNY-Jenius · 2024-05-14T16:27:32Z

在处理好了PG19数据之后，进行训练，一直发现有问题
[WARNING|logging.py:329] 2024-05-14 16:24:22,784 >> LlamaModel is using LlamaSdpaAttention, but torch.nn.functional.scaled_dot_product_attention does not support output_attentions=True. Falling back to the manual attention implementation, but specifying the manual implementation will be required from Transformers version v5.0.0 onwards. This warning can be removed using the argument attn_implementation="eager" when loading the model.
Traceback (most recent call last):
File "/home/runyu.cai/nope_head_scale/run_clm.py", line 130, in
main()
File "/home/runyu.cai/nope_head_scale/run_clm.py", line 91, in main
train_result: TrainOutput = trainer.train(resume_from_checkpoint=None)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1537, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1854, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2735, in training_step
loss = self.compute_loss(model, inputs)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2758, in compute_loss
outputs = model(**inputs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1519, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1528, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 1523, in forward
else self._run_ddp_forward(*inputs, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward
return self.module(*inputs, **kwargs) # type: ignore[index]
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1519, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1528, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 1181, in forward
outputs = self.model(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1519, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1528, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 1041, in forward
attention_mask = _prepare_4d_causal_attention_mask(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_attn_mask_utils.py", line 306, in _prepare_4d_causal_attention_mask
attention_mask = attn_mask_converter.to_4d(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_attn_mask_utils.py", line 136, in to_4d
expanded_attn_mask = causal_4d_mask.masked_fill(expanded_attn_mask.bool(), torch.finfo(dtype).min)
RuntimeError: The size of tensor a (1024) must match the size of tensor b (750) at non-singleton dimension 3
这个问题我调节了很多参数都没有结果，请问一下能怎么解决这个问题呢？

The text was updated successfully, but these errors were encountered:

RmZeta2718 · 2024-09-10T07:46:52Z

你的transformers版本是什么？本项目使用的版本是4.35，其他版本很可能无法运行。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

attention mask不匹配问题 #1

attention mask不匹配问题 #1

RENNY-Jenius commented May 14, 2024

RmZeta2718 commented Sep 10, 2024

attention mask不匹配问题 #1

attention mask不匹配问题 #1

Comments

RENNY-Jenius commented May 14, 2024

RmZeta2718 commented Sep 10, 2024