Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

attention mask不匹配问题 #1

Open
RENNY-Jenius opened this issue May 14, 2024 · 1 comment
Open

attention mask不匹配问题 #1

RENNY-Jenius opened this issue May 14, 2024 · 1 comment

Comments

@RENNY-Jenius
Copy link

在处理好了PG19数据之后,进行训练,一直发现有问题
[WARNING|logging.py:329] 2024-05-14 16:24:22,784 >> LlamaModel is using LlamaSdpaAttention, but torch.nn.functional.scaled_dot_product_attention does not support output_attentions=True. Falling back to the manual attention implementation, but specifying the manual implementation will be required from Transformers version v5.0.0 onwards. This warning can be removed using the argument attn_implementation="eager" when loading the model.
Traceback (most recent call last):
File "/home/runyu.cai/nope_head_scale/run_clm.py", line 130, in
main()
File "/home/runyu.cai/nope_head_scale/run_clm.py", line 91, in main
train_result: TrainOutput = trainer.train(resume_from_checkpoint=None)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1537, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1854, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2735, in training_step
loss = self.compute_loss(model, inputs)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2758, in compute_loss
outputs = model(**inputs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1519, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1528, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 1523, in forward
else self._run_ddp_forward(*inputs, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward
return self.module(*inputs, **kwargs) # type: ignore[index]
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1519, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1528, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 1181, in forward
outputs = self.model(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1519, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1528, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 1041, in forward
attention_mask = _prepare_4d_causal_attention_mask(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_attn_mask_utils.py", line 306, in _prepare_4d_causal_attention_mask
attention_mask = attn_mask_converter.to_4d(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_attn_mask_utils.py", line 136, in to_4d
expanded_attn_mask = causal_4d_mask.masked_fill(expanded_attn_mask.bool(), torch.finfo(dtype).min)
RuntimeError: The size of tensor a (1024) must match the size of tensor b (750) at non-singleton dimension 3
这个问题我调节了很多参数都没有结果,请问一下能怎么解决这个问题呢?

@RmZeta2718
Copy link
Collaborator

你的transformers版本是什么?本项目使用的版本是4.35,其他版本很可能无法运行。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants