Expected is_sm80 || is_sm90 to be true, but got false. #1125

1523826455647 · 2024-04-12T03:07:19Z

1523826455647
Apr 12, 2024

完全按照finetune.py的操作，并且在一个测试数据上也成功运行，但将数据换成我的数据之后就无法运行。
pytorch2.1.2+cuda12.1

:\Users\leiliji\anaconda3\envs\chatglm3-demo\lib\site-packages\torch_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.get(instance, owner)()
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [02:29<00:00, 21.42s/it]
trainable params: 1,949,696 || all params: 6,245,533,696 || trainable%: 0.031217444255383614
--> Model

--> model has 1.949696M params

Map (num_proc=16): 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1775/1775 [00:06<00:00, 257.49 examples/s]
train_dataset: Dataset({
features: ['input_ids', 'labels'],
num_rows: 1775
})
Map (num_proc=16): 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 345/345 [00:06<00:00, 49.66 examples/s]
val_dataset: Dataset({
features: ['input_ids', 'output_ids'],
num_rows: 345
})
Map (num_proc=16): 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 345/345 [00:06<00:00, 50.93 examples/s]
test_dataset: Dataset({
features: ['input_ids', 'output_ids'],
num_rows: 345
})
You are using an old version of the checkpointing format that is deprecated (We will also silently ignore gradient_checkpointing_kwargs in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method _set_gradient_checkpointing in your model.
max_steps is given, it will override any value given in num_train_epochs
***** Running training *****
Num examples = 1,775
Num Epochs = 2
Instantaneous batch size per device = 1
Total train batch size (w. parallel, distributed & accumulation) = 1
Gradient Accumulation steps = 1
Total optimization steps = 3,550
Number of trainable parameters = 1,949,696
0%| | 0/3550 [00:00<?, ?it/s]C
:\Users\leiliji\anaconda3\envs\chatglm3-demo\lib\site-packages\torch\utils\checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_ree
ntrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
warnings.warn(
0%|▎ | 5/3550 [00:50<4:22:50, 4.45s/it]╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ C:\Users\leiliji\PycharmProjects\chatglm3\ChatGLM3\chatglm3_6b_finetune-main\finetune_hf.py:518 │
│ in main │
│ │
│ 515 │ │ tokenizer=tokenizer, │
│ 516 │ │ compute_metrics=functools.partial(compute_metrics, tokenizer=tokenizer), │
│ 517 │ ) │
│ ❱ 518 │ trainer.train() │
│ 519 │ │
│ 520 │ # test stage │
│ 521 │ if test_dataset is not None: │
│ │
│ C:\Users\leiliji\anaconda3\envs\chatglm3-demo\lib\site-packages\transformers\trainer.py:1624 in │
│ train │
│ │
│ 1621 │ │ │ finally: │
│ 1622 │ │ │ │ hf_hub_utils.enable_progress_bars() │
│ 1623 │ │ else: │
│ ❱ 1624 │ │ │ return inner_training_loop( │
│ 1625 │ │ │ │ args=args, │
│ 1626 │ │ │ │ resume_from_checkpoint=resume_from_checkpoint, │
│ 1627 │ │ │ │ trial=trial, │
│ │
│ C:\Users\leiliji\anaconda3\envs\chatglm3-demo\lib\site-packages\transformers\trainer.py:1961 in │
│ inner_training_loop │
│ │
│ 1958 │ │ │ │ │ self.control = self.callback_handler.on_step_begin(args, self.state, │
│ 1959 │ │ │ │ │
│ 1960 │ │ │ │ with self.accelerator.accumulate(model): │
│ ❱ 1961 │ │ │ │ │ tr_loss_step = self.training_step(model, inputs) │
│ 1962 │ │ │ │ │
│ 1963 │ │ │ │ if ( │
│ 1964 │ │ │ │ │ args.logging_nan_inf_filter │
│ │
│ C:\Users\leiliji\anaconda3\envs\chatglm3-demo\lib\site-packages\transformers\trainer.py:2911 in │
│ training_step │
│ │
│ 2908 │ │ │ with amp.scale_loss(loss, self.optimizer) as scaled_loss: │
│ 2909 │ │ │ │ scaled_loss.backward() │
│ 2910 │ │ else: │
│ ❱ 2911 │ │ │ self.accelerator.backward(loss) │
│ 2912 │ │ │
│ 2913 │ │ return loss.detach() / self.args.gradient_accumulation_steps │
│ 2914 │
│ │
│ C:\Users\leiliji\anaconda3\envs\chatglm3-demo\lib\site-packages\accelerate\accelerator.py:1966 │
│ in backward │
│ │
│ 1963 │ │ elif self.scaler is not None: │
│ 1964 │ │ │ self.scaler.scale(loss).backward(**kwargs) │
│ 1965 │ │ else: │
│ ❱ 1966 │ │ │ loss.backward(**kwargs) │
│ 1967 │ │
│ 1968 │ def set_trigger(self): │
│ 1969 │ │ """ │
│ │
│ C:\Users\leiliji\anaconda3\envs\chatglm3-demo\lib\site-packages\torch_tensor.py:492 in backward │
│ │
│ 489 │ │ │ │ create_graph=create_graph, │
│ 490 │ │ │ │ inputs=inputs, │
│ 491 │ │ │ ) │
│ ❱ 492 │ │ torch.autograd.backward( │
│ 493 │ │ │ self, gradient, retain_graph, create_graph, inputs=inputs │
│ 494 │ │ ) │
│ 495 │
│ │
│ C:\Users\leiliji\anaconda3\envs\chatglm3-demo\lib\site-packages\torch\autograd_init.py:251 │
│ in backward │
│ │
│ 248 │ # The reason we repeat the same comment below is that │
│ 249 │ # some Python versions print out the first line of a multi-line function │
│ 250 │ # calls in the traceback and some print out the last line │
│ ❱ 251 │ Variable.execution_engine.run_backward( # Calls into the C++ engine to run the bac │
│ 252 │ │ tensors, │
│ 253 │ │ grad_tensors, │
│ 254 │ │ retain_graph, │
│ │
│ C:\Users\leiliji\anaconda3\envs\chatglm3-demo\lib\site-packages\torch\autograd\function.py:288 │
│ in apply │
│ │
│ 285 │ │ │ │ "of them." │
│ 286 │ │ │ ) │
│ 287 │ │ user_fn = vjp_fn if vjp_fn is not Function.vjp else backward_fn │
│ ❱ 288 │ │ return user_fn(self, *args) │
│ 289 │ │
│ 290 │ def apply_jvp(self, *args): │
│ 291 │ │ # forward_cls is defined by derived class │
│ │
│ C:\Users\leiliji\anaconda3\envs\chatglm3-demo\lib\site-packages\torch\utils\checkpoint.py:288 in │
│ backward │
│ │
│ 285 │ │ │ │ "none of output has requires_grad=True," │
│ 286 │ │ │ │ " this checkpoint() is not necessary" │
│ 287 │ │ │ ) │
│ ❱ 288 │ │ torch.autograd.backward(outputs_with_grad, args_with_grad) │
│ 289 │ │ grads = tuple( │
│ 290 │ │ │ inp.grad if isinstance(inp, torch.Tensor) else None │
│ 291 │ │ │ for inp in detached_inputs │
│ │
│ C:\Users\leiliji\anaconda3\envs\chatglm3-demo\lib\site-packages\torch\autograd_init.py:251 │
│ in backward │
│ │
│ 248 │ # The reason we repeat the same comment below is that │
│ 249 │ # some Python versions print out the first line of a multi-line function │
│ 250 │ # calls in the traceback and some print out the last line │
│ ❱ 251 │ Variable.execution_engine.run_backward( # Calls into the C++ engine to run the bac │
│ 252 │ │ tensors, │
│ 253 │ │ grad_tensors, │
│ 254 │ │ retain_graph, │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Expected is_sm80 || is_sm90 to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)
0%|▎ | 5/3550 [00:52<10:22:52, 10.54s/it]

我用的显卡2080ti，在网上搜索了一下是显卡架构不匹配，除了换卡还有其他方法么

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expected is_sm80 || is_sm90 to be true, but got false. #1125

{{title}}

Replies: 0 comments

Select a reply

Expected is_sm80 || is_sm90 to be true, but got false. #1125

1523826455647 Apr 12, 2024

Replies: 0 comments

1523826455647
Apr 12, 2024