Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

### KeyError: 'loss' #1340

Open
2 tasks done
xieyvhuan opened this issue Dec 18, 2024 · 1 comment
Open
2 tasks done

### KeyError: 'loss' #1340

xieyvhuan opened this issue Dec 18, 2024 · 1 comment

Comments

@xieyvhuan
Copy link

System Info / 系統信息

Pytorch 2.5.1
Python 3.12(ubuntu22.04)
Cuda 12.4
transformers== 4.47.0
RTX 4090D(24GB) * 1
系 统 盘 :91% 28G/30G
数 据 盘:71% 36G/50G

Who can help? / 谁可以帮助到您?

@Btlmd

Information / 问题信息

  • The official example scripts / 官方的示例脚本
  • My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

1.正常下载模型,安装依赖,跟网上大部分教程一致,实现了三种部署方式,都正常运行(可能过程有点小差错但是都解决了)具体操作流程如此网站一致:https://zhuanlan.zhihu.com/p/676106044
2.安装你们的官方文档lora微调示例,将自己的数据集(5000个样本)转换成相应的格式,创建环境且安装依赖,微调需要的依赖都成功安装,如何按照自己的路径启动微调,具体代码:

CUDA_VISIBLE_DEVICES=0 NCCL_P2P_DISABLE="1" NCCL_IB_DISABLE="1" python finetune_hf.py  /root/autodl-tmp/ChatGLM3/finetune_demo/data/AdvertiseGen  /root/autodl-tmp/ChatGLM3/chatglm3-6b  configs/lora.yaml

3.我已经确定是评估阶段(到达评估步长出现的该错误)返回loss参数时出现的bug,但是具体原因不得而知,Traceback如下:

│ /root/autodl-tmp/ChatGLM3/finetune_demo/finetune_hf.py:539 in main                                                                                                                  │
│                                                                                                                                                                                   
│   536 │   )                                                                                                                                                                     
│   537 │                                                                                                                                                                            
│   538 │   if auto_resume_from_checkpoint.upper() == "" or auto_resume_from_checkpoint is None:                                                                                      │
│ ❱ 539 │   │   trainer.train()                                                                                                                                                       
│   540 │   else:                                                                                                                                                                  
│   541 │   │   def do_rf_checkpoint(sn):                                                                                                                                    
│   542 │   │   │   model.gradient_checkpointing_enable()
│
│ /root/miniconda3/envs/ChatGLM3-Tuning/lib/python3.10/site-packages/transformers/trainer.py:2164 in train                                                                            │
│                                                                                                                                                                                     
│   2161 │   │   │   finally:                                                                                                                                                         
│   2162 │   │   │   │   hf_hub_utils.enable_progress_bars()                                                                                                                          │
│   2163 │   │   else:                                                                                                                                                           
│ ❱ 2164 │   │   │   return inner_training_loop(                                                                                                                                      │
│   2165 │   │   │   │   args=args,                                                                                                                                              
│   2166 │   │   │   │   resume_from_checkpoint=resume_from_checkpoint,                                                                                                               │
│   2167 │   │   │   │   trial=trial,                                                                                                                                               
│                                                                                                                                                                                     
│ /root/miniconda3/envs/ChatGLM3-Tuning/lib/python3.10/site-packages/transformers/trainer.py:2589 in _inner_training_loop                                                             │
│                                                                                                                                                                                     
│   2586 │   │   │   │   │   │   self.state.global_step += 1                                                                                                                          │
│   2587 │   │   │   │   │   │   self.state.epoch = epoch + (step + 1 + steps_skipped) / steps_in                                                                                     │
│   2588 │   │   │   │   │   │   self.control = self.callback_handler.on_step_end(args, self.stat                                                                                     │
│ ❱ 2589 │   │   │   │   │   │   self._maybe_log_save_evaluate(                                                                                                                       │
│   2590 │   │   │   │   │   │   │   tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eva                                                                                     │
│   2591 │   │   │   │   │   │   )                                                                                                                                                    
│   2592 │   │   │   │   │   else:                                                                                                                                                    
│                                                                                                                                                                                     
│ /root/miniconda3/envs/ChatGLM3-Tuning/lib/python3.10/site-packages/transformers/trainer.py:3047 in _maybe_log_save_evaluate                                                         
│                                                                                                                                                                                     
│   3044 │   │                                                                                                                                                                        
│   3045 │   │   metrics = None                                                                                                                                                       
│   3046 │   │   if self.control.should_evaluate:                                                                                                                                     │
│ ❱ 3047 │   │   │   metrics = self._evaluate(trial, ignore_keys_for_eval)                                                                                                            │
│   3048 │   │   │   is_new_best_metric = self._determine_best_metric(metrics=metrics, trial=tria                                                                                     │
│   3049 │   │   │                                                                                                                                                                    
│   3050 │   │   │   if self.args.save_strategy == SaveStrategy.BEST:                                                                                                                 │
│                                                                                                                                                                                     
│ /root/miniconda3/envs/ChatGLM3-Tuning/lib/python3.10/site-packages/transformers/trainer.py:3001 in _evaluate                                                                        │
│                                                                                                                                                                                     
│   2998 │   │   │   )                                                                                                                                                                
│   2999 │                                                                                                                                                                            
│   3000 │   def _evaluate(self, trial, ignore_keys_for_eval, skip_scheduler=False):                                                                                                  │
│ ❱ 3001 │   │   metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)                                                                                                            │
│   3002 │   │   self._report_to_hp_search(trial, self.state.global_step, metrics)                                                                                                    │
│   3003 │   │                                                                                                                                                                        
│   3004 │   │   # Run delayed LR scheduler now that metrics are populated                                                                                                            │
│                                                                                                                                                                                     
│ /root/miniconda3/envs/ChatGLM3-Tuning/lib/python3.10/site-packages/transformers/trainer_seq2seq.py:195 in evaluate                                                                  │
│                                                                                                                                                                                     
│   192 │   │   # We don't want to drop samples in general                                                                                                                            │
│   193 │   │   self.gather_function = self.accelerator.gather                                                                                                                        │
│   194 │   │   self._gen_kwargs = gen_kwargs                                                                                                                                         │
│ ❱ 195 │   │   return super().evaluate(eval_dataset, ignore_keys=ignore_keys, metric_key_prefix                                                                                      │
│   196 │                                                                                                                                                                             
│   197 │   def predict(                                                                                                                                                              
│   198 │   │   self,                                                                                                                                                                 
│                                                                                                                                                                                     
│ /root/miniconda3/envs/ChatGLM3-Tuning/lib/python3.10/site-packages/transformers/trainer.py:4051 in evaluate                                                                         │
│                                                                                                                                                                                     
│   4048 │   │   start_time = time.time()                                                                                                                                             
│   4049 │   │                                                                                                                                                                        
│   4050 │   │   eval_loop = self.prediction_loop if self.args.use_legacy_prediction_loop else se                                                                                     │
│ ❱ 4051 │   │   output = eval_loop(                                                                                                                                                  
│   4052 │   │   │   eval_dataloader,                                                                                                                                                 
│   4053 │   │   │   description="Evaluation",                                                                                                                                        │
│   4054 │   │   │   # No point gathering the predictions if there are no metrics, otherwise we d                                                                                     │
│                                                                                                                                                                                     
│ /root/miniconda3/envs/ChatGLM3-Tuning/lib/python3.10/site-packages/transformers/trainer.py:4245 in evaluation_loop                                                                  │
│                                                                                                                                                                                     
│   4242 │   │   │   │   │   batch_size = observed_batch_size                                                                                                                         │
│   4243 │   │   │                                                                                                                                                                    
│   4244 │   │   │   # Prediction step                                                                                                                                                
│ ❱ 4245 │   │   │   losses, logits, labels = self.prediction_step(model, inputs, prediction_loss                                                                                     │
│   4246 │   │   │   main_input_name = getattr(self.model, "main_input_name", "input_ids")                                                                                            │
│   4247 │   │   │   inputs_decode = (                                                                                                                                                
│   4248 │   │   │   │   self._prepare_input(inputs[main_input_name]) if "inputs" in args.include                                                                                     │
│                                                                                                                                                                                     
│ /root/autodl-tmp/ChatGLM3/finetune_demo/finetune_hf.py:82 in prediction_step                                                                                                        │
│                                                                                                                                                                                     
│    79 │   │   if self.args.predict_with_generate:                                                                                                                                   │
│    80 │   │   │   output_ids = inputs.pop('output_ids')                                                                                                                             │
│    81 │   │   input_ids = inputs['input_ids']                                                                                                                                       
│ ❱  82 │   │   loss, generated_tokens, labels = super().prediction_step(                                                                                                             │
│    83 │   │   │   model, inputs, prediction_loss_only, ignore_keys, **gen_kwargs                                                                                                    │
│    84 │   │   )                                                                                                                                                                     
│    85 │   │   generated_tokens = generated_tokens[:, input_ids.size()[1]:]                                                                                                         │
│                                                                                                                                                                                     
│ /root/miniconda3/envs/ChatGLM3-Tuning/lib/python3.10/site-packages/transformers/trainer_seq2seq.py:354 in prediction_step                                                           
│                                                                                                                                                                                     
│   351 │   │   │   │   if self.label_smoother is not None:                                                                                                                           │
│   352 │   │   │   │   │   loss = self.label_smoother(outputs, inputs["labels"]).mean().detach(                                                                                      │
│   353 │   │   │   │   else:                                                                                                                                                         
│ ❱ 354 │   │   │   │   │   loss = (outputs["loss"] if isinstance(outputs, dict) else outputs[0]                                                                                      │
│   355 │   │   │   else:                                                                                                                                                             
│   356 │   │   │   │   loss = None                                                                                                                                                   
│   357                                                                                                                                                                               
│                                                                                                                                                                                     
│ /root/miniconda3/envs/ChatGLM3-Tuning/lib/python3.10/site-packages/transformers/utils/generic.py:431 in __getitem__                                                                 │
│                                                                                                                                                                                     
│   428 │   def __getitem__(self, k):                                                                                                                                                 
│   429 │   │   if isinstance(k, str):                                                                                                                                                
│   430 │   │   │   inner_dict = dict(self.items())                                                                                                                                   
│ ❱ 431 │   │   │   return inner_dict[k]                                                                                                                                              
│   432 │   │   else:                                                                                                                                                                 
│   433 │   │   │   return self.to_tuple()[k]

### Expected behavior / 期待表现

我希望可以指导我解决这个问题,或者告诉我问题出在哪里,谢谢!
@cosin1415
Copy link

把transformers的版本降到4.40.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants