We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pytorch 2.5.1 Python 3.12(ubuntu22.04) Cuda 12.4 transformers== 4.47.0 RTX 4090D(24GB) * 1 系 统 盘 :91% 28G/30G 数 据 盘:71% 36G/50G
@Btlmd
1.正常下载模型,安装依赖,跟网上大部分教程一致,实现了三种部署方式,都正常运行(可能过程有点小差错但是都解决了)具体操作流程如此网站一致:https://zhuanlan.zhihu.com/p/676106044 2.安装你们的官方文档lora微调示例,将自己的数据集(5000个样本)转换成相应的格式,创建环境且安装依赖,微调需要的依赖都成功安装,如何按照自己的路径启动微调,具体代码:
CUDA_VISIBLE_DEVICES=0 NCCL_P2P_DISABLE="1" NCCL_IB_DISABLE="1" python finetune_hf.py /root/autodl-tmp/ChatGLM3/finetune_demo/data/AdvertiseGen /root/autodl-tmp/ChatGLM3/chatglm3-6b configs/lora.yaml
3.我已经确定是评估阶段(到达评估步长出现的该错误)返回loss参数时出现的bug,但是具体原因不得而知,Traceback如下:
│ /root/autodl-tmp/ChatGLM3/finetune_demo/finetune_hf.py:539 in main │ │ │ 536 │ ) │ 537 │ │ 538 │ if auto_resume_from_checkpoint.upper() == "" or auto_resume_from_checkpoint is None: │ │ ❱ 539 │ │ trainer.train() │ 540 │ else: │ 541 │ │ def do_rf_checkpoint(sn): │ 542 │ │ │ model.gradient_checkpointing_enable() │ │ /root/miniconda3/envs/ChatGLM3-Tuning/lib/python3.10/site-packages/transformers/trainer.py:2164 in train │ │ │ 2161 │ │ │ finally: │ 2162 │ │ │ │ hf_hub_utils.enable_progress_bars() │ │ 2163 │ │ else: │ ❱ 2164 │ │ │ return inner_training_loop( │ │ 2165 │ │ │ │ args=args, │ 2166 │ │ │ │ resume_from_checkpoint=resume_from_checkpoint, │ │ 2167 │ │ │ │ trial=trial, │ │ /root/miniconda3/envs/ChatGLM3-Tuning/lib/python3.10/site-packages/transformers/trainer.py:2589 in _inner_training_loop │ │ │ 2586 │ │ │ │ │ │ self.state.global_step += 1 │ │ 2587 │ │ │ │ │ │ self.state.epoch = epoch + (step + 1 + steps_skipped) / steps_in │ │ 2588 │ │ │ │ │ │ self.control = self.callback_handler.on_step_end(args, self.stat │ │ ❱ 2589 │ │ │ │ │ │ self._maybe_log_save_evaluate( │ │ 2590 │ │ │ │ │ │ │ tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eva │ │ 2591 │ │ │ │ │ │ ) │ 2592 │ │ │ │ │ else: │ │ /root/miniconda3/envs/ChatGLM3-Tuning/lib/python3.10/site-packages/transformers/trainer.py:3047 in _maybe_log_save_evaluate │ │ 3044 │ │ │ 3045 │ │ metrics = None │ 3046 │ │ if self.control.should_evaluate: │ │ ❱ 3047 │ │ │ metrics = self._evaluate(trial, ignore_keys_for_eval) │ │ 3048 │ │ │ is_new_best_metric = self._determine_best_metric(metrics=metrics, trial=tria │ │ 3049 │ │ │ │ 3050 │ │ │ if self.args.save_strategy == SaveStrategy.BEST: │ │ │ /root/miniconda3/envs/ChatGLM3-Tuning/lib/python3.10/site-packages/transformers/trainer.py:3001 in _evaluate │ │ │ 2998 │ │ │ ) │ 2999 │ │ 3000 │ def _evaluate(self, trial, ignore_keys_for_eval, skip_scheduler=False): │ │ ❱ 3001 │ │ metrics = self.evaluate(ignore_keys=ignore_keys_for_eval) │ │ 3002 │ │ self._report_to_hp_search(trial, self.state.global_step, metrics) │ │ 3003 │ │ │ 3004 │ │ # Run delayed LR scheduler now that metrics are populated │ │ │ /root/miniconda3/envs/ChatGLM3-Tuning/lib/python3.10/site-packages/transformers/trainer_seq2seq.py:195 in evaluate │ │ │ 192 │ │ # We don't want to drop samples in general │ │ 193 │ │ self.gather_function = self.accelerator.gather │ │ 194 │ │ self._gen_kwargs = gen_kwargs │ │ ❱ 195 │ │ return super().evaluate(eval_dataset, ignore_keys=ignore_keys, metric_key_prefix │ │ 196 │ │ 197 │ def predict( │ 198 │ │ self, │ │ /root/miniconda3/envs/ChatGLM3-Tuning/lib/python3.10/site-packages/transformers/trainer.py:4051 in evaluate │ │ │ 4048 │ │ start_time = time.time() │ 4049 │ │ │ 4050 │ │ eval_loop = self.prediction_loop if self.args.use_legacy_prediction_loop else se │ │ ❱ 4051 │ │ output = eval_loop( │ 4052 │ │ │ eval_dataloader, │ 4053 │ │ │ description="Evaluation", │ │ 4054 │ │ │ # No point gathering the predictions if there are no metrics, otherwise we d │ │ │ /root/miniconda3/envs/ChatGLM3-Tuning/lib/python3.10/site-packages/transformers/trainer.py:4245 in evaluation_loop │ │ │ 4242 │ │ │ │ │ batch_size = observed_batch_size │ │ 4243 │ │ │ │ 4244 │ │ │ # Prediction step │ ❱ 4245 │ │ │ losses, logits, labels = self.prediction_step(model, inputs, prediction_loss │ │ 4246 │ │ │ main_input_name = getattr(self.model, "main_input_name", "input_ids") │ │ 4247 │ │ │ inputs_decode = ( │ 4248 │ │ │ │ self._prepare_input(inputs[main_input_name]) if "inputs" in args.include │ │ │ /root/autodl-tmp/ChatGLM3/finetune_demo/finetune_hf.py:82 in prediction_step │ │ │ 79 │ │ if self.args.predict_with_generate: │ │ 80 │ │ │ output_ids = inputs.pop('output_ids') │ │ 81 │ │ input_ids = inputs['input_ids'] │ ❱ 82 │ │ loss, generated_tokens, labels = super().prediction_step( │ │ 83 │ │ │ model, inputs, prediction_loss_only, ignore_keys, **gen_kwargs │ │ 84 │ │ ) │ 85 │ │ generated_tokens = generated_tokens[:, input_ids.size()[1]:] │ │ │ /root/miniconda3/envs/ChatGLM3-Tuning/lib/python3.10/site-packages/transformers/trainer_seq2seq.py:354 in prediction_step │ │ 351 │ │ │ │ if self.label_smoother is not None: │ │ 352 │ │ │ │ │ loss = self.label_smoother(outputs, inputs["labels"]).mean().detach( │ │ 353 │ │ │ │ else: │ ❱ 354 │ │ │ │ │ loss = (outputs["loss"] if isinstance(outputs, dict) else outputs[0] │ │ 355 │ │ │ else: │ 356 │ │ │ │ loss = None │ 357 │ │ /root/miniconda3/envs/ChatGLM3-Tuning/lib/python3.10/site-packages/transformers/utils/generic.py:431 in __getitem__ │ │ │ 428 │ def __getitem__(self, k): │ 429 │ │ if isinstance(k, str): │ 430 │ │ │ inner_dict = dict(self.items()) │ ❱ 431 │ │ │ return inner_dict[k] │ 432 │ │ else: │ 433 │ │ │ return self.to_tuple()[k] ### Expected behavior / 期待表现 我希望可以指导我解决这个问题,或者告诉我问题出在哪里,谢谢!
The text was updated successfully, but these errors were encountered:
把transformers的版本降到4.40.2
Sorry, something went wrong.
No branches or pull requests
System Info / 系統信息
Pytorch 2.5.1
Python 3.12(ubuntu22.04)
Cuda 12.4
transformers== 4.47.0
RTX 4090D(24GB) * 1
系 统 盘 :91% 28G/30G
数 据 盘:71% 36G/50G
Who can help? / 谁可以帮助到您?
@Btlmd
Information / 问题信息
Reproduction / 复现过程
1.正常下载模型,安装依赖,跟网上大部分教程一致,实现了三种部署方式,都正常运行(可能过程有点小差错但是都解决了)具体操作流程如此网站一致:https://zhuanlan.zhihu.com/p/676106044
2.安装你们的官方文档lora微调示例,将自己的数据集(5000个样本)转换成相应的格式,创建环境且安装依赖,微调需要的依赖都成功安装,如何按照自己的路径启动微调,具体代码:
3.我已经确定是评估阶段(到达评估步长出现的该错误)返回loss参数时出现的bug,但是具体原因不得而知,Traceback如下:
The text was updated successfully, but these errors were encountered: