-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: Early stopping conditioned on metric val_loss
which is not available. Pass in or modify your EarlyStopping
callback to use any of the following: ``
#52
Comments
I meet the same problem. |
Hello,
I'm trying to train the vposer with my own train and val dataset, but it always said val_loss is not available. I guessed it might be caused by the little validate dataset, but after I reduce the batch size, the error still exists. I found some verizon of pytorch-ligthning might have this issue. Could you please tell me the verison you use and give me some advise if you have this issue as well?
Epoch 0: 88%|████████▊ | 15/17 [00:00<00:00, 37.29it/s, loss=89.1, v_num=29]
Validating: 0it [00:00, ?it/s]
Validating: 0%| | 0/2 [00:00<?, ?it/s]{'weighted_loss': {'loss_kl': tensor(0.0516, device='cuda:0'), 'loss_mesh_rec': tensor(81.0408, device='cuda:0'), 'matrot': tensor(3.5944, device='cuda:0'), 'loss_total': tensor(84.6868, device='cuda:0')}, 'unweighted_loss': {'v2v': tensor(55.1946, device='cuda:0'), 'loss_total': tensor([55.1946], device='cuda:0')}}
{'weighted_loss': {'loss_kl': tensor(0.0580, device='cuda:0'), 'loss_mesh_rec': tensor(86.9938, device='cuda:0'), 'matrot': tensor(3.5297, device='cuda:0'), 'loss_total': tensor(90.5815, device='cuda:0')}, 'unweighted_loss': {'v2v': tensor(59.5597, device='cuda:0'), 'loss_total': tensor([59.5597], device='cuda:0')}}
[1] -- Epoch 0: val_loss:57.38
[1] -- lr is [0.001]
Traceback (most recent call last):
File "/home/drow/human_body_prior/src/train.py", line 54, in
main()
File "/home/drow/human_body_prior/src/train.py", line 50, in main
train_vposer_once(job)
File "/home/drow/human_body_prior/src/human_body_prior/train/vposer_trainer.py", line 351, in train_vposer_once
trainer.fit(model)
File "/home/drow/anaconda3/envs/vae/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 552, in fit
self._run(model)
File "/home/drow/anaconda3/envs/vae/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 917, in _run
self._dispatch()
File "/home/drow/anaconda3/envs/vae/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 985, in _dispatch
self.accelerator.start_training(self)
File "/home/drow/anaconda3/envs/vae/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 92, in start_training
self.training_type_plugin.start_training(trainer)
File "/home/drow/anaconda3/envs/vae/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 161, in start_training
self._results = trainer.run_stage()
File "/home/drow/anaconda3/envs/vae/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 995, in run_stage
return self._run_train()
File "/home/drow/anaconda3/envs/vae/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1044, in _run_train
self.fit_loop.run()
File "/home/drow/anaconda3/envs/vae/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 111, in run
self.advance(*args, **kwargs)
File "/home/drow/anaconda3/envs/vae/lib/python3.7/site-packages/pytorch_lightning/loops/fit_loop.py", line 200, in advance
epoch_output = self.epoch_loop.run(train_dataloader)
File "/home/drow/anaconda3/envs/vae/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 118, in run
output = self.on_run_end()
File "/home/drow/anaconda3/envs/vae/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 235, in on_run_end
self._on_train_epoch_end_hook(processed_outputs)
File "/home/drow/anaconda3/envs/vae/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 276, in _on_train_epoch_end_hook
trainer_hook(processed_epoch_output)
File "/home/drow/anaconda3/envs/vae/lib/python3.7/site-packages/pytorch_lightning/trainer/callback_hook.py", line 109, in on_train_epoch_end
callback.on_train_epoch_end(self, self.lightning_module)
File "/home/drow/anaconda3/envs/vae/lib/python3.7/site-packages/pytorch_lightning/callbacks/early_stopping.py", line 170, in on_train_epoch_end
self._run_early_stopping_check(trainer)
File "/home/drow/anaconda3/envs/vae/lib/python3.7/site-packages/pytorch_lightning/callbacks/early_stopping.py", line 185, in _run_early_stopping_check
logs
File "/home/drow/anaconda3/envs/vae/lib/python3.7/site-packages/pytorch_lightning/callbacks/early_stopping.py", line 134, in _validate_condition_metric
raise RuntimeError(error_msg)
RuntimeError: Early stopping conditioned on metric
val_loss
which is not available. Pass in or modify yourEarlyStopping
callback to use any of the following: ``Epoch 0: 100%|██████████| 17/17 [00:00<00:00, 35.66it/s, loss=89.1, v_num=29]
Epoch 0: 100%|██████████| 17/17 [00:00<00:00, 32.31it/s, loss=89.1, v_num=29]
The text was updated successfully, but these errors were encountered: