You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
training with sh file: tools\single_train.sh would report a runtime-error:
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.
it made that can not use pycharm to debug program with parameter: configs/classification/cifar10/mixups/basic/r18_mixups_CE_none.py --work_dir work_dirs/classification/cifar10/mixups/basic/r18_mixups_CE_none/
To Reproduce
The command I executed.
bash tools/single_train.sh configs/classification/cifar10/mixups/basic/r18_mixups_CE_none.py
or set the parameter of pycharm as onfigs/classification/cifar10/mixups/basic/r18_mixups_CE_none.py --work_dir work_dirs/classification/cifar10/mixups/basic/r18_mixups_CE_none/ and execute with pycharm
Post related information
Your train log file if you meet the problem during training.
2024-07-17 14:15:48,333 - openmixup - INFO - workflow: [('train', 1)], max: 20 epochs
2024-07-17 14:15:48,333 - openmixup - INFO - Checkpoints will be saved to /home/hang/research/repository/openmixup/work_dirs/classification/cifar10/mixups/basic/r18_mixups_CE_none by HardDiskBackend.
2024-07-17 14:15:58,315 - openmixup - INFO - Epoch [1][50/500] lr: 1.000e-01, eta: 0:32:57, time: 0.199, data_time: 0.049, memory: 2011, loss: 3.1864, acc: 11.2400, acc_mix: 11.1770
2024-07-17 14:16:02,621 - openmixup - INFO - Epoch [1][100/500] lr: 1.000e-01, eta: 0:23:31, time: 0.086, data_time: 0.002, memory: 2011, loss: 2.2559, acc: 13.2800, acc_mix: 14.8798
2024-07-17 14:16:06,981 - openmixup - INFO - Epoch [1][150/500] lr: 1.000e-01, eta: 0:20:21, time: 0.087, data_time: 0.001, memory: 2011, loss: 2.1951, acc: 15.7400, acc_mix: 17.7451
2024-07-17 14:16:11,302 - openmixup - INFO - Epoch [1][200/500] lr: 1.000e-01, eta: 0:18:43, time: 0.087, data_time: 0.002, memory: 2011, loss: 2.1531, acc: 17.2800, acc_mix: 20.0116
2024-07-17 14:16:15,607 - openmixup - INFO - Epoch [1][250/500] lr: 1.000e-01, eta: 0:17:41, time: 0.086, data_time: 0.001, memory: 2011, loss: 2.1376, acc: 18.2000, acc_mix: 20.3139
2024-07-17 14:16:19,971 - openmixup - INFO - Epoch [1][300/500] lr: 1.000e-01, eta: 0:17:01, time: 0.087, data_time: 0.002, memory: 2011, loss: 2.1047, acc: 19.3600, acc_mix: 22.5280
2024-07-17 14:16:24,319 - openmixup - INFO - Epoch [1][350/500] lr: 1.000e-01, eta: 0:16:30, time: 0.087, data_time: 0.002, memory: 2011, loss: 2.0813, acc: 20.6200, acc_mix: 23.2146
2024-07-17 14:16:28,675 - openmixup - INFO - Epoch [1][400/500] lr: 1.000e-01, eta: 0:16:07, time: 0.087, data_time: 0.002, memory: 2011, loss: 2.0527, acc: 19.5200, acc_mix: 24.3855
2024-07-17 14:16:33,065 - openmixup - INFO - Epoch [1][450/500] lr: 1.000e-01, eta: 0:15:48, time: 0.088, data_time: 0.001, memory: 2011, loss: 2.0525, acc: 19.1800, acc_mix: 25.3555
2024-07-17 14:16:37,458 - openmixup - INFO - Exp name: r18_mixups_CE_none.py
2024-07-17 14:16:37,458 - openmixup - INFO - Epoch [1][500/500] lr: 1.000e-01, eta: 0:15:32, time: 0.088, data_time: 0.002, memory: 2011, loss: 2.0137, acc: 23.0800, acc_mix: 26.3356
Traceback (most recent call last):
File "/home/hang/research/repository/openmixup/tools/train.py", line 208, in<module>main()
File "/home/hang/research/repository/openmixup/tools/train.py", line 198, in main
train_model(
File "/home/hang/research/repository/openmixup/openmixup/apis/train.py", line 225, in train_model
runner.run(data_loaders, cfg.workflow)
File "/home/hang/anaconda3/envs/openmixup/lib/python3.9/site-packages/mmcv/runner/epoch_based_runner.py", line 136, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/hang/anaconda3/envs/openmixup/lib/python3.9/site-packages/mmcv/runner/epoch_based_runner.py", line 58, in train
self.call_hook('after_train_epoch')
File "/home/hang/anaconda3/envs/openmixup/lib/python3.9/site-packages/mmcv/runner/base_runner.py", line 317, in call_hook
getattr(hook, fn_name)(self)
File "/home/hang/research/repository/openmixup/openmixup/core/hooks/validate_hook.py", line 281, in after_train_epoch
self._run_validate(runner)
File "/home/hang/research/repository/openmixup/openmixup/core/hooks/validate_hook.py", line 371, in _run_validate
dist.broadcast(module.running_var, 0)
File "/home/hang/anaconda3/envs/openmixup/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 1192, in broadcast
default_pg = _get_default_group()
File "/home/hang/anaconda3/envs/openmixup/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 429, in _get_default_group
raise RuntimeError(
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.
Additional context
The text was updated successfully, but these errors were encountered:
Describe the bug
training with sh file: tools\single_train.sh would report a runtime-error:
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.
it made that can not use pycharm to debug program with parameter: configs/classification/cifar10/mixups/basic/r18_mixups_CE_none.py --work_dir work_dirs/classification/cifar10/mixups/basic/r18_mixups_CE_none/
To Reproduce
The command I executed.
bash tools/single_train.sh configs/classification/cifar10/mixups/basic/r18_mixups_CE_none.py or set the parameter of pycharm as onfigs/classification/cifar10/mixups/basic/r18_mixups_CE_none.py --work_dir work_dirs/classification/cifar10/mixups/basic/r18_mixups_CE_none/ and execute with pycharm
Post related information
Additional context
The text was updated successfully, but these errors were encountered: