You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
convert squad examples to features: 100%|█████████████████████████████████████████████████████████████████████████| 902/902 [00:00<00:00, 2092.37it/s]
add example index and unique id: 100%|██████████████████████████████████████████████████████████████████████████| 902/902 [00:00<00:00, 439863.06it/s]
06/14/2022 22:26:05 - INFO - main - Number of trainable params: 258,127,108
Selected optimization level O1: Insert automatic casts around Pytorch functions and Tensor methods.
Defaults for this optimization level are:
enabled : True
opt_level : O1
cast_model_type : None
patch_torch_functions : True
keep_batchnorm_fp32 : None
master_weights : None
loss_scale : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled : True
opt_level : O1
cast_model_type : None
patch_torch_functions : True
keep_batchnorm_fp32 : None
master_weights : None
loss_scale : dynamic
Warning: multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback. Original ImportError was: ModuleNotFoundError("No module named 'amp_C'")
06/14/2022 22:26:05 - INFO - main - ***** Running training *****
06/14/2022 22:26:05 - INFO - main - Num examples = 1218
06/14/2022 22:26:05 - INFO - main - Num Epochs = 2
06/14/2022 22:26:05 - INFO - main - Instantaneous batch size per GPU = 48
06/14/2022 22:26:05 - INFO - main - Total train batch size (w. parallel, distributed & accumulation) = 384
06/14/2022 22:26:05 - INFO - main - Gradient Accumulation steps = 1
06/14/2022 22:26:05 - INFO - main - Total optimization steps = 8
Epoch: 0%| | 0/2 [00:00<?, ?it/s]06/14/2022 22:26:05 - INFO - main -
[Epoch 1]
06/14/2022 22:26:05 - INFO - main - Initialize pre-batch of size 2 for Epoch 1
raceback (most recent call last): | 0/4 [00:00<?, ?it/s]
File "train_rc.py", line 593, in
main()
File "train_rc.py", line 537, in main
global_step, tr_loss = train(args, train_dataset, model, tokenizer)
File "train_rc.py", line 222, in train
outputs = model(**inputs)
File "/ssd3/wangxiao/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/ssd3/wangxiao/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/ssd3/wangxiao/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/ssd3/wangxiao/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/ssd3/wangxiao/anaconda3/envs/py38/lib/python3.8/site-packages/torch/_utils.py", line 425, in reraise
raise self.exc_type(msg)
StopIteration: Caught StopIteration in replica 0 on device 0.
Original Traceback (most recent call last):
File "/ssd3/wangxiao/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/ssd3/wangxiao/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/ssd1/zhangyiming/densephrase/DensePhrases/densephrases/encoder.py", line 132, in forward
start, end = self.embed_phrase(input_ids, attention_mask, token_type_ids)
File "/ssd1/zhangyiming/densephrase/DensePhrases/densephrases/encoder.py", line 94, in embed_phrase
outputs = self.phrase_encoder(
File "/ssd3/wangxiao/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/ssd3/wangxiao/anaconda3/envs/py38/lib/python3.8/site-packages/transformers/modeling_bert.py", line 707, in forward
attention_mask, input_shape, self.device
File "/ssd3/wangxiao/anaconda3/envs/py38/lib/python3.8/site-packages/transformers/modeling_utils.py", line 113, in device
return next(self.parameters()).device
StopIteration
The text was updated successfully, but these errors were encountered:
logs as following, thanks
convert squad examples to features: 100%|█████████████████████████████████████████████████████████████████████████| 902/902 [00:00<00:00, 2092.37it/s]
add example index and unique id: 100%|██████████████████████████████████████████████████████████████████████████| 902/902 [00:00<00:00, 439863.06it/s]
06/14/2022 22:26:05 - INFO - main - Number of trainable params: 258,127,108
Selected optimization level O1: Insert automatic casts around Pytorch functions and Tensor methods.
Defaults for this optimization level are:
enabled : True
opt_level : O1
cast_model_type : None
patch_torch_functions : True
keep_batchnorm_fp32 : None
master_weights : None
loss_scale : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled : True
opt_level : O1
cast_model_type : None
patch_torch_functions : True
keep_batchnorm_fp32 : None
master_weights : None
loss_scale : dynamic
Warning: multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback. Original ImportError was: ModuleNotFoundError("No module named 'amp_C'")
06/14/2022 22:26:05 - INFO - main - ***** Running training *****
06/14/2022 22:26:05 - INFO - main - Num examples = 1218
06/14/2022 22:26:05 - INFO - main - Num Epochs = 2
06/14/2022 22:26:05 - INFO - main - Instantaneous batch size per GPU = 48
06/14/2022 22:26:05 - INFO - main - Total train batch size (w. parallel, distributed & accumulation) = 384
06/14/2022 22:26:05 - INFO - main - Gradient Accumulation steps = 1
06/14/2022 22:26:05 - INFO - main - Total optimization steps = 8
Epoch: 0%| | 0/2 [00:00<?, ?it/s]06/14/2022 22:26:05 - INFO - main -
[Epoch 1]
06/14/2022 22:26:05 - INFO - main - Initialize pre-batch of size 2 for Epoch 1
raceback (most recent call last): | 0/4 [00:00<?, ?it/s]
File "train_rc.py", line 593, in
main()
File "train_rc.py", line 537, in main
global_step, tr_loss = train(args, train_dataset, model, tokenizer)
File "train_rc.py", line 222, in train
outputs = model(**inputs)
File "/ssd3/wangxiao/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/ssd3/wangxiao/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/ssd3/wangxiao/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/ssd3/wangxiao/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/ssd3/wangxiao/anaconda3/envs/py38/lib/python3.8/site-packages/torch/_utils.py", line 425, in reraise
raise self.exc_type(msg)
StopIteration: Caught StopIteration in replica 0 on device 0.
Original Traceback (most recent call last):
File "/ssd3/wangxiao/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/ssd3/wangxiao/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/ssd1/zhangyiming/densephrase/DensePhrases/densephrases/encoder.py", line 132, in forward
start, end = self.embed_phrase(input_ids, attention_mask, token_type_ids)
File "/ssd1/zhangyiming/densephrase/DensePhrases/densephrases/encoder.py", line 94, in embed_phrase
outputs = self.phrase_encoder(
File "/ssd3/wangxiao/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/ssd3/wangxiao/anaconda3/envs/py38/lib/python3.8/site-packages/transformers/modeling_bert.py", line 707, in forward
attention_mask, input_shape, self.device
File "/ssd3/wangxiao/anaconda3/envs/py38/lib/python3.8/site-packages/transformers/modeling_utils.py", line 113, in device
return next(self.parameters()).device
StopIteration
The text was updated successfully, but these errors were encountered: