Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relationship target error #109

Open
kesaroid opened this issue Nov 5, 2020 · 0 comments
Open

Relationship target error #109

kesaroid opened this issue Nov 5, 2020 · 0 comments

Comments

@kesaroid
Copy link

kesaroid commented Nov 5, 2020

I am running the graph-rcnn network on a custom dataset and I am recieving the following error.

2020-11-05 18:46:14,177 scene_graph_generation.trainer INFO: Start training

2020-11-05 18:46:17,477 scene_graph_generation.trainer INFO: model: scene_parser  eta: 3 days, 19:37:47  iter: 0/100000  loss: 5.7890 (5.7890)  loss_classifier: 3.7554 (3.7554)  loss_box_reg: 0.0884 (0.0884)  loss_obj_classifier: 0.0000 (0.0000)  loss_relpn: 1.1636 (1.1636)  loss_pred_classifier: 0.0000 (0.0000)  loss_objectness: 0.7720 (0.7720)  loss_rpn_box_reg: 0.0096 (0.0096)  time: 3.2987 (3.2987)  data: 2.2400 (2.2400)  lr: 0.001667  max mem: 5850

2020-11-05 18:46:36,095 scene_graph_generation.trainer INFO: model: scene_parser  eta: 1 day, 4:59:03  iter: 20/100000  loss: 1.2947 (2.3564)  loss_classifier: 0.7579 (1.0317)  loss_box_reg: 0.2304 (0.2288)  loss_obj_classifier: 0.0000 (0.0000)  loss_relpn: 0.1001 (0.2322)  loss_pred_classifier: 0.0000 (0.6398)  loss_objectness: 0.0946 (0.1937)  loss_rpn_box_reg: 0.0237 (0.0303)  time: 0.9302 (1.0436)  data: 0.0039 (0.1106)  lr: 0.001800  max mem: 6381

2020-11-05 18:46:54,608 scene_graph_generation.trainer INFO: model: scene_parser  eta: 1 day, 3:22:49  iter: 40/100000  loss: 1.0378 (2.0255)  loss_classifier: 0.5367 (0.8101)  loss_box_reg: 0.1825 (0.2174)  loss_obj_classifier: 0.0000 (0.0000)  loss_relpn: 0.0745 (0.1592)  loss_pred_classifier: 0.0000 (0.6767)  loss_objectness: 0.0736 (0.1368)  loss_rpn_box_reg: 0.0149 (0.0253)  time: 0.9274 (0.9861)  data: 0.0043 (0.0587)  lr: 0.001933  max mem: 6381

/opt/conda/conda-bld/pytorch_1544202130060/work/aten/src/THCUNN/BCECriterion.cu:42: Acctype bce_functor<Dtype, Acctype>::operator()(Tuple) [with Tuple = thrust::detail::tuple_of_iterator_references<thrust::device_reference<float>, thrust::device_reference<float>, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, Dtype = float, Acctype = float]: block: [0,0,0], thread: [0,0,0] Assertion `input >= 0. && input <= 1.` failed.

/opt/conda/conda-bld/pytorch_1544202130060/work/aten/src/THCUNN/BCECriterion.cu:42: Acctype bce_functor<Dtype, Acctype>::operator()(Tuple) [with Tuple = thrust::detail::tuple_of_iterator_references<thrust::device_reference<float>, thrust::device_reference<float>, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, Dtype = float, Acctype = float]: block: [0,0,0], thread: [1,0,0] Assertion `input >= 0. && input <= 1.` failed.

/opt/conda/conda-bld/pytorch_1544202130060/work/aten/src/THCUNN/BCECriterion.cu:42: Acctype bce_functor<Dtype, Acctype>::operator()(Tuple) [with Tuple = thrust::detail::tuple_of_iterator_references<thrust::device_reference<float>, thrust::device_reference<float>, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, Dtype = float, Acctype = float]: block: [0,0,0], thread: [2,0,0] Assertion `input >= 0. && input <= 1.` failed.

/opt/conda/conda-bld/pytorch_1544202130060/work/aten/src/THCUNN/BCECriterion.cu:42: Acctype bce_functor<Dtype, Acctype>::operator()(Tuple) [with Tuple = thrust::detail::tuple_of_iterator_references<thrust::device_reference<float>, thrust::device_reference<float>, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, Dtype = float, Acctype = float]: block: [0,0,0], thread: [3,0,0] Assertion `input >= 0. && input <= 1.` failed.

Traceback (most recent call last):
  File "main.py", line 99, in <module>
    main()
  File "main.py", line 94, in main
    model = train(cfg, args)
  File "main.py", line 31, in train
    model.train()
  File "/home/engineering/Documents/Thesis/graph-rcnn.pytorch/lib/model.py", line 132, in train
    loss_dict = self.scene_parser(imgs, targets)
  File "/home/engineering/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/engineering/Documents/Thesis/graph-rcnn.pytorch/lib/scene_parser/parser.py", line 136, in forward
    x_pairs, detection_pairs, rel_heads_loss = self.rel_heads(relation_features, detections, targets)
  File "/home/engineering/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/engineering/Documents/Thesis/graph-rcnn.pytorch/lib/scene_parser/rcnn/modeling/relation_heads/relation_heads.py", line 151, in forward
    proposal_pairs, loss_relpn = self.relpn(proposals, targets)
  File "/home/engineering/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/engineering/Documents/Thesis/graph-rcnn.pytorch/lib/scene_parser/rcnn/modeling/relation_heads/relpn/relpn.py", line 307, in forward
    return self._relpnsample_train(proposals, targets)
  File "/home/engineering/Documents/Thesis/graph-rcnn.pytorch/lib/scene_parser/rcnn/modeling/relation_heads/relpn/relpn.py", line 190, in _relpnsample_train
    losses += F.binary_cross_entropy(relness.view(-1, 1), (labels[img_idx] > 0).view(-1, 1).float())
  File "/home/engineering/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 2027, in binary_cross_entropy
    input, target, weight, reduction_enum)
RuntimeError: after reduction step 2: device-side assert triggered

Now this maybe because my relationship targets are more than what's specified or negative. But I rechecked and the target values are 26 as mentioned in my cfg file. This is the error I recieve after using os.environ['CUDA_LAUNCH_BLOCKING'] = '1'
What else could be causing this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant