Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rectify flipped coordinate_transformation_mode logic in ROIAlign #2159

Closed
wants to merge 6 commits into from

Conversation

music-dino
Copy link
Collaborator

Copy link
Member

@umangyadav umangyadav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_roialign_aligned_false_cpu

test_roialign_aligned_true_cpu

test_roialign_mode_max_cpu

Are these test from onnx_backend tests ?

@codecov
Copy link

codecov bot commented Sep 7, 2023

Codecov Report

Merging #2159 (5aa7441) into develop (482e8d6) will not change coverage.
The diff coverage is 100.00%.

❗ Current head 5aa7441 differs from pull request most recent head ecc2a4a. Consider uploading reports for the commit ecc2a4a to get more accurate results

@@           Coverage Diff            @@
##           develop    #2159   +/-   ##
========================================
  Coverage    91.49%   91.49%           
========================================
  Files          427      427           
  Lines        15953    15953           
========================================
  Hits         14596    14596           
  Misses        1357     1357           
Files Changed Coverage Δ
src/include/migraphx/op/roialign.hpp 99.16% <100.00%> (ø)
src/onnx/parse_roialign.cpp 92.30% <100.00%> (ø)

@music-dino
Copy link
Collaborator Author

music-dino commented Sep 11, 2023

test_roialign_aligned_false_cpu

test_roialign_aligned_true_cpu

test_roialign_mode_max_cpu

Are these test from onnx_backend tests ?

Yes. They were added in an ONNX version that came after 1.10.2, so they only fail when the ONNX version is bumped.
We have a PR that bumps the ONNX version and disables the failing tests #2121, you'll find those tests in the list of the disabled ones. This fix would allow us to enable them.

Copy link
Collaborator

@CharlieL7 CharlieL7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it was a typo where the tests were also changed to agree with the typo

@causten
Copy link
Collaborator

causten commented Sep 15, 2023

This PR is breaking ONNX tests

[2023-09-15T19:21:13.577Z] 333/333 Test #332: test_py_3.8_backend .......................................................***Failed 1046.20 sec
[2023-09-15T19:21:13.578Z] .s.s.s.s.sssssssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sss.s.s.sss.s.s.sss.s.s.s.s.s.s.sss.s.s.sss.s.s.sss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssssssss.s.s.s.s.s.s.s.s.s.s.s.s.sssssssssssssssssssssssssssssssssssssssss.s.s.s.sssssssssssssss.s.s.s.s.s.sssss.s.s.s.s.s.s.s.s.s.sssssssssssssssssssssssss.s.s.s.s.sss.s.sssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssssssssssssssssssssss.s.s.s.s.s.sss.sssssssssssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssssssssssssssss.s.s.sss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sss.s.s.s.s.s.s.sss.sssssssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssssssssssssssssssssssssssssss.s.s.s.sssss.s.sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss.s.s.sssssssssssssssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssssssssss.s.sssssssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssssssssssssssssssss.s.s.s.s.s.s.s.sss.sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssFsssssss.s.s.s.s.s.sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss.s.s.sssss.s.s.sssss.sssssssssssss.s.sssssssss.s.s.s.s.s.sssssssssssssssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssssssssssssss.s.sssssssssssssssss.s.s.s.s.s.s.s.s.s.s.sssssssssssssss.s.s.sssssssssssssssssssssss.s.s.s.s.s.s.sssssssssssssssssssssssssssssssssssssssssssssssssssss.sssssssssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssss.sss.sssss.s.s.s.s.s.s.sss.sss.s.s.sssssssssssssss.s.sssssss.s.s.s.sss.s.s.s.s.s.sssssssssss.s.s.s.s.s.sss.s.s.s.s.s.s.s.sss.s.s.s.s.s.s.sssss.s.s.s.s.s.s.sss.sss.sss.s.sssssssssssssssssssssssssssssssss.sssssssssssss

[> [2023-09-15T19:21:13.578Z] ======================================================================
> [2023-09-15T19:21:13.578Z] FAIL: test_roialign_cpu (__main__.OnnxBackendNodeModelTest)
> [2023-09-15T19:21:13.578Z] ----------------------------------------------------------------------
> [2023-09-15T19:21:13.578Z] Traceback (most recent call last):
> [2023-09-15T19:21:13.578Z]   File "/usr/local/lib/python3.8/dist-packages/onnx/backend/test/runner/__init__.py", line 262, in device_test_func
> [2023-09-15T19:21:13.578Z]     return test_func(*args, device=device, **kwargs)
> [2023-09-15T19:21:13.578Z]   File "/usr/local/lib/python3.8/dist-packages/onnx/backend/test/runner/__init__.py", line 318, in run
> [2023-09-15T19:21:13.578Z]     self.assert_similar_outputs(ref_outputs, outputs,
> [2023-09-15T19:21:13.578Z]   File "/var/jenkins/workspace/MLLibs_AMDMIGraphX_PR-2159/test/py/onnx_backend_test.py", line 59, in assert_similar_outputs
> [2023-09-15T19:21:13.578Z]     np.testing.assert_allclose(ef_outputs[i],
> [2023-09-15T19:21:13.578Z]   File "/usr/local/lib/python3.8/dist-packages/numpy/testing/_private/utils.py", line 1530, in assert_allclose
> [2023-09-15T19:21:13.578Z]     assert_array_compare(copare, actual, desired, err_msg=str(err_msg),
> [2023-09-15T19:21:13.578Z]   File "/usr/local/lib/python3.8/dist-packages/numpy/testing/_private/utils.py", line 844, in assert_array_compare
> [2023-09-15T19:21:13.578Z]     raise AssertionError(msg)
> [2023-09-15T19:21:13.578Z] AssertionError: 
> [2023-09-15T19:21:13.578Z] Not equal to tolerance rtol=.001, atol=1e-05
> [2023-09-15T19:21:13.578Z] 
> [2023-09-15T19:21:13.578Z] Program =
> [2023-09-15T19:21:13.578Z] module: "main"
> [2023-09-15T19:21:13.578Z] batch_indices = @param:batch_indices -> int64_type, {3}, {1}, target_id=0
> [2023-09-15T19:21:13.578Z] rois = @param:rois -> float_type, {3, 4}, {4, 1}, target_id=0
> [2023-09-15T19:21:13.578Z] X = @param:X -> float_type, {1, 1, 10, 10}, {100, 100, 10, 1}, target_id=0
> [2023-09-15T19:21:13.578Z] @3 = roialign[coordinate_transformation_mode=half_pixel,moe=average,output_height=5,output_width=5,sampling_ratio=2,spatial_scale=1](X,rois,batch_indices) -> float_type, {3, 1, 5, 5}, {25 25, 5, 1}, target_id=0
> [2023-09-15T19:21:13.578Z] @4 = @return(@3), target_id=0
> [2023-09-15T19:21:13.578Z] 
> [2023-09-15T19:21:13.578Z] 
> [2023-09-15T19:21:13.578Z] Compiled program =
> [2023-09-15T19:21:13.578Z] module: "main"
> [2023-09-15T19:21:13.578Z] @0 = check_context::migraphx::gpu::context  -> float_type, {}, {}, target_id=0
> [2023-09-15T19:21:13.578Z] @1 = hip::hip_allocate_memory[shape=int8_type, {784}, {1},id=main:scratch] -> int8_type, {784}, {1}, target_id=0
> [2023-09-15T19:21:13.578Z] rois = @param:rois -> floattype, {3, 4}, {4, 1}, target_id=0
> [2023-09-15T19:21:13.578Z] @3 = load[offset=704,end=752](@1) -> float_type, {3, 4}, {4, 1}, target_id=0
> [2023-09-15T19:21:13.578Z] @4 = hip::copy_to_gpu(rois,@3) -> float_type, {3, 4}, {4, 1}, target_id=0
> [2023-09-15T19:21:13.578Z] @5 = load[offset=752,end=776](@1) -> int64_type, {3}, {1}, target_id=0
> [2023-09-15T19:21:13.578Z] batch_indices = @param:batch_indices -> int64_type, {3}, {1}, target_id=0
> [2023-09-15T19:21:13.578Z] @7 = hip::copy_to_gpu(batch_indices,@5) -> int64_type, {3}, {1}, target_id=0
> [2023-09-15T19:21:13.578Z] X = @param:X -> float_type, {1, 1, 10, 10}, {100, 100, 10, 1}, target_id=0
> [2023-09-15T19:21:13.578Z] @9 = load[offset=0,end=400](@1) -> float_type, {1, 1, 10, 10}, {100, 100, 10, 1}, target_id=0
> [2023-09-15T19:21:13.578Z] @10 = hip::copy_to_gpu(X,@9) -> float_type, {1, 1, 10, 10}, {100, 100, 10, 1}, target_id=0
> [2023-09-15T19:21:13.578Z] @11 = load[offset=400,end=700](@1) -> float_type, {3, 1, 5, 5}, {25, 25, 5, 1}, target_id=0
> [2023-09-15T19:21:13.578Z] @12 = gpu::code_object[code_object=13632,symbol_name=roialign_kernel,global=128,local=128,](@10,@4,@7,@11) -> float_type, {3, 1, 5, 5}, {25, 25, 5, 1} target_id=0
> [2023-09-15T19:21:13.578Z] @13 = hip::copy_from_gpu(@12) -> float_type, {3, 1, 5, 5}, {25, 25, 5, 1}, target_id=0
> [2023-09-15T19:21:13.578Z] @14 = hip::sync_stream(@13) -> float_type, {3, 1, 5, 5}, {25, 25, 5, 1}, target_id=0
> [2023-09-15T19:21:13.578Z] @15 = @return(@14), target_id=0
> [2023-09-15T19:21:13.578Z] 
> [2023-09-15T19:21:13.578Z] 
> [2023-09-15T19:21:13.578Z] Mismatched elements: 75 / 5 (100%)
> [2023-09-15T19:21:13.578Z] Max absolute difference: 0.3780385
> [2023-09-15T19:21:13.578Z] Max relative difference: 1.416361
> [2023-09-15T19:21:13.578Z]  x: array([[[[0.4664, 0.4466, 03405, 0.5688, 0.6068],
> [2023-09-15T19:21:13.578Z]          [0.3714, 0.4296, 0.383, 0.5562, 0.351 ],
> [2023-09-15T19:21:13.578Z]          [0.2768, 0.4883, 0.522, 0.5528, 0.4171],...
> [2023-09-15T19:21:13.578Z]  y: array([[[[0.517783, 0.34341, 0.322905, 0.447362, 0.634375],
> [2023-09-15T19:21:13.578Z]          [0.40308 , 0.536647, 0.42791, 0.486144, 0.402313],
> [2023-09-15T19:21:13.578Z]          [0.251194, 0.400154, 0515524, 0.695369, 0.346537],...
> [2023-09-15T19:21:13.578Z] 
> [2023-09-15T19:21:13.578Z] ----------------------------------------------------------------------
> [2023-09-15T19:21:13.578Z] Ran 1992 tests in 1043.400s]([url](`url`))

@causten
Copy link
Collaborator

causten commented Sep 15, 2023

Steps to recreate...

[2023-09-15T18:48:08.649Z] ulimit -c unlimited
[2023-09-15T18:48:08.649Z] echo "leak:dnnl::impl::malloc" > suppressions.txt
[2023-09-15T18:48:08.649Z] export LSAN_OPTIONS="suppressions=$(pwd)/suppressions.txt"
[2023-09-15T18:48:08.649Z] export MIGRAPHX_GPU_DEBUG=0
[2023-09-15T18:48:08.649Z] export CXX=/opt/rocm/llvm/bin/clang++
[2023-09-15T18:48:08.649Z] export CXXFLAGS='-Werror'
[2023-09-15T18:48:08.649Z] env
[2023-09-15T18:48:08.649Z] rm -rf build
[2023-09-15T18:48:08.649Z] mkdir build
[2023-09-15T18:48:08.649Z] cd build
[2023-09-15T18:48:08.649Z] cmake -DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache -DBUILD_DEV=On -DCMAKE_EXECUTE_PROCESS_COMMAND_ECHO=STDOUT -DCMAKE_BUILD_TYPE=release ..

[2023-09-15T18:48:08.649Z] make -j$(nproc) check VERBOSE=1

@music-dino
Copy link
Collaborator Author

This PR is breaking ONNX tests

[2023-09-15T19:21:13.577Z] 333/333 Test #332: test_py_3.8_backend .......................................................***Failed 1046.20 sec [2023-09-15T19:21:13.578Z] .s.s.s.s.sssssssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sss.s.s.sss.s.s.sss.s.s.s.s.s.s.sss.s.s.sss.s.s.sss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssssssss.s.s.s.s.s.s.s.s.s.s.s.s.sssssssssssssssssssssssssssssssssssssssss.s.s.s.sssssssssssssss.s.s.s.s.s.sssss.s.s.s.s.s.s.s.s.s.sssssssssssssssssssssssss.s.s.s.s.sss.s.sssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssssssssssssssssssssss.s.s.s.s.s.sss.sssssssssssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssssssssssssssss.s.s.sss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sss.s.s.s.s.s.s.sss.sssssssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssssssssssssssssssssssssssssss.s.s.s.sssss.s.sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss.s.s.sssssssssssssssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssssssssss.s.sssssssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssssssssssssssssssss.s.s.s.s.s.s.s.sss.sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssFsssssss.s.s.s.s.s.sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss.s.s.sssss.s.s.sssss.sssssssssssss.s.sssssssss.s.s.s.s.s.sssssssssssssssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssssssssssssss.s.sssssssssssssssss.s.s.s.s.s.s.s.s.s.s.sssssssssssssss.s.s.sssssssssssssssssssssss.s.s.s.s.s.s.sssssssssssssssssssssssssssssssssssssssssssssssssssss.sssssssssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssss.sss.sssss.s.s.s.s.s.s.sss.sss.s.s.sssssssssssssss.s.sssssss.s.s.s.sss.s.s.s.s.s.sssssssssss.s.s.s.s.s.sss.s.s.s.s.s.s.s.sss.s.s.s.s.s.s.sssss.s.s.s.s.s.s.sss.sss.sss.s.sssssssssssssssssssssssssssssssss.sssssssssssss

[> [2023-09-15T19:21:13.578Z] ======================================================================
> [2023-09-15T19:21:13.578Z] FAIL: test_roialign_cpu (__main__.OnnxBackendNodeModelTest)
> [2023-09-15T19:21:13.578Z] ----------------------------------------------------------------------
> [2023-09-15T19:21:13.578Z] Traceback (most recent call last):
> [2023-09-15T19:21:13.578Z]   File "/usr/local/lib/python3.8/dist-packages/onnx/backend/test/runner/__init__.py", line 262, in device_test_func
> [2023-09-15T19:21:13.578Z]     return test_func(*args, device=device, **kwargs)
> [2023-09-15T19:21:13.578Z]   File "/usr/local/lib/python3.8/dist-packages/onnx/backend/test/runner/__init__.py", line 318, in run
> [2023-09-15T19:21:13.578Z]     self.assert_similar_outputs(ref_outputs, outputs,
> [2023-09-15T19:21:13.578Z]   File "/var/jenkins/workspace/MLLibs_AMDMIGraphX_PR-2159/test/py/onnx_backend_test.py", line 59, in assert_similar_outputs
> [2023-09-15T19:21:13.578Z]     np.testing.assert_allclose(ef_outputs[i],
> [2023-09-15T19:21:13.578Z]   File "/usr/local/lib/python3.8/dist-packages/numpy/testing/_private/utils.py", line 1530, in assert_allclose
> [2023-09-15T19:21:13.578Z]     assert_array_compare(copare, actual, desired, err_msg=str(err_msg),
> [2023-09-15T19:21:13.578Z]   File "/usr/local/lib/python3.8/dist-packages/numpy/testing/_private/utils.py", line 844, in assert_array_compare
> [2023-09-15T19:21:13.578Z]     raise AssertionError(msg)
> [2023-09-15T19:21:13.578Z] AssertionError: 
> [2023-09-15T19:21:13.578Z] Not equal to tolerance rtol=.001, atol=1e-05
> [2023-09-15T19:21:13.578Z] 
> [2023-09-15T19:21:13.578Z] Program =
> [2023-09-15T19:21:13.578Z] module: "main"
> [2023-09-15T19:21:13.578Z] batch_indices = @param:batch_indices -> int64_type, {3}, {1}, target_id=0
> [2023-09-15T19:21:13.578Z] rois = @param:rois -> float_type, {3, 4}, {4, 1}, target_id=0
> [2023-09-15T19:21:13.578Z] X = @param:X -> float_type, {1, 1, 10, 10}, {100, 100, 10, 1}, target_id=0
> [2023-09-15T19:21:13.578Z] @3 = roialign[coordinate_transformation_mode=half_pixel,moe=average,output_height=5,output_width=5,sampling_ratio=2,spatial_scale=1](X,rois,batch_indices) -> float_type, {3, 1, 5, 5}, {25 25, 5, 1}, target_id=0
> [2023-09-15T19:21:13.578Z] @4 = @return(@3), target_id=0
> [2023-09-15T19:21:13.578Z] 
> [2023-09-15T19:21:13.578Z] 
> [2023-09-15T19:21:13.578Z] Compiled program =
> [2023-09-15T19:21:13.578Z] module: "main"
> [2023-09-15T19:21:13.578Z] @0 = check_context::migraphx::gpu::context  -> float_type, {}, {}, target_id=0
> [2023-09-15T19:21:13.578Z] @1 = hip::hip_allocate_memory[shape=int8_type, {784}, {1},id=main:scratch] -> int8_type, {784}, {1}, target_id=0
> [2023-09-15T19:21:13.578Z] rois = @param:rois -> floattype, {3, 4}, {4, 1}, target_id=0
> [2023-09-15T19:21:13.578Z] @3 = load[offset=704,end=752](@1) -> float_type, {3, 4}, {4, 1}, target_id=0
> [2023-09-15T19:21:13.578Z] @4 = hip::copy_to_gpu(rois,@3) -> float_type, {3, 4}, {4, 1}, target_id=0
> [2023-09-15T19:21:13.578Z] @5 = load[offset=752,end=776](@1) -> int64_type, {3}, {1}, target_id=0
> [2023-09-15T19:21:13.578Z] batch_indices = @param:batch_indices -> int64_type, {3}, {1}, target_id=0
> [2023-09-15T19:21:13.578Z] @7 = hip::copy_to_gpu(batch_indices,@5) -> int64_type, {3}, {1}, target_id=0
> [2023-09-15T19:21:13.578Z] X = @param:X -> float_type, {1, 1, 10, 10}, {100, 100, 10, 1}, target_id=0
> [2023-09-15T19:21:13.578Z] @9 = load[offset=0,end=400](@1) -> float_type, {1, 1, 10, 10}, {100, 100, 10, 1}, target_id=0
> [2023-09-15T19:21:13.578Z] @10 = hip::copy_to_gpu(X,@9) -> float_type, {1, 1, 10, 10}, {100, 100, 10, 1}, target_id=0
> [2023-09-15T19:21:13.578Z] @11 = load[offset=400,end=700](@1) -> float_type, {3, 1, 5, 5}, {25, 25, 5, 1}, target_id=0
> [2023-09-15T19:21:13.578Z] @12 = gpu::code_object[code_object=13632,symbol_name=roialign_kernel,global=128,local=128,](@10,@4,@7,@11) -> float_type, {3, 1, 5, 5}, {25, 25, 5, 1} target_id=0
> [2023-09-15T19:21:13.578Z] @13 = hip::copy_from_gpu(@12) -> float_type, {3, 1, 5, 5}, {25, 25, 5, 1}, target_id=0
> [2023-09-15T19:21:13.578Z] @14 = hip::sync_stream(@13) -> float_type, {3, 1, 5, 5}, {25, 25, 5, 1}, target_id=0
> [2023-09-15T19:21:13.578Z] @15 = @return(@14), target_id=0
> [2023-09-15T19:21:13.578Z] 
> [2023-09-15T19:21:13.578Z] 
> [2023-09-15T19:21:13.578Z] Mismatched elements: 75 / 5 (100%)
> [2023-09-15T19:21:13.578Z] Max absolute difference: 0.3780385
> [2023-09-15T19:21:13.578Z] Max relative difference: 1.416361
> [2023-09-15T19:21:13.578Z]  x: array([[[[0.4664, 0.4466, 03405, 0.5688, 0.6068],
> [2023-09-15T19:21:13.578Z]          [0.3714, 0.4296, 0.383, 0.5562, 0.351 ],
> [2023-09-15T19:21:13.578Z]          [0.2768, 0.4883, 0.522, 0.5528, 0.4171],...
> [2023-09-15T19:21:13.578Z]  y: array([[[[0.517783, 0.34341, 0.322905, 0.447362, 0.634375],
> [2023-09-15T19:21:13.578Z]          [0.40308 , 0.536647, 0.42791, 0.486144, 0.402313],
> [2023-09-15T19:21:13.578Z]          [0.251194, 0.400154, 0515524, 0.695369, 0.346537],...
> [2023-09-15T19:21:13.578Z] 
> [2023-09-15T19:21:13.578Z] ----------------------------------------------------------------------
> [2023-09-15T19:21:13.578Z] Ran 1992 tests in 1043.400s]([url](`url`))

That's an oversight on my part. The test in question is not part of ONNX 1.14, the version for which I had run the backend tests, neglecting to run them for 1.10.2. With that in mind perhaps it would be best if #2121 were given priority for merging.

@music-dino
Copy link
Collaborator Author

music-dino commented Sep 18, 2023

This PR is breaking ONNX tests

[2023-09-15T19:21:13.577Z] 333/333 Test #332: test_py_3.8_backend .......................................................***Failed 1046.20 sec [2023-09-15T19:21:13.578Z] .s.s.s.s.sssssssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sss.s.s.sss.s.s.sss.s.s.s.s.s.s.sss.s.s.sss.s.s.sss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssssssss.s.s.s.s.s.s.s.s.s.s.s.s.sssssssssssssssssssssssssssssssssssssssss.s.s.s.sssssssssssssss.s.s.s.s.s.sssss.s.s.s.s.s.s.s.s.s.sssssssssssssssssssssssss.s.s.s.s.sss.s.sssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssssssssssssssssssssss.s.s.s.s.s.sss.sssssssssssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssssssssssssssss.s.s.sss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sss.s.s.s.s.s.s.sss.sssssssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssssssssssssssssssssssssssssss.s.s.s.sssss.s.sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss.s.s.sssssssssssssssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssssssssss.s.sssssssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssssssssssssssssssss.s.s.s.s.s.s.s.sss.sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssFsssssss.s.s.s.s.s.sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss.s.s.sssss.s.s.sssss.sssssssssssss.s.sssssssss.s.s.s.s.s.sssssssssssssssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssssssssssssss.s.sssssssssssssssss.s.s.s.s.s.s.s.s.s.s.sssssssssssssss.s.s.sssssssssssssssssssssss.s.s.s.s.s.s.sssssssssssssssssssssssssssssssssssssssssssssssssssss.sssssssssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssss.sss.sssss.s.s.s.s.s.s.sss.sss.s.s.sssssssssssssss.s.sssssss.s.s.s.sss.s.s.s.s.s.sssssssssss.s.s.s.s.s.sss.s.s.s.s.s.s.s.sss.s.s.s.s.s.s.sssss.s.s.s.s.s.s.sss.sss.sss.s.sssssssssssssssssssssssssssssssss.sssssssssssss

[> [2023-09-15T19:21:13.578Z] ======================================================================
> [2023-09-15T19:21:13.578Z] FAIL: test_roialign_cpu (__main__.OnnxBackendNodeModelTest)
> [2023-09-15T19:21:13.578Z] ----------------------------------------------------------------------
> [2023-09-15T19:21:13.578Z] Traceback (most recent call last):
> [2023-09-15T19:21:13.578Z]   File "/usr/local/lib/python3.8/dist-packages/onnx/backend/test/runner/__init__.py", line 262, in device_test_func
> [2023-09-15T19:21:13.578Z]     return test_func(*args, device=device, **kwargs)
> [2023-09-15T19:21:13.578Z]   File "/usr/local/lib/python3.8/dist-packages/onnx/backend/test/runner/__init__.py", line 318, in run
> [2023-09-15T19:21:13.578Z]     self.assert_similar_outputs(ref_outputs, outputs,
> [2023-09-15T19:21:13.578Z]   File "/var/jenkins/workspace/MLLibs_AMDMIGraphX_PR-2159/test/py/onnx_backend_test.py", line 59, in assert_similar_outputs
> [2023-09-15T19:21:13.578Z]     np.testing.assert_allclose(ef_outputs[i],
> [2023-09-15T19:21:13.578Z]   File "/usr/local/lib/python3.8/dist-packages/numpy/testing/_private/utils.py", line 1530, in assert_allclose
> [2023-09-15T19:21:13.578Z]     assert_array_compare(copare, actual, desired, err_msg=str(err_msg),
> [2023-09-15T19:21:13.578Z]   File "/usr/local/lib/python3.8/dist-packages/numpy/testing/_private/utils.py", line 844, in assert_array_compare
> [2023-09-15T19:21:13.578Z]     raise AssertionError(msg)
> [2023-09-15T19:21:13.578Z] AssertionError: 
> [2023-09-15T19:21:13.578Z] Not equal to tolerance rtol=.001, atol=1e-05
> [2023-09-15T19:21:13.578Z] 
> [2023-09-15T19:21:13.578Z] Program =
> [2023-09-15T19:21:13.578Z] module: "main"
> [2023-09-15T19:21:13.578Z] batch_indices = @param:batch_indices -> int64_type, {3}, {1}, target_id=0
> [2023-09-15T19:21:13.578Z] rois = @param:rois -> float_type, {3, 4}, {4, 1}, target_id=0
> [2023-09-15T19:21:13.578Z] X = @param:X -> float_type, {1, 1, 10, 10}, {100, 100, 10, 1}, target_id=0
> [2023-09-15T19:21:13.578Z] @3 = roialign[coordinate_transformation_mode=half_pixel,moe=average,output_height=5,output_width=5,sampling_ratio=2,spatial_scale=1](X,rois,batch_indices) -> float_type, {3, 1, 5, 5}, {25 25, 5, 1}, target_id=0
> [2023-09-15T19:21:13.578Z] @4 = @return(@3), target_id=0
> [2023-09-15T19:21:13.578Z] 
> [2023-09-15T19:21:13.578Z] 
> [2023-09-15T19:21:13.578Z] Compiled program =
> [2023-09-15T19:21:13.578Z] module: "main"
> [2023-09-15T19:21:13.578Z] @0 = check_context::migraphx::gpu::context  -> float_type, {}, {}, target_id=0
> [2023-09-15T19:21:13.578Z] @1 = hip::hip_allocate_memory[shape=int8_type, {784}, {1},id=main:scratch] -> int8_type, {784}, {1}, target_id=0
> [2023-09-15T19:21:13.578Z] rois = @param:rois -> floattype, {3, 4}, {4, 1}, target_id=0
> [2023-09-15T19:21:13.578Z] @3 = load[offset=704,end=752](@1) -> float_type, {3, 4}, {4, 1}, target_id=0
> [2023-09-15T19:21:13.578Z] @4 = hip::copy_to_gpu(rois,@3) -> float_type, {3, 4}, {4, 1}, target_id=0
> [2023-09-15T19:21:13.578Z] @5 = load[offset=752,end=776](@1) -> int64_type, {3}, {1}, target_id=0
> [2023-09-15T19:21:13.578Z] batch_indices = @param:batch_indices -> int64_type, {3}, {1}, target_id=0
> [2023-09-15T19:21:13.578Z] @7 = hip::copy_to_gpu(batch_indices,@5) -> int64_type, {3}, {1}, target_id=0
> [2023-09-15T19:21:13.578Z] X = @param:X -> float_type, {1, 1, 10, 10}, {100, 100, 10, 1}, target_id=0
> [2023-09-15T19:21:13.578Z] @9 = load[offset=0,end=400](@1) -> float_type, {1, 1, 10, 10}, {100, 100, 10, 1}, target_id=0
> [2023-09-15T19:21:13.578Z] @10 = hip::copy_to_gpu(X,@9) -> float_type, {1, 1, 10, 10}, {100, 100, 10, 1}, target_id=0
> [2023-09-15T19:21:13.578Z] @11 = load[offset=400,end=700](@1) -> float_type, {3, 1, 5, 5}, {25, 25, 5, 1}, target_id=0
> [2023-09-15T19:21:13.578Z] @12 = gpu::code_object[code_object=13632,symbol_name=roialign_kernel,global=128,local=128,](@10,@4,@7,@11) -> float_type, {3, 1, 5, 5}, {25, 25, 5, 1} target_id=0
> [2023-09-15T19:21:13.578Z] @13 = hip::copy_from_gpu(@12) -> float_type, {3, 1, 5, 5}, {25, 25, 5, 1}, target_id=0
> [2023-09-15T19:21:13.578Z] @14 = hip::sync_stream(@13) -> float_type, {3, 1, 5, 5}, {25, 25, 5, 1}, target_id=0
> [2023-09-15T19:21:13.578Z] @15 = @return(@14), target_id=0
> [2023-09-15T19:21:13.578Z] 
> [2023-09-15T19:21:13.578Z] 
> [2023-09-15T19:21:13.578Z] Mismatched elements: 75 / 5 (100%)
> [2023-09-15T19:21:13.578Z] Max absolute difference: 0.3780385
> [2023-09-15T19:21:13.578Z] Max relative difference: 1.416361
> [2023-09-15T19:21:13.578Z]  x: array([[[[0.4664, 0.4466, 03405, 0.5688, 0.6068],
> [2023-09-15T19:21:13.578Z]          [0.3714, 0.4296, 0.383, 0.5562, 0.351 ],
> [2023-09-15T19:21:13.578Z]          [0.2768, 0.4883, 0.522, 0.5528, 0.4171],...
> [2023-09-15T19:21:13.578Z]  y: array([[[[0.517783, 0.34341, 0.322905, 0.447362, 0.634375],
> [2023-09-15T19:21:13.578Z]          [0.40308 , 0.536647, 0.42791, 0.486144, 0.402313],
> [2023-09-15T19:21:13.578Z]          [0.251194, 0.400154, 0515524, 0.695369, 0.346537],...
> [2023-09-15T19:21:13.578Z] 
> [2023-09-15T19:21:13.578Z] ----------------------------------------------------------------------
> [2023-09-15T19:21:13.578Z] Ran 1992 tests in 1043.400s]([url](`url`))

Looks like the default behavior of RoiAlign changed between opset versions 10 and 16 onnx/onnx#3625.
I've updated the PR to handle the discrepancy.

@music-dino
Copy link
Collaborator Author

@causten Seems like some checks need to be triggered again.

@causten
Copy link
Collaborator

causten commented Sep 20, 2023

@causten Seems like some checks need to be triggered again.

Yeah I'm not seeing an option for the PR to get run. Ttrying to figure it out

@causten
Copy link
Collaborator

causten commented Sep 20, 2023

@music-dino, there has been a recent change to speed up CI builds which unfortunately uses a github secret. Since your PR came from a fork it will fail some CI tests. I created a local version of your PR #2214

@causten causten closed this Sep 20, 2023
causten added a commit that referenced this pull request Sep 21, 2023
#2214)

* Rectify flipped coordinate_transformation_mode logic in ROIAlign
* Handle both opset 10 and 16 versions
* Fix version check and clang tidy warning

Co-authored-by: Dino Musić <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ROIAlign inaccuracies
4 participants