Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate DETR model failure with latest develop branch #2137

Closed
umangyadav opened this issue Aug 30, 2023 · 3 comments · Fixed by #2185 or #2189
Closed

Investigate DETR model failure with latest develop branch #2137

umangyadav opened this issue Aug 30, 2023 · 3 comments · Fixed by #2185 or #2189
Assignees

Comments

@umangyadav
Copy link
Member

Detr model from UIF was working until
master branch revision: 1354c86
but with the latest develop branch, it is running into Following error :

Compiling ...
Reading: detr_r50_fp32.onnx
terminate called after throwing an instance of 'migraphx::version_2_7_0::exception'
what(): /home/umayadav/repo/AMDMIGraphX/src/targets/gpu/gemm_impl.cpp:60: blas_shape: GPU_GEMM: needs to have one matrix stride as 1

@umangyadav umangyadav self-assigned this Aug 30, 2023
@umangyadav
Copy link
Member Author

umangyadav commented Sep 13, 2023

Detr model is failing because it has this pattern:

transpose --> contiguous --> preadd_layernorm --> add --> gemm.

Before this #2071,

since preadd_layernorm does not have module inputs, using empty input mods caused try_compute_shape to return false with the following exception and it didn't remove contiguous. https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/blob/f5da3bb258a6d67421d9a855e6a4eda3c406ed1c/src/include/migraphx/op/pointwise.hpp#L46

But after #2071 it uses output->module_inputs() and then, add being a pointwise binary operator can take non-standard shape and outputs standard shapes, and therefore try_compute_shape() returns true and eliminates the contiguous.

But then during [fuse_ops.cpp](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/blob/f5da3bb258a6d67421d9a855e6a4eda3c406ed1c/src/targets/gpu/fuse_ops.cpp#L808), gemm receives a transposed input shape that doesn't have a stride of 1 in the last two dimensions and it fails with
what(): /home/umayadav/repo/AMDMIGraphX/src/targets/gpu/gemm_impl.cpp:60: blas_shape: GPU_GEMM: needs to have one matrix stride as 1

@pfultz2
Copy link
Collaborator

pfultz2 commented Sep 13, 2023

So we start with:

transpose --> contiguous --> preadd_layernorm --> add --> gemm.

And then eliminate_contiguous give us:

transpose --> preadd_layernorm --> add --> gemm.

But the output of add is not transposed?

Then we fuse it to this:

transpose --> preadd_layernorm_add --> gemm.

And then the output of preadd_layernorm_add is transposed?

@umangyadav
Copy link
Member Author

And then the output of preadd_layernorm_add is transposed?

yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants