Investigate DETR model failure with latest develop branch #2137

umangyadav · 2023-08-30T19:42:20Z

Detr model from UIF was working until
master branch revision: 1354c86
but with the latest develop branch, it is running into Following error :

Compiling ...
Reading: detr_r50_fp32.onnx
terminate called after throwing an instance of 'migraphx::version_2_7_0::exception'
what(): /home/umayadav/repo/AMDMIGraphX/src/targets/gpu/gemm_impl.cpp:60: blas_shape: GPU_GEMM: needs to have one matrix stride as 1

The text was updated successfully, but these errors were encountered:

umangyadav · 2023-09-13T20:24:11Z

Detr model is failing because it has this pattern:

transpose --> contiguous --> preadd_layernorm --> add --> gemm.

Before this #2071,

since preadd_layernorm does not have module inputs, using empty input mods caused try_compute_shape to return false with the following exception and it didn't remove contiguous. https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/blob/f5da3bb258a6d67421d9a855e6a4eda3c406ed1c/src/include/migraphx/op/pointwise.hpp#L46

But after #2071 it uses output->module_inputs() and then, add being a pointwise binary operator can take non-standard shape and outputs standard shapes, and therefore try_compute_shape() returns true and eliminates the contiguous.

But then during [fuse_ops.cpp](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/blob/f5da3bb258a6d67421d9a855e6a4eda3c406ed1c/src/targets/gpu/fuse_ops.cpp#L808), gemm receives a transposed input shape that doesn't have a stride of 1 in the last two dimensions and it fails with
what(): /home/umayadav/repo/AMDMIGraphX/src/targets/gpu/gemm_impl.cpp:60: blas_shape: GPU_GEMM: needs to have one matrix stride as 1

pfultz2 · 2023-09-13T21:15:07Z

So we start with:

transpose --> contiguous --> preadd_layernorm --> add --> gemm.

And then eliminate_contiguous give us:

transpose --> preadd_layernorm --> add --> gemm.

But the output of add is not transposed?

Then we fuse it to this:

transpose --> preadd_layernorm_add --> gemm.

And then the output of preadd_layernorm_add is transposed?

umangyadav · 2023-09-13T22:13:52Z

And then the output of preadd_layernorm_add is transposed?

yes

umangyadav self-assigned this Aug 30, 2023

umangyadav mentioned this issue Sep 14, 2023

Preserve layout of fused kernel for layernorm+pointwise #2185

Merged

TedThemistokleous linked a pull request Sep 14, 2023 that will close this issue

Ignore order of arguments while doing layernorm + pointwise fusion #2189

Merged

causten closed this as completed in #2185 Sep 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate DETR model failure with latest develop branch #2137

Investigate DETR model failure with latest develop branch #2137

umangyadav commented Aug 30, 2023

umangyadav commented Sep 13, 2023 •

edited

Loading

pfultz2 commented Sep 13, 2023

umangyadav commented Sep 13, 2023

Investigate DETR model failure with latest develop branch #2137

Investigate DETR model failure with latest develop branch #2137

Comments

umangyadav commented Aug 30, 2023

umangyadav commented Sep 13, 2023 • edited Loading

pfultz2 commented Sep 13, 2023

umangyadav commented Sep 13, 2023

umangyadav commented Sep 13, 2023 •

edited

Loading