-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for standalone mlir-conv #2110
Conversation
7f14839
to
63bdf8f
Compare
Codecov Report
@@ Coverage Diff @@
## develop #2110 +/- ##
========================================
Coverage 91.43% 91.43%
========================================
Files 422 422
Lines 15771 15771
========================================
Hits 14420 14420
Misses 1351 1351 |
src/targets/gpu/fuse_mlir.cpp
Outdated
MIGRAPHX_PRED_MATCHER(is_supported_arch, instruction_ref) | ||
{ | ||
// TODO(ravil): debug | ||
static std::unordered_set<std::string> supported_consumer_archs{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This list isn't right.
Also, I'm pretty sure we're only meant to offload on gfx110x
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This list isn't right.
You are correct. I just used it to test the concept on MI200 because I am still waiting to get access to a Navi machine.
From what I understood, gfx11*
or gfx110*
denote Navi3X GPUs. Maybe we can use a regex to capture current and future Navi cards?
Yi just updated me a few hours ago that gfx1100 for Navi31, gfx1101 for Navi32
src/targets/gpu/fuse_mlir.cpp
Outdated
void apply(module_pass_manager& mpm, const match::matcher_result& r) const | ||
{ | ||
auto conv_based_op = r.result; | ||
// Only fuse with fp32/fp16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we need this restriction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, comment wrong
src/targets/gpu/target.cpp
Outdated
@@ -143,6 +144,7 @@ std::vector<pass> target::get_passes(migraphx::context& gctx, const compile_opti | |||
#endif | |||
dead_code_elimination{}, | |||
enable_pass(mlir_enabled(), fuse_mlir{&ctx}), | |||
enable_pass(mlir_enabled(), standalone_mlir{&ctx}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why's this a separate pass instead of being a rewrite within fuse_mlir()
(which could be renamed to something like mlir_offload()
)?
What do folks think here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a good question. @manupak and I was discussing this today - i.e., whether to add use one pass or two ones. My argument was that adding standalone conv
to fuse_mlir
was ambiguous. @manupak suggested that we could rename fuse_mlir
as you mentioned.
By and large, yes. We can rename fuse_mlir
to mlir_offload
which sounds more generic.
src/targets/gpu/fuse_mlir.cpp
Outdated
void standalone_mlir::apply(module_pass_manager& mpm) const | ||
{ | ||
#ifdef MIGRAPHX_MLIR | ||
match::find_matches(mpm, find_mlir_standalone_convolution_op{}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be moved to fuse_mlir::appy
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/targets/gpu/fuse_mlir.cpp
Outdated
#ifdef MIGRAPHX_MLIR | ||
|
||
namespace { | ||
MIGRAPHX_PRED_MATCHER(is_supported_arch, instruction_ref) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think this needs to be a matcher as it is not reading anything from the graph. It can be a vanilla function and we can call it to conditionally call match::find_matches(mpm, find_mlir_standalone_convolution_op{})
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! It something that we were discussing with @manupak today and came to the same conclusion. I will rework it
src/targets/gpu/fuse_mlir.cpp
Outdated
"gfx900", "gfx906", "gfx908", "gfx1030", "gfx940"}; | ||
|
||
// static std::unordered_set<std::string> supported_consumer_archs{"gfx1030"}; | ||
const auto device_name = trim(split_string(get_device_name(), ':').front()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The context
should be used to get the device name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see. Let me give a try.
* renamed the corresponding struct * addressed suggestions of PR ROCm#2110
* renamed the corresponding struct * addressed suggestions of PR ROCm#2110
50b1321
to
76fe008
Compare
I have no major concerns - I ll let maintainers approve it though coz I had some input into the code here. Only concern is this code is not tested at the minute. @pfultz2 since we add the flag to force standalone convs; we can test (the integration issues) in cdna using that flag OR do ya'll have Navi nodes to test this proper ? |
src/targets/gpu/CMakeLists.txt
Outdated
@@ -109,7 +109,7 @@ add_library(migraphx_gpu | |||
compiler.cpp | |||
device_name.cpp | |||
fuse_ck.cpp | |||
fuse_mlir.cpp | |||
mlir_offload.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be renamed in this PR. fuse_mlir is consistent with the other fuse passes. So if we rename this like this than we should rename all the other fuse passes. However, such refactoring should go into a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pfultz2, should I rename 1) just file names or 2) file names + the corresponding structures?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this moment, I just changed file names. Let me know if I should to adapt the stuct names as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should to adapt the stuct names as well
Yes, the class names should match the filenames.
* renamed the corresponding struct * addressed suggestions of PR ROCm#2110
161fc20
to
81aa217
Compare
@@ -35,15 +35,15 @@ namespace gpu { | |||
|
|||
MIGRAPHX_GPU_EXPORT bool mlir_enabled(); | |||
|
|||
struct MIGRAPHX_GPU_EXPORT fuse_mlir | |||
struct MIGRAPHX_GPU_EXPORT mlir_offload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class should be named fuse_mlir
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then it sounds as if we only provide fused operation. However, this PR aims at additionally providing standalone convs.
Moreover, we are going to add standalone GEMMs. Additionally, we would like to have an option to switch off fused ops based on the value of an env. variables. In this case, we would like to completely fall back to standalone GEMMs and CONVs taking them from rocMLIR; we think it may be help us during our performance analyses and monitoring.
By and large, the apply
method is not supposed to be focused on fused operations only; it is supposed to be generic. That is the reason why @krzysz00 and I proposed to rename the struct mlir_offload
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then it sounds as if we only provide fused operation.
For fuse_ck, we handle standalone gemms, for fuse_pointwise we handle standalone pointwise, and fuse_reduce we handle standalone reduce. So I dont see how it would be different here.
By and large, the apply method is not supposed to be focused on fused operations only; it is supposed to be generic.
Yea niether do fuse_ck, fuse_pointwise, and fuse_reduce focus on fused operations only.
That is the reason why @krzysz00 and I proposed to rename the struct mlir_offload.
Yes, but such renaming should go in a separate PR, because the other fuse passes need to renamed as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but such renaming should go in a separate PR, because the other fuse passes need to renamed as well.
Got your point! Ok, I will rework.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* renamed the corresponding struct * addressed suggestions of PR ROCm#2110
e714ed3
to
4da4a3e
Compare
4da4a3e
to
1276c98
Compare
* renamed the corresponding struct * addressed suggestions of PR #2110
* renamed the corresponding struct * addressed suggestions of PR #2110
* renamed the corresponding struct * addressed suggestions of PR #2110
* renamed the corresponding struct * addressed suggestions of PR #2110
* renamed the corresponding struct * addressed suggestions of PR #2110
* renamed the corresponding struct * addressed suggestions of PR #2110
* renamed the corresponding struct * addressed suggestions of PR #2110
Hi @pfultz2. I would like to add support for rocmlir standalone convolution ops. @manupak and I expect that it will beneficial on Navi cards.
It is my draft. Could you suggest something to be improved?
My first question is whether to add this feature to the existing
fuse_mlir
or make to it separate?