-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for axes input to reduce operators #2120
Add support for axes input to reduce operators #2120
Conversation
music-dino
commented
Aug 24, 2023
- Add parsing support for axes as a non-constant input
- Update the reference implementation to handle this case
- Resolves Reduce operators axes input parse error migraphx-benchmark/AMDMIGraphX#68
@@ -80,11 +80,12 @@ template <class Derived> | |||
struct reduce_op : op_name<Derived> | |||
{ | |||
std::vector<std::int64_t> axes{}; | |||
bool noop_with_empty_axes = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to make reduce_op to always be a noop with empty axes. For this PR, just skip the handling of empty axes for now. In a future PR, we can update the onnx parser to insert extra operators for empy axes that is not a noop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want the empty axes handling to be skipped, or changed even in the case when axes are constant or an attribute?
In the current code, that is handled according to ONNX spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This field should be removed. And we should by default be a noop wit empty axes and the parser can insert extra operators to handle the empty case for the ONNX spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the constant axes case the parser already handles that, by not inserting an operator at all, and just passing the input back as the output.
In the variable axes case it cannot be known until runtime whether a noop will occur until the runtime value of axes is checked.
So which operators could the parser insert in the variable axes case to handle it. The If operator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This field needs to be removed from the operator.
In the variable axes case it cannot be known until runtime whether a noop will occur until the runtime value of axes is checked. So which operators could the parser insert in the variable axes case to handle it. The If operator?
We can insert operators such as equal
to detect this and then insert a where
operator to pick between the reduce and an empty reduce(or a reduce with all axes for the noop_with_empty_axes=false
case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Where would only have to be inserted when noop_with_empty_axes is true, and be predicated on the size of the variable axes input. For empty axes the input tensor would need to be returned, while for non-empty axes a reduce would be performed.
This leads to an issue with Where due to the difference of output dimensions of the two branches, given that one would have the shape of the input, and the other would have whichever shape the variable input axes dictate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Where would only have to be inserted when noop_with_empty_axes is true
reduce_op
should be a noop with empty axes, by default. So then when noop_with_empty_axes=false
the where
would take two different reduces: is_empty(axes) ? reduce_op(x, computed_axes) : reduce_op(x, axes)
. For both branches the axes will be passed as a variable input(even though the empty case will be taking a literal) so they produce the same output shape.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has this issue been resolved yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @CharlieL7, not yet, I'd been on vacation for most of December so the PR was left unattended.
@music-dino can you fix tidy ? and resolve merge issues ? |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #2120 +/- ##
===========================================
+ Coverage 91.75% 91.78% +0.02%
===========================================
Files 473 473
Lines 17958 18011 +53
===========================================
+ Hits 16478 16531 +53
Misses 1480 1480 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what's the status on this PR.
argument compute(const shape& computed_shape, | ||
const std::vector<int64_t>& reduce_axes, | ||
argument& data_arg) const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we name this function something else so we're not getting into possible future trouble with how operation.hpp handles the compute function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
@@ -80,11 +80,12 @@ template <class Derived> | |||
struct reduce_op : op_name<Derived> | |||
{ | |||
std::vector<std::int64_t> axes{}; | |||
bool noop_with_empty_axes = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has this issue been resolved yet?
auto is_axes_empty_bc = | ||
info.add_instruction(make_op("multibroadcast"), is_axes_empty, reduce_all_axes); | ||
|
||
return info.add_instruction( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pfultz2 I've implemented operator parsing in the fashion you recommended. The where
here will work during parsing, due to dynamic output shape equivalence between branches. However, when evaluation time comes around, the output shapes will differ which causes where
to throw an exception(demonstrated by this test).
std::vector<int64_t> axes{1}; | ||
pm["axes"] = argument(shape{shape::int64_type, {1}}, axes.data()); | ||
|
||
auto result = p.eval(pm); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test demonstrates the Where issue that crops up(here is the code path it tests) we discussed last week.
All three input shapes of Where being the same({1, 2}, {1, 2}, {1, 2}) parsing will succeed.
However, when attempting to evaluate the graph issues pop up.
The shapes being reified, the all axes reduction will have an output shape of {1, 1, 1}, the other one {2, 1, 2}, causing this check to fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compute
method to where
needs to be updated to remove the dyn_output
usage so the compute shape can happen lazily. For scalar condition we can pick either branch regardless of the shape difference:
argument compute(shape output_shape, std::vector<argument> args) const
{
if(args[0].scalar())
return args[args[0].at<bool>() ? 1 : 2].copy();
if(output_shape.dynamic())
output_shape = compute_shape(to_shapes(args));
argument result{output_shape};
visit_all(result, args[1], args[2])([&](auto output, const auto x, const auto y) {
args[0].visit([&](const auto condition) {
par_for(dyn_out.computed_shape.elements(),
[&](auto i) { output[i] = condition[i] ? x[i] : y[i]; });
});
});
return result;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all your commits/responses so far. I have submitted some additional comments. PTAL.
|
||
std::vector<int32_t> gold{2, 6, 10}; | ||
EXPECT(results_vector == gold); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is not a test for default axes, with keepdims=1, the following (copied from Netron) would be a good test. Thanks.
shape = [3, 2, 2]
axes = np.array([], dtype=np.int64)
keepdims = 1
node = onnx.helper.make_node(
"ReduceMean",
inputs=["data", "axes"],
outputs=["reduced"],
keepdims=keepdims,
)
data = np.array(
[[[5, 1], [20, 2]], [[30, 1], [40, 2]], [[55, 1], [60, 2]]],
dtype=np.float32,
)
reduced = np.mean(data, axis=None, keepdims=keepdims == 1)
# print(reduced)
# [[[18.25]]]
expect(
node,
inputs=[data, axes],
outputs=[reduced],
name="test_reduce_mean_default_axes_keepdims_example",
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note, the dimension of the output tensor has to be verified as well as the value.
// Empty axes attribute indicates to the operator to look for axes in the inputs | ||
auto reduce_op = make_op(op_name, {{"axes", {}}}); | ||
|
||
if(noop_with_empty_axes != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This clause also needs to check that the axes is indeed empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry but I still fail to see why it should. Can you demonstrate with a particular situation for which it would be necessary, and how you see the code path for that situation, including the compute method of the operator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cant check that the axes is empty because we dont know how many axes there are yet until runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Had a discussion with Paul. The reason this logic works is because, our reduce_op has some default behavior when the axis is empty: to be an identity operator. Hence it is taken care of somewhere else. This code doesn't explain that.)
So please add a comment here to give a hint:
'If axes is empty, the default behavior of reduce_op is to be an identity operator"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added that default behavior, and that's what I was trying to explain. I've added the suggested comment. Thanks for reviewing.
… variable dynamic axes
// Depending on how axes is passed to the operator the output shape can be either dynamic or | ||
// static. | ||
// | ||
// If the axes are passed as an attribute, or a constant input, we can precisely |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cant produce a static shape with a constant input because the input to compute_shape is just the shapes.
// static. | ||
// | ||
// If the axes are passed as an attribute, or a constant input, we can precisely | ||
// determine which axes need to be collapsed even during parsing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove even during parsing
. There is no parsing here. The onnx parser has already constructed the operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed. I had intended to state something different, but I worded it poorly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update with comments, as I have explained above.. Approving this PR.