-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CPU][Ref] Fix Reduce ops to produce stable (default val) output for empty input #27438
[CPU][Ref] Fix Reduce ops to produce stable (default val) output for empty input #27438
Conversation
@@ -2274,6 +2274,14 @@ void Reduce::execute(dnnl::stream strm) { | |||
const uint8_t *src_data = srcMemPtr->getDataAs<const uint8_t>(); | |||
uint8_t *dst_data = dstMemPtr->getDataAs<uint8_t>(); | |||
|
|||
const auto& src_shape = srcMemPtr->getStaticDims(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am afraid this is not fully correct fix.
Reduce node semantics support post operations which mean several ops should be executed within single kernel.
For example Reduce + Add. It means we shouln't only fill ouput with correct default value, but also apply Add oepration to it.
@xuchen-intel Could you please take a look? Would appreciate if you can help to finilize the fix on CPU side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dmitry-gorokhov I agree. If the expected output shape is non-empty, then we should apply post ops fusion.
@mitruska Could you please add reduce_kernel_post_process(dst_data);
before the new added early return
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative way would be to disable post ops fusion here if input is empty, which would be more safe. After all this is a corner case to reduce dim size 0 to 1, any incompatibility to the variants of such case from the computation module would lead to incorrect/undefined results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative way would be to disable post ops fusion here if input is empty, which would be more safe. After all this is a corner case to reduce dim size 0 to 1, any incompatibility to the variants of such case from the computation module would lead to incorrect/undefined results.
@xuchen-intel Shapes might be dynamic. So we don't know in advance should the fusion be disabled or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right! Then @mitruska please let's add reduce_kernel_post_process(dst_data);
, with some test cases (e.g. here) to reproduce this issue beforehand if possible.
Applying reduce_kernel_post_process(dst_data);
should work for post ops cases. Otherwise, please let me know, if you need me to fix it in a separate PR for the CPU part. @dmitry-gorokhov
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dmitry-gorokhov, @xuchen-intel Thank you for your support! I've added the reduce_kernel_post_process(dst_data)
, but it needed some special conditions to make the tests passed.
The condition:
openvino/src/plugins/intel_cpu/src/nodes/reduce.cpp
Lines 2281 to 2283 in 99c1e89
const bool skip_post_process = getAlgorithm() == Algorithm::ReduceMean || attr.get()->post_ops_.len() == 0; | |
if (!skip_post_process) { | |
reduce_kernel_post_process(dst_data); |
was inspired by:
openvino/src/plugins/intel_cpu/src/nodes/reduce.cpp
Lines 2564 to 2565 in 99c1e89
apply_division = getAlgorithm() == Algorithm::ReduceMean && attr.get()->post_ops_.len() == 0; | |
apply_post_kernel = !apply_division; |
Moreover I was able to use only fuse "Swish" CPU tests, because other fusion tests pipeline was failing for 0 and dynamic cases. Those cases probably need some fusing test pipeline adjustment.
src/common/transformations/include/transformations/op_conversions/convert_reduce_to_pooling.hpp
Show resolved
Hide resolved
LGTM core part. |
Agreed with CPU maintainers that the work will be continued within a separate PR. |
### Details: - *The main part of this PR is contributed by #27438 - *My revision is placed in the last commit, regarding changes on Reduce node of CPU plugin, mainly about the following contents:* 1. [x64] Avoid the `divisor` in `reduce_kernel_post_process` to be zero, and enable post ops fusion of ReduceMean. 2. [x64] Add `axesZeroDim` and `axesZeroDimFusing` in test cases, so that all these new added test cases will go exactly to the new added "early return" code block, where input tensor is empty and output tensor is not. 3. [x64] For the case of empty input combined with low precision ops fusion, use intermediate buffer to set default results before post ops fusion. 4. [arm] `makeExecutor` is skipped for the case of empty input on ARM, because Acl library does not support empty input tensor (e.g., NEReduceMean::validate return error). Besides, because of early return, the executor won't be needed anyway. 5. [arm] ARM Transformations ConvertReduceProd(Min, Max, Sum) are disabled to avoid empty output. ### Tickets: - *[CVS-117469](https://jira.devtools.intel.com/browse/CVS-117469)* --------- Co-authored-by: mitruska <[email protected]>
Details:
Fix Reduce ops to produce stable output (aligned with default values instead of random values) for an empty input:
including update of CPU and reference implementations.
openvino/src/plugins/intel_cpu/src/nodes/reduce.cpp
Lines 2949 to 3023 in eabe528
Stable behavior for empty input has been requested for NNCF
Originally random values has been produced, because common shape inference of the Reduce can produce non empty output shape for an empty input, for example Reduce with keep_dims=False:
InputShape{2, 0}, reduce_axes=1, OutputShape{2}
Common CPU Reduce::isExecutable was returning
False
for an empty input tensor (with 0 dim in the input Shape), and the Reduce::execute has been not executed at all for such case.The proposal of updated behavior is to return
False
for an empty output shape (with 0 dim in the output Shape),but when the input is empty, and the output is not, the output will be filled with default value.
The approach I've tested within the second PR #27423 is similar, but it forces CPU Reduce output value to be always 0 (which is not fully compatible with the reference impl, TF and ONNX).
Note: This change is considered as backward compatible with relatively low risk.
Alternatively Reduce ops common
shape_inference
can be updated:or
But those options are currently considered as not backward compatible, and would require new version of each Reduce* op.
Tickets: