-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ONNX] Fix sporadic results in BC #3081
[ONNX] Fix sporadic results in BC #3081
Conversation
547 job - ONNX |
549 ptq - passed |
ra = fns.where(qval < level_high, qval / (qval - level_high) * right_border, left_border) | ||
with warnings.catch_warnings(): | ||
# If `qval` is 0 `rb` will equal `right_border`, and we don't want to show an unnecessary division by 0 warning | ||
# The same for (qval - level_high) | ||
warnings.simplefilter("ignore") | ||
ra_then_result = qval / (qval - level_high) * right_border | ||
rb_then_result = (qval - level_high) / qval * left_border | ||
ra = fns.where(qval < level_high, ra_then_result, left_border) | ||
rb = fns.where(qval > 0.0, rb_then_result, right_border) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please expand
def test_tune_range_zero_division_warning(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. test passes on PR and does not on develop
@kshpv, feel free to merge this PR as you need. |
nncf/quantization/fake_quantize.py
Outdated
warnings.simplefilter("ignore") | ||
rb_then_result = (qval - level_high) / qval * left_border | ||
# Avoid division by zero | ||
qval_nonzero = fns.where(qval == 0, fns.ones_like(qval), qval) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, check the performance of the function after your changes. @nikita-savelyevv, it looks like we discussed it already. Could you remember us the solution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I remember ignoring the warning was the solution with the least impact on performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, should I rollback or keep this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm inclined towards moving the line under the catch_warning context manager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perf measurment:
with warnings context manager: ~17.648 sec
without warning context manager but with: ~20.809 sec
qval_nonzero = fns.where(qval == 0, fns.ones_like(qval), qval)
qval_not_high = fns.where(qval - level_high == 0, fns.ones_like(qval), qval - level_high)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kshpv you can use just 1.0
instead of fns.ones_like(qval)
qval_nonzero = fns.where(qval == 0, 1.0 , qval)
qval_not_high = fns.where(qval - level_high == 0, 1.0 , qval - level_high)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kshpv you can use just
1.0
instead offns.ones_like(qval)
qval_nonzero = fns.where(qval == 0, 1.0 , qval) qval_not_high = fns.where(qval - level_high == 0, 1.0 , qval - level_high)
I checked your proposed version and it has the same performance as with fns.ones_like(qval)
:(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more about avoiding creating extra instances of tensor than performance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to use implementation with the best performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rollbacked
Found the open issue with the same problem on ONNXRuntime - microsoft/onnxruntime#21922 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Changes
This PR addresses an issue using
ONNXRuntime==1.19.2
where a tensor used as both an input and output in a model shares the same memory. This causes unexpected behavior: updating the input tensor inadvertently modifies the statistics data due to memory overlap.The issue was confirmed by calling
np.shares_memory(input_data['image'], outputs['image'])
, which returnedTrue
, indicating that the input and output tensors share memory. After applying the proposed changes, the same check now returnsFalse
, confirming that memory sharing is resolved.To fix this, the
ONNXEngine
logic has been updated to create a copy of any output tensor that is also used as a model input. This ensures that the input tensor and statistics data remain independent, avoiding unintended side effects.Merge RawReducer and NoopReducer
Minor fixes (remove warnings + fix bug in BC)
Reason for changes
Regression
Related tickets
156025
Tests
PTQ run 549