Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser changes to handle MatMulIntegerToFloat #3445

Open
wants to merge 26 commits into
base: develop
Choose a base branch
from

Conversation

TedThemistokleous
Copy link
Collaborator

@TedThemistokleous TedThemistokleous commented Sep 16, 2024

Changes to MatMul parser to handle the Microsoft Contrib operator MatMulintegarToFloat

Since we have the scale and zero points in our operands we can just perform a multiplied after int8 biases are added and then insert a regular dot on the scaled input values which should give the same output as the input data types.

Able to leverage the existing set of tests for matmul

Needs #3526 as there's a bug with dequantizelinear this has uncovered

@TedThemistokleous TedThemistokleous self-assigned this Sep 16, 2024
@TedThemistokleous
Copy link
Collaborator Author

TedThemistokleous commented Sep 16, 2024

TODO:

  • Add Parser tests for err cases
  • Add parser tests for base case
  • Add parser test for bias and zero point cases
  • Add verify tests for all of the above

@TedThemistokleous TedThemistokleous added onnxruntime PR changes interaction between MIGraphX and Onnxruntime Onnx Operators Adding or modifying an Onnx Operator in the MIGraphX codebase UAI labels Sep 16, 2024
Copy link

codecov bot commented Sep 16, 2024

Codecov Report

Attention: Patch coverage is 87.50000% with 11 lines in your changes missing coverage. Please review.

Project coverage is 92.21%. Comparing base (2e59073) to head (13063df).

Files with missing lines Patch % Lines
src/onnx/parse_matmul.cpp 87.50% 11 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3445      +/-   ##
===========================================
- Coverage    92.23%   92.21%   -0.02%     
===========================================
  Files          514      514              
  Lines        21746    21810      +64     
===========================================
+ Hits         20057    20113      +56     
- Misses        1689     1697       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Updated parser to handle bias case as well as bad scale conditions

Initial float/half tests
bad scale tests
bad bias tests
avoid tidy screaming about complexity
@TedThemistokleous TedThemistokleous force-pushed the add_matmulintegertofloat_contrib_op branch from 74f8ae0 to cdb307d Compare September 17, 2024 15:48
TedThemistokleous and others added 2 commits October 11, 2024 17:45
Use dequantizelinear which elminates the need to add in shifts due to int8/uint8 mismatches

still needs parser tests
@TedThemistokleous TedThemistokleous marked this pull request as ready for review October 11, 2024 23:26
src/onnx/parse_matmul.cpp Outdated Show resolved Hide resolved
src/onnx/parse_matmul.cpp Show resolved Hide resolved
MIGRAPHX_THROW("PARSE_QUANT_DOT_SCALED: Bias have same dim as matrix B column");
}

has_valid_scale_bias = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As against invalid? ;-)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If scale bias doesn't exist there isn't a bias at the end of the matmulintergertofloat added then.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was simply wondering if has_scale_bias isn't what the intent is? :-)

src/onnx/parse_matmul.cpp Show resolved Hide resolved
return dequantized_op;
}

static instruction_ref handle_scaled_output(const onnx_parser::node_info& info,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many parameters. Ideally they should be handled by a struct parameter.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're the same amount of a parameters gathered by the operator. These are all needed for dequantize steps and adding the proper unsqueeze->transpose paths. Order matters here with respect to matrix input A or B

src/onnx/parse_matmul.cpp Outdated Show resolved Hide resolved
src/onnx/parse_matmul.cpp Outdated Show resolved Hide resolved
src/onnx/parse_matmul.cpp Outdated Show resolved Hide resolved
Use the parsed in op name for error messages to help logging should parser errors occur.
Change naming to be agnostic of input index.
@TedThemistokleous TedThemistokleous force-pushed the add_matmulintegertofloat_contrib_op branch from 42b787d to 9660e11 Compare October 31, 2024 22:00
src/onnx/parse_matmul.cpp Outdated Show resolved Hide resolved
bool a1_has_no_zp = (a1 == zp_a1);

auto unsq_scale_a0 = info.add_instruction(make_op("unsqueeze", {{"axes", {-1}}}), scale_a0);
if(not a0_has_no_zp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Nit) Style: perhaps two negatives are not required, if there is a variable like a0_has_zp.

src/onnx/parse_matmul.cpp Outdated Show resolved Hide resolved
src/onnx/parse_matmul.cpp Outdated Show resolved Hide resolved
src/onnx/parse_matmul.cpp Outdated Show resolved Hide resolved
src/onnx/parse_matmul.cpp Outdated Show resolved Hide resolved
src/onnx/parse_matmul.cpp Show resolved Hide resolved
Ted Themistokleous added 2 commits November 7, 2024 14:03
Clean up uint8 handling for quant_dot. Fix tests
Copy link
Contributor

@lakhinderwalia lakhinderwalia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for following up the comments. Approved.

Copy link
Collaborator

@CharlieL7 CharlieL7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From our conversation, need to test/handle higher dimensional matrix contractions (matrix mul). Also transpose with permutation = {0, 1} probably does nothing.

Remove the transpose ops here as this was masking an issue related to input checks for the scale and bias inputs. Supposed to match input for the scale value based on the input column value instead of the row. With this in mind we can remove the transpose here.

Updated parser tests to handle this correclty. Retested and validated output since transpose changes the math here.
@TedThemistokleous
Copy link
Collaborator Author

From our conversation, need to test/handle higher dimensional matrix contractions (matrix mul). Also transpose with permutation = {0, 1} probably does nothing.

You're right but I think when I tried this wasn't doing what I thought it was in conjunction with some the squeeze at -1 . Fixed this an realized I don't need the transpose here.

Also I think this should handle N-dim now since everything is broadcasted on a per column basis for scale/bias inputs which should be matched to the column of the input matrix.

Copy link
Collaborator

@CharlieL7 CharlieL7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: would be good to have a higher ndim test

Comment on lines +290 to +294
unsq_zp_a0 = info.add_instruction(make_op("unsqueeze", {{"axes", {0}}}), zp_a0);
if(zp_a0->get_shape().scalar())
{
unsq_zp_a0 =
info.add_instruction(make_op("unsqueeze", {{"axes", {0}}}), unsq_zp_a0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Can these unsqueeze operators be merged to either be unsqueeze {{axes, 0}} and unsqueeze {{axes, 0, 1}?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think so here as one is assume the input is scalar and needs the additional added 1 dimension

@TedThemistokleous
Copy link
Collaborator Author

Minor: would be good to have a higher ndim test

Added

Modified input checks to ensure we check last dimension correclty for multi input in matmulintegertofloat

Validated output using numpy for this.
@TedThemistokleous TedThemistokleous force-pushed the add_matmulintegertofloat_contrib_op branch from 6b3d516 to 6e2a36c Compare December 11, 2024 03:44
@migraphx-bot
Copy link
Collaborator

Test Batch Rate new
13063d
Rate old
64fe0c
Diff Compare
torchvision-resnet50 64 3,254.74 3,254.94 -0.01%
torchvision-resnet50_fp16 64 6,970.71 6,977.96 -0.10%
torchvision-densenet121 32 2,435.00 2,436.55 -0.06%
torchvision-densenet121_fp16 32 4,077.51 4,076.26 0.03%
torchvision-inceptionv3 32 1,628.16 1,627.46 0.04%
torchvision-inceptionv3_fp16 32 2,742.57 2,741.58 0.04%
cadene-inceptionv4 16 765.25 764.31 0.12%
cadene-resnext64x4 16 813.34 813.14 0.02%
slim-mobilenet 64 7,466.12 7,466.83 -0.01%
slim-nasnetalarge 64 209.02 209.03 -0.00%
slim-resnet50v2 64 3,440.77 3,443.32 -0.07%
bert-mrpc-onnx 8 1,147.21 1,144.17 0.27%
bert-mrpc-tf 1 484.14 474.21 2.09%
pytorch-examples-wlang-gru 1 422.01 416.53 1.32%
pytorch-examples-wlang-lstm 1 387.79 384.23 0.93%
torchvision-resnet50_1 1 769.92 783.29 -1.71%
cadene-dpn92_1 1 399.09 398.94 0.04%
cadene-resnext101_1 1 382.07 383.46 -0.36%
onnx-taau-downsample 1 345.93 345.52 0.12%
dlrm-criteoterabyte 1 33.31 33.33 -0.06%
dlrm-criteoterabyte_fp16 1 52.74 52.73 0.02%
agentmodel 1 8,123.54 8,127.83 -0.05%
unet_fp16 2 58.78 58.89 -0.19%
resnet50v1_fp16 1 929.72 938.63 -0.95%
resnet50v1_int8 1 1,015.77 984.73 3.15% 🔆
bert_base_cased_fp16 64 1,169.67 1,170.23 -0.05%
bert_large_uncased_fp16 32 363.04 362.94 0.03%
bert_large_fp16 1 198.06 200.28 -1.11%
distilgpt2_fp16 16 2,200.80 2,198.50 0.10%
yolov5s 1 522.19 531.33 -1.72%
tinyllama 1 43.40 43.34 0.12%
vicuna-fastchat 1 181.22 172.03 5.34% 🔆
whisper-tiny-encoder 1 418.05 418.00 0.01%
whisper-tiny-decoder 1 428.62 428.83 -0.05%

Check results before merge 🔆

@migraphx-bot
Copy link
Collaborator


     ✅ bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

     ✅ bert-mrpc-tf: PASSED: MIGraphX meets tolerance

     ✅ pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

     ✅ pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

     ✅ torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

     ✅ cadene-dpn92_1: PASSED: MIGraphX meets tolerance

     ✅ cadene-resnext101_1: PASSED: MIGraphX meets tolerance

     ✅ dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

     ✅ agentmodel: PASSED: MIGraphX meets tolerance

     ✅ unet: PASSED: MIGraphX meets tolerance

     ✅ resnet50v1: PASSED: MIGraphX meets tolerance

     ✅ bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output


     ✅ bert_large: PASSED: MIGraphX meets tolerance

     ✅ yolov5s: PASSED: MIGraphX meets tolerance

     ✅ tinyllama: PASSED: MIGraphX meets tolerance

     ✅ vicuna-fastchat: PASSED: MIGraphX meets tolerance

     ✅ whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

     ✅ whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

     ✅ distilgpt2_fp16: PASSED: MIGraphX meets tolerance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Onnx Operators Adding or modifying an Onnx Operator in the MIGraphX codebase onnxruntime PR changes interaction between MIGraphX and Onnxruntime UAI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants