Fix exception raised for quantization block-size = 1 in quark-generated int4 models. #3510

lakhinderwalia · 2024-10-05T16:43:04Z

This PR fixes an exception that was raised while parsing for block quantization: block size = 1.
This corner case is raising an exception in quark-generated int-4 graphs.
A test case has been created to verify that no exception is raised, using a snippet off the quark generated model.

(Additionally, a test is being moved, based on a prior PR feedback. This test has nothing to do with this exception.)

codecov · 2024-10-05T19:46:16Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.05%. Comparing base (b923336) to head (520ce0d).
Report is 6 commits behind head on develop.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #3510      +/-   ##
===========================================
+ Coverage    92.02%   92.05%   +0.02%     
===========================================
  Files          509      509              
  Lines        21014    21014              
===========================================
+ Hits         19339    19345       +6     
+ Misses        1675     1669       -6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

migraphx-bot · 2024-10-05T23:59:44Z

Test	Batch	Rate new 520ce0	Rate old b92333	Diff	Compare
torchvision-resnet50	64	3,261.36	3,263.77	-0.07%	✅
torchvision-resnet50_fp16	64	6,994.55	6,996.15	-0.02%	✅
torchvision-densenet121	32	2,432.85	2,432.72	0.01%	✅
torchvision-densenet121_fp16	32	4,093.37	4,082.35	0.27%	✅
torchvision-inceptionv3	32	1,638.18	1,637.26	0.06%	✅
torchvision-inceptionv3_fp16	32	2,764.91	2,763.65	0.05%	✅
cadene-inceptionv4	16	776.22	775.75	0.06%	✅
cadene-resnext64x4	16	808.87	808.46	0.05%	✅
slim-mobilenet	64	7,531.01	7,531.96	-0.01%	✅
slim-nasnetalarge	64	211.41	211.47	-0.03%	✅
slim-resnet50v2	64	3,499.64	3,502.05	-0.07%	✅
bert-mrpc-onnx	8	1,150.24	1,151.10	-0.07%	✅
bert-mrpc-tf	1	465.58	489.88	-4.96%	🔴
pytorch-examples-wlang-gru	1	416.56	426.32	-2.29%	✅
pytorch-examples-wlang-lstm	1	393.65	367.19	7.21%	🔆
torchvision-resnet50_1	1	823.25	822.42	0.10%	✅
cadene-dpn92_1	1	401.99	410.84	-2.15%	✅
cadene-resnext101_1	1	383.53	381.55	0.52%	✅
onnx-taau-downsample	1	342.36	343.34	-0.29%	✅
dlrm-criteoterabyte	1	33.33	33.37	-0.13%	✅
dlrm-criteoterabyte_fp16	1	52.75	52.75	-0.01%	✅
agentmodel	1	9,697.91	8,588.85	12.91%	🔆
unet_fp16	2	58.80	58.84	-0.07%	✅
resnet50v1_fp16	1	998.58	922.91	8.20%	🔆
resnet50v1_int8	1	1,003.29	969.45	3.49%	🔆
bert_base_cased_fp16	64	1,171.90	1,170.56	0.11%	✅
bert_large_uncased_fp16	32	363.39	363.70	-0.09%	✅
bert_large_fp16	1	198.76	200.51	-0.87%	✅
distilgpt2_fp16	16	2,200.83	2,200.62	0.01%	✅
yolov5s	1	544.08	546.87	-0.51%	✅
tinyllama	1	43.46	43.49	-0.06%	✅
vicuna-fastchat	1	172.42	170.79	0.95%	✅
whisper-tiny-encoder	1	418.15	419.48	-0.32%	✅
whisper-tiny-decoder	1	426.65	435.90	-2.12%	✅

This build is not recommended to merge 🔴

migraphx-bot · 2024-10-05T23:59:45Z

✅ bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

✅ bert-mrpc-tf: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

✅ torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

✅ cadene-dpn92_1: PASSED: MIGraphX meets tolerance

✅ cadene-resnext101_1: PASSED: MIGraphX meets tolerance

✅ dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

✅ agentmodel: PASSED: MIGraphX meets tolerance

✅ unet: PASSED: MIGraphX meets tolerance

✅ resnet50v1: PASSED: MIGraphX meets tolerance

✅ bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

✅ bert_large: PASSED: MIGraphX meets tolerance

✅ yolov5s: PASSED: MIGraphX meets tolerance

✅ tinyllama: PASSED: MIGraphX meets tolerance

✅ vicuna-fastchat: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

✅ distilgpt2_fp16: PASSED: MIGraphX meets tolerance

lakhinderwalia requested a review from causten as a code owner October 5, 2024 16:43

lakhinderwalia self-assigned this Oct 5, 2024

lakhinderwalia requested review from CharlieL7 and pfultz2 and removed request for causten October 5, 2024 16:43

lakhinderwalia force-pushed the lw/block_quant_fix branch from 6e23d3a to 0e7062b Compare October 5, 2024 18:15

Fix default block-size for quark-generated block-quantized models

520ce0d

lakhinderwalia force-pushed the lw/block_quant_fix branch from 0e7062b to 520ce0d Compare October 5, 2024 19:34

lakhinderwalia linked an issue Oct 5, 2024 that may be closed by this pull request

[INT4] Compress model by quantizing weights to int4 #3307

Open

18 tasks

causten requested a review from TedThemistokleous October 7, 2024 14:08

TedThemistokleous added roadmap Tasks to finish for a release bugfix Fixes a bug found in the code. INT4 labels Oct 7, 2024

TedThemistokleous approved these changes Oct 7, 2024

View reviewed changes

pfultz2 approved these changes Oct 7, 2024

View reviewed changes

causten merged commit b9fe915 into develop Oct 8, 2024
48 checks passed

causten deleted the lw/block_quant_fix branch October 8, 2024 19:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix exception raised for quantization block-size = 1 in quark-generated int4 models. #3510

Fix exception raised for quantization block-size = 1 in quark-generated int4 models. #3510

lakhinderwalia commented Oct 5, 2024

codecov bot commented Oct 5, 2024 •

edited

Loading

migraphx-bot commented Oct 5, 2024

migraphx-bot commented Oct 5, 2024

Fix exception raised for quantization block-size = 1 in quark-generated int4 models. #3510

Fix exception raised for quantization block-size = 1 in quark-generated int4 models. #3510

Conversation

lakhinderwalia commented Oct 5, 2024

codecov bot commented Oct 5, 2024 • edited Loading

Codecov Report

migraphx-bot commented Oct 5, 2024

migraphx-bot commented Oct 5, 2024

codecov bot commented Oct 5, 2024 •

edited

Loading