Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Regression or Improvement: Pytorch image classification on 50k images of size 224 x 224 with resnet 152 with Tesla T4 GPU:mean_load_model_latency_milli_secs #27077

Closed
github-actions bot opened this issue Jun 8, 2023 · 6 comments
Labels
awaiting triage perf-alert Automatically filed performance-related alerts.

Comments

@github-actions
Copy link
Contributor

github-actions bot commented Jun 8, 2023

Performance change found in the
test: Pytorch image classification on 50k images of size 224 x 224 with resnet 152 with Tesla T4 GPU:apache_beam.testing.benchmarks.inference.pytorch_image_classification_benchmarks for the metric: mean_load_model_latency_milli_secs.

For more information on how to triage the alerts, please look at
Triage performance alert issues section of the README.

timestamp: Thu Jun  8 21:01:58 2023, metric_value: `200115.25925925927` <---- Anomaly
timestamp: Thu Jun  8 18:29:14 2023, metric_value: `134762.9642857143`
timestamp: Sun Jun  4 20:11:58 2023, metric_value: `128878.32142857143`
timestamp: Sat Jun  3 20:11:16 2023, metric_value: `132582.9642857143`
timestamp: Fri Jun  2 20:35:07 2023, metric_value: `136720.82142857142`
timestamp: Thu Jun  1 20:18:39 2023, metric_value: `141940.48214285713`
timestamp: Wed May 31 20:17:18 2023, metric_value: `139211.85714285713`
timestamp: Tue May 30 20:16:01 2023, metric_value: `121131.6909090909`
timestamp: Mon May 29 20:11:38 2023, metric_value: `122672.34545454546`
timestamp: Sun May 28 20:10:33 2023, metric_value: `117560.10909090909`
timestamp: Sat May 27 20:10:06 2023, metric_value: `118506.76785714286` 
@github-actions github-actions bot added awaiting triage perf-alert Automatically filed performance-related alerts. labels Jun 8, 2023
@AnandInguva
Copy link
Contributor

Actually detected Anomaly is at June 8th, 2023, according to the metadata published. There is a bug in the UI which doesn't point out to right anamoly.

@github-actions
Copy link
Contributor Author

github-actions bot commented Jul 6, 2023

Performance change found in the
test: pytorch_image_classification_benchmarks-resnet152-GPU-mean_load_model_latency_milli_secs for the metric: mean_load_model_latency_milli_secs.

For more information on how to triage the alerts, please look at
Triage performance alert issues section of the README.

Test description: Pytorch image classification on 50k images of size 224 x 224 with resnet 152 with Tesla T4 GPU. Test link -

test : 'apache_beam.testing.benchmarks.inference.pytorch_image_classification_benchmarks',
Test dashboard - http://104.154.241.245/d/ZpS8Uf44z/python-ml-runinference-benchmarks?orgId=1&viewPanel=7

timestamp: Thu Jul  6 20:18:41 2023, metric_value: 211548.54
timestamp: Wed Jul  5 20:28:56 2023, metric_value: 205620.69
timestamp: Tue Jul  4 20:25:33 2023, metric_value: 205096.16 <---- Anomaly
timestamp: Mon Jul  3 20:30:00 2023, metric_value: 208163.29
timestamp: Sun Jul  2 20:24:06 2023, metric_value: 195112.75
timestamp: Sat Jul  1 20:23:14 2023, metric_value: 195221.70
timestamp: Fri Jun 30 20:29:06 2023, metric_value: 197976.39
timestamp: Wed Jun 28 20:36:25 2023, metric_value: 197206.00
timestamp: Sat Jun 24 20:22:25 2023, metric_value: 173418.86
timestamp: Thu Jun 22 20:33:39 2023, metric_value: 194870.17
timestamp: Wed Jun 21 20:31:24 2023, metric_value: 191692.80 
timestamp: Tue Jun 20 20:41:06 2023, metric_value: 196980.26
timestamp: Mon Jun 12 20:28:45 2023, metric_value: 181619.41
timestamp: Sun Jun 11 20:21:29 2023, metric_value: 170558.29

@github-actions
Copy link
Contributor Author

Performance change found in the
test: pytorch_image_classification_benchmarks-resnet152-GPU-mean_load_model_latency_milli_secs for the metric: mean_load_model_latency_milli_secs.

For more information on how to triage the alerts, please look at
Triage performance alert issues section of the README.

Test description: Pytorch image classification on 50k images of size 224 x 224 with resnet 152 with Tesla T4 GPU. Test link -

test : 'apache_beam.testing.benchmarks.inference.pytorch_image_classification_benchmarks',
Test dashboard - http://104.154.241.245/d/ZpS8Uf44z/python-ml-runinference-benchmarks?orgId=1&viewPanel=7


timestamp: Tue Jul 18 20:18:01 2023, metric_value: 169408.30
timestamp: Mon Jul 17 20:17:35 2023, metric_value: 179341.26
timestamp: Sun Jul 16 20:12:41 2023, metric_value: 163747.60
timestamp: Fri Jul 14 20:20:19 2023, metric_value: 172176.19
timestamp: Thu Jul 13 20:23:24 2023, metric_value: 181711.54
timestamp: Tue Jul 11 20:34:43 2023, metric_value: 193342.36
timestamp: Mon Jul 10 20:19:11 2023, metric_value: 189409.14
timestamp: Sat Jul  8 20:11:56 2023, metric_value: 157463.11
timestamp: Fri Jul  7 20:20:13 2023, metric_value: 181411.25 <---- Anomaly
timestamp: Thu Jul  6 20:18:41 2023, metric_value: 211548.54
timestamp: Wed Jul  5 20:28:56 2023, metric_value: 205620.69
timestamp: Tue Jul  4 20:25:33 2023, metric_value: 205096.16
timestamp: Mon Jul  3 20:30:00 2023, metric_value: 208163.29
timestamp: Sun Jul  2 20:24:06 2023, metric_value: 195112.75
timestamp: Sat Jul  1 20:23:14 2023, metric_value: 195221.70
timestamp: Fri Jun 30 20:29:06 2023, metric_value: 197976.39
timestamp: Wed Jun 28 20:36:25 2023, metric_value: 197206.00
timestamp: Sat Jun 24 20:22:25 2023, metric_value: 173418.86
timestamp: Thu Jun 22 20:33:39 2023, metric_value: 194870.17

@github-actions
Copy link
Contributor Author

github-actions bot commented Aug 9, 2023

Performance change found in the
test: pytorch_image_classification_benchmarks-resnet152-GPU-mean_load_model_latency_milli_secs for the metric: mean_load_model_latency_milli_secs.

For more information on how to triage the alerts, please look at
Triage performance alert issues section of the README.

Test description: Pytorch image classification on 50k images of size 224 x 224 with resnet 152 with Tesla T4 GPU. Test link -

test : 'apache_beam.testing.benchmarks.inference.pytorch_image_classification_benchmarks',
Test dashboard - http://104.154.241.245/d/ZpS8Uf44z/python-ml-runinference-benchmarks?orgId=1&viewPanel=7


timestamp: Wed Aug  9 20:22:26 2023, metric_value: 224696.44 <---- Anomaly
timestamp: Sun Aug  6 20:11:28 2023, metric_value: 168483.14
timestamp: Sat Aug  5 20:15:50 2023, metric_value: 178606.28
timestamp: Fri Aug  4 20:17:43 2023, metric_value: 193685.41
timestamp: Thu Aug  3 20:33:48 2023, metric_value: 175384.55
timestamp: Wed Aug  2 15:48:49 2023, metric_value: 180097.67
timestamp: Tue Aug  1 20:21:43 2023, metric_value: 187113.02
timestamp: Tue Jul 18 20:18:01 2023, metric_value: 169408.30
timestamp: Mon Jul 17 20:17:35 2023, metric_value: 179341.26
timestamp: Sun Jul 16 20:12:41 2023, metric_value: 163747.60
timestamp: Fri Jul 14 20:20:19 2023, metric_value: 172176.19

@tvalentyn
Copy link
Contributor

tvalentyn commented Sep 1, 2023

I see variability in Batch Size and Batch Latency in GPU flavor of the benchmark, see: http://metrics.beam.apache.org/d/ZpS8Uf44z/python-ml-runinference-benchmarks?from=now-90d&to=now&orgId=1

would increasing batch sizes increase the latency-per-batch? If so, we may need to compute latency per element or fix the batch size.

@github-actions
Copy link
Contributor Author

Performance change found in the
test: pytorch_image_classification_benchmarks-resnet152-GPU-mean_load_model_latency_milli_secs for the metric: mean_load_model_latency_milli_secs.

For more information on how to triage the alerts, please look at
Triage performance alert issues section of the README.

Test description: Pytorch image classification on 50k images of size 224 x 224 with resnet 152 with Tesla T4 GPU. Test link -

test : 'apache_beam.testing.benchmarks.inference.pytorch_image_classification_benchmarks',
Test dashboard - http://metrics.beam.apache.org/d/ZpS8Uf44z/python-ml-runinference-benchmarks?orgId=1&viewPanel=7


timestamp: Tue Sep 26 20:39:13 2023, metric_value: 230483.59 <---- Anomaly
timestamp: Sun Sep 24 20:16:17 2023, metric_value: 171467.95
timestamp: Sat Sep 23 20:34:54 2023, metric_value: 170864.55
timestamp: Fri Sep 22 20:31:38 2023, metric_value: 176961.21
timestamp: Thu Sep 21 20:42:19 2023, metric_value: 180771.67
timestamp: Tue Sep 19 20:22:42 2023, metric_value: 198311.58
timestamp: Mon Sep 18 20:32:00 2023, metric_value: 177070.07
timestamp: Sun Sep 17 20:17:01 2023, metric_value: 180274.53
timestamp: Sat Sep 16 20:16:20 2023, metric_value: 176510.54
timestamp: Thu Sep 14 20:21:44 2023, metric_value: 182756.80
timestamp: Wed Sep 13 20:24:00 2023, metric_value: 182646.01

@github-actions github-actions bot added this to the 2.52.0 Release milestone Oct 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting triage perf-alert Automatically filed performance-related alerts.
Projects
None yet
Development

No branches or pull requests

2 participants