Performance Regression or Improvement: Pytorch image classification on 50k images of size 224 x 224 with resnet 152 with Tesla T4 GPU:mean_load_model_latency_milli_secs #27077

github-actions · 2023-06-08T22:09:04Z

Performance change found in the
test: Pytorch image classification on 50k images of size 224 x 224 with resnet 152 with Tesla T4 GPU:apache_beam.testing.benchmarks.inference.pytorch_image_classification_benchmarks for the metric: mean_load_model_latency_milli_secs.

For more information on how to triage the alerts, please look at
Triage performance alert issues section of the README.

timestamp: Thu Jun  8 21:01:58 2023, metric_value: `200115.25925925927` <---- Anomaly
timestamp: Thu Jun  8 18:29:14 2023, metric_value: `134762.9642857143`
timestamp: Sun Jun  4 20:11:58 2023, metric_value: `128878.32142857143`
timestamp: Sat Jun  3 20:11:16 2023, metric_value: `132582.9642857143`
timestamp: Fri Jun  2 20:35:07 2023, metric_value: `136720.82142857142`
timestamp: Thu Jun  1 20:18:39 2023, metric_value: `141940.48214285713`
timestamp: Wed May 31 20:17:18 2023, metric_value: `139211.85714285713`
timestamp: Tue May 30 20:16:01 2023, metric_value: `121131.6909090909`
timestamp: Mon May 29 20:11:38 2023, metric_value: `122672.34545454546`
timestamp: Sun May 28 20:10:33 2023, metric_value: `117560.10909090909`
timestamp: Sat May 27 20:10:06 2023, metric_value: `118506.76785714286`

The text was updated successfully, but these errors were encountered:

AnandInguva · 2023-06-28T17:26:05Z

Actually detected Anomaly is at June 8th, 2023, according to the metadata published. There is a bug in the UI which doesn't point out to right anamoly.

github-actions · 2023-07-06T22:08:41Z

Performance change found in the
test: pytorch_image_classification_benchmarks-resnet152-GPU-mean_load_model_latency_milli_secs for the metric: mean_load_model_latency_milli_secs.

For more information on how to triage the alerts, please look at
Triage performance alert issues section of the README.

Test description: Pytorch image classification on 50k images of size 224 x 224 with resnet 152 with Tesla T4 GPU. Test link -

beam/.test-infra/jenkins/job_InferenceBenchmarkTests_Python.groovy

Line 151 in 42d0a6e

    
           test              : 'apache_beam.testing.benchmarks.inference.pytorch_image_classification_benchmarks',

Test dashboard - http://104.154.241.245/d/ZpS8Uf44z/python-ml-runinference-benchmarks?orgId=1&viewPanel=7

timestamp: Thu Jul  6 20:18:41 2023, metric_value: 211548.54
timestamp: Wed Jul  5 20:28:56 2023, metric_value: 205620.69
timestamp: Tue Jul  4 20:25:33 2023, metric_value: 205096.16 <---- Anomaly
timestamp: Mon Jul  3 20:30:00 2023, metric_value: 208163.29
timestamp: Sun Jul  2 20:24:06 2023, metric_value: 195112.75
timestamp: Sat Jul  1 20:23:14 2023, metric_value: 195221.70
timestamp: Fri Jun 30 20:29:06 2023, metric_value: 197976.39
timestamp: Wed Jun 28 20:36:25 2023, metric_value: 197206.00
timestamp: Sat Jun 24 20:22:25 2023, metric_value: 173418.86
timestamp: Thu Jun 22 20:33:39 2023, metric_value: 194870.17
timestamp: Wed Jun 21 20:31:24 2023, metric_value: 191692.80 
timestamp: Tue Jun 20 20:41:06 2023, metric_value: 196980.26
timestamp: Mon Jun 12 20:28:45 2023, metric_value: 181619.41
timestamp: Sun Jun 11 20:21:29 2023, metric_value: 170558.29

github-actions · 2023-07-18T22:08:23Z

Performance change found in the
test: pytorch_image_classification_benchmarks-resnet152-GPU-mean_load_model_latency_milli_secs for the metric: mean_load_model_latency_milli_secs.

For more information on how to triage the alerts, please look at
Triage performance alert issues section of the README.

Test description: Pytorch image classification on 50k images of size 224 x 224 with resnet 152 with Tesla T4 GPU. Test link -

beam/.test-infra/jenkins/job_InferenceBenchmarkTests_Python.groovy

Line 151 in 42d0a6e

    
           test              : 'apache_beam.testing.benchmarks.inference.pytorch_image_classification_benchmarks',

Test dashboard - http://104.154.241.245/d/ZpS8Uf44z/python-ml-runinference-benchmarks?orgId=1&viewPanel=7


timestamp: Tue Jul 18 20:18:01 2023, metric_value: 169408.30
timestamp: Mon Jul 17 20:17:35 2023, metric_value: 179341.26
timestamp: Sun Jul 16 20:12:41 2023, metric_value: 163747.60
timestamp: Fri Jul 14 20:20:19 2023, metric_value: 172176.19
timestamp: Thu Jul 13 20:23:24 2023, metric_value: 181711.54
timestamp: Tue Jul 11 20:34:43 2023, metric_value: 193342.36
timestamp: Mon Jul 10 20:19:11 2023, metric_value: 189409.14
timestamp: Sat Jul  8 20:11:56 2023, metric_value: 157463.11
timestamp: Fri Jul  7 20:20:13 2023, metric_value: 181411.25 <---- Anomaly
timestamp: Thu Jul  6 20:18:41 2023, metric_value: 211548.54
timestamp: Wed Jul  5 20:28:56 2023, metric_value: 205620.69
timestamp: Tue Jul  4 20:25:33 2023, metric_value: 205096.16
timestamp: Mon Jul  3 20:30:00 2023, metric_value: 208163.29
timestamp: Sun Jul  2 20:24:06 2023, metric_value: 195112.75
timestamp: Sat Jul  1 20:23:14 2023, metric_value: 195221.70
timestamp: Fri Jun 30 20:29:06 2023, metric_value: 197976.39
timestamp: Wed Jun 28 20:36:25 2023, metric_value: 197206.00
timestamp: Sat Jun 24 20:22:25 2023, metric_value: 173418.86
timestamp: Thu Jun 22 20:33:39 2023, metric_value: 194870.17

github-actions · 2023-08-09T22:08:44Z

Performance change found in the
test: pytorch_image_classification_benchmarks-resnet152-GPU-mean_load_model_latency_milli_secs for the metric: mean_load_model_latency_milli_secs.

For more information on how to triage the alerts, please look at
Triage performance alert issues section of the README.

Test description: Pytorch image classification on 50k images of size 224 x 224 with resnet 152 with Tesla T4 GPU. Test link -

beam/.test-infra/jenkins/job_InferenceBenchmarkTests_Python.groovy

Line 151 in 42d0a6e

    
           test              : 'apache_beam.testing.benchmarks.inference.pytorch_image_classification_benchmarks',

Test dashboard - http://104.154.241.245/d/ZpS8Uf44z/python-ml-runinference-benchmarks?orgId=1&viewPanel=7


timestamp: Wed Aug  9 20:22:26 2023, metric_value: 224696.44 <---- Anomaly
timestamp: Sun Aug  6 20:11:28 2023, metric_value: 168483.14
timestamp: Sat Aug  5 20:15:50 2023, metric_value: 178606.28
timestamp: Fri Aug  4 20:17:43 2023, metric_value: 193685.41
timestamp: Thu Aug  3 20:33:48 2023, metric_value: 175384.55
timestamp: Wed Aug  2 15:48:49 2023, metric_value: 180097.67
timestamp: Tue Aug  1 20:21:43 2023, metric_value: 187113.02
timestamp: Tue Jul 18 20:18:01 2023, metric_value: 169408.30
timestamp: Mon Jul 17 20:17:35 2023, metric_value: 179341.26
timestamp: Sun Jul 16 20:12:41 2023, metric_value: 163747.60
timestamp: Fri Jul 14 20:20:19 2023, metric_value: 172176.19

tvalentyn · 2023-09-01T22:49:48Z

I see variability in Batch Size and Batch Latency in GPU flavor of the benchmark, see: http://metrics.beam.apache.org/d/ZpS8Uf44z/python-ml-runinference-benchmarks?from=now-90d&to=now&orgId=1

would increasing batch sizes increase the latency-per-batch? If so, we may need to compute latency per element or fix the batch size.

github-actions · 2023-09-27T02:36:16Z

Performance change found in the
test: pytorch_image_classification_benchmarks-resnet152-GPU-mean_load_model_latency_milli_secs for the metric: mean_load_model_latency_milli_secs.

For more information on how to triage the alerts, please look at
Triage performance alert issues section of the README.

Test description: Pytorch image classification on 50k images of size 224 x 224 with resnet 152 with Tesla T4 GPU. Test link -

beam/.test-infra/jenkins/job_InferenceBenchmarkTests_Python.groovy

Line 151 in 42d0a6e

    
           test              : 'apache_beam.testing.benchmarks.inference.pytorch_image_classification_benchmarks',

Test dashboard - http://metrics.beam.apache.org/d/ZpS8Uf44z/python-ml-runinference-benchmarks?orgId=1&viewPanel=7


timestamp: Tue Sep 26 20:39:13 2023, metric_value: 230483.59 <---- Anomaly
timestamp: Sun Sep 24 20:16:17 2023, metric_value: 171467.95
timestamp: Sat Sep 23 20:34:54 2023, metric_value: 170864.55
timestamp: Fri Sep 22 20:31:38 2023, metric_value: 176961.21
timestamp: Thu Sep 21 20:42:19 2023, metric_value: 180771.67
timestamp: Tue Sep 19 20:22:42 2023, metric_value: 198311.58
timestamp: Mon Sep 18 20:32:00 2023, metric_value: 177070.07
timestamp: Sun Sep 17 20:17:01 2023, metric_value: 180274.53
timestamp: Sat Sep 16 20:16:20 2023, metric_value: 176510.54
timestamp: Thu Sep 14 20:21:44 2023, metric_value: 182756.80
timestamp: Wed Sep 13 20:24:00 2023, metric_value: 182646.01

github-actions bot added awaiting triage perf-alert Automatically filed performance-related alerts. labels Jun 8, 2023

tvalentyn mentioned this issue Sep 1, 2023

Performance Regression or Improvement: pytorch_image_classification_benchmarks-resnet152-GPU-mean_inference_batch_latency_micro_secs:mean_inference_batch_latency_micro_secs #27986

Closed

tvalentyn closed this as completed Oct 31, 2023

github-actions bot added this to the 2.52.0 Release milestone Oct 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Regression or Improvement: Pytorch image classification on 50k images of size 224 x 224 with resnet 152 with Tesla T4 GPU:mean_load_model_latency_milli_secs #27077

Performance Regression or Improvement: Pytorch image classification on 50k images of size 224 x 224 with resnet 152 with Tesla T4 GPU:mean_load_model_latency_milli_secs #27077

github-actions bot commented Jun 8, 2023 •

edited by AnandInguva

Loading

AnandInguva commented Jun 28, 2023

github-actions bot commented Jul 6, 2023 •

edited by AnandInguva

Loading

github-actions bot commented Jul 18, 2023

github-actions bot commented Aug 9, 2023

tvalentyn commented Sep 1, 2023 •

edited

Loading

github-actions bot commented Sep 27, 2023

Performance Regression or Improvement: Pytorch image classification on 50k images of size 224 x 224 with resnet 152 with Tesla T4 GPU:mean_load_model_latency_milli_secs #27077

Performance Regression or Improvement: Pytorch image classification on 50k images of size 224 x 224 with resnet 152 with Tesla T4 GPU:mean_load_model_latency_milli_secs #27077

Comments

github-actions bot commented Jun 8, 2023 • edited by AnandInguva Loading

AnandInguva commented Jun 28, 2023

github-actions bot commented Jul 6, 2023 • edited by AnandInguva Loading

github-actions bot commented Jul 18, 2023

github-actions bot commented Aug 9, 2023

tvalentyn commented Sep 1, 2023 • edited Loading

github-actions bot commented Sep 27, 2023

github-actions bot commented Jun 8, 2023 •

edited by AnandInguva

Loading

github-actions bot commented Jul 6, 2023 •

edited by AnandInguva

Loading

tvalentyn commented Sep 1, 2023 •

edited

Loading