Rework model/batch size configuration again #14

jeremybobbin · 2020-07-04T19:33:02Z

If GPU_INDICIES contains GPUs that are of different models, the script exits. This assumption makes (future)logging simpler.
Batch size is calculated based on a model's batch size multiplier(seen below), precision, and GPU memory:

  resnet50='5  + 1/3'
 resnet152='2  + 2/3'
inception3='5  + 1/3'
inception4='1  + 1/3'
     vgg16='5  + 1/3'
   alexnet='42 + 2/3'
    ssd300='2  + 2/3'

benchmark.sh's options are now position independent:

./benchmark.sh -l 2 -h 4     # low/high - benchmarks 2 GPUs, 3 GPUs and 4GPUs at a time.

Options:

i - GPU_INDEX
l - MIN_NUM_GPU
h - MAX_NUM_GPU
n - ITERATIONS
b - NUM_BATCHES
c - SETTING
v - GPU_VENDOR
t - THERMAL_INTERVAL

-n has implemented the functionality of batch_benchmark.sh

This:

./batch_benchmark.sh 1 1 1 100 2 config_resnet50_replicated_fp32_train_syn

Is now this:

./benchmark.sh -h 1 -n 1 -b 100 -t 2 -c config_resnet50_replicated_fp32_train_syn

Create tf2 branch. Add official tensorflow benchmark repo as a sub-module checkout branch for tf1.15

I'm skeptical that we'll ever want to use heterogeneous GPU configurations

This patch renders batch_benchmark.sh redundent.

jeremybobbin · 2020-07-17T15:14:22Z

Issues:

requires bc, fails mysteriously otherwise
CPU_NAME is not set

jeremybobbin and others added 30 commits July 4, 2020 11:11

update benchmarks repo

a3eb3c0

report average - not total

f442879

make adjustments for reworked report.sh script

33dd6fc

commentary

3a3e9a5

report gigabytes instead of gibibytes

c21198f

Change GPU VRAM's(given in MiB) divisor to 1024(to GiB)

dada317

Panic when GPU_RAM is undefined or doesn't have a configuration

1680bed

Use lambda repo for benchmarks

0b1521e

Create tf2 branch. Add official tensorflow benchmark repo as a sub-module checkout branch for tf1.15

Update README.md

94fb779

run only python3

48bd745

Add amd support to tf2

c00df0e

calculate optimal batchsize

043088d

fmt

c68f645

consider precision in batchsize calculation

5f9ba51

expose fn to calculate batch size instead of manipulating directly

8fecc5c

move batch_size function to main script

f8ff897

Pass the name of the config file as a parameter

b44486e

add fp32 resnet50 config

0387e05

assert that GPUs are available before running benchmarks

a305b75

write benchmark entries to log.csv

5ac3702

add timestamp

1503085

better CPU name

d05d847

don't allow different GPU models to be run

950d5f2

I'm skeptical that we'll ever want to use heterogeneous GPU configurations

refactor GPU homogeny check & setting of GPU_NAME

f544630

only swap whitespace in CPU_NAME for log dir

8de9c6f

add CPU_NAME to csv

f89cfe5

log motherboard name

7378643

adjust Tensorflow version in README

bf6b04f

add options to benchmark - adjust readme accordingly

077d165

This patch renders batch_benchmark.sh redundent.

wrap min to max GPU seq loop around main loop

02e4d1d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework model/batch size configuration again #14

Rework model/batch size configuration again #14

jeremybobbin commented Jul 4, 2020

jeremybobbin commented Jul 17, 2020 •

edited

Loading

Rework model/batch size configuration again #14

Are you sure you want to change the base?

Rework model/batch size configuration again #14

Conversation

jeremybobbin commented Jul 4, 2020

jeremybobbin commented Jul 17, 2020 • edited Loading

jeremybobbin commented Jul 17, 2020 •

edited

Loading