Per-benchmark instructions

This repository contains the scripts and source code for running the MLPerf v0.6 training benchmarks written by NVIDIA in RHEL7.6 and UBI8 containers. The Dockerfiles for creating the containers are in the top level directory.

The 4 directories, gnmt, maskrcnn, ssd, and transformer contain the source code and scripts for running each of these benchmarks. The resnet directory is for a benchmark using the ImageNet dataset on mxnet, which we haven't yet run. These 4 benchmarks are implemented in PyTorch, and the necessary PyTorch source code to be built and installed in the containers is in the ./pytorch directory.

For more detailed information about the benchmarks, read the original documentation at ssd, maskrcnn, gnmt, transformer.

Per-benchmark instructions

Prerequisites

The host system needs to be configured so that containers run with podman can access GPUs.

Install podman on the host system
Install the nvidia-container-toolkit, so that the nvidia container runtime hook can be used.

SSD

Build the container image:

This may take ~1 hour depending on download speeds of packages.

podman build -f ssd_dockerfile_ubi8 -t mlperf_v06_ssd_ubi8

Download the COCO 2017 dataset:

The data download script is in the ssd directory. Further documentation can be found in the original results. bash download_dataset.sh

Put the data in whatever directory you prefer. When you run the benchmark, you will specify the directory in DATADIR so that the run script can mount it in a volume for the container to access.

The directory structure of DATADIR should be:

<DATADIR>
├── coco2017/
│   ├── annotations/
│   ├── models/
│   ├── train2017/
│   └── val2017/
...

Run the benchmark

From the ssd directory, the benchmark can be started with CONT=mlperf_v06_ssd_ubi8 DATADIR=<coco2017> LOGDIR=/data/mlperf/logs DGXSYSTEM=DGX1 NEXP=1 PULL=0 ./podman_run.sub

On a system other than an NVIDIA DGX1 or DGX2, you will likely need to create a custom config_*.sh file to specify the number of GPUs, and to configure other parameters to optimize performance.

Mask R-CNN

Build the container image:

This may take ~1 hour depending on download speeds of packages.

podman build -f maskrcnn_dockerfile_ubi8 -t mlperf_v06_maskrcnn_ubi8

Download the COCO 2017 dataset:

The data is the same as that which is used for the ssd benchmark. See the above download instructions.

Download the ResNet-50 weights

The Mask R-CNN uses the trained ResNet-50 model as a backbone. Run the download_weights.sh script in the maskrcnn/ directory. Place the resulting file R-50.pkl in the directory coco2017/models directory, where coco2017 contains the unzipped train2017 and val2017 directories.

Run the benchmark

From the maskrcnn directory, the benchmark can be started with CONT=mlperf_v06_maskrcnn_ubi8 DATADIR=<coco2017> LOGDIR=/data/mlperf/logs DGXSYSTEM=DGX1 NEXP=1 PULL=0 ./podman_run.sub

On a system other than an NVIDIA DGX1 or DGX2, you will likely need to create a custom config_*.sh file to specify the number of GPUs, and to configure other parameters to optimize performance.

GNMT

Build the container image:

This may take ~1 hour depending on download speeds of packages.

podman build -f gnmt_dockerfile_ubi8 -t mlperf_v06_gnmt_ubi8

Download the data set:

Follow the instructions in the original documentation.

Run the benchmark

From the gnmt directory, the benchmark can be started with CONT=mlperf_v06_gnmt_ubi8 DATADIR=<gnmt_data> PREPROC_DATADIR=<host directory for preproc data> LOGDIR=/data/mlperf/logs DGXSYSTEM=DGX1 NEXP=1 PULL=0 ./podman_run.sub

On a system other than an NVIDIA DGX1 or DGX2, you will likely need to create a custom config_*.sh file to specify the number of GPUs, and to configure other parameters to optimize performance.

Transformer

Build the container image:

This may take ~1 hour depending on download speeds of packages.

podman build -f transformer_dockerfile_ubi8 -t mlperf_v06_transformer_ubi8

Download the data set:

Unfortunately the scripts in the submission files for the transformer benchmark do not work. To run the benchmark we had to get the data directly from the creators. I am still working on finding the exact source for the correct data.

Run the benchmark

From the transformer directory, the benchmark can be started with CONT=mlperf_v06_gnmt_ubi8 DATADIR=<transformer_data> LOGDIR=/data/mlperf/logs DGXSYSTEM=DGX1 NEXP=1 PULL=0 ./podman_run.sub

On a system other than an NVIDIA DGX1 or DGX2, you will likely need to create a custom config_*.sh file to specify the number of GPUs, and to configure other parameters to optimize performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Per-benchmark instructions

Prerequisites

SSD

Build the container image:

Download the COCO 2017 dataset:

Run the benchmark

Mask R-CNN

Build the container image:

Download the COCO 2017 dataset:

Download the ResNet-50 weights

Run the benchmark

GNMT

Build the container image:

Download the data set:

Run the benchmark

Transformer

Build the container image:

Download the data set:

Run the benchmark

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
gnmt		gnmt
maskrcnn		maskrcnn
pytorch		pytorch
resnet		resnet
ssd		ssd
transformer		transformer
README.md		README.md
gnmt_dockerfile_rhel		gnmt_dockerfile_rhel
gnmt_dockerfile_ubi8		gnmt_dockerfile_ubi8
maskrcnn_dockerfile_rhel		maskrcnn_dockerfile_rhel
maskrcnn_dockerfile_ubi8		maskrcnn_dockerfile_ubi8
ssd_dockerfile_rhel		ssd_dockerfile_rhel
ssd_dockerfile_ubi8		ssd_dockerfile_ubi8
transformer_dockerfile_rhel		transformer_dockerfile_rhel
transformer_dockerfile_ubi8		transformer_dockerfile_ubi8

openshift-psap/mlperf_v0.6_resources

Folders and files

Latest commit

History

Repository files navigation

Per-benchmark instructions

Prerequisites

SSD

Build the container image:

Download the COCO 2017 dataset:

Run the benchmark

Mask R-CNN

Build the container image:

Download the COCO 2017 dataset:

Download the ResNet-50 weights

Run the benchmark

GNMT

Build the container image:

Download the data set:

Run the benchmark

Transformer

Build the container image:

Download the data set:

Run the benchmark

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages