From 2e26a06e17e9dbb0680563a7a89e8dc8926f9ddb Mon Sep 17 00:00:00 2001 From: Arjun Suresh Date: Tue, 5 Nov 2024 17:25:34 +0000 Subject: [PATCH] Added SUT summary files --- .../1xMI300X_2xEPYC-9374F/summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../8xMI300X_2xEPYC-9374F/summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../8xMI300X_2xEPYC-TURIN/summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../ESC4000A_E12_4XH100_TRT/summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../ESC8000A_E12_H100x8_TRT/summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../ESC_N8_E11_H100x8_TRT/summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../C240M7_L40Sx2_TRT/summary/README.md | 58 ++++++++ .../C240M7_L40Sx2_TRT/summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../C245M8_L40Sx2_TRT/summary/README.md | 58 ++++++++ .../C245M8_L40Sx2_TRT/summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../X210c_L40SX2_TRT/summary/README.md | 58 ++++++++ .../X210c_L40SX2_TRT/summary/summary.html | 126 ++++++++++++++++++ .../results/Orin_TRT/summary/README.md | 58 ++++++++ .../results/Orin_TRT/summary/summary.html | 126 ++++++++++++++++++ .../1-node-2S-EMR-PyTorch/summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../R760xa_L40Sx4_TRT/summary/README.md | 58 ++++++++ .../R760xa_L40Sx4_TRT/summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../XE9680_MI300X_192GBx8/summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../results/XR8620_L4x1_TRT/summary/README.md | 58 ++++++++ .../XR8620_L4x1_TRT/summary/summary.html | 126 ++++++++++++++++++ .../results/CDI_L40Sx16_TRT/summary/README.md | 58 ++++++++ .../CDI_L40Sx16_TRT/summary/summary.html | 126 ++++++++++++++++++ .../results/CDI_L40Sx8_TRT/summary/README.md | 58 ++++++++ .../CDI_L40Sx8_TRT/summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../results/tpu_v5e_x4_flax/summary/README.md | 58 ++++++++ .../tpu_v5e_x4_flax/summary/summary.html | 126 ++++++++++++++++++ .../results/tpu_v6_x4_flax/summary/README.md | 58 ++++++++ .../tpu_v6_x4_flax/summary/summary.html | 126 ++++++++++++++++++ .../1-node-2S-EMR-PyTorch/summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../1-node-2S-EMR-PyTorch/summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../1-node-2S-GNR-PyTorch/summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../H200_SXM_141GBx8_TRT/summary/README.md | 58 ++++++++ .../H200_SXM_141GBx8_TRT/summary/summary.html | 126 ++++++++++++++++++ .../Lenovo_2xL40S_TRT/summary/README.md | 58 ++++++++ .../Lenovo_2xL40S_TRT/summary/summary.html | 126 ++++++++++++++++++ .../results/Lenovo_2xL4_TRT/summary/README.md | 58 ++++++++ .../Lenovo_2xL4_TRT/summary/summary.html | 126 ++++++++++++++++++ .../Lenovo_8xH200_TRT/summary/README.md | 58 ++++++++ .../Lenovo_8xH200_TRT/summary/summary.html | 126 ++++++++++++++++++ .../SR650_V3_3xL40S_TRT/summary/README.md | 58 ++++++++ .../SR650_V3_3xL40S_TRT/summary/summary.html | 126 ++++++++++++++++++ .../SR675_V3_8xH100_NVL_TRT/summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../B200-SXM-180GBx1_TRT/summary/README.md | 58 ++++++++ .../B200-SXM-180GBx1_TRT/summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../H200-SXM-141GBx1_TRT/summary/README.md | 58 ++++++++ .../H200-SXM-141GBx1_TRT/summary/summary.html | 126 ++++++++++++++++++ .../H200-SXM-141GBx8_TRT/summary/README.md | 58 ++++++++ .../H200-SXM-141GBx8_TRT/summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../NVIDIA/results/Orin_TRT/summary/README.md | 58 ++++++++ .../results/Orin_TRT/summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../1-node-2S-EMR-PyTorch/summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../L40S-RedHat-OpenShift/summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../1-node-2S-EMR-PyTorch/summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../SMC_H100_SXM_80GBX8_TRT/summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../results/h13_u1_preview/summary/README.md | 58 ++++++++ .../h13_u1_preview/summary/summary.html | 126 ++++++++++++++++++ .../h13_u1_preview_dc/summary/README.md | 58 ++++++++ .../h13_u1_preview_dc/summary/summary.html | 126 ++++++++++++++++++ .../results/h13_u1_slim/summary/README.md | 58 ++++++++ .../results/h13_u1_slim/summary/summary.html | 126 ++++++++++++++++++ .../results/h13_u2_preview/summary/README.md | 58 ++++++++ .../h13_u2_preview/summary/summary.html | 126 ++++++++++++++++++ .../h13_u2_preview_dc/summary/README.md | 58 ++++++++ .../h13_u2_preview_dc/summary/summary.html | 126 ++++++++++++++++++ .../results/h13_u3_slim/summary/README.md | 58 ++++++++ .../results/h13_u3_slim/summary/summary.html | 126 ++++++++++++++++++ .../results/r760_u4_slim/summary/README.md | 58 ++++++++ .../results/r760_u4_slim/summary/summary.html | 126 ++++++++++++++++++ .../results/r760_u6_slim/summary/README.md | 58 ++++++++ .../results/r760_u6_slim/summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../Orin_TRT_DepthPruned/summary/README.md | 58 ++++++++ .../Orin_TRT_DepthPruned/summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ .../summary/README.md | 58 ++++++++ .../summary/summary.html | 126 ++++++++++++++++++ 196 files changed, 18032 insertions(+) create mode 100644 closed/AMD/results/1xMI300X_2xEPYC-9374F/summary/README.md create mode 100644 closed/AMD/results/1xMI300X_2xEPYC-9374F/summary/summary.html create mode 100644 closed/AMD/results/8xMI300X_2xEPYC-9374F/summary/README.md create mode 100644 closed/AMD/results/8xMI300X_2xEPYC-9374F/summary/summary.html create mode 100644 closed/AMD/results/8xMI300X_2xEPYC-TURIN/summary/README.md create mode 100644 closed/AMD/results/8xMI300X_2xEPYC-TURIN/summary/summary.html create mode 100644 closed/ASUSTeK/results/ESC4000A_E12_4XH100_TRT/summary/README.md create mode 100644 closed/ASUSTeK/results/ESC4000A_E12_4XH100_TRT/summary/summary.html create mode 100644 closed/ASUSTeK/results/ESC8000A_E12_H100x8_TRT/summary/README.md create mode 100644 closed/ASUSTeK/results/ESC8000A_E12_H100x8_TRT/summary/summary.html create mode 100644 closed/ASUSTeK/results/ESC_N8_E11_H100x8_TRT/summary/README.md create mode 100644 closed/ASUSTeK/results/ESC_N8_E11_H100x8_TRT/summary/summary.html create mode 100644 closed/Cisco/results/C240M7-1-node-2S-EMR-PyTorch/summary/README.md create mode 100644 closed/Cisco/results/C240M7-1-node-2S-EMR-PyTorch/summary/summary.html create mode 100644 closed/Cisco/results/C240M7_L40Sx2_TRT/summary/README.md create mode 100644 closed/Cisco/results/C240M7_L40Sx2_TRT/summary/summary.html create mode 100644 closed/Cisco/results/C245M8_H100_PCIe_80GBx2_TRT/summary/README.md create mode 100644 closed/Cisco/results/C245M8_H100_PCIe_80GBx2_TRT/summary/summary.html create mode 100644 closed/Cisco/results/C245M8_L40Sx2_TRT/summary/README.md create mode 100644 closed/Cisco/results/C245M8_L40Sx2_TRT/summary/summary.html create mode 100644 closed/Cisco/results/X210M7-1-node-2S-EMR-PyTorch/summary/README.md create mode 100644 closed/Cisco/results/X210M7-1-node-2S-EMR-PyTorch/summary/summary.html create mode 100644 closed/Cisco/results/X210c_L40SX2_TRT/summary/README.md create mode 100644 closed/Cisco/results/X210c_L40SX2_TRT/summary/summary.html create mode 100644 closed/ConnectTechInc/results/Orin_TRT/summary/README.md create mode 100644 closed/ConnectTechInc/results/Orin_TRT/summary/summary.html create mode 100644 closed/Dell/results/1-node-2S-EMR-PyTorch/summary/README.md create mode 100644 closed/Dell/results/1-node-2S-EMR-PyTorch/summary/summary.html create mode 100644 closed/Dell/results/R760_H100_PCIe_80GBx2_TRT/summary/README.md create mode 100644 closed/Dell/results/R760_H100_PCIe_80GBx2_TRT/summary/summary.html create mode 100644 closed/Dell/results/R760xa_H100NVL_PCIe_94GBx4_TRT/summary/README.md create mode 100644 closed/Dell/results/R760xa_H100NVL_PCIe_94GBx4_TRT/summary/summary.html create mode 100644 closed/Dell/results/R760xa_H100_PCIe_80GBx4_TRT/summary/README.md create mode 100644 closed/Dell/results/R760xa_H100_PCIe_80GBx4_TRT/summary/summary.html create mode 100644 closed/Dell/results/R760xa_L40Sx4_TRT/summary/README.md create mode 100644 closed/Dell/results/R760xa_L40Sx4_TRT/summary/summary.html create mode 100644 closed/Dell/results/XE8640_H100_SXM_80GBx4_TRT/summary/README.md create mode 100644 closed/Dell/results/XE8640_H100_SXM_80GBx4_TRT/summary/summary.html create mode 100644 closed/Dell/results/XE9640_H100_SXM_80GBx4_TRT/summary/README.md create mode 100644 closed/Dell/results/XE9640_H100_SXM_80GBx4_TRT/summary/summary.html create mode 100644 closed/Dell/results/XE9680_H100_SXM_80GBx8_TRT/summary/README.md create mode 100644 closed/Dell/results/XE9680_H100_SXM_80GBx8_TRT/summary/summary.html create mode 100644 closed/Dell/results/XE9680_H200_SXM_141GBx8_TRT/summary/README.md create mode 100644 closed/Dell/results/XE9680_H200_SXM_141GBx8_TRT/summary/summary.html create mode 100644 closed/Dell/results/XE9680_MI300X_192GBx8/summary/README.md create mode 100644 closed/Dell/results/XE9680_MI300X_192GBx8/summary/summary.html create mode 100644 closed/Dell/results/XR8620_L4x1_TRT/summary/README.md create mode 100644 closed/Dell/results/XR8620_L4x1_TRT/summary/summary.html create mode 100644 closed/Fujitsu/results/CDI_L40Sx16_TRT/summary/README.md create mode 100644 closed/Fujitsu/results/CDI_L40Sx16_TRT/summary/summary.html create mode 100644 closed/Fujitsu/results/CDI_L40Sx8_TRT/summary/README.md create mode 100644 closed/Fujitsu/results/CDI_L40Sx8_TRT/summary/summary.html create mode 100644 closed/Fujitsu/results/GX2560M7_H100_SXM_80GBx4_TRT/summary/README.md create mode 100644 closed/Fujitsu/results/GX2560M7_H100_SXM_80GBx4_TRT/summary/summary.html create mode 100644 closed/GigaComputing/results/GIGABYTE_G593_SD1_H200_SXM_141GBx8_TRT/summary/README.md create mode 100644 closed/GigaComputing/results/GIGABYTE_G593_SD1_H200_SXM_141GBx8_TRT/summary/summary.html create mode 100644 closed/Google/results/DGX-H100_H100-SXM-80GBx8_TRT/summary/README.md create mode 100644 closed/Google/results/DGX-H100_H100-SXM-80GBx8_TRT/summary/summary.html create mode 100644 closed/Google/results/tpu_v5e_x4_flax/summary/README.md create mode 100644 closed/Google/results/tpu_v5e_x4_flax/summary/summary.html create mode 100644 closed/Google/results/tpu_v6_x4_flax/summary/README.md create mode 100644 closed/Google/results/tpu_v6_x4_flax/summary/summary.html create mode 100644 closed/HPE/results/1-node-2S-EMR-PyTorch/summary/README.md create mode 100644 closed/HPE/results/1-node-2S-EMR-PyTorch/summary/summary.html create mode 100644 closed/HPE/results/GH200-GraceHopper-Superchip_GH200-144GB_aarch64x1_TRT/summary/README.md create mode 100644 closed/HPE/results/GH200-GraceHopper-Superchip_GH200-144GB_aarch64x1_TRT/summary/summary.html create mode 100644 closed/HPE/results/HPE_Cray_XD670_H100_SXM_80GBx8_TRT/summary/README.md create mode 100644 closed/HPE/results/HPE_Cray_XD670_H100_SXM_80GBx8_TRT/summary/summary.html create mode 100644 closed/HPE/results/HPE_ProLiant_DL380a_H100_NVL_94GBx4_TRT/summary/README.md create mode 100644 closed/HPE/results/HPE_ProLiant_DL380a_H100_NVL_94GBx4_TRT/summary/summary.html create mode 100644 closed/HPE/results/HPE_ProLiant_DL380a_L40S_PCIe_48GBx4_TRT/summary/README.md create mode 100644 closed/HPE/results/HPE_ProLiant_DL380a_L40S_PCIe_48GBx4_TRT/summary/summary.html create mode 100644 closed/Intel/results/1-node-2S-EMR-PyTorch/summary/README.md create mode 100644 closed/Intel/results/1-node-2S-EMR-PyTorch/summary/summary.html create mode 100644 closed/Intel/results/1-node-2S-GNR-PyTorch/summary/README.md create mode 100644 closed/Intel/results/1-node-2S-GNR-PyTorch/summary/summary.html create mode 100644 closed/JuniperNetworks/results/DGX-H100_H100-SXM-80GBx16_TRT/summary/README.md create mode 100644 closed/JuniperNetworks/results/DGX-H100_H100-SXM-80GBx16_TRT/summary/summary.html create mode 100644 closed/JuniperNetworks/results/DGX-H100_H100-SXM-80GBx32_TRT/summary/README.md create mode 100644 closed/JuniperNetworks/results/DGX-H100_H100-SXM-80GBx32_TRT/summary/summary.html create mode 100644 closed/Lenovo/results/H200_SXM_141GBx8_TRT/summary/README.md create mode 100644 closed/Lenovo/results/H200_SXM_141GBx8_TRT/summary/summary.html create mode 100644 closed/Lenovo/results/Lenovo_2xL40S_TRT/summary/README.md create mode 100644 closed/Lenovo/results/Lenovo_2xL40S_TRT/summary/summary.html create mode 100644 closed/Lenovo/results/Lenovo_2xL4_TRT/summary/README.md create mode 100644 closed/Lenovo/results/Lenovo_2xL4_TRT/summary/summary.html create mode 100644 closed/Lenovo/results/Lenovo_8xH200_TRT/summary/README.md create mode 100644 closed/Lenovo/results/Lenovo_8xH200_TRT/summary/summary.html create mode 100644 closed/Lenovo/results/SR650_V3_3xL40S_TRT/summary/README.md create mode 100644 closed/Lenovo/results/SR650_V3_3xL40S_TRT/summary/summary.html create mode 100644 closed/Lenovo/results/SR675_V3_8xH100_NVL_TRT/summary/README.md create mode 100644 closed/Lenovo/results/SR675_V3_8xH100_NVL_TRT/summary/summary.html create mode 100644 closed/NVIDIA/results/B200-SXM-180GBx1_TRT/summary/README.md create mode 100644 closed/NVIDIA/results/B200-SXM-180GBx1_TRT/summary/summary.html create mode 100644 closed/NVIDIA/results/DGX-H100_H100-SXM-80GBx8_TRT/summary/README.md create mode 100644 closed/NVIDIA/results/DGX-H100_H100-SXM-80GBx8_TRT/summary/summary.html create mode 100644 closed/NVIDIA/results/GH200-GraceHopper-Superchip_GH200-144GB_aarch64x1_TRT/summary/README.md create mode 100644 closed/NVIDIA/results/GH200-GraceHopper-Superchip_GH200-144GB_aarch64x1_TRT/summary/summary.html create mode 100644 closed/NVIDIA/results/H200-SXM-141GB-CTSx1_TRT/summary/README.md create mode 100644 closed/NVIDIA/results/H200-SXM-141GB-CTSx1_TRT/summary/summary.html create mode 100644 closed/NVIDIA/results/H200-SXM-141GB-CTSx8_TRT/summary/README.md create mode 100644 closed/NVIDIA/results/H200-SXM-141GB-CTSx8_TRT/summary/summary.html create mode 100644 closed/NVIDIA/results/H200-SXM-141GBx1_TRT/summary/README.md create mode 100644 closed/NVIDIA/results/H200-SXM-141GBx1_TRT/summary/summary.html create mode 100644 closed/NVIDIA/results/H200-SXM-141GBx8_TRT/summary/README.md create mode 100644 closed/NVIDIA/results/H200-SXM-141GBx8_TRT/summary/summary.html create mode 100644 closed/NVIDIA/results/H200-SXM-141GBx8_TRT_MaxQ/summary/README.md create mode 100644 closed/NVIDIA/results/H200-SXM-141GBx8_TRT_MaxQ/summary/summary.html create mode 100644 closed/NVIDIA/results/H200-SXM-141GBx8_TRT_Triton/summary/README.md create mode 100644 closed/NVIDIA/results/H200-SXM-141GBx8_TRT_Triton/summary/summary.html create mode 100644 closed/NVIDIA/results/Orin_TRT/summary/README.md create mode 100644 closed/NVIDIA/results/Orin_TRT/summary/summary.html create mode 100644 closed/NeuralMagic/results/vLLM_8xL40S-reference-cpu-pytorch-v2.3.1-default_config/summary/README.md create mode 100644 closed/NeuralMagic/results/vLLM_8xL40S-reference-cpu-pytorch-v2.3.1-default_config/summary/summary.html create mode 100644 closed/Oracle/results/GH200-GraceHopper-Superchip_GH200-96GB_aarch64x1_TRT/summary/README.md create mode 100644 closed/Oracle/results/GH200-GraceHopper-Superchip_GH200-96GB_aarch64x1_TRT/summary/summary.html create mode 100644 closed/Quanta_Cloud_Technology/results/1-node-2S-EMR-PyTorch/summary/README.md create mode 100644 closed/Quanta_Cloud_Technology/results/1-node-2S-EMR-PyTorch/summary/summary.html create mode 100644 closed/Quanta_Cloud_Technology/results/D54U_3U_H100_PCIe_80GBx4_TRT/summary/README.md create mode 100644 closed/Quanta_Cloud_Technology/results/D54U_3U_H100_PCIe_80GBx4_TRT/summary/summary.html create mode 100644 closed/Quanta_Cloud_Technology/results/D54U_3U_L40S_PCIe_48GBx4_TRT/summary/README.md create mode 100644 closed/Quanta_Cloud_Technology/results/D54U_3U_L40S_PCIe_48GBx4_TRT/summary/summary.html create mode 100644 closed/Quanta_Cloud_Technology/results/GH200-GraceHopper-Superchip_GH200-96GB_aarch64x1_TRT/summary/README.md create mode 100644 closed/Quanta_Cloud_Technology/results/GH200-GraceHopper-Superchip_GH200-96GB_aarch64x1_TRT/summary/summary.html create mode 100644 closed/RedHat/results/L40S-RedHat-OpenShift/summary/README.md create mode 100644 closed/RedHat/results/L40S-RedHat-OpenShift/summary/summary.html create mode 100644 closed/Supermicro/results/1-node-2S-EMR-PyTorch/summary/README.md create mode 100644 closed/Supermicro/results/1-node-2S-EMR-PyTorch/summary/summary.html create mode 100644 closed/Supermicro/results/AS_4125GS_TNHR2_LCC_H100_SXM_80GBx8_TRT/summary/README.md create mode 100644 closed/Supermicro/results/AS_4125GS_TNHR2_LCC_H100_SXM_80GBx8_TRT/summary/summary.html create mode 100644 closed/Supermicro/results/AS_8125GS_TNHR_H100_SXM_80GBx8_TRT/summary/README.md create mode 100644 closed/Supermicro/results/AS_8125GS_TNHR_H100_SXM_80GBx8_TRT/summary/summary.html create mode 100644 closed/Supermicro/results/GH200-GraceHopper-Superchip_GH200-96GB_aarch64x1_TRT/summary/README.md create mode 100644 closed/Supermicro/results/GH200-GraceHopper-Superchip_GH200-96GB_aarch64x1_TRT/summary/summary.html create mode 100644 closed/Supermicro/results/SYS_421GE_TNHR2_LCC_H100_SXM_80GBx8_TRT/summary/README.md create mode 100644 closed/Supermicro/results/SYS_421GE_TNHR2_LCC_H100_SXM_80GBx8_TRT/summary/summary.html create mode 100644 closed/Supermicro/results/SYS_821GE_TNHR_H100_SXM_80GBx8_TRT/summary/README.md create mode 100644 closed/Supermicro/results/SYS_821GE_TNHR_H100_SXM_80GBx8_TRT/summary/summary.html create mode 100644 closed/Sustainable_Metal_Cloud/results/SMC_H100_SXM_80GBX8_TRT/summary/README.md create mode 100644 closed/Sustainable_Metal_Cloud/results/SMC_H100_SXM_80GBX8_TRT/summary/summary.html create mode 100644 closed/UntetherAI/results/h13_u1_preview/summary/README.md create mode 100644 closed/UntetherAI/results/h13_u1_preview/summary/summary.html create mode 100644 closed/UntetherAI/results/h13_u1_preview_dc/summary/README.md create mode 100644 closed/UntetherAI/results/h13_u1_preview_dc/summary/summary.html create mode 100644 closed/UntetherAI/results/h13_u1_slim/summary/README.md create mode 100644 closed/UntetherAI/results/h13_u1_slim/summary/summary.html create mode 100644 closed/UntetherAI/results/h13_u2_preview/summary/README.md create mode 100644 closed/UntetherAI/results/h13_u2_preview/summary/summary.html create mode 100644 closed/UntetherAI/results/h13_u2_preview_dc/summary/README.md create mode 100644 closed/UntetherAI/results/h13_u2_preview_dc/summary/summary.html create mode 100644 closed/UntetherAI/results/h13_u3_slim/summary/README.md create mode 100644 closed/UntetherAI/results/h13_u3_slim/summary/summary.html create mode 100644 closed/UntetherAI/results/r760_u4_slim/summary/README.md create mode 100644 closed/UntetherAI/results/r760_u4_slim/summary/summary.html create mode 100644 closed/UntetherAI/results/r760_u6_slim/summary/README.md create mode 100644 closed/UntetherAI/results/r760_u6_slim/summary/summary.html create mode 100644 open/CTuning/results/cm-demo-gfursin-aws-g4dn.4xlarge-nvidia_original-gpu-tensorrt-vdefault-default_config/summary/README.md create mode 100644 open/CTuning/results/cm-demo-gfursin-aws-g4dn.4xlarge-nvidia_original-gpu-tensorrt-vdefault-default_config/summary/summary.html create mode 100644 open/CTuning/results/cm-demo-gfursin-scaleway-L4-1-24G-reference-gpu-pytorch-v2.3.1-default_config/summary/README.md create mode 100644 open/CTuning/results/cm-demo-gfursin-scaleway-L4-1-24G-reference-gpu-pytorch-v2.3.1-default_config/summary/summary.html create mode 100644 open/HPE/results/HPE_Cray_XD670_H100_SXM_80GBx8_TRT/summary/README.md create mode 100644 open/HPE/results/HPE_Cray_XD670_H100_SXM_80GBx8_TRT/summary/summary.html create mode 100644 open/HPE/results/HPE_ProLiant_DL380a_H100_NVL_94GBx4_TRT/summary/README.md create mode 100644 open/HPE/results/HPE_ProLiant_DL380a_H100_NVL_94GBx4_TRT/summary/summary.html create mode 100644 open/HPE/results/HPE_ProLiant_DL380a_L40S_PCIe_48GBx4_TRT/summary/README.md create mode 100644 open/HPE/results/HPE_ProLiant_DL380a_L40S_PCIe_48GBx4_TRT/summary/summary.html create mode 100644 open/Krai/results/7920t-kilt-onnxruntime_cpu/summary/README.md create mode 100644 open/Krai/results/7920t-kilt-onnxruntime_cpu/summary/summary.html create mode 100644 open/Krai/results/7920t-kilt-onnxruntime_gpu/summary/README.md create mode 100644 open/Krai/results/7920t-kilt-onnxruntime_gpu/summary/summary.html create mode 100644 open/NVIDIA/results/H200-SXM-141GBx1_TRT_DepthPruned/summary/README.md create mode 100644 open/NVIDIA/results/H200-SXM-141GBx1_TRT_DepthPruned/summary/summary.html create mode 100644 open/NVIDIA/results/H200-SXM-141GBx1_TRT_LCM/summary/README.md create mode 100644 open/NVIDIA/results/H200-SXM-141GBx1_TRT_LCM/summary/summary.html create mode 100644 open/NVIDIA/results/H200-SXM-141GBx1_TRT_Sparse/summary/README.md create mode 100644 open/NVIDIA/results/H200-SXM-141GBx1_TRT_Sparse/summary/summary.html create mode 100644 open/NVIDIA/results/Orin_TRT_DepthPruned/summary/README.md create mode 100644 open/NVIDIA/results/Orin_TRT_DepthPruned/summary/summary.html create mode 100644 open/NeuralMagic/results/4xH100-SXM-80GB_vLLM_FP8-reference-cpu-pytorch-v2.3.1-default_config/summary/README.md create mode 100644 open/NeuralMagic/results/4xH100-SXM-80GB_vLLM_FP8-reference-cpu-pytorch-v2.3.1-default_config/summary/summary.html create mode 100644 open/NeuralMagic/results/4xH100-SXM-80GB_vLLM_GPTQ-reference-cpu-pytorch-v2.3.1-default_config/summary/README.md create mode 100644 open/NeuralMagic/results/4xH100-SXM-80GB_vLLM_GPTQ-reference-cpu-pytorch-v2.3.1-default_config/summary/summary.html create mode 100644 open/NeuralMagic/results/ASUS_Vivobook-reference-cpu-deepsparse_v1.8.0-default_config/summary/README.md create mode 100644 open/NeuralMagic/results/ASUS_Vivobook-reference-cpu-deepsparse_v1.8.0-default_config/summary/summary.html create mode 100644 open/NeuralMagic/results/GO_2xRTX4090-reference-cpu-pytorch-v2.2.1-default_config/summary/README.md create mode 100644 open/NeuralMagic/results/GO_2xRTX4090-reference-cpu-pytorch-v2.2.1-default_config/summary/summary.html create mode 100644 open/NeuralMagic/results/GO_Intel_SPR-intel-cpu-pytorch-vdefault-default_config/summary/README.md create mode 100644 open/NeuralMagic/results/GO_Intel_SPR-intel-cpu-pytorch-vdefault-default_config/summary/summary.html create mode 100644 open/NeuralMagic/results/pcspecialist_amd_am5-reference-gpu-pytorch-v2.2.1-default_config/summary/README.md create mode 100644 open/NeuralMagic/results/pcspecialist_amd_am5-reference-gpu-pytorch-v2.2.1-default_config/summary/summary.html create mode 100644 open/NeuralMagic/results/phoenix_Amd_Am5-reference-cpu-deepsparse-vdefault-default_config/summary/README.md create mode 100644 open/NeuralMagic/results/phoenix_Amd_Am5-reference-cpu-deepsparse-vdefault-default_config/summary/summary.html create mode 100644 open/NeuralMagic/results/vLLM_8xL40S-reference-cpu-pytorch-v2.3.1-default_config/summary/README.md create mode 100644 open/NeuralMagic/results/vLLM_8xL40S-reference-cpu-pytorch-v2.3.1-default_config/summary/summary.html create mode 100644 open/Oracle/results/GH200-GraceHopper-Superchip_GH200-96GB_aarch64x1_TRT/summary/README.md create mode 100644 open/Oracle/results/GH200-GraceHopper-Superchip_GH200-96GB_aarch64x1_TRT/summary/summary.html diff --git a/closed/AMD/results/1xMI300X_2xEPYC-9374F/summary/README.md b/closed/AMD/results/1xMI300X_2xEPYC-9374F/summary/README.md new file mode 100644 index 00000000..4b76f4b1 --- /dev/null +++ b/closed/AMD/results/1xMI300X_2xEPYC-9374F/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

AMD

+

Supermicro AS-8125GS-TNMR2

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:AMDAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnectXGMI
accelerator_interconnect_topology
accelerator_memory_capacity192 GB
accelerator_memory_configurationHBM3
accelerator_model_nameAMD Instinct MI300X-NPS1-SPX-192GB-750W
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity1.5TiB
host_memory_configuration24x 64GB Micron MTC40F2046S1RC48BA1 MHCC
host_processor_caches
host_processor_core_count32
host_processor_frequency
host_processor_interconnect
host_processor_model_name2xAMD EPYC 9374F
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count1x BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller, 1x MT2910 Family [ConnectX-7]
host_networking10G Ethernet
host_networking_topologyEthernet on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkvLLM 0.4.3+rocm614, PyTorch 2.3.0, ROCm 6.1.2
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4 LTS (Jammy Jellyfish)
other_software_stackhipblaslt-8b71e7a, flash-attn-23a2b1c, vllm-a8cff57
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 2520.27Tokens/s 3062.72
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 2520.27Tokens/s 3062.72
diff --git a/closed/AMD/results/1xMI300X_2xEPYC-9374F/summary/summary.html b/closed/AMD/results/1xMI300X_2xEPYC-9374F/summary/summary.html new file mode 100644 index 00000000..957ae4b7 --- /dev/null +++ b/closed/AMD/results/1xMI300X_2xEPYC-9374F/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

AMD

+

Supermicro AS-8125GS-TNMR2

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:AMDAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnectXGMI
accelerator_interconnect_topology
accelerator_memory_capacity192 GB
accelerator_memory_configurationHBM3
accelerator_model_nameAMD Instinct MI300X-NPS1-SPX-192GB-750W
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity1.5TiB
host_memory_configuration24x 64GB Micron MTC40F2046S1RC48BA1 MHCC
host_processor_caches
host_processor_core_count32
host_processor_frequency
host_processor_interconnect
host_processor_model_name2xAMD EPYC 9374F
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count1x BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller, 1x MT2910 Family [ConnectX-7]
host_networking10G Ethernet
host_networking_topologyEthernet on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkvLLM 0.4.3+rocm614, PyTorch 2.3.0, ROCm 6.1.2
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4 LTS (Jammy Jellyfish)
other_software_stackhipblaslt-8b71e7a, flash-attn-23a2b1c, vllm-a8cff57
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 2520.27Tokens/s 3062.72
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 2520.27Tokens/s 3062.72
+ + + + \ No newline at end of file diff --git a/closed/AMD/results/8xMI300X_2xEPYC-9374F/summary/README.md b/closed/AMD/results/8xMI300X_2xEPYC-9374F/summary/README.md new file mode 100644 index 00000000..dd400ca1 --- /dev/null +++ b/closed/AMD/results/8xMI300X_2xEPYC-9374F/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

AMD

+

Supermicro AS-8125GS-TNMR2

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:AMDAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnectXGMI
accelerator_interconnect_topology
accelerator_memory_capacity192 GB
accelerator_memory_configurationHBM3
accelerator_model_nameAMD Instinct MI300X-NPS1-SPX-192GB-750W
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity1.5TiB
host_memory_configuration24x 64GB Micron MTC40F2046S1RC48BA1 MHCC
host_processor_caches
host_processor_core_count32
host_processor_frequency
host_processor_interconnect
host_processor_model_name2xAMD EPYC 9374F
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count1x BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller, 1x MT2910 Family [ConnectX-7]
host_networking10G Ethernet
host_networking_topologyEthernet on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkvLLM 0.4.3+rocm614, PyTorch 2.3.0, ROCm 6.1.2
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4 LTS (Jammy Jellyfish)
other_software_stackhipblaslt-8b71e7a, flash-attn-23a2b1c, vllm-a8cff57
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 21028.2Tokens/s 23514.8
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 21028.2Tokens/s 23514.8
diff --git a/closed/AMD/results/8xMI300X_2xEPYC-9374F/summary/summary.html b/closed/AMD/results/8xMI300X_2xEPYC-9374F/summary/summary.html new file mode 100644 index 00000000..e61f20ed --- /dev/null +++ b/closed/AMD/results/8xMI300X_2xEPYC-9374F/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

AMD

+

Supermicro AS-8125GS-TNMR2

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:AMDAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnectXGMI
accelerator_interconnect_topology
accelerator_memory_capacity192 GB
accelerator_memory_configurationHBM3
accelerator_model_nameAMD Instinct MI300X-NPS1-SPX-192GB-750W
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity1.5TiB
host_memory_configuration24x 64GB Micron MTC40F2046S1RC48BA1 MHCC
host_processor_caches
host_processor_core_count32
host_processor_frequency
host_processor_interconnect
host_processor_model_name2xAMD EPYC 9374F
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count1x BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller, 1x MT2910 Family [ConnectX-7]
host_networking10G Ethernet
host_networking_topologyEthernet on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkvLLM 0.4.3+rocm614, PyTorch 2.3.0, ROCm 6.1.2
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4 LTS (Jammy Jellyfish)
other_software_stackhipblaslt-8b71e7a, flash-attn-23a2b1c, vllm-a8cff57
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 21028.2Tokens/s 23514.8
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 21028.2Tokens/s 23514.8
+ + + + \ No newline at end of file diff --git a/closed/AMD/results/8xMI300X_2xEPYC-TURIN/summary/README.md b/closed/AMD/results/8xMI300X_2xEPYC-TURIN/summary/README.md new file mode 100644 index 00000000..d4adc758 --- /dev/null +++ b/closed/AMD/results/8xMI300X_2xEPYC-TURIN/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

AMD

+

Supermicro AS-8125GS-TNMR2

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:AMDAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnectXGMI
accelerator_interconnect_topology
accelerator_memory_capacity192 GB
accelerator_memory_configurationHBM3
accelerator_model_nameAMD Instinct MI300X-NPS1-SPX-192GB-750W
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity1.5TiB
host_memory_configuration24x 64GB Samsung M321R8GA0PB1-CCPXC
host_processor_caches
host_processor_core_countN/A
host_processor_frequency
host_processor_interconnect
host_processor_model_name2xAMD EPYC TURIN
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notesPreview due to using AMD next-generation EPYC CPU
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count1x BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller, 1x MT2910 Family [ConnectX-7]
host_networking10G Ethernet
host_networking_topologyEthernet on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkvLLM 0.4.3+rocm614, PyTorch 2.3.0, ROCm 6.1.2
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.6 LTS (Jammy Jellyfish)
other_software_stackhipblaslt-8b71e7a, flash-attn-23a2b1c, vllm-a8cff57
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 22020.9Tokens/s 24109.8
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 22020.9Tokens/s 24109.8
diff --git a/closed/AMD/results/8xMI300X_2xEPYC-TURIN/summary/summary.html b/closed/AMD/results/8xMI300X_2xEPYC-TURIN/summary/summary.html new file mode 100644 index 00000000..ed64e1a5 --- /dev/null +++ b/closed/AMD/results/8xMI300X_2xEPYC-TURIN/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

AMD

+

Supermicro AS-8125GS-TNMR2

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:AMDAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnectXGMI
accelerator_interconnect_topology
accelerator_memory_capacity192 GB
accelerator_memory_configurationHBM3
accelerator_model_nameAMD Instinct MI300X-NPS1-SPX-192GB-750W
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity1.5TiB
host_memory_configuration24x 64GB Samsung M321R8GA0PB1-CCPXC
host_processor_caches
host_processor_core_countN/A
host_processor_frequency
host_processor_interconnect
host_processor_model_name2xAMD EPYC TURIN
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notesPreview due to using AMD next-generation EPYC CPU
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count1x BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller, 1x MT2910 Family [ConnectX-7]
host_networking10G Ethernet
host_networking_topologyEthernet on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkvLLM 0.4.3+rocm614, PyTorch 2.3.0, ROCm 6.1.2
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.6 LTS (Jammy Jellyfish)
other_software_stackhipblaslt-8b71e7a, flash-attn-23a2b1c, vllm-a8cff57
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 22020.9Tokens/s 24109.8
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 22020.9Tokens/s 24109.8
+ + + + \ No newline at end of file diff --git a/closed/ASUSTeK/results/ESC4000A_E12_4XH100_TRT/summary/README.md b/closed/ASUSTeK/results/ESC4000A_E12_4XH100_TRT/summary/README.md new file mode 100644 index 00000000..fda17500 --- /dev/null +++ b/closed/ASUSTeK/results/ESC4000A_E12_4XH100_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

ASUSTeK

+

ASUSTeK ESC4000A-E12 (4xH100-NVL-94GB)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:ASUSTeKAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacity94 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-NVL-94GB
accelerator_on-chip_memories
accelerators_per_node4

Processor and Memory Details

host_memory_capacity768 GB
host_memory_configuration12x64GB MTC40F2046S1RC48BA1
host_processor_caches
host_processor_core_count96
host_processor_frequency
host_processor_interconnect
host_processor_model_nameAMD EPYC 9654 96-Core Processor
host_processors_per_node1

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count1x MT2910 Family [ConnectX-7]
host_networkingInfiniband
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 20.04.6
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 9.1.0, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 5788.29Tokens/s 6673.95
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 5788.29Tokens/s 6673.95
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 6070.26Tokens/s 6985.51
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 6070.26Tokens/s 6985.51
bert-99F1: 89.9653Queries/s 17001.6Samples/s 24173.2
bert-99.9F1: 90.7831Queries/s 16002.5Samples/s 20380.0
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 5.3739Samples/s 5.89745
dlrm-v2-99AUC: 79.5069Queries/s 129386.0Samples/s 197140.0
dlrm-v2-99.9AUC: 80.2297Queries/s 100009.0Samples/s 113651.0
retinanetmAP: 37.1745Queries/s 2647.04Samples/s 5288.58
resnetacc: 75.6954Queries/s 233275.0Samples/s 270891.0
3d-unet-99DICE: 0.8531Samples/s 21.5153
3d-unet-99.9DICE: 0.8608Samples/s 21.5153
diff --git a/closed/ASUSTeK/results/ESC4000A_E12_4XH100_TRT/summary/summary.html b/closed/ASUSTeK/results/ESC4000A_E12_4XH100_TRT/summary/summary.html new file mode 100644 index 00000000..ba9799a2 --- /dev/null +++ b/closed/ASUSTeK/results/ESC4000A_E12_4XH100_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

ASUSTeK

+

ASUSTeK ESC4000A-E12 (4xH100-NVL-94GB)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:ASUSTeKAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacity94 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-NVL-94GB
accelerator_on-chip_memories
accelerators_per_node4

Processor and Memory Details

host_memory_capacity768 GB
host_memory_configuration12x64GB MTC40F2046S1RC48BA1
host_processor_caches
host_processor_core_count96
host_processor_frequency
host_processor_interconnect
host_processor_model_nameAMD EPYC 9654 96-Core Processor
host_processors_per_node1

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count1x MT2910 Family [ConnectX-7]
host_networkingInfiniband
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 20.04.6
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 9.1.0, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 5788.29Tokens/s 6673.95
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 5788.29Tokens/s 6673.95
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 6070.26Tokens/s 6985.51
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 6070.26Tokens/s 6985.51
bert-99F1: 89.9653Queries/s 17001.6Samples/s 24173.2
bert-99.9F1: 90.7831Queries/s 16002.5Samples/s 20380.0
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 5.3739Samples/s 5.89745
dlrm-v2-99AUC: 79.5069Queries/s 129386.0Samples/s 197140.0
dlrm-v2-99.9AUC: 80.2297Queries/s 100009.0Samples/s 113651.0
retinanetmAP: 37.1745Queries/s 2647.04Samples/s 5288.58
resnetacc: 75.6954Queries/s 233275.0Samples/s 270891.0
3d-unet-99DICE: 0.8531Samples/s 21.5153
3d-unet-99.9DICE: 0.8608Samples/s 21.5153
+ + + + \ No newline at end of file diff --git a/closed/ASUSTeK/results/ESC8000A_E12_H100x8_TRT/summary/README.md b/closed/ASUSTeK/results/ESC8000A_E12_H100x8_TRT/summary/README.md new file mode 100644 index 00000000..906dc669 --- /dev/null +++ b/closed/ASUSTeK/results/ESC8000A_E12_H100x8_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

ASUSTeK

+

ESC8000A-E12 (8x H100-PCIe-80GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:ASUSTeKAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM3e
accelerator_model_nameNVIDIA H100-PCIe-80GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration16x 64GB
host_processor_caches
host_processor_core_count32
host_processor_frequency
host_processor_interconnect
host_processor_model_nameAMD EPYC 9374F 32-Core Processor
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notesData bandwidth for GPU-PCIe: 504 GB/s; PCIe-NIC: 126 GB/s
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count 2x MT2910 Family [ConnectX-7]
host_networkingInfiniband
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 20.04.6
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 9.1.0, Driver 550.54.15 , DALI 1.36.0
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 8094.27Tokens/s 9281.5
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 8094.27Tokens/s 9281.5
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 7986.46Tokens/s 13078.5
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 7986.46Tokens/s 13078.5
bert-99F1: 89.9653Queries/s 35365.6Samples/s 45409.8
bert-99.9F1: 90.7831Queries/s 32007.7Samples/s 38464.2
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 7.84131Samples/s 9.83763
dlrm-v2-99AUC: 79.5069Queries/s 170023.0Samples/s 368654.0
dlrm-v2-99.9AUC: 80.2297Queries/s 170021.0Samples/s 211128.0
retinanetmAP: 37.1745Queries/s 8801.91Samples/s 9421.13
resnetacc: 75.6954Queries/s 410103.0Samples/s 450364.0
3d-unet-99DICE: 0.8531Samples/s 37.1373
3d-unet-99.9DICE: 0.8608Samples/s 37.1373
diff --git a/closed/ASUSTeK/results/ESC8000A_E12_H100x8_TRT/summary/summary.html b/closed/ASUSTeK/results/ESC8000A_E12_H100x8_TRT/summary/summary.html new file mode 100644 index 00000000..c9f826c6 --- /dev/null +++ b/closed/ASUSTeK/results/ESC8000A_E12_H100x8_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

ASUSTeK

+

ESC8000A-E12 (8x H100-PCIe-80GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:ASUSTeKAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM3e
accelerator_model_nameNVIDIA H100-PCIe-80GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration16x 64GB
host_processor_caches
host_processor_core_count32
host_processor_frequency
host_processor_interconnect
host_processor_model_nameAMD EPYC 9374F 32-Core Processor
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notesData bandwidth for GPU-PCIe: 504 GB/s; PCIe-NIC: 126 GB/s
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count 2x MT2910 Family [ConnectX-7]
host_networkingInfiniband
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 20.04.6
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 9.1.0, Driver 550.54.15 , DALI 1.36.0
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 8094.27Tokens/s 9281.5
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 8094.27Tokens/s 9281.5
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 7986.46Tokens/s 13078.5
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 7986.46Tokens/s 13078.5
bert-99F1: 89.9653Queries/s 35365.6Samples/s 45409.8
bert-99.9F1: 90.7831Queries/s 32007.7Samples/s 38464.2
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 7.84131Samples/s 9.83763
dlrm-v2-99AUC: 79.5069Queries/s 170023.0Samples/s 368654.0
dlrm-v2-99.9AUC: 80.2297Queries/s 170021.0Samples/s 211128.0
retinanetmAP: 37.1745Queries/s 8801.91Samples/s 9421.13
resnetacc: 75.6954Queries/s 410103.0Samples/s 450364.0
3d-unet-99DICE: 0.8531Samples/s 37.1373
3d-unet-99.9DICE: 0.8608Samples/s 37.1373
+ + + + \ No newline at end of file diff --git a/closed/ASUSTeK/results/ESC_N8_E11_H100x8_TRT/summary/README.md b/closed/ASUSTeK/results/ESC_N8_E11_H100x8_TRT/summary/README.md new file mode 100644 index 00000000..7709db1f --- /dev/null +++ b/closed/ASUSTeK/results/ESC_N8_E11_H100x8_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

ASUSTeK

+

ESC-N8-E11 (8x H100-SXM-80GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:ASUSTeKAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-SXM-80GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity1.5 TB
host_memory_configuration16x 96GB M321RYGA0PB2-CCPPC
host_processor_caches
host_processor_core_count64
host_processor_frequency
host_processor_interconnect
host_processor_model_nameINTEL(R) XEON(R) PLATINUM 8592+
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesNVMe SSD
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count10x MT2910 Family [ConnectX-7]
host_networkingInfiniband
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.3
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.90.07
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 20605.3Tokens/s 24323.6
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 20605.3Tokens/s 24323.6
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 19226.9Tokens/s 19877.6
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 19226.9Tokens/s 19877.6
bert-99F1: 89.9653Queries/s 57005.2Samples/s 70661.2
bert-99.9F1: 90.7831Queries/s 51213.8Samples/s 62371.4
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 15.7041Samples/s 16.4176
dlrm-v2-99AUC: 79.5069Queries/s 516159.0Samples/s 591476.0
dlrm-v2-99.9AUC: 80.2297Queries/s 330066.0Samples/s 363048.0
retinanetmAP: 37.1745Queries/s 13763.0Samples/s 14432.8
resnetacc: 75.6954Queries/s 630229.0Samples/s 709920.0
3d-unet-99DICE: 0.8531Samples/s 51.6944
3d-unet-99.9DICE: 0.8608Samples/s 51.6944
diff --git a/closed/ASUSTeK/results/ESC_N8_E11_H100x8_TRT/summary/summary.html b/closed/ASUSTeK/results/ESC_N8_E11_H100x8_TRT/summary/summary.html new file mode 100644 index 00000000..64fcecd4 --- /dev/null +++ b/closed/ASUSTeK/results/ESC_N8_E11_H100x8_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

ASUSTeK

+

ESC-N8-E11 (8x H100-SXM-80GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:ASUSTeKAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-SXM-80GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity1.5 TB
host_memory_configuration16x 96GB M321RYGA0PB2-CCPPC
host_processor_caches
host_processor_core_count64
host_processor_frequency
host_processor_interconnect
host_processor_model_nameINTEL(R) XEON(R) PLATINUM 8592+
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesNVMe SSD
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count10x MT2910 Family [ConnectX-7]
host_networkingInfiniband
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.3
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.90.07
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 20605.3Tokens/s 24323.6
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 20605.3Tokens/s 24323.6
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 19226.9Tokens/s 19877.6
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 19226.9Tokens/s 19877.6
bert-99F1: 89.9653Queries/s 57005.2Samples/s 70661.2
bert-99.9F1: 90.7831Queries/s 51213.8Samples/s 62371.4
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 15.7041Samples/s 16.4176
dlrm-v2-99AUC: 79.5069Queries/s 516159.0Samples/s 591476.0
dlrm-v2-99.9AUC: 80.2297Queries/s 330066.0Samples/s 363048.0
retinanetmAP: 37.1745Queries/s 13763.0Samples/s 14432.8
resnetacc: 75.6954Queries/s 630229.0Samples/s 709920.0
3d-unet-99DICE: 0.8531Samples/s 51.6944
3d-unet-99.9DICE: 0.8608Samples/s 51.6944
+ + + + \ No newline at end of file diff --git a/closed/Cisco/results/C240M7-1-node-2S-EMR-PyTorch/summary/README.md b/closed/Cisco/results/C240M7-1-node-2S-EMR-PyTorch/summary/README.md new file mode 100644 index 00000000..27b3fea6 --- /dev/null +++ b/closed/Cisco/results/C240M7-1-node-2S-EMR-PyTorch/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Cisco

+

C240M7-1-node-2S-EMR-PyTorch

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:CiscoAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerators_per_node0
accelerator_model_nameN/A
accelerator_frequencyN/A
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topologyN/A
accelerator_memory_capacityN/A
accelerator_memory_configurationN/A
accelerator_on-chip_memoriesN/A

Processor and Memory Details

host_processor_model_nameINTEL(R) XEON(R) PLATINUM 8592+
host_processors_per_node2
host_processor_core_count64
host_processor_frequencyN/A
host_processor_cachesN/A
host_memory_configuration8 slots / 64GB each / per socket
host_memory_capacity1024GB
host_processor_interconnectN/A

Other Hardware Details

coolingAir
hw_notesCisco UCS C240 M7

Network and Interconnect Details

host_networkingEthernet Controller / 100G
host_networking_topologyN/A
host_network_card_count2

Software Details

frameworkPyTorch
operating_systemUbuntu 22.04.3 LTS
other_software_stack5.15.0-102-generic
sw_notesN/A
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 113.739Tokens/s 252.597
bert-99F1: 89.9653Queries/s 1321.65Samples/s 1678.45
dlrm-v2-99AUC: 79.5069Queries/s 9102.39Samples/s 10404.5
retinanetmAP: 37.1745Queries/s 285.454Samples/s 388.142
resnetacc: 75.6954Queries/s 22501.7Samples/s 25643.7
3d-unet-99.9DICE: 0.8608Samples/s 1.93148
diff --git a/closed/Cisco/results/C240M7-1-node-2S-EMR-PyTorch/summary/summary.html b/closed/Cisco/results/C240M7-1-node-2S-EMR-PyTorch/summary/summary.html new file mode 100644 index 00000000..3b96f3a7 --- /dev/null +++ b/closed/Cisco/results/C240M7-1-node-2S-EMR-PyTorch/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Cisco

+

C240M7-1-node-2S-EMR-PyTorch

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:CiscoAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerators_per_node0
accelerator_model_nameN/A
accelerator_frequencyN/A
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topologyN/A
accelerator_memory_capacityN/A
accelerator_memory_configurationN/A
accelerator_on-chip_memoriesN/A

Processor and Memory Details

host_processor_model_nameINTEL(R) XEON(R) PLATINUM 8592+
host_processors_per_node2
host_processor_core_count64
host_processor_frequencyN/A
host_processor_cachesN/A
host_memory_configuration8 slots / 64GB each / per socket
host_memory_capacity1024GB
host_processor_interconnectN/A

Other Hardware Details

coolingAir
hw_notesCisco UCS C240 M7

Network and Interconnect Details

host_networkingEthernet Controller / 100G
host_networking_topologyN/A
host_network_card_count2

Software Details

frameworkPyTorch
operating_systemUbuntu 22.04.3 LTS
other_software_stack5.15.0-102-generic
sw_notesN/A
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 113.739Tokens/s 252.597
bert-99F1: 89.9653Queries/s 1321.65Samples/s 1678.45
dlrm-v2-99AUC: 79.5069Queries/s 9102.39Samples/s 10404.5
retinanetmAP: 37.1745Queries/s 285.454Samples/s 388.142
resnetacc: 75.6954Queries/s 22501.7Samples/s 25643.7
3d-unet-99.9DICE: 0.8608Samples/s 1.93148
+ + + + \ No newline at end of file diff --git a/closed/Cisco/results/C240M7_L40Sx2_TRT/summary/README.md b/closed/Cisco/results/C240M7_L40Sx2_TRT/summary/README.md new file mode 100644 index 00000000..74a66214 --- /dev/null +++ b/closed/Cisco/results/C240M7_L40Sx2_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

CISCO

+

Cisco UCS C240 M7 (2x L40S, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:CISCOAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen4 x16
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacity48 GB
accelerator_memory_configurationGDDR6
accelerator_model_nameNVIDIA L40S
accelerator_on-chip_memories
accelerators_per_node2

Processor and Memory Details

host_memory_capacity1 TB
host_memory_configuration16x 64GB HMCG94MEBRA109N
host_processor_caches
host_processor_core_count32
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel Xeon Gold 6448H
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count2x 100Gbe
host_networkingEthernet
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.3
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54.15, DALI 1.28.0
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 1734.03Tokens/s 1747.67
bert-99F1: 89.9653Queries/s 6761.33Samples/s 6684.77
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 1.25757Samples/s 1.36759
retinanetmAP: 37.1745Queries/s 1580.59Samples/s 1642.06
resnetacc: 75.6954Queries/s 91209.1Samples/s 87978.3
diff --git a/closed/Cisco/results/C240M7_L40Sx2_TRT/summary/summary.html b/closed/Cisco/results/C240M7_L40Sx2_TRT/summary/summary.html new file mode 100644 index 00000000..b9de98a5 --- /dev/null +++ b/closed/Cisco/results/C240M7_L40Sx2_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

CISCO

+

Cisco UCS C240 M7 (2x L40S, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:CISCOAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen4 x16
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacity48 GB
accelerator_memory_configurationGDDR6
accelerator_model_nameNVIDIA L40S
accelerator_on-chip_memories
accelerators_per_node2

Processor and Memory Details

host_memory_capacity1 TB
host_memory_configuration16x 64GB HMCG94MEBRA109N
host_processor_caches
host_processor_core_count32
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel Xeon Gold 6448H
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count2x 100Gbe
host_networkingEthernet
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.3
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54.15, DALI 1.28.0
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 1734.03Tokens/s 1747.67
bert-99F1: 89.9653Queries/s 6761.33Samples/s 6684.77
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 1.25757Samples/s 1.36759
retinanetmAP: 37.1745Queries/s 1580.59Samples/s 1642.06
resnetacc: 75.6954Queries/s 91209.1Samples/s 87978.3
+ + + + \ No newline at end of file diff --git a/closed/Cisco/results/C245M8_H100_PCIe_80GBx2_TRT/summary/README.md b/closed/Cisco/results/C245M8_H100_PCIe_80GBx2_TRT/summary/README.md new file mode 100644 index 00000000..b9a3897a --- /dev/null +++ b/closed/Cisco/results/C245M8_H100_PCIe_80GBx2_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

CISCO

+

Cisco UCS C245 M8 (2x H100-PCIe-80GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:CISCOAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacity80 GB
accelerator_memory_configurationGDDR6
accelerator_model_nameNVIDIA H100-PCIe-80GB
accelerator_on-chip_memories
accelerators_per_node2

Processor and Memory Details

host_memory_capacity1 TB
host_memory_configuration16x 64GB HMCG94MEBRA109N
host_processor_caches
host_processor_core_count96
host_processor_frequency
host_processor_interconnect
host_processor_model_nameAMD EPYC 9684X 96-Core Processor
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count2x 100Gbe
host_networkingEthernet
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.3
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54.15, DALI 1.28.0
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 3258.83Tokens/s 3268.94
bert-99F1: 89.9653Queries/s 9060.77Samples/s 11576.0
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 2.1963Samples/s 2.4685
retinanetmAP: 37.1745Queries/s 2101.87Samples/s 2349.22
resnetacc: 75.6954Queries/s 102012.0Samples/s 113422.0
diff --git a/closed/Cisco/results/C245M8_H100_PCIe_80GBx2_TRT/summary/summary.html b/closed/Cisco/results/C245M8_H100_PCIe_80GBx2_TRT/summary/summary.html new file mode 100644 index 00000000..b095848f --- /dev/null +++ b/closed/Cisco/results/C245M8_H100_PCIe_80GBx2_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

CISCO

+

Cisco UCS C245 M8 (2x H100-PCIe-80GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:CISCOAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacity80 GB
accelerator_memory_configurationGDDR6
accelerator_model_nameNVIDIA H100-PCIe-80GB
accelerator_on-chip_memories
accelerators_per_node2

Processor and Memory Details

host_memory_capacity1 TB
host_memory_configuration16x 64GB HMCG94MEBRA109N
host_processor_caches
host_processor_core_count96
host_processor_frequency
host_processor_interconnect
host_processor_model_nameAMD EPYC 9684X 96-Core Processor
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count2x 100Gbe
host_networkingEthernet
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.3
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54.15, DALI 1.28.0
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 3258.83Tokens/s 3268.94
bert-99F1: 89.9653Queries/s 9060.77Samples/s 11576.0
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 2.1963Samples/s 2.4685
retinanetmAP: 37.1745Queries/s 2101.87Samples/s 2349.22
resnetacc: 75.6954Queries/s 102012.0Samples/s 113422.0
+ + + + \ No newline at end of file diff --git a/closed/Cisco/results/C245M8_L40Sx2_TRT/summary/README.md b/closed/Cisco/results/C245M8_L40Sx2_TRT/summary/README.md new file mode 100644 index 00000000..febf2a57 --- /dev/null +++ b/closed/Cisco/results/C245M8_L40Sx2_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

CISCO

+

Cisco UCS C245 M8 (2x L40S, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:CISCOAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen4 x16
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacity48 GB
accelerator_memory_configurationGDDR6
accelerator_model_nameNVIDIA L40S
accelerator_on-chip_memories
accelerators_per_node2

Processor and Memory Details

host_memory_capacity1 TB
host_memory_configuration16x 64GB HMCG94MEBRA109N
host_processor_caches
host_processor_core_count96
host_processor_frequency
host_processor_interconnect
host_processor_model_nameAMD EPYC 9684X 96-Core Processor
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count2x 100Gbe
host_networkingEthernet
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.3
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54.15, DALI 1.28.0
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 1725.54Tokens/s 1729.6
bert-99F1: 89.9653Queries/s 6741.2Samples/s 6844.6
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 1.25759Samples/s 1.35121
retinanetmAP: 37.1745Queries/s 1600.42Samples/s 1655.57
resnetacc: 75.6954Queries/s 90612.1Samples/s 86700.2
diff --git a/closed/Cisco/results/C245M8_L40Sx2_TRT/summary/summary.html b/closed/Cisco/results/C245M8_L40Sx2_TRT/summary/summary.html new file mode 100644 index 00000000..aa82a368 --- /dev/null +++ b/closed/Cisco/results/C245M8_L40Sx2_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

CISCO

+

Cisco UCS C245 M8 (2x L40S, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:CISCOAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen4 x16
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacity48 GB
accelerator_memory_configurationGDDR6
accelerator_model_nameNVIDIA L40S
accelerator_on-chip_memories
accelerators_per_node2

Processor and Memory Details

host_memory_capacity1 TB
host_memory_configuration16x 64GB HMCG94MEBRA109N
host_processor_caches
host_processor_core_count96
host_processor_frequency
host_processor_interconnect
host_processor_model_nameAMD EPYC 9684X 96-Core Processor
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count2x 100Gbe
host_networkingEthernet
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.3
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54.15, DALI 1.28.0
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 1725.54Tokens/s 1729.6
bert-99F1: 89.9653Queries/s 6741.2Samples/s 6844.6
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 1.25759Samples/s 1.35121
retinanetmAP: 37.1745Queries/s 1600.42Samples/s 1655.57
resnetacc: 75.6954Queries/s 90612.1Samples/s 86700.2
+ + + + \ No newline at end of file diff --git a/closed/Cisco/results/X210M7-1-node-2S-EMR-PyTorch/summary/README.md b/closed/Cisco/results/X210M7-1-node-2S-EMR-PyTorch/summary/README.md new file mode 100644 index 00000000..34ffa4f2 --- /dev/null +++ b/closed/Cisco/results/X210M7-1-node-2S-EMR-PyTorch/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Cisco

+

X210M7-1-node-2S-EMR-PyTorch

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:CiscoAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerators_per_node0
accelerator_model_nameN/A
accelerator_frequencyN/A
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topologyN/A
accelerator_memory_capacityN/A
accelerator_memory_configurationN/A
accelerator_on-chip_memoriesN/A

Processor and Memory Details

host_processor_model_nameINTEL(R) XEON(R) PLATINUM 8592+
host_processors_per_node2
host_processor_core_count64
host_processor_frequencyN/A
host_processor_cachesN/A
host_memory_configuration8 slots / 64GB each / per socket
host_memory_capacity1024GB
host_processor_interconnectN/A

Other Hardware Details

coolingAir
hw_notesCisco UCS X210 M7

Network and Interconnect Details

host_networkingEthernet Controller / 100G
host_networking_topologyN/A
host_network_card_count2

Software Details

frameworkPyTorch
operating_systemUbuntu 22.04.3 LTS
other_software_stack5.15.0-102-generic
sw_notesN/A
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 113.739Tokens/s 206.119
resnetacc: 75.6954Queries/s 22501.8Samples/s 25482.9
3d-unet-99.9DICE: 0.8608Samples/s 1.93062
diff --git a/closed/Cisco/results/X210M7-1-node-2S-EMR-PyTorch/summary/summary.html b/closed/Cisco/results/X210M7-1-node-2S-EMR-PyTorch/summary/summary.html new file mode 100644 index 00000000..44aa8707 --- /dev/null +++ b/closed/Cisco/results/X210M7-1-node-2S-EMR-PyTorch/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Cisco

+

X210M7-1-node-2S-EMR-PyTorch

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:CiscoAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerators_per_node0
accelerator_model_nameN/A
accelerator_frequencyN/A
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topologyN/A
accelerator_memory_capacityN/A
accelerator_memory_configurationN/A
accelerator_on-chip_memoriesN/A

Processor and Memory Details

host_processor_model_nameINTEL(R) XEON(R) PLATINUM 8592+
host_processors_per_node2
host_processor_core_count64
host_processor_frequencyN/A
host_processor_cachesN/A
host_memory_configuration8 slots / 64GB each / per socket
host_memory_capacity1024GB
host_processor_interconnectN/A

Other Hardware Details

coolingAir
hw_notesCisco UCS X210 M7

Network and Interconnect Details

host_networkingEthernet Controller / 100G
host_networking_topologyN/A
host_network_card_count2

Software Details

frameworkPyTorch
operating_systemUbuntu 22.04.3 LTS
other_software_stack5.15.0-102-generic
sw_notesN/A
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 113.739Tokens/s 206.119
resnetacc: 75.6954Queries/s 22501.8Samples/s 25482.9
3d-unet-99.9DICE: 0.8608Samples/s 1.93062
+ + + + \ No newline at end of file diff --git a/closed/Cisco/results/X210c_L40SX2_TRT/summary/README.md b/closed/Cisco/results/X210c_L40SX2_TRT/summary/README.md new file mode 100644 index 00000000..9d2b4d40 --- /dev/null +++ b/closed/Cisco/results/X210c_L40SX2_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

CISCO

+

Cisco UCS X210c M7 (2x L40S, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:CISCOAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen4 x16
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacity48 GB
accelerator_memory_configurationGDDR6
accelerator_model_nameNVIDIA L40S
accelerator_on-chip_memories
accelerators_per_node2

Processor and Memory Details

host_memory_capacity1 TB
host_memory_configuration16x 64GB HMCG94MEBRA109N
host_processor_caches
host_processor_core_count32
host_processor_frequency
host_processor_interconnect
host_processor_model_nameINTEL(R) XEON(R) PLATINUM 8562Y+
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count2x 100Gbe
host_networkingEthernet
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54.15, DALI 1.28.0
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 1725.98Tokens/s 1746.46
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 1725.98Tokens/s 1746.46
bert-99F1: 89.9653Queries/s 6781.43Samples/s 6590.32
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 1.26746Samples/s 1.37501
retinanetmAP: 37.1745Queries/s 1600.42Samples/s 1648.81
diff --git a/closed/Cisco/results/X210c_L40SX2_TRT/summary/summary.html b/closed/Cisco/results/X210c_L40SX2_TRT/summary/summary.html new file mode 100644 index 00000000..f4315747 --- /dev/null +++ b/closed/Cisco/results/X210c_L40SX2_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

CISCO

+

Cisco UCS X210c M7 (2x L40S, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:CISCOAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen4 x16
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacity48 GB
accelerator_memory_configurationGDDR6
accelerator_model_nameNVIDIA L40S
accelerator_on-chip_memories
accelerators_per_node2

Processor and Memory Details

host_memory_capacity1 TB
host_memory_configuration16x 64GB HMCG94MEBRA109N
host_processor_caches
host_processor_core_count32
host_processor_frequency
host_processor_interconnect
host_processor_model_nameINTEL(R) XEON(R) PLATINUM 8562Y+
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count2x 100Gbe
host_networkingEthernet
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54.15, DALI 1.28.0
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 1725.98Tokens/s 1746.46
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 1725.98Tokens/s 1746.46
bert-99F1: 89.9653Queries/s 6781.43Samples/s 6590.32
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 1.26746Samples/s 1.37501
retinanetmAP: 37.1745Queries/s 1600.42Samples/s 1648.81
+ + + + \ No newline at end of file diff --git a/closed/ConnectTechInc/results/Orin_TRT/summary/README.md b/closed/ConnectTechInc/results/Orin_TRT/summary/README.md new file mode 100644 index 00000000..585f40f5 --- /dev/null +++ b/closed/ConnectTechInc/results/Orin_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

ConnectTechInc

+

NVIDIA Jetson AGX Orin 64G (TensorRT) + CTI Forge Carrier (AGX201)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:ConnectTechIncAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacityShared with host
accelerator_memory_configurationLPDDR5
accelerator_model_nameNVIDIA Jetson AGX Orin 64G
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity64 GB
host_memory_configuration64GB 256-bit LPDDR5
host_processor_caches
host_processor_core_count12
host_processor_frequency
host_processor_interconnect
host_processor_model_name12-core ARM Cortex-A78AE CPU
host_processors_per_node1

Other Hardware Details

coolingActive Heatsink (12V fan)
disk_controllerseMMC 5.1
disk_driveseMMC 5.1
hw_notesCTI Forge Carrier for AGX Orin (AGX201) us used as the carrier board
other_hardware
power_management
power_supply_detailsMean Well 252W Adapter (GST280A12-C6P)
power_supply_quantity_and_rating_watts252W

Network and Interconnect Details

host_network_card_count1 Integrated
host_networkingGig Ethernet
host_networking_topology802.3 Cat6 RJ45 Copper
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkJetpack 6.0, TensorRT 10.1, CUDA 12.2
management_firmware_version
nics_enabled_firmware
operating_systemJetson r36.3.0 L4T
other_software_stackJetpack 6.0, TensorRT 10.1, CUDA 12.2, cuDNN 8.9.4
sw_notesUsing default kernel paging size
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 64.0078
diff --git a/closed/ConnectTechInc/results/Orin_TRT/summary/summary.html b/closed/ConnectTechInc/results/Orin_TRT/summary/summary.html new file mode 100644 index 00000000..f17e837f --- /dev/null +++ b/closed/ConnectTechInc/results/Orin_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

ConnectTechInc

+

NVIDIA Jetson AGX Orin 64G (TensorRT) + CTI Forge Carrier (AGX201)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:ConnectTechIncAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacityShared with host
accelerator_memory_configurationLPDDR5
accelerator_model_nameNVIDIA Jetson AGX Orin 64G
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity64 GB
host_memory_configuration64GB 256-bit LPDDR5
host_processor_caches
host_processor_core_count12
host_processor_frequency
host_processor_interconnect
host_processor_model_name12-core ARM Cortex-A78AE CPU
host_processors_per_node1

Other Hardware Details

coolingActive Heatsink (12V fan)
disk_controllerseMMC 5.1
disk_driveseMMC 5.1
hw_notesCTI Forge Carrier for AGX Orin (AGX201) us used as the carrier board
other_hardware
power_management
power_supply_detailsMean Well 252W Adapter (GST280A12-C6P)
power_supply_quantity_and_rating_watts252W

Network and Interconnect Details

host_network_card_count1 Integrated
host_networkingGig Ethernet
host_networking_topology802.3 Cat6 RJ45 Copper
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkJetpack 6.0, TensorRT 10.1, CUDA 12.2
management_firmware_version
nics_enabled_firmware
operating_systemJetson r36.3.0 L4T
other_software_stackJetpack 6.0, TensorRT 10.1, CUDA 12.2, cuDNN 8.9.4
sw_notesUsing default kernel paging size
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 64.0078
+ + + + \ No newline at end of file diff --git a/closed/Dell/results/1-node-2S-EMR-PyTorch/summary/README.md b/closed/Dell/results/1-node-2S-EMR-PyTorch/summary/README.md new file mode 100644 index 00000000..fb180aca --- /dev/null +++ b/closed/Dell/results/1-node-2S-EMR-PyTorch/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Dell

+

Dell PowerEdge R760

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:DellAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerators_per_node0
accelerator_model_nameN/A
accelerator_frequencyN/A
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacityN/A
accelerator_memory_configurationN/A
accelerator_on-chip_memoriesN/A

Processor and Memory Details

host_processor_model_nameINTEL(R) XEON(R) PLATINUM 8592+
host_processors_per_node2
host_processor_core_count64
host_processor_frequency
host_processor_caches
host_memory_configuration8 slots / 96GB each / per socket
host_memory_capacity1536GB
host_processor_interconnect

Other Hardware Details

coolingAir
hw_notes

Network and Interconnect Details

host_networkingEthernet
host_networking_topologyMesh
host_network_card_count1

Software Details

frameworkPyTorch
operating_systemUbuntu 24.04
other_software_stack6.8.0-36-generic
sw_notesINT4 for GPT-J, and INT8 for all other models
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 113.715Tokens/s 248.601
bert-99F1: 89.9653Queries/s 1321.72Samples/s 1685.75
dlrm-v2-99.9AUC: 80.2297Queries/s 9101.57Samples/s 9830.18
retinanetmAP: 37.1745Queries/s 285.452Samples/s 373.596
resnetacc: 75.6954Queries/s 22501.7Samples/s 25105.8
3d-unet-99.9DICE: 0.8608Samples/s 1.87014
diff --git a/closed/Dell/results/1-node-2S-EMR-PyTorch/summary/summary.html b/closed/Dell/results/1-node-2S-EMR-PyTorch/summary/summary.html new file mode 100644 index 00000000..95e2ac14 --- /dev/null +++ b/closed/Dell/results/1-node-2S-EMR-PyTorch/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Dell

+

Dell PowerEdge R760

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:DellAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerators_per_node0
accelerator_model_nameN/A
accelerator_frequencyN/A
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacityN/A
accelerator_memory_configurationN/A
accelerator_on-chip_memoriesN/A

Processor and Memory Details

host_processor_model_nameINTEL(R) XEON(R) PLATINUM 8592+
host_processors_per_node2
host_processor_core_count64
host_processor_frequency
host_processor_caches
host_memory_configuration8 slots / 96GB each / per socket
host_memory_capacity1536GB
host_processor_interconnect

Other Hardware Details

coolingAir
hw_notes

Network and Interconnect Details

host_networkingEthernet
host_networking_topologyMesh
host_network_card_count1

Software Details

frameworkPyTorch
operating_systemUbuntu 24.04
other_software_stack6.8.0-36-generic
sw_notesINT4 for GPT-J, and INT8 for all other models
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 113.715Tokens/s 248.601
bert-99F1: 89.9653Queries/s 1321.72Samples/s 1685.75
dlrm-v2-99.9AUC: 80.2297Queries/s 9101.57Samples/s 9830.18
retinanetmAP: 37.1745Queries/s 285.452Samples/s 373.596
resnetacc: 75.6954Queries/s 22501.7Samples/s 25105.8
3d-unet-99.9DICE: 0.8608Samples/s 1.87014
+ + + + \ No newline at end of file diff --git a/closed/Dell/results/R760_H100_PCIe_80GBx2_TRT/summary/README.md b/closed/Dell/results/R760_H100_PCIe_80GBx2_TRT/summary/README.md new file mode 100644 index 00000000..8c5e75b2 --- /dev/null +++ b/closed/Dell/results/R760_H100_PCIe_80GBx2_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Dell

+

Dell PowerEdge R760 (2xH100_PCIe_80GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:DellAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe 4.0 x16
accelerator_interconnectPCIe 4.0 x16
accelerator_interconnect_topology
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM2e
accelerator_model_nameNVIDIA H100-PCIe-80GB
accelerator_on-chip_memories
accelerators_per_node2

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB 4400 MT/s
host_processor_caches
host_processor_core_count120
host_processor_frequency
host_processor_interconnect
host_processor_model_nameINTEL(R) XEON(R) PLATINUM 8580
host_processors_per_node2

Other Hardware Details

coolingAir cooling
disk_controllersNVMe
disk_drivesSSD
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count2x 1GbE
host_networkingEthernet
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 2759.33Tokens/s 3317.08
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 2759.33Tokens/s 3317.08
bert-99F1: 89.9653Queries/s 9145.69Samples/s 11758.6
bert-99.9F1: 90.7831Queries/s 8252.92Samples/s 10138.8
retinanetmAP: 37.1745Queries/s 2201.86Samples/s 2314.75
resnetacc: 75.6954Queries/s 103012.0Samples/s 112968.0
3d-unet-99DICE: 0.8531Samples/s 9.31163
3d-unet-99.9DICE: 0.8608Samples/s 9.31163
diff --git a/closed/Dell/results/R760_H100_PCIe_80GBx2_TRT/summary/summary.html b/closed/Dell/results/R760_H100_PCIe_80GBx2_TRT/summary/summary.html new file mode 100644 index 00000000..bc6167c5 --- /dev/null +++ b/closed/Dell/results/R760_H100_PCIe_80GBx2_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Dell

+

Dell PowerEdge R760 (2xH100_PCIe_80GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:DellAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe 4.0 x16
accelerator_interconnectPCIe 4.0 x16
accelerator_interconnect_topology
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM2e
accelerator_model_nameNVIDIA H100-PCIe-80GB
accelerator_on-chip_memories
accelerators_per_node2

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB 4400 MT/s
host_processor_caches
host_processor_core_count120
host_processor_frequency
host_processor_interconnect
host_processor_model_nameINTEL(R) XEON(R) PLATINUM 8580
host_processors_per_node2

Other Hardware Details

coolingAir cooling
disk_controllersNVMe
disk_drivesSSD
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count2x 1GbE
host_networkingEthernet
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 2759.33Tokens/s 3317.08
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 2759.33Tokens/s 3317.08
bert-99F1: 89.9653Queries/s 9145.69Samples/s 11758.6
bert-99.9F1: 90.7831Queries/s 8252.92Samples/s 10138.8
retinanetmAP: 37.1745Queries/s 2201.86Samples/s 2314.75
resnetacc: 75.6954Queries/s 103012.0Samples/s 112968.0
3d-unet-99DICE: 0.8531Samples/s 9.31163
3d-unet-99.9DICE: 0.8608Samples/s 9.31163
+ + + + \ No newline at end of file diff --git a/closed/Dell/results/R760xa_H100NVL_PCIe_94GBx4_TRT/summary/README.md b/closed/Dell/results/R760xa_H100NVL_PCIe_94GBx4_TRT/summary/README.md new file mode 100644 index 00000000..1ec18e9d --- /dev/null +++ b/closed/Dell/results/R760xa_H100NVL_PCIe_94GBx4_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Dell

+

Dell PowerEdge R760xa (4xH100 NVL, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:DellAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe 4.0 x16
accelerator_interconnectPCIe 4.0 x16
accelerator_interconnect_topology
accelerator_memory_capacity94 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-NVL-94GB
accelerator_on-chip_memories
accelerators_per_node4

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB 4400 MT/s
host_processor_caches
host_processor_core_count120
host_processor_frequency
host_processor_interconnect
host_processor_model_nameINTEL(R) XEON(R) PLATINUM 8580
host_processors_per_node2

Other Hardware Details

coolingAir cooling
disk_controllersNVMe
disk_drivesSSD
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count2x 1GbE
host_networkingEthernet
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 6337.64Tokens/s 7191.41
bert-99F1: 89.9653Queries/s 20104.7Samples/s 24851.2
bert-99.9F1: 90.7831Queries/s 17600.1Samples/s 20729.6
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 4.87198Samples/s 6.13275
dlrm-v2-99AUC: 79.5069Queries/s 155014.0Samples/s 208212.0
dlrm-v2-99.9AUC: 80.2297Queries/s 118009.0Samples/s 123033.0
retinanetmAP: 37.1745Queries/s 4802.53Samples/s 5438.57
resnetacc: 75.6954Queries/s 220027.0Samples/s 250296.0
3d-unet-99DICE: 0.8531Samples/s 22.2878
3d-unet-99.9DICE: 0.8608Samples/s 22.2825
diff --git a/closed/Dell/results/R760xa_H100NVL_PCIe_94GBx4_TRT/summary/summary.html b/closed/Dell/results/R760xa_H100NVL_PCIe_94GBx4_TRT/summary/summary.html new file mode 100644 index 00000000..bfb25ed3 --- /dev/null +++ b/closed/Dell/results/R760xa_H100NVL_PCIe_94GBx4_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Dell

+

Dell PowerEdge R760xa (4xH100 NVL, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:DellAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe 4.0 x16
accelerator_interconnectPCIe 4.0 x16
accelerator_interconnect_topology
accelerator_memory_capacity94 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-NVL-94GB
accelerator_on-chip_memories
accelerators_per_node4

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB 4400 MT/s
host_processor_caches
host_processor_core_count120
host_processor_frequency
host_processor_interconnect
host_processor_model_nameINTEL(R) XEON(R) PLATINUM 8580
host_processors_per_node2

Other Hardware Details

coolingAir cooling
disk_controllersNVMe
disk_drivesSSD
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count2x 1GbE
host_networkingEthernet
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 6337.64Tokens/s 7191.41
bert-99F1: 89.9653Queries/s 20104.7Samples/s 24851.2
bert-99.9F1: 90.7831Queries/s 17600.1Samples/s 20729.6
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 4.87198Samples/s 6.13275
dlrm-v2-99AUC: 79.5069Queries/s 155014.0Samples/s 208212.0
dlrm-v2-99.9AUC: 80.2297Queries/s 118009.0Samples/s 123033.0
retinanetmAP: 37.1745Queries/s 4802.53Samples/s 5438.57
resnetacc: 75.6954Queries/s 220027.0Samples/s 250296.0
3d-unet-99DICE: 0.8531Samples/s 22.2878
3d-unet-99.9DICE: 0.8608Samples/s 22.2825
+ + + + \ No newline at end of file diff --git a/closed/Dell/results/R760xa_H100_PCIe_80GBx4_TRT/summary/README.md b/closed/Dell/results/R760xa_H100_PCIe_80GBx4_TRT/summary/README.md new file mode 100644 index 00000000..efd75021 --- /dev/null +++ b/closed/Dell/results/R760xa_H100_PCIe_80GBx4_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Dell

+

Dell PowerEdge R760xa (4x H100-PCIe-80GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:DellAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnectPCIe Gen5 x16
accelerator_interconnect_topology
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM2e
accelerator_model_nameNVIDIA H100-PCIe-80GB
accelerator_on-chip_memories
accelerators_per_node4

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB DDR5
host_processor_caches
host_processor_core_count64
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8480+
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_networkingEthernet
host_network_card_count2x 100GbE
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 20.04.4
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
bert-99F1: 89.9653Queries/s 17880.0Samples/s 23238.4
retinanetmAP: 37.1745Queries/s 4502.63Samples/s 4690.54
resnetacc: 75.6954Queries/s 206529.0Samples/s 196365.0
3d-unet-99DICE: 0.8531Samples/s 18.6131
3d-unet-99.9DICE: 0.8608Samples/s 18.6131
diff --git a/closed/Dell/results/R760xa_H100_PCIe_80GBx4_TRT/summary/summary.html b/closed/Dell/results/R760xa_H100_PCIe_80GBx4_TRT/summary/summary.html new file mode 100644 index 00000000..bd1c384e --- /dev/null +++ b/closed/Dell/results/R760xa_H100_PCIe_80GBx4_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Dell

+

Dell PowerEdge R760xa (4x H100-PCIe-80GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:DellAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnectPCIe Gen5 x16
accelerator_interconnect_topology
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM2e
accelerator_model_nameNVIDIA H100-PCIe-80GB
accelerator_on-chip_memories
accelerators_per_node4

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB DDR5
host_processor_caches
host_processor_core_count64
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8480+
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_networkingEthernet
host_network_card_count2x 100GbE
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 20.04.4
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
bert-99F1: 89.9653Queries/s 17880.0Samples/s 23238.4
retinanetmAP: 37.1745Queries/s 4502.63Samples/s 4690.54
resnetacc: 75.6954Queries/s 206529.0Samples/s 196365.0
3d-unet-99DICE: 0.8531Samples/s 18.6131
3d-unet-99.9DICE: 0.8608Samples/s 18.6131
+ + + + \ No newline at end of file diff --git a/closed/Dell/results/R760xa_L40Sx4_TRT/summary/README.md b/closed/Dell/results/R760xa_L40Sx4_TRT/summary/README.md new file mode 100644 index 00000000..916d62ca --- /dev/null +++ b/closed/Dell/results/R760xa_L40Sx4_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Dell

+

Dell PowerEdge R760xa (4x L40S, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:DellAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe 4.0 x16
accelerator_interconnectPCIe 4.0 x16
accelerator_interconnect_topology
accelerator_memory_capacity48 GB
accelerator_memory_configurationGDDR6
accelerator_model_nameNVIDIA L40S
accelerator_on-chip_memories
accelerators_per_node4

Processor and Memory Details

host_memory_capacity512 GB
host_memory_configuration16x 32GB 3200 MT/s
host_processor_caches
host_processor_core_count64
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz
host_processors_per_node2

Other Hardware Details

coolingair-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_networkingEthernet
host_network_card_count2x 1GbE
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemRocky Linux 9.1
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
bert-99F1: 89.9653Queries/s 13853.7Samples/s 13903.8
retinanetmAP: 37.1745Queries/s 3152.43Samples/s 3345.71
resnetacc: 75.6954Queries/s 181231.0Samples/s 180613.0
diff --git a/closed/Dell/results/R760xa_L40Sx4_TRT/summary/summary.html b/closed/Dell/results/R760xa_L40Sx4_TRT/summary/summary.html new file mode 100644 index 00000000..3135864e --- /dev/null +++ b/closed/Dell/results/R760xa_L40Sx4_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Dell

+

Dell PowerEdge R760xa (4x L40S, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:DellAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe 4.0 x16
accelerator_interconnectPCIe 4.0 x16
accelerator_interconnect_topology
accelerator_memory_capacity48 GB
accelerator_memory_configurationGDDR6
accelerator_model_nameNVIDIA L40S
accelerator_on-chip_memories
accelerators_per_node4

Processor and Memory Details

host_memory_capacity512 GB
host_memory_configuration16x 32GB 3200 MT/s
host_processor_caches
host_processor_core_count64
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz
host_processors_per_node2

Other Hardware Details

coolingair-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_networkingEthernet
host_network_card_count2x 1GbE
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemRocky Linux 9.1
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
bert-99F1: 89.9653Queries/s 13853.7Samples/s 13903.8
retinanetmAP: 37.1745Queries/s 3152.43Samples/s 3345.71
resnetacc: 75.6954Queries/s 181231.0Samples/s 180613.0
+ + + + \ No newline at end of file diff --git a/closed/Dell/results/XE8640_H100_SXM_80GBx4_TRT/summary/README.md b/closed/Dell/results/XE8640_H100_SXM_80GBx4_TRT/summary/README.md new file mode 100644 index 00000000..39501cca --- /dev/null +++ b/closed/Dell/results/XE8640_H100_SXM_80GBx4_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Dell

+

Dell PowerEdge XE8640 (4x H100-SXM-80GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:DellAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect6x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-SXM-80GB
accelerator_on-chip_memories
accelerators_per_node4

Processor and Memory Details

host_memory_capacity1 TB
host_memory_configuration16x 64GB DDR5
host_processor_caches
host_processor_core_count96
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8468
host_processors_per_node2

Other Hardware Details

coolingLiquid Assisted Air-Cooled (LAAC)
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count4x 400Gb Infiniband
host_networkingInfiniband; Data bandwidth for GPU-PCIe: 252.06GB/s; PCIe-NIC: 200GB/s
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 9.3.0, CUDA 12.2
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 20.04.6
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 9522.2Tokens/s 10699.6
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 9522.2Tokens/s 10699.6
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 9990.24Tokens/s 9966.79
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 9990.24Tokens/s 9966.79
bert-99F1: 89.9653Queries/s 28667.8Samples/s 35524.8
bert-99.9F1: 90.7831Queries/s 25392.6Samples/s 31392.5
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 7.85802Samples/s 8.22546
retinanetmAP: 37.1745Queries/s 6791.24Samples/s 7195.55
resnetacc: 75.6954Queries/s 310333.0Samples/s 356320.0
3d-unet-99DICE: 0.8531Samples/s 25.6625
3d-unet-99.9DICE: 0.8608Samples/s 25.6625
diff --git a/closed/Dell/results/XE8640_H100_SXM_80GBx4_TRT/summary/summary.html b/closed/Dell/results/XE8640_H100_SXM_80GBx4_TRT/summary/summary.html new file mode 100644 index 00000000..9a405d4e --- /dev/null +++ b/closed/Dell/results/XE8640_H100_SXM_80GBx4_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Dell

+

Dell PowerEdge XE8640 (4x H100-SXM-80GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:DellAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect6x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-SXM-80GB
accelerator_on-chip_memories
accelerators_per_node4

Processor and Memory Details

host_memory_capacity1 TB
host_memory_configuration16x 64GB DDR5
host_processor_caches
host_processor_core_count96
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8468
host_processors_per_node2

Other Hardware Details

coolingLiquid Assisted Air-Cooled (LAAC)
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count4x 400Gb Infiniband
host_networkingInfiniband; Data bandwidth for GPU-PCIe: 252.06GB/s; PCIe-NIC: 200GB/s
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 9.3.0, CUDA 12.2
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 20.04.6
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 9522.2Tokens/s 10699.6
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 9522.2Tokens/s 10699.6
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 9990.24Tokens/s 9966.79
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 9990.24Tokens/s 9966.79
bert-99F1: 89.9653Queries/s 28667.8Samples/s 35524.8
bert-99.9F1: 90.7831Queries/s 25392.6Samples/s 31392.5
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 7.85802Samples/s 8.22546
retinanetmAP: 37.1745Queries/s 6791.24Samples/s 7195.55
resnetacc: 75.6954Queries/s 310333.0Samples/s 356320.0
3d-unet-99DICE: 0.8531Samples/s 25.6625
3d-unet-99.9DICE: 0.8608Samples/s 25.6625
+ + + + \ No newline at end of file diff --git a/closed/Dell/results/XE9640_H100_SXM_80GBx4_TRT/summary/README.md b/closed/Dell/results/XE9640_H100_SXM_80GBx4_TRT/summary/README.md new file mode 100644 index 00000000..34a22948 --- /dev/null +++ b/closed/Dell/results/XE9640_H100_SXM_80GBx4_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Dell

+

Dell PowerEdge XE9640 (4x H100-SXM-80GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:DellAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect6x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-SXM-80GB
accelerator_on-chip_memories
accelerators_per_node4

Processor and Memory Details

host_memory_capacity1 TB
host_memory_configuration16x 64GB DDR5
host_processor_caches
host_processor_core_count96
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8468
host_processors_per_node2

Other Hardware Details

coolingLiquid Cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count4x 400Gb Infiniband
host_networkingInfiniband; Data bandwidth for GPU-PCIe: 252.06GB/s; PCIe-NIC: 200GB/s
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 9.3.0, CUDA 12.2
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 20.04.6
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 8937.82Tokens/s 10594.1
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 8937.82Tokens/s 10594.1
bert-99F1: 89.9653Queries/s 28348.3Samples/s 36051.2
bert-99.9F1: 90.7831Queries/s 24854.8Samples/s 31412.4
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 6.36178Samples/s 8.31658
retinanetmAP: 37.1745Queries/s 6738.09Samples/s 7149.42
diff --git a/closed/Dell/results/XE9640_H100_SXM_80GBx4_TRT/summary/summary.html b/closed/Dell/results/XE9640_H100_SXM_80GBx4_TRT/summary/summary.html new file mode 100644 index 00000000..5f2badbe --- /dev/null +++ b/closed/Dell/results/XE9640_H100_SXM_80GBx4_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Dell

+

Dell PowerEdge XE9640 (4x H100-SXM-80GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:DellAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect6x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-SXM-80GB
accelerator_on-chip_memories
accelerators_per_node4

Processor and Memory Details

host_memory_capacity1 TB
host_memory_configuration16x 64GB DDR5
host_processor_caches
host_processor_core_count96
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8468
host_processors_per_node2

Other Hardware Details

coolingLiquid Cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count4x 400Gb Infiniband
host_networkingInfiniband; Data bandwidth for GPU-PCIe: 252.06GB/s; PCIe-NIC: 200GB/s
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 9.3.0, CUDA 12.2
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 20.04.6
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 8937.82Tokens/s 10594.1
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 8937.82Tokens/s 10594.1
bert-99F1: 89.9653Queries/s 28348.3Samples/s 36051.2
bert-99.9F1: 90.7831Queries/s 24854.8Samples/s 31412.4
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 6.36178Samples/s 8.31658
retinanetmAP: 37.1745Queries/s 6738.09Samples/s 7149.42
+ + + + \ No newline at end of file diff --git a/closed/Dell/results/XE9680_H100_SXM_80GBx8_TRT/summary/README.md b/closed/Dell/results/XE9680_H100_SXM_80GBx8_TRT/summary/README.md new file mode 100644 index 00000000..1af63c1c --- /dev/null +++ b/closed/Dell/results/XE9680_H100_SXM_80GBx8_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Dell

+

Dell PowerEdge XE9680 (8x H100-SXM-80GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:DellAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnectNVLINK
accelerator_interconnect_topologyNVLINK Switch
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-SXM-80GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB DDR5
host_processor_caches
host_processor_core_count48
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8468
host_processors_per_node2

Other Hardware Details

coolingair-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_networkingInfiniband:Data bandwidth for GPU-PCIe: 504GB/s; PCIe-NIC: 500GB/s
host_networking_topologyEthernet/Infiniband on switching network
host_network_card_count10x 400Gb Infiniband
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.90.07
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 21589.2Tokens/s 24086.7
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 21589.2Tokens/s 24086.7
bert-99F1: 89.9653Queries/s 56011.6Samples/s 70594.0
bert-99.9F1: 90.7831Queries/s 49611.8Samples/s 61736.4
resnetacc: 75.6954Queries/s 584207.0Samples/s 709849.0
3d-unet-99DICE: 0.8531Samples/s 51.8056
3d-unet-99.9DICE: 0.8608Samples/s 51.8056
diff --git a/closed/Dell/results/XE9680_H100_SXM_80GBx8_TRT/summary/summary.html b/closed/Dell/results/XE9680_H100_SXM_80GBx8_TRT/summary/summary.html new file mode 100644 index 00000000..5536bfc0 --- /dev/null +++ b/closed/Dell/results/XE9680_H100_SXM_80GBx8_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Dell

+

Dell PowerEdge XE9680 (8x H100-SXM-80GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:DellAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnectNVLINK
accelerator_interconnect_topologyNVLINK Switch
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-SXM-80GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB DDR5
host_processor_caches
host_processor_core_count48
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8468
host_processors_per_node2

Other Hardware Details

coolingair-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_networkingInfiniband:Data bandwidth for GPU-PCIe: 504GB/s; PCIe-NIC: 500GB/s
host_networking_topologyEthernet/Infiniband on switching network
host_network_card_count10x 400Gb Infiniband
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.90.07
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 21589.2Tokens/s 24086.7
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 21589.2Tokens/s 24086.7
bert-99F1: 89.9653Queries/s 56011.6Samples/s 70594.0
bert-99.9F1: 90.7831Queries/s 49611.8Samples/s 61736.4
resnetacc: 75.6954Queries/s 584207.0Samples/s 709849.0
3d-unet-99DICE: 0.8531Samples/s 51.8056
3d-unet-99.9DICE: 0.8608Samples/s 51.8056
+ + + + \ No newline at end of file diff --git a/closed/Dell/results/XE9680_H200_SXM_141GBx8_TRT/summary/README.md b/closed/Dell/results/XE9680_H200_SXM_141GBx8_TRT/summary/README.md new file mode 100644 index 00000000..fe39d9d6 --- /dev/null +++ b/closed/Dell/results/XE9680_H200_SXM_141GBx8_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Dell

+

Dell PowerEdge XE9680 (8x H200-SXM-141GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:DellAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnectNVLINK
accelerator_interconnect_topologyNVLINK Switch
accelerator_memory_capacity141 GB
accelerator_memory_configurationHBM3e
accelerator_model_nameNVIDIA H200-SXM-141GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB DDR5
host_processor_caches
host_processor_core_count52
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8470
host_processors_per_node2

Other Hardware Details

coolingair-cooled
disk_controllersNVMe
disk_drives
hw_notesH200 TGP 700W
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_networkingInfiniband:Data bandwidth for GPU-PCIe: 504GB/s; PCIe-NIC: 500GB/s
host_networking_topologyEthernet/Infiniband on switching network
host_network_card_count10x 400Gb Infiniband
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.3
other_software_stackTensorRT 10.2.0, CUDA 12.5, cuDNN 8.9.7, Driver 555.42.06
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 29739.9Tokens/s 32124.3
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 29739.9Tokens/s 32124.3
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 20139.0Tokens/s 20238.4
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 20139.0Tokens/s 20238.4
bert-99F1: 89.9653Queries/s 58091.3Samples/s 73791.0
bert-99.9F1: 90.7831Queries/s 51213.5Samples/s 65322.6
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 16.6945Samples/s 17.6742
retinanetmAP: 37.1745Queries/s 13603.8Samples/s 14760.1
resnetacc: 75.6954Queries/s 630226.0Samples/s 768235.0
3d-unet-99DICE: 0.8531Samples/s 54.6196
3d-unet-99.9DICE: 0.8608Samples/s 54.6196
diff --git a/closed/Dell/results/XE9680_H200_SXM_141GBx8_TRT/summary/summary.html b/closed/Dell/results/XE9680_H200_SXM_141GBx8_TRT/summary/summary.html new file mode 100644 index 00000000..fd43e51b --- /dev/null +++ b/closed/Dell/results/XE9680_H200_SXM_141GBx8_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Dell

+

Dell PowerEdge XE9680 (8x H200-SXM-141GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:DellAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnectNVLINK
accelerator_interconnect_topologyNVLINK Switch
accelerator_memory_capacity141 GB
accelerator_memory_configurationHBM3e
accelerator_model_nameNVIDIA H200-SXM-141GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB DDR5
host_processor_caches
host_processor_core_count52
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8470
host_processors_per_node2

Other Hardware Details

coolingair-cooled
disk_controllersNVMe
disk_drives
hw_notesH200 TGP 700W
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_networkingInfiniband:Data bandwidth for GPU-PCIe: 504GB/s; PCIe-NIC: 500GB/s
host_networking_topologyEthernet/Infiniband on switching network
host_network_card_count10x 400Gb Infiniband
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.3
other_software_stackTensorRT 10.2.0, CUDA 12.5, cuDNN 8.9.7, Driver 555.42.06
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 29739.9Tokens/s 32124.3
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 29739.9Tokens/s 32124.3
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 20139.0Tokens/s 20238.4
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 20139.0Tokens/s 20238.4
bert-99F1: 89.9653Queries/s 58091.3Samples/s 73791.0
bert-99.9F1: 90.7831Queries/s 51213.5Samples/s 65322.6
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 16.6945Samples/s 17.6742
retinanetmAP: 37.1745Queries/s 13603.8Samples/s 14760.1
resnetacc: 75.6954Queries/s 630226.0Samples/s 768235.0
3d-unet-99DICE: 0.8531Samples/s 54.6196
3d-unet-99.9DICE: 0.8608Samples/s 54.6196
+ + + + \ No newline at end of file diff --git a/closed/Dell/results/XE9680_MI300X_192GBx8/summary/README.md b/closed/Dell/results/XE9680_MI300X_192GBx8/summary/README.md new file mode 100644 index 00000000..8fe64bdd --- /dev/null +++ b/closed/Dell/results/XE9680_MI300X_192GBx8/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Dell

+

Dell PowerEdge XE9680 (8x MI300X_192GB, vLLM)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:DellAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnectXGMI
accelerator_interconnect_topologyMesh
accelerator_memory_capacity192 GB
accelerator_memory_configurationHBM3
accelerator_model_nameAMD MI300X-NPS1-SPX-192GB-750W
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB DDR5
host_processor_caches
host_processor_core_count40
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8460Y+
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_networkingInfiniband:Data bandwidth for GPU-PCIe: 504GB/s; PCIe-NIC: 500GB/s
host_networking_topologyEthernet/Infiniband on switching network
host_network_card_count10x 400Gb Infiniband
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkvLLM 0.4.3+rocm614, PyTorch 2.3.0, ROCm 6.1.0
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackDriver version TBD
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 19886.1Tokens/s 22677.6
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 19886.1Tokens/s 22677.6
diff --git a/closed/Dell/results/XE9680_MI300X_192GBx8/summary/summary.html b/closed/Dell/results/XE9680_MI300X_192GBx8/summary/summary.html new file mode 100644 index 00000000..2b81765d --- /dev/null +++ b/closed/Dell/results/XE9680_MI300X_192GBx8/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Dell

+

Dell PowerEdge XE9680 (8x MI300X_192GB, vLLM)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:DellAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnectXGMI
accelerator_interconnect_topologyMesh
accelerator_memory_capacity192 GB
accelerator_memory_configurationHBM3
accelerator_model_nameAMD MI300X-NPS1-SPX-192GB-750W
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB DDR5
host_processor_caches
host_processor_core_count40
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8460Y+
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_networkingInfiniband:Data bandwidth for GPU-PCIe: 504GB/s; PCIe-NIC: 500GB/s
host_networking_topologyEthernet/Infiniband on switching network
host_network_card_count10x 400Gb Infiniband
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkvLLM 0.4.3+rocm614, PyTorch 2.3.0, ROCm 6.1.0
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackDriver version TBD
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 19886.1Tokens/s 22677.6
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 19886.1Tokens/s 22677.6
+ + + + \ No newline at end of file diff --git a/closed/Dell/results/XR8620_L4x1_TRT/summary/README.md b/closed/Dell/results/XR8620_L4x1_TRT/summary/README.md new file mode 100644 index 00000000..036208c2 --- /dev/null +++ b/closed/Dell/results/XR8620_L4x1_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Dell

+

Dell PowerEdge XR8620t (1x L4, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:DellAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnectPCIe Gen5 x16
accelerator_interconnect_topology
accelerator_memory_capacity24 GB
accelerator_memory_configurationGDDR6
accelerator_model_nameNVIDIA L4
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity251GB
host_memory_configuration8x 32GB DDR5
host_processor_caches
host_processor_core_count32
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Gold 6433N
host_processors_per_node1

Other Hardware Details

coolingair-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_networkingEthernet
host_networking_topologyN/A
host_network_card_count4x 25GbE
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.3
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
retinanetmAP: 37.1745Samples/s 222.619
diff --git a/closed/Dell/results/XR8620_L4x1_TRT/summary/summary.html b/closed/Dell/results/XR8620_L4x1_TRT/summary/summary.html new file mode 100644 index 00000000..e2657536 --- /dev/null +++ b/closed/Dell/results/XR8620_L4x1_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Dell

+

Dell PowerEdge XR8620t (1x L4, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:DellAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnectPCIe Gen5 x16
accelerator_interconnect_topology
accelerator_memory_capacity24 GB
accelerator_memory_configurationGDDR6
accelerator_model_nameNVIDIA L4
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity251GB
host_memory_configuration8x 32GB DDR5
host_processor_caches
host_processor_core_count32
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Gold 6433N
host_processors_per_node1

Other Hardware Details

coolingair-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_networkingEthernet
host_networking_topologyN/A
host_network_card_count4x 25GbE
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.3
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
retinanetmAP: 37.1745Samples/s 222.619
+ + + + \ No newline at end of file diff --git a/closed/Fujitsu/results/CDI_L40Sx16_TRT/summary/README.md b/closed/Fujitsu/results/CDI_L40Sx16_TRT/summary/README.md new file mode 100644 index 00000000..4a9bc729 --- /dev/null +++ b/closed/Fujitsu/results/CDI_L40Sx16_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Fujitsu

+

PRIMERGY CDI (16x L40S, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:FujitsuAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen4 x16
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacity48 GB
accelerator_memory_configurationGDDR6
accelerator_model_nameNVIDIA L40S
accelerator_on-chip_memories
accelerators_per_node16

Processor and Memory Details

host_memory_capacity1024GB
host_memory_configuration16x 64GB M321R8GA0BB0-CQKZJ
host_processor_cachesL1d:3MiB, L1i:2MiB, L2:128MiB, L3:120MiB
host_processor_core_count32
host_processor_frequency3.4GHz
host_processor_interconnectUPI
host_processor_model_nameIntel(R) Xeon(R) Gold 6454S
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count5x1Gbe, 1x200Gb
host_networkingIntel I210, I350x4 (Gib Eth), Mellanox MT28908 ConnectX-6 (IB 200Gib)
host_networking_topologyEthernet on switching network; Infiniband on peer to peer network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 8.6.3, CUDA 12.4, cuDNN 9.1.0.70, Driver 550.90.07
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
retinanetmAP: 37.1745Queries/s 11948.7Samples/s 12048.3
3d-unet-99DICE: 0.8531Samples/s 61.09
diff --git a/closed/Fujitsu/results/CDI_L40Sx16_TRT/summary/summary.html b/closed/Fujitsu/results/CDI_L40Sx16_TRT/summary/summary.html new file mode 100644 index 00000000..276fe5e9 --- /dev/null +++ b/closed/Fujitsu/results/CDI_L40Sx16_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Fujitsu

+

PRIMERGY CDI (16x L40S, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:FujitsuAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen4 x16
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacity48 GB
accelerator_memory_configurationGDDR6
accelerator_model_nameNVIDIA L40S
accelerator_on-chip_memories
accelerators_per_node16

Processor and Memory Details

host_memory_capacity1024GB
host_memory_configuration16x 64GB M321R8GA0BB0-CQKZJ
host_processor_cachesL1d:3MiB, L1i:2MiB, L2:128MiB, L3:120MiB
host_processor_core_count32
host_processor_frequency3.4GHz
host_processor_interconnectUPI
host_processor_model_nameIntel(R) Xeon(R) Gold 6454S
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count5x1Gbe, 1x200Gb
host_networkingIntel I210, I350x4 (Gib Eth), Mellanox MT28908 ConnectX-6 (IB 200Gib)
host_networking_topologyEthernet on switching network; Infiniband on peer to peer network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 8.6.3, CUDA 12.4, cuDNN 9.1.0.70, Driver 550.90.07
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
retinanetmAP: 37.1745Queries/s 11948.7Samples/s 12048.3
3d-unet-99DICE: 0.8531Samples/s 61.09
+ + + + \ No newline at end of file diff --git a/closed/Fujitsu/results/CDI_L40Sx8_TRT/summary/README.md b/closed/Fujitsu/results/CDI_L40Sx8_TRT/summary/README.md new file mode 100644 index 00000000..6b8aa9fd --- /dev/null +++ b/closed/Fujitsu/results/CDI_L40Sx8_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Fujitsu

+

PRIMERGY CDI (8x L40S, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:FujitsuAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen4 x16
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacity48 GB
accelerator_memory_configurationGDDR6
accelerator_model_nameNVIDIA L40S
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity1024GB
host_memory_configuration16x 64GB M321R8GA0BB0-CQKZJ
host_processor_cachesL1d:3MiB, L1i:2MiB, L2:128MiB, L3:120MiB
host_processor_core_count32
host_processor_frequency3.4GHz
host_processor_interconnectUPI
host_processor_model_nameIntel(R) Xeon(R) Gold 6454S
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count5x1Gbe, 1x200Gb
host_networkingIntel I210, I350x4 (Gib Eth), Mellanox MT28908 ConnectX-6 (IB 200Gib)
host_networking_topologyEthernet on switching network; Infiniband on peer to peer network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 8.6.3, CUDA 12.4, cuDNN 9.1.0.70, Driver 550.90.07
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 3218.55Tokens/s 3717.74
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 3218.55Tokens/s 3717.74
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 6903.45Tokens/s 6911.78
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 6903.45Tokens/s 6911.78
diff --git a/closed/Fujitsu/results/CDI_L40Sx8_TRT/summary/summary.html b/closed/Fujitsu/results/CDI_L40Sx8_TRT/summary/summary.html new file mode 100644 index 00000000..d60f69a7 --- /dev/null +++ b/closed/Fujitsu/results/CDI_L40Sx8_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Fujitsu

+

PRIMERGY CDI (8x L40S, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:FujitsuAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen4 x16
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacity48 GB
accelerator_memory_configurationGDDR6
accelerator_model_nameNVIDIA L40S
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity1024GB
host_memory_configuration16x 64GB M321R8GA0BB0-CQKZJ
host_processor_cachesL1d:3MiB, L1i:2MiB, L2:128MiB, L3:120MiB
host_processor_core_count32
host_processor_frequency3.4GHz
host_processor_interconnectUPI
host_processor_model_nameIntel(R) Xeon(R) Gold 6454S
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count5x1Gbe, 1x200Gb
host_networkingIntel I210, I350x4 (Gib Eth), Mellanox MT28908 ConnectX-6 (IB 200Gib)
host_networking_topologyEthernet on switching network; Infiniband on peer to peer network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 8.6.3, CUDA 12.4, cuDNN 9.1.0.70, Driver 550.90.07
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 3218.55Tokens/s 3717.74
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 3218.55Tokens/s 3717.74
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 6903.45Tokens/s 6911.78
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 6903.45Tokens/s 6911.78
+ + + + \ No newline at end of file diff --git a/closed/Fujitsu/results/GX2560M7_H100_SXM_80GBx4_TRT/summary/README.md b/closed/Fujitsu/results/GX2560M7_H100_SXM_80GBx4_TRT/summary/README.md new file mode 100644 index 00000000..599d762f --- /dev/null +++ b/closed/Fujitsu/results/GX2560M7_H100_SXM_80GBx4_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Fujitsu

+

GX2560M7_H100_SXM_80GBx4 (4x H100-SXM-80GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:FujitsuAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect6x 4th Gen NVLINK, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-SXM-80GB
accelerator_on-chip_memories
accelerators_per_node4

Processor and Memory Details

host_memory_capacity1 TB
host_memory_configuration32x 32GB DDR5
host_processor_caches
host_processor_core_count48
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8468
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count2x10Gb(ethernet), 2x200Gb (infiniband)
host_networkingEthernet, Infiniband
host_networking_topologyEthernet on switching network; Infiniband on peer to peer network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 20.04.4
other_software_stackTensorRT 9.3.0, CUDA 12.2, cuDNN 8.9.6, Driver 535.129.03, DALI 1.28.0
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 9493.99Tokens/s 10133.3
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 9493.99Tokens/s 10133.3
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 9602.79Tokens/s 9960.74
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 9602.79Tokens/s 9960.74
bert-99F1: 89.9653Queries/s 28605.3Samples/s 36110.8
bert-99.9F1: 90.7831Queries/s 25504.8Samples/s 31575.2
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 7.84386Samples/s 8.06901
dlrm-v2-99AUC: 79.5069Queries/s 293303.0Samples/s 303974.0
dlrm-v2-99.9AUC: 80.2297Queries/s 179024.0Samples/s 190162.0
retinanetmAP: 37.1745Queries/s 6801.47Samples/s 7041.1
resnetacc: 75.6954Queries/s 301304.0Samples/s 351603.0
3d-unet-99DICE: 0.8531Samples/s 25.6741
3d-unet-99.9DICE: 0.8608Samples/s 25.6741
diff --git a/closed/Fujitsu/results/GX2560M7_H100_SXM_80GBx4_TRT/summary/summary.html b/closed/Fujitsu/results/GX2560M7_H100_SXM_80GBx4_TRT/summary/summary.html new file mode 100644 index 00000000..6b51a75e --- /dev/null +++ b/closed/Fujitsu/results/GX2560M7_H100_SXM_80GBx4_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Fujitsu

+

GX2560M7_H100_SXM_80GBx4 (4x H100-SXM-80GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:FujitsuAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect6x 4th Gen NVLINK, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-SXM-80GB
accelerator_on-chip_memories
accelerators_per_node4

Processor and Memory Details

host_memory_capacity1 TB
host_memory_configuration32x 32GB DDR5
host_processor_caches
host_processor_core_count48
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8468
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count2x10Gb(ethernet), 2x200Gb (infiniband)
host_networkingEthernet, Infiniband
host_networking_topologyEthernet on switching network; Infiniband on peer to peer network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 20.04.4
other_software_stackTensorRT 9.3.0, CUDA 12.2, cuDNN 8.9.6, Driver 535.129.03, DALI 1.28.0
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 9493.99Tokens/s 10133.3
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 9493.99Tokens/s 10133.3
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 9602.79Tokens/s 9960.74
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 9602.79Tokens/s 9960.74
bert-99F1: 89.9653Queries/s 28605.3Samples/s 36110.8
bert-99.9F1: 90.7831Queries/s 25504.8Samples/s 31575.2
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 7.84386Samples/s 8.06901
dlrm-v2-99AUC: 79.5069Queries/s 293303.0Samples/s 303974.0
dlrm-v2-99.9AUC: 80.2297Queries/s 179024.0Samples/s 190162.0
retinanetmAP: 37.1745Queries/s 6801.47Samples/s 7041.1
resnetacc: 75.6954Queries/s 301304.0Samples/s 351603.0
3d-unet-99DICE: 0.8531Samples/s 25.6741
3d-unet-99.9DICE: 0.8608Samples/s 25.6741
+ + + + \ No newline at end of file diff --git a/closed/GigaComputing/results/GIGABYTE_G593_SD1_H200_SXM_141GBx8_TRT/summary/README.md b/closed/GigaComputing/results/GIGABYTE_G593_SD1_H200_SXM_141GBx8_TRT/summary/README.md new file mode 100644 index 00000000..6ba0f90d --- /dev/null +++ b/closed/GigaComputing/results/GIGABYTE_G593_SD1_H200_SXM_141GBx8_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

GigaComputing

+

GIGABYTE G593-SD1

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:GigaComputingAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity141 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H200-SXM-141GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB DDR5
host_processor_caches
host_processor_core_count56
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8480+
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notesH200 TGP 700W
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_countN/A
host_networkingN/A
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 29715.3Tokens/s 31263.8
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 29715.3Tokens/s 31263.8
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 19250.4Tokens/s 20041.3
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 19250.4Tokens/s 20041.3
bert-99F1: 89.9653Queries/s 58090.2Samples/s 73765.6
bert-99.9F1: 90.7831Queries/s 51213.6Samples/s 64368.8
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 16.5912Samples/s 17.3717
dlrm-v2-99AUC: 79.5069Queries/s 585209.0Samples/s 639512.0
dlrm-v2-99.9AUC: 80.2297Queries/s 370085.0Samples/s 394489.0
retinanetmAP: 37.1745Queries/s 14012.2Samples/s 14988.0
resnetacc: 75.6954Queries/s 681328.0Samples/s 757446.0
3d-unet-99DICE: 0.8531Samples/s 54.3608
3d-unet-99.9DICE: 0.8608Samples/s 54.3608
diff --git a/closed/GigaComputing/results/GIGABYTE_G593_SD1_H200_SXM_141GBx8_TRT/summary/summary.html b/closed/GigaComputing/results/GIGABYTE_G593_SD1_H200_SXM_141GBx8_TRT/summary/summary.html new file mode 100644 index 00000000..86bfa848 --- /dev/null +++ b/closed/GigaComputing/results/GIGABYTE_G593_SD1_H200_SXM_141GBx8_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

GigaComputing

+

GIGABYTE G593-SD1

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:GigaComputingAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity141 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H200-SXM-141GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB DDR5
host_processor_caches
host_processor_core_count56
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8480+
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notesH200 TGP 700W
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_countN/A
host_networkingN/A
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 29715.3Tokens/s 31263.8
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 29715.3Tokens/s 31263.8
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 19250.4Tokens/s 20041.3
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 19250.4Tokens/s 20041.3
bert-99F1: 89.9653Queries/s 58090.2Samples/s 73765.6
bert-99.9F1: 90.7831Queries/s 51213.6Samples/s 64368.8
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 16.5912Samples/s 17.3717
dlrm-v2-99AUC: 79.5069Queries/s 585209.0Samples/s 639512.0
dlrm-v2-99.9AUC: 80.2297Queries/s 370085.0Samples/s 394489.0
retinanetmAP: 37.1745Queries/s 14012.2Samples/s 14988.0
resnetacc: 75.6954Queries/s 681328.0Samples/s 757446.0
3d-unet-99DICE: 0.8531Samples/s 54.3608
3d-unet-99.9DICE: 0.8608Samples/s 54.3608
+ + + + \ No newline at end of file diff --git a/closed/Google/results/DGX-H100_H100-SXM-80GBx8_TRT/summary/README.md b/closed/Google/results/DGX-H100_H100-SXM-80GBx8_TRT/summary/README.md new file mode 100644 index 00000000..ef0d51dd --- /dev/null +++ b/closed/Google/results/DGX-H100_H100-SXM-80GBx8_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Google

+

NVIDIA DGX H100 (8x H100-SXM-80GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:GoogleAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen3 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-SXM-80GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB MTC40F2046S1RC48BA1
host_processor_caches
host_processor_core_count56
host_processor_frequency2.70GHz (Base Clock)
host_processor_interconnectIntel Ultra Path Interconnect (UPI)
host_processor_model_nameIntel(R) Xeon(R) Platinum 8481C
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count8+1
host_networkingMaximum network bandwidth speed: 1800 Gbps
host_networking_topologyEthernet on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 21588.3Tokens/s 24133.7
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 21588.3Tokens/s 24133.7
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 15.8185Samples/s 16.2613
dlrm-v2-99.9AUC: 80.2297Queries/s 340068.0Samples/s 375565.0
3d-unet-99DICE: 0.8531Samples/s 51.3841
3d-unet-99.9DICE: 0.8608Samples/s 51.3841
diff --git a/closed/Google/results/DGX-H100_H100-SXM-80GBx8_TRT/summary/summary.html b/closed/Google/results/DGX-H100_H100-SXM-80GBx8_TRT/summary/summary.html new file mode 100644 index 00000000..cad60640 --- /dev/null +++ b/closed/Google/results/DGX-H100_H100-SXM-80GBx8_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Google

+

NVIDIA DGX H100 (8x H100-SXM-80GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:GoogleAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen3 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-SXM-80GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB MTC40F2046S1RC48BA1
host_processor_caches
host_processor_core_count56
host_processor_frequency2.70GHz (Base Clock)
host_processor_interconnectIntel Ultra Path Interconnect (UPI)
host_processor_model_nameIntel(R) Xeon(R) Platinum 8481C
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count8+1
host_networkingMaximum network bandwidth speed: 1800 Gbps
host_networking_topologyEthernet on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 21588.3Tokens/s 24133.7
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 21588.3Tokens/s 24133.7
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 15.8185Samples/s 16.2613
dlrm-v2-99.9AUC: 80.2297Queries/s 340068.0Samples/s 375565.0
3d-unet-99DICE: 0.8531Samples/s 51.3841
3d-unet-99.9DICE: 0.8608Samples/s 51.3841
+ + + + \ No newline at end of file diff --git a/closed/Google/results/tpu_v5e_x4_flax/summary/README.md b/closed/Google/results/tpu_v5e_x4_flax/summary/README.md new file mode 100644 index 00000000..ae2651ea --- /dev/null +++ b/closed/Google/results/tpu_v5e_x4_flax/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Google

+

tpu-v5e-4

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:GoogleAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_host_interconnectPCIe Gen3 x16
accelerator_interconnectICI 1600 Gbps
accelerator_memory_capacity16 GB
accelerator_memory_configurationHBM2
accelerator_model_nameTPU v5e
accelerators_per_node4
accelerator_frequency
accelerator_interconnect_topology2D Torus
accelerator_on-chip_memories48 MiB

Processor and Memory Details

host_memory_capacity192 GB
host_memory_configurationTODO
host_processor_core_count112
host_processor_model_nameAMD EPYC 7B13
host_processors_per_node1
host_processor_cachesL1d: 1.8 MiB; L1i: 1.8 MiB; L2: 28 MiB; L3: 224 MiB
host_processor_frequency2200 MHz (base); 3500 MHz (turbo)
host_processor_interconnect

Other Hardware Details

coolingTODO
hw_notes

Network and Interconnect Details

host_network_card_countTODO
host_networking_topologyTODO
host_networkingTODO

Software Details

operating_systemLinux version 5.19.0-1030-gcp (buildd@bos03-amd64-050) (x86_64-linux-gnu-gcc-12 (Ubuntu 12.1.0-2ubuntu1~22.04) 12.1.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #32~22.04.1-Ubuntu SMP Thu Jul 13 09:36:23 UTC 2023
frameworkflax
other_software_stack{'JAX TPU runtime': 'flax==0.8.5, jax[tpu]==0.4.30'}
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 1.54554Samples/s 1.75335
diff --git a/closed/Google/results/tpu_v5e_x4_flax/summary/summary.html b/closed/Google/results/tpu_v5e_x4_flax/summary/summary.html new file mode 100644 index 00000000..0fa1531c --- /dev/null +++ b/closed/Google/results/tpu_v5e_x4_flax/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Google

+

tpu-v5e-4

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:GoogleAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_host_interconnectPCIe Gen3 x16
accelerator_interconnectICI 1600 Gbps
accelerator_memory_capacity16 GB
accelerator_memory_configurationHBM2
accelerator_model_nameTPU v5e
accelerators_per_node4
accelerator_frequency
accelerator_interconnect_topology2D Torus
accelerator_on-chip_memories48 MiB

Processor and Memory Details

host_memory_capacity192 GB
host_memory_configurationTODO
host_processor_core_count112
host_processor_model_nameAMD EPYC 7B13
host_processors_per_node1
host_processor_cachesL1d: 1.8 MiB; L1i: 1.8 MiB; L2: 28 MiB; L3: 224 MiB
host_processor_frequency2200 MHz (base); 3500 MHz (turbo)
host_processor_interconnect

Other Hardware Details

coolingTODO
hw_notes

Network and Interconnect Details

host_network_card_countTODO
host_networking_topologyTODO
host_networkingTODO

Software Details

operating_systemLinux version 5.19.0-1030-gcp (buildd@bos03-amd64-050) (x86_64-linux-gnu-gcc-12 (Ubuntu 12.1.0-2ubuntu1~22.04) 12.1.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #32~22.04.1-Ubuntu SMP Thu Jul 13 09:36:23 UTC 2023
frameworkflax
other_software_stack{'JAX TPU runtime': 'flax==0.8.5, jax[tpu]==0.4.30'}
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 1.54554Samples/s 1.75335
+ + + + \ No newline at end of file diff --git a/closed/Google/results/tpu_v6_x4_flax/summary/README.md b/closed/Google/results/tpu_v6_x4_flax/summary/README.md new file mode 100644 index 00000000..ce3b41ff --- /dev/null +++ b/closed/Google/results/tpu_v6_x4_flax/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Google

+

tpu-v6-4

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:GoogleAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_host_interconnectTODO
accelerator_interconnectTODO
accelerator_memory_capacity32 GB
accelerator_memory_configurationHBM3
accelerator_model_nameTPU v6
accelerators_per_node4
accelerator_frequency
accelerator_interconnect_topology
accelerator_on-chip_memories

Processor and Memory Details

host_memory_capacity720 GB
host_memory_configurationTODO
host_processor_core_count180
host_processor_model_nameAMD EPYC 9B14
host_processors_per_node1
host_processor_caches
host_processor_frequency
host_processor_interconnect

Other Hardware Details

coolingTODO
hw_notes

Network and Interconnect Details

host_network_card_countTODO
host_networking_topologyTODO
host_networkingTODO

Software Details

operating_systemLinux version 6.2.0-1019-gcp (buildd@lcy02-amd64-032) (x86_64-linux-gnu-gcc-12 (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #21~22.04.1-Ubuntu SMP Thu Nov 16 18:18:34 UTC 2023
frameworkflax
other_software_stack{'JAX TPU runtime': 'flax==0.8.5, jax[tpu]==0.4.30'}
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 4.48577Samples/s 5.43896
diff --git a/closed/Google/results/tpu_v6_x4_flax/summary/summary.html b/closed/Google/results/tpu_v6_x4_flax/summary/summary.html new file mode 100644 index 00000000..a1266965 --- /dev/null +++ b/closed/Google/results/tpu_v6_x4_flax/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Google

+

tpu-v6-4

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:GoogleAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_host_interconnectTODO
accelerator_interconnectTODO
accelerator_memory_capacity32 GB
accelerator_memory_configurationHBM3
accelerator_model_nameTPU v6
accelerators_per_node4
accelerator_frequency
accelerator_interconnect_topology
accelerator_on-chip_memories

Processor and Memory Details

host_memory_capacity720 GB
host_memory_configurationTODO
host_processor_core_count180
host_processor_model_nameAMD EPYC 9B14
host_processors_per_node1
host_processor_caches
host_processor_frequency
host_processor_interconnect

Other Hardware Details

coolingTODO
hw_notes

Network and Interconnect Details

host_network_card_countTODO
host_networking_topologyTODO
host_networkingTODO

Software Details

operating_systemLinux version 6.2.0-1019-gcp (buildd@lcy02-amd64-032) (x86_64-linux-gnu-gcc-12 (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #21~22.04.1-Ubuntu SMP Thu Nov 16 18:18:34 UTC 2023
frameworkflax
other_software_stack{'JAX TPU runtime': 'flax==0.8.5, jax[tpu]==0.4.30'}
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 4.48577Samples/s 5.43896
+ + + + \ No newline at end of file diff --git a/closed/HPE/results/1-node-2S-EMR-PyTorch/summary/README.md b/closed/HPE/results/1-node-2S-EMR-PyTorch/summary/README.md new file mode 100644 index 00000000..71ebb4cb --- /dev/null +++ b/closed/HPE/results/1-node-2S-EMR-PyTorch/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

HPE

+

HPE ProLiant DL380 Gen11

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:HPEAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerators_per_node0
accelerator_model_nameN/A
accelerator_frequencyN/A
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topologyN/A
accelerator_memory_capacityN/A
accelerator_memory_configurationN/A
accelerator_on-chip_memoriesN/A

Processor and Memory Details

host_processor_model_nameINTEL(R) XEON(R) PLATINUM 8592+
host_processors_per_node2
host_processor_core_count64
host_processor_frequencyN/A
host_processor_cachesN/A
host_memory_configuration8 slots / 64GB each / per socket
host_memory_capacity1024GB
host_processor_interconnectN/A

Other Hardware Details

coolingAir
hw_notes

Network and Interconnect Details

host_networkingEthernet Controller / 10GBASE-T
host_networking_topologyN/A
host_network_card_count2

Software Details

frameworkPyTorch
operating_systemCentOS Stream 8
other_software_stack6.6.8-1.el8.elrepo.x86_64
sw_notesN/A
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 113.735Tokens/s 251.298
bert-99F1: 89.9653Queries/s 1301.7Samples/s 1608.87
retinanetmAP: 37.1745Queries/s 275.531Samples/s 370.727
resnetacc: 75.6954Queries/s 22501.7Samples/s 25356.7
3d-unet-99.9DICE: 0.8608Samples/s 1.86652
diff --git a/closed/HPE/results/1-node-2S-EMR-PyTorch/summary/summary.html b/closed/HPE/results/1-node-2S-EMR-PyTorch/summary/summary.html new file mode 100644 index 00000000..f82dae00 --- /dev/null +++ b/closed/HPE/results/1-node-2S-EMR-PyTorch/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

HPE

+

HPE ProLiant DL380 Gen11

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:HPEAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerators_per_node0
accelerator_model_nameN/A
accelerator_frequencyN/A
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topologyN/A
accelerator_memory_capacityN/A
accelerator_memory_configurationN/A
accelerator_on-chip_memoriesN/A

Processor and Memory Details

host_processor_model_nameINTEL(R) XEON(R) PLATINUM 8592+
host_processors_per_node2
host_processor_core_count64
host_processor_frequencyN/A
host_processor_cachesN/A
host_memory_configuration8 slots / 64GB each / per socket
host_memory_capacity1024GB
host_processor_interconnectN/A

Other Hardware Details

coolingAir
hw_notes

Network and Interconnect Details

host_networkingEthernet Controller / 10GBASE-T
host_networking_topologyN/A
host_network_card_count2

Software Details

frameworkPyTorch
operating_systemCentOS Stream 8
other_software_stack6.6.8-1.el8.elrepo.x86_64
sw_notesN/A
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 113.735Tokens/s 251.298
bert-99F1: 89.9653Queries/s 1301.7Samples/s 1608.87
retinanetmAP: 37.1745Queries/s 275.531Samples/s 370.727
resnetacc: 75.6954Queries/s 22501.7Samples/s 25356.7
3d-unet-99.9DICE: 0.8608Samples/s 1.86652
+ + + + \ No newline at end of file diff --git a/closed/HPE/results/GH200-GraceHopper-Superchip_GH200-144GB_aarch64x1_TRT/summary/README.md b/closed/HPE/results/GH200-GraceHopper-Superchip_GH200-144GB_aarch64x1_TRT/summary/README.md new file mode 100644 index 00000000..81c88d93 --- /dev/null +++ b/closed/HPE/results/GH200-GraceHopper-Superchip_GH200-144GB_aarch64x1_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

HPE

+

HPE ProLiant Compute DL384 Gen12 (1x GH200-144GB_aarch64, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:HPEAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectNVLink-C2C
accelerator_interconnect1x 400Gbe Infiniband
accelerator_interconnect_topology
accelerator_memory_capacity144 GB
accelerator_memory_configurationHBM3e
accelerator_model_nameNVIDIA GH200 Grace Hopper Superchip 144GB
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity480 GB
host_memory_configuration15x 16DP (32GB) LPDDR5x
host_processor_caches
host_processor_core_count72
host_processor_frequency
host_processor_interconnect
host_processor_model_nameNVIDIA Grace CPU
host_processors_per_node1

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notesNVIDIA GH200 144GB HBM3e
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count1x Mellanox MT2894 [ConnectX-6 Lx]
host_networkingEthernet(IPoIB); Data bandwidth for GPU-NIC is 252.06 GB/s
host_networking_topologyEthernet(IPoIB)/Infiniband on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.2
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 560.30
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 3884.16Tokens/s 4084.3
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 3884.16Tokens/s 4084.3
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 2512.89Tokens/s 2632.76
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 2512.89Tokens/s 2632.76
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 2.01813Samples/s 2.30503
dlrm-v2-99AUC: 79.5069Queries/s 81009.6Samples/s 87052.7
dlrm-v2-99.9AUC: 80.2297Queries/s 51014.2Samples/s 53611.9
diff --git a/closed/HPE/results/GH200-GraceHopper-Superchip_GH200-144GB_aarch64x1_TRT/summary/summary.html b/closed/HPE/results/GH200-GraceHopper-Superchip_GH200-144GB_aarch64x1_TRT/summary/summary.html new file mode 100644 index 00000000..fbff8840 --- /dev/null +++ b/closed/HPE/results/GH200-GraceHopper-Superchip_GH200-144GB_aarch64x1_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

HPE

+

HPE ProLiant Compute DL384 Gen12 (1x GH200-144GB_aarch64, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:HPEAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectNVLink-C2C
accelerator_interconnect1x 400Gbe Infiniband
accelerator_interconnect_topology
accelerator_memory_capacity144 GB
accelerator_memory_configurationHBM3e
accelerator_model_nameNVIDIA GH200 Grace Hopper Superchip 144GB
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity480 GB
host_memory_configuration15x 16DP (32GB) LPDDR5x
host_processor_caches
host_processor_core_count72
host_processor_frequency
host_processor_interconnect
host_processor_model_nameNVIDIA Grace CPU
host_processors_per_node1

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notesNVIDIA GH200 144GB HBM3e
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count1x Mellanox MT2894 [ConnectX-6 Lx]
host_networkingEthernet(IPoIB); Data bandwidth for GPU-NIC is 252.06 GB/s
host_networking_topologyEthernet(IPoIB)/Infiniband on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.2
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 560.30
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 3884.16Tokens/s 4084.3
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 3884.16Tokens/s 4084.3
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 2512.89Tokens/s 2632.76
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 2512.89Tokens/s 2632.76
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 2.01813Samples/s 2.30503
dlrm-v2-99AUC: 79.5069Queries/s 81009.6Samples/s 87052.7
dlrm-v2-99.9AUC: 80.2297Queries/s 51014.2Samples/s 53611.9
+ + + + \ No newline at end of file diff --git a/closed/HPE/results/HPE_Cray_XD670_H100_SXM_80GBx8_TRT/summary/README.md b/closed/HPE/results/HPE_Cray_XD670_H100_SXM_80GBx8_TRT/summary/README.md new file mode 100644 index 00000000..1b8764e8 --- /dev/null +++ b/closed/HPE/results/HPE_Cray_XD670_H100_SXM_80GBx8_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

HPE

+

HPE Cray XD670 (8x H100-SXM-80GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:HPEAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectN/A
accelerator_interconnect8x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-SXM-80GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2048GB
host_memory_configuration32x 64GB MTC40F2046S1RC48BA1
host_processor_cachesL1d: 4.5 MiB (96 instances), L1i: 3 MiB (96 instances), L2: 192 MiB (96 instances), L3: 210 MiB (2 instances)
host_processor_core_count48
host_processor_frequency3.8GHz
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8468
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count10x 400Gbe Infiniband
host_networkingInfiniband
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_system22.04.4 LTS
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 555.42.06
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 23132.8Tokens/s 24528.3
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 23144.0Tokens/s 24424.7
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 19502.6Tokens/s 19751.9
bert-99F1: 89.9653Queries/s 56729.3Samples/s 71560.3
bert-99.9F1: 90.7831Queries/s 51210.8Samples/s 62207.3
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 15.8155Samples/s 16.202
retinanetmAP: 37.1745Queries/s 13763.0Samples/s 14410.0
resnetacc: 75.6954Queries/s 620228.0Samples/s 707695.0
3d-unet-99DICE: 0.8531Samples/s 52.0396
3d-unet-99.9DICE: 0.8608Samples/s 52.0344
diff --git a/closed/HPE/results/HPE_Cray_XD670_H100_SXM_80GBx8_TRT/summary/summary.html b/closed/HPE/results/HPE_Cray_XD670_H100_SXM_80GBx8_TRT/summary/summary.html new file mode 100644 index 00000000..22bda890 --- /dev/null +++ b/closed/HPE/results/HPE_Cray_XD670_H100_SXM_80GBx8_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

HPE

+

HPE Cray XD670 (8x H100-SXM-80GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:HPEAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectN/A
accelerator_interconnect8x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-SXM-80GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2048GB
host_memory_configuration32x 64GB MTC40F2046S1RC48BA1
host_processor_cachesL1d: 4.5 MiB (96 instances), L1i: 3 MiB (96 instances), L2: 192 MiB (96 instances), L3: 210 MiB (2 instances)
host_processor_core_count48
host_processor_frequency3.8GHz
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8468
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count10x 400Gbe Infiniband
host_networkingInfiniband
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_system22.04.4 LTS
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 555.42.06
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 23132.8Tokens/s 24528.3
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 23144.0Tokens/s 24424.7
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 19502.6Tokens/s 19751.9
bert-99F1: 89.9653Queries/s 56729.3Samples/s 71560.3
bert-99.9F1: 90.7831Queries/s 51210.8Samples/s 62207.3
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 15.8155Samples/s 16.202
retinanetmAP: 37.1745Queries/s 13763.0Samples/s 14410.0
resnetacc: 75.6954Queries/s 620228.0Samples/s 707695.0
3d-unet-99DICE: 0.8531Samples/s 52.0396
3d-unet-99.9DICE: 0.8608Samples/s 52.0344
+ + + + \ No newline at end of file diff --git a/closed/HPE/results/HPE_ProLiant_DL380a_H100_NVL_94GBx4_TRT/summary/README.md b/closed/HPE/results/HPE_ProLiant_DL380a_H100_NVL_94GBx4_TRT/summary/README.md new file mode 100644 index 00000000..54e7a4cf --- /dev/null +++ b/closed/HPE/results/HPE_ProLiant_DL380a_H100_NVL_94GBx4_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

HPE

+

HPE ProLiant DL380a Gen11 (4x H100-NVL-94GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:HPEAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectN/A
accelerator_interconnect4x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity94 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-NVL-94GB
accelerator_on-chip_memories
accelerators_per_node4

Processor and Memory Details

host_memory_capacity1024GB
host_memory_configuration16x 64GB
host_processor_cachesL1d: 3 MiB, L1i: 2 MiB, L2: 128 MiB (64 instances), L3: 320 MiB (2 instances)
host_processor_core_count32
host_processor_frequency2.1GHz, Max Turbo=4.0GHz
host_processor_interconnect
host_processor_model_nameINTEL(R) XEON(R) GOLD 6530
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts4x 3200W

Network and Interconnect Details

host_network_card_count7-slots of 2-port 200G Infiniband (Max 2800GB HDR) or 2-port 100G Ethernet (Max 1400GbE)
host_networking8x 400Gbe Infiniband
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4 LTS
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 535.183.01
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 6927.8Tokens/s 8858.78
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 6927.8Tokens/s 8858.78
bert-99F1: 89.9653Queries/s 19202.4Samples/s 23937.5
bert-99.9F1: 90.7831Queries/s 15003.9Samples/s 20087.7
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 3.9493Samples/s 5.80657
retinanetmAP: 37.1745Queries/s 5003.0Samples/s 5285.27
resnetacc: 75.6954Queries/s 240031.0Samples/s 242572.0
3d-unet-99DICE: 0.8531Samples/s 21.5112
3d-unet-99.9DICE: 0.8608Samples/s 21.5112
diff --git a/closed/HPE/results/HPE_ProLiant_DL380a_H100_NVL_94GBx4_TRT/summary/summary.html b/closed/HPE/results/HPE_ProLiant_DL380a_H100_NVL_94GBx4_TRT/summary/summary.html new file mode 100644 index 00000000..fe8c80c6 --- /dev/null +++ b/closed/HPE/results/HPE_ProLiant_DL380a_H100_NVL_94GBx4_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

HPE

+

HPE ProLiant DL380a Gen11 (4x H100-NVL-94GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:HPEAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectN/A
accelerator_interconnect4x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity94 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-NVL-94GB
accelerator_on-chip_memories
accelerators_per_node4

Processor and Memory Details

host_memory_capacity1024GB
host_memory_configuration16x 64GB
host_processor_cachesL1d: 3 MiB, L1i: 2 MiB, L2: 128 MiB (64 instances), L3: 320 MiB (2 instances)
host_processor_core_count32
host_processor_frequency2.1GHz, Max Turbo=4.0GHz
host_processor_interconnect
host_processor_model_nameINTEL(R) XEON(R) GOLD 6530
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts4x 3200W

Network and Interconnect Details

host_network_card_count7-slots of 2-port 200G Infiniband (Max 2800GB HDR) or 2-port 100G Ethernet (Max 1400GbE)
host_networking8x 400Gbe Infiniband
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4 LTS
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 535.183.01
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 6927.8Tokens/s 8858.78
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 6927.8Tokens/s 8858.78
bert-99F1: 89.9653Queries/s 19202.4Samples/s 23937.5
bert-99.9F1: 90.7831Queries/s 15003.9Samples/s 20087.7
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 3.9493Samples/s 5.80657
retinanetmAP: 37.1745Queries/s 5003.0Samples/s 5285.27
resnetacc: 75.6954Queries/s 240031.0Samples/s 242572.0
3d-unet-99DICE: 0.8531Samples/s 21.5112
3d-unet-99.9DICE: 0.8608Samples/s 21.5112
+ + + + \ No newline at end of file diff --git a/closed/HPE/results/HPE_ProLiant_DL380a_L40S_PCIe_48GBx4_TRT/summary/README.md b/closed/HPE/results/HPE_ProLiant_DL380a_L40S_PCIe_48GBx4_TRT/summary/README.md new file mode 100644 index 00000000..08b55533 --- /dev/null +++ b/closed/HPE/results/HPE_ProLiant_DL380a_L40S_PCIe_48GBx4_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

HPE

+

HPE ProLiant DL380a Gen11 (4x L40S-PCIe-48GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:HPEAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 Switch
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacity48 GB
accelerator_memory_configurationHBM2e
accelerator_model_nameNVIDIA L40S
accelerator_on-chip_memories
accelerators_per_node4

Processor and Memory Details

host_memory_capacity1024GB
host_memory_configuration16x 64GB 36ASF8G72PZ-3G2E1
host_processor_caches
host_processor_core_count60
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8580
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count7-slots of 2-port 200G Infiniband (Max 2800GB HDR) or 2-port 100G Ethernet (Max 1400GbE)
host_networking1Gbe
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 9.0.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.3
other_software_stackTensorRT 9.3.0, CUDA 12.4, Driver 535.183.01
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
bert-99F1: 89.9653Queries/s 12904.6Samples/s 12981.8
retinanetmAP: 37.1745Queries/s 3102.23Samples/s 3273.27
resnetacc: 75.6954Queries/s 176025.0Samples/s 172857.0
3d-unet-99DICE: 0.8531Samples/s 15.4388
diff --git a/closed/HPE/results/HPE_ProLiant_DL380a_L40S_PCIe_48GBx4_TRT/summary/summary.html b/closed/HPE/results/HPE_ProLiant_DL380a_L40S_PCIe_48GBx4_TRT/summary/summary.html new file mode 100644 index 00000000..6bce02f8 --- /dev/null +++ b/closed/HPE/results/HPE_ProLiant_DL380a_L40S_PCIe_48GBx4_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

HPE

+

HPE ProLiant DL380a Gen11 (4x L40S-PCIe-48GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:HPEAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 Switch
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacity48 GB
accelerator_memory_configurationHBM2e
accelerator_model_nameNVIDIA L40S
accelerator_on-chip_memories
accelerators_per_node4

Processor and Memory Details

host_memory_capacity1024GB
host_memory_configuration16x 64GB 36ASF8G72PZ-3G2E1
host_processor_caches
host_processor_core_count60
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8580
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count7-slots of 2-port 200G Infiniband (Max 2800GB HDR) or 2-port 100G Ethernet (Max 1400GbE)
host_networking1Gbe
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 9.0.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.3
other_software_stackTensorRT 9.3.0, CUDA 12.4, Driver 535.183.01
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
bert-99F1: 89.9653Queries/s 12904.6Samples/s 12981.8
retinanetmAP: 37.1745Queries/s 3102.23Samples/s 3273.27
resnetacc: 75.6954Queries/s 176025.0Samples/s 172857.0
3d-unet-99DICE: 0.8531Samples/s 15.4388
+ + + + \ No newline at end of file diff --git a/closed/Intel/results/1-node-2S-EMR-PyTorch/summary/README.md b/closed/Intel/results/1-node-2S-EMR-PyTorch/summary/README.md new file mode 100644 index 00000000..44f4fab4 --- /dev/null +++ b/closed/Intel/results/1-node-2S-EMR-PyTorch/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Intel

+

1-node-2S-EMR-PyTorch

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:IntelAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerators_per_node0
accelerator_model_nameN/A
accelerator_frequencyN/A
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topologyN/A
accelerator_memory_capacityN/A
accelerator_memory_configurationN/A
accelerator_on-chip_memoriesN/A

Processor and Memory Details

host_processor_model_nameINTEL(R) XEON(R) PLATINUM 8592+
host_processors_per_node2
host_processor_core_count64
host_processor_frequencyN/A
host_processor_cachesN/A
host_memory_configuration8 slots / 64GB each / per socket
host_memory_capacity1024GB
host_processor_interconnectN/A

Other Hardware Details

coolingAir
hw_notesQuantaGrid D54Q-2U

Network and Interconnect Details

host_networkingEthernet Controller / 10GBASE-T
host_networking_topologyN/A
host_network_card_count2

Software Details

frameworkPyTorch
operating_systemCentOS Stream 9
other_software_stack6.9.7-1.el9.elrepo.x86_64
sw_notesN/A
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 113.74Tokens/s 254.724
bert-99F1: 89.9653Queries/s 1281.58Samples/s 1666.82
dlrm-v2-99AUC: 79.5069Queries/s 9731.54Samples/s 9949.2
dlrm-v2-99.9AUC: 80.2297Queries/s 9731.54Samples/s 9949.2
retinanetmAP: 37.1745Queries/s 285.455Samples/s 377.53
resnetacc: 75.6954Queries/s 22501.8Samples/s 25204.5
3d-unet-99DICE: 0.8531Samples/s 1.93632
3d-unet-99.9DICE: 0.8608Samples/s 1.93632
diff --git a/closed/Intel/results/1-node-2S-EMR-PyTorch/summary/summary.html b/closed/Intel/results/1-node-2S-EMR-PyTorch/summary/summary.html new file mode 100644 index 00000000..a73298f3 --- /dev/null +++ b/closed/Intel/results/1-node-2S-EMR-PyTorch/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Intel

+

1-node-2S-EMR-PyTorch

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:IntelAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerators_per_node0
accelerator_model_nameN/A
accelerator_frequencyN/A
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topologyN/A
accelerator_memory_capacityN/A
accelerator_memory_configurationN/A
accelerator_on-chip_memoriesN/A

Processor and Memory Details

host_processor_model_nameINTEL(R) XEON(R) PLATINUM 8592+
host_processors_per_node2
host_processor_core_count64
host_processor_frequencyN/A
host_processor_cachesN/A
host_memory_configuration8 slots / 64GB each / per socket
host_memory_capacity1024GB
host_processor_interconnectN/A

Other Hardware Details

coolingAir
hw_notesQuantaGrid D54Q-2U

Network and Interconnect Details

host_networkingEthernet Controller / 10GBASE-T
host_networking_topologyN/A
host_network_card_count2

Software Details

frameworkPyTorch
operating_systemCentOS Stream 9
other_software_stack6.9.7-1.el9.elrepo.x86_64
sw_notesN/A
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 113.74Tokens/s 254.724
bert-99F1: 89.9653Queries/s 1281.58Samples/s 1666.82
dlrm-v2-99AUC: 79.5069Queries/s 9731.54Samples/s 9949.2
dlrm-v2-99.9AUC: 80.2297Queries/s 9731.54Samples/s 9949.2
retinanetmAP: 37.1745Queries/s 285.455Samples/s 377.53
resnetacc: 75.6954Queries/s 22501.8Samples/s 25204.5
3d-unet-99DICE: 0.8531Samples/s 1.93632
3d-unet-99.9DICE: 0.8608Samples/s 1.93632
+ + + + \ No newline at end of file diff --git a/closed/Intel/results/1-node-2S-GNR-PyTorch/summary/README.md b/closed/Intel/results/1-node-2S-GNR-PyTorch/summary/README.md new file mode 100644 index 00000000..ef1ed250 --- /dev/null +++ b/closed/Intel/results/1-node-2S-GNR-PyTorch/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Intel

+

1-node-2S-GNR-PyTorch

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:IntelAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerators_per_node0
accelerator_model_nameN/A
accelerator_frequencyN/A
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topologyN/A
accelerator_memory_capacityN/A
accelerator_memory_configurationN/A
accelerator_on-chip_memoriesN/A

Processor and Memory Details

host_processor_model_nameINTEL(R) XEON(R) 6980P
host_processors_per_node2
host_processor_core_count128
host_processor_frequencyN/A
host_processor_cachesN/A
host_memory_configuration12 slots per socket / 96GB each / 8800 MT/s DDR5 (MRDIMM)
host_memory_capacity2304GB
host_processor_interconnectN/A

Other Hardware Details

coolingAir
hw_notesIntel AvenueCity

Network and Interconnect Details

host_networkingEthernet Controller / 10GBASE-T
host_networking_topologyN/A
host_network_card_count2

Software Details

frameworkPyTorch
operating_systemCentOS Stream 9
other_software_stack6.6.0-gnr.bkc.6.6.16.8.23.x86_64
sw_notesN/A
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 217.466Tokens/s 498.316
bert-99F1: 89.9653Queries/s 2436.99Samples/s 3024.03
dlrm-v2-99AUC: 79.5069Queries/s 17749.5Samples/s 18326.6
dlrm-v2-99.9AUC: 80.2297Queries/s 17749.5Samples/s 18326.6
retinanetmAP: 37.1745Queries/s 595.785Samples/s 746.647
resnetacc: 75.6954Queries/s 39798.3Samples/s 45617.3
3d-unet-99DICE: 0.8531Samples/s 3.28548
3d-unet-99.9DICE: 0.8608Samples/s 3.28548
diff --git a/closed/Intel/results/1-node-2S-GNR-PyTorch/summary/summary.html b/closed/Intel/results/1-node-2S-GNR-PyTorch/summary/summary.html new file mode 100644 index 00000000..d93a4700 --- /dev/null +++ b/closed/Intel/results/1-node-2S-GNR-PyTorch/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Intel

+

1-node-2S-GNR-PyTorch

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:IntelAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerators_per_node0
accelerator_model_nameN/A
accelerator_frequencyN/A
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topologyN/A
accelerator_memory_capacityN/A
accelerator_memory_configurationN/A
accelerator_on-chip_memoriesN/A

Processor and Memory Details

host_processor_model_nameINTEL(R) XEON(R) 6980P
host_processors_per_node2
host_processor_core_count128
host_processor_frequencyN/A
host_processor_cachesN/A
host_memory_configuration12 slots per socket / 96GB each / 8800 MT/s DDR5 (MRDIMM)
host_memory_capacity2304GB
host_processor_interconnectN/A

Other Hardware Details

coolingAir
hw_notesIntel AvenueCity

Network and Interconnect Details

host_networkingEthernet Controller / 10GBASE-T
host_networking_topologyN/A
host_network_card_count2

Software Details

frameworkPyTorch
operating_systemCentOS Stream 9
other_software_stack6.6.0-gnr.bkc.6.6.16.8.23.x86_64
sw_notesN/A
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 217.466Tokens/s 498.316
bert-99F1: 89.9653Queries/s 2436.99Samples/s 3024.03
dlrm-v2-99AUC: 79.5069Queries/s 17749.5Samples/s 18326.6
dlrm-v2-99.9AUC: 80.2297Queries/s 17749.5Samples/s 18326.6
retinanetmAP: 37.1745Queries/s 595.785Samples/s 746.647
resnetacc: 75.6954Queries/s 39798.3Samples/s 45617.3
3d-unet-99DICE: 0.8531Samples/s 3.28548
3d-unet-99.9DICE: 0.8608Samples/s 3.28548
+ + + + \ No newline at end of file diff --git a/closed/JuniperNetworks/results/DGX-H100_H100-SXM-80GBx16_TRT/summary/README.md b/closed/JuniperNetworks/results/DGX-H100_H100-SXM-80GBx16_TRT/summary/README.md new file mode 100644 index 00000000..e1b1483c --- /dev/null +++ b/closed/JuniperNetworks/results/DGX-H100_H100-SXM-80GBx16_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

JuniperNetworks

+

2x8xH100-SXM-80GB

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:JuniperNetworksAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen4 x16
accelerator_interconnectNVLink
accelerator_interconnect_topologyrail-optimized
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-SXM-80GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2.0 TB
host_memory_configuration32 slots / 64 GB each/ Total = 2 TB /DDR4 3200 MHz
host_processor_caches
host_processor_core_count112
host_processor_frequency
host_processor_interconnect
host_processor_model_nameAMD EPYC 7763 64-Core Processor
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count8
host_networkingGig Ethernet
host_networking_topologyEthernet on Juniper switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 9.3.0, CUDA 12.2
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 20.04.4
other_software_stackTensorRT 9.3.0, CUDA 12.2, cuDNN 8.9.6, Driver 535.129.03, DALI 1.28.0
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 41091.6Tokens/s 41672.4
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 41091.6Tokens/s 41672.4
diff --git a/closed/JuniperNetworks/results/DGX-H100_H100-SXM-80GBx16_TRT/summary/summary.html b/closed/JuniperNetworks/results/DGX-H100_H100-SXM-80GBx16_TRT/summary/summary.html new file mode 100644 index 00000000..65d39455 --- /dev/null +++ b/closed/JuniperNetworks/results/DGX-H100_H100-SXM-80GBx16_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

JuniperNetworks

+

2x8xH100-SXM-80GB

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:JuniperNetworksAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen4 x16
accelerator_interconnectNVLink
accelerator_interconnect_topologyrail-optimized
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-SXM-80GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2.0 TB
host_memory_configuration32 slots / 64 GB each/ Total = 2 TB /DDR4 3200 MHz
host_processor_caches
host_processor_core_count112
host_processor_frequency
host_processor_interconnect
host_processor_model_nameAMD EPYC 7763 64-Core Processor
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count8
host_networkingGig Ethernet
host_networking_topologyEthernet on Juniper switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 9.3.0, CUDA 12.2
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 20.04.4
other_software_stackTensorRT 9.3.0, CUDA 12.2, cuDNN 8.9.6, Driver 535.129.03, DALI 1.28.0
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 41091.6Tokens/s 41672.4
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 41091.6Tokens/s 41672.4
+ + + + \ No newline at end of file diff --git a/closed/JuniperNetworks/results/DGX-H100_H100-SXM-80GBx32_TRT/summary/README.md b/closed/JuniperNetworks/results/DGX-H100_H100-SXM-80GBx32_TRT/summary/README.md new file mode 100644 index 00000000..d159f3f5 --- /dev/null +++ b/closed/JuniperNetworks/results/DGX-H100_H100-SXM-80GBx32_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

JuniperNetworks

+

4x8xH100-SXM-80GB

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:JuniperNetworksAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen4 x16
accelerator_interconnectNVLink
accelerator_interconnect_topologyrail-optimized
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-SXM-80GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2.0 TB
host_memory_configuration32 slots / 64 GB each/ Total = 2 TB /DDR4 3200 MHz
host_processor_caches
host_processor_core_count112
host_processor_frequency
host_processor_interconnect
host_processor_model_nameAMD EPYC 7763 64-Core Processor
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count8
host_networkingGig Ethernet
host_networking_topologyEthernet on Juniper switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 9.3.0, CUDA 12.2
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 20.04.4
other_software_stackTensorRT 9.3.0, CUDA 12.2, cuDNN 8.9.6, Driver 535.129.03, DALI 1.28.0
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 82273.2Tokens/s 82749.6
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 82273.2Tokens/s 82749.6
diff --git a/closed/JuniperNetworks/results/DGX-H100_H100-SXM-80GBx32_TRT/summary/summary.html b/closed/JuniperNetworks/results/DGX-H100_H100-SXM-80GBx32_TRT/summary/summary.html new file mode 100644 index 00000000..fb15c599 --- /dev/null +++ b/closed/JuniperNetworks/results/DGX-H100_H100-SXM-80GBx32_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

JuniperNetworks

+

4x8xH100-SXM-80GB

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:JuniperNetworksAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen4 x16
accelerator_interconnectNVLink
accelerator_interconnect_topologyrail-optimized
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-SXM-80GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2.0 TB
host_memory_configuration32 slots / 64 GB each/ Total = 2 TB /DDR4 3200 MHz
host_processor_caches
host_processor_core_count112
host_processor_frequency
host_processor_interconnect
host_processor_model_nameAMD EPYC 7763 64-Core Processor
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count8
host_networkingGig Ethernet
host_networking_topologyEthernet on Juniper switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 9.3.0, CUDA 12.2
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 20.04.4
other_software_stackTensorRT 9.3.0, CUDA 12.2, cuDNN 8.9.6, Driver 535.129.03, DALI 1.28.0
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 82273.2Tokens/s 82749.6
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 82273.2Tokens/s 82749.6
+ + + + \ No newline at end of file diff --git a/closed/Lenovo/results/H200_SXM_141GBx8_TRT/summary/README.md b/closed/Lenovo/results/H200_SXM_141GBx8_TRT/summary/README.md new file mode 100644 index 00000000..02a8852d --- /dev/null +++ b/closed/Lenovo/results/H200_SXM_141GBx8_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Lenovo

+

ThinkSystem SR685a V3(8x H200-SXM-141GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:LenovoAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity141 GB
accelerator_memory_configurationHBM3e
accelerator_model_nameNVIDIA H200-SXM-141GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity1.5 TB
host_memory_configuration24x 64GB HMCG94MEBRA121N
host_processor_caches
host_processor_core_count48
host_processor_frequency
host_processor_interconnect
host_processor_model_nameAMD EPYC 9454
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notesH200 TGP 700W
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count1
host_networkingGig Ethernet
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.5
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 555.42
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 30068.5Tokens/s 31917.0
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 30068.5Tokens/s 31917.0
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 19716.0Tokens/s 19859.2
bert-99F1: 89.9653Queries/s 56811.1Samples/s 70319.1
bert-99.9F1: 90.7831Queries/s 51211.0Samples/s 62102.5
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 16.645Samples/s 17.1877
retinanetmAP: 37.1745Queries/s 13164.0Samples/s 15015.4
3d-unet-99DICE: 0.8531Samples/s 54.3653
3d-unet-99.9DICE: 0.8608Samples/s 54.3653
diff --git a/closed/Lenovo/results/H200_SXM_141GBx8_TRT/summary/summary.html b/closed/Lenovo/results/H200_SXM_141GBx8_TRT/summary/summary.html new file mode 100644 index 00000000..48f83fb1 --- /dev/null +++ b/closed/Lenovo/results/H200_SXM_141GBx8_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Lenovo

+

ThinkSystem SR685a V3(8x H200-SXM-141GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:LenovoAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity141 GB
accelerator_memory_configurationHBM3e
accelerator_model_nameNVIDIA H200-SXM-141GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity1.5 TB
host_memory_configuration24x 64GB HMCG94MEBRA121N
host_processor_caches
host_processor_core_count48
host_processor_frequency
host_processor_interconnect
host_processor_model_nameAMD EPYC 9454
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notesH200 TGP 700W
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count1
host_networkingGig Ethernet
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.5
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 555.42
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 30068.5Tokens/s 31917.0
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 30068.5Tokens/s 31917.0
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 19716.0Tokens/s 19859.2
bert-99F1: 89.9653Queries/s 56811.1Samples/s 70319.1
bert-99.9F1: 90.7831Queries/s 51211.0Samples/s 62102.5
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 16.645Samples/s 17.1877
retinanetmAP: 37.1745Queries/s 13164.0Samples/s 15015.4
3d-unet-99DICE: 0.8531Samples/s 54.3653
3d-unet-99.9DICE: 0.8608Samples/s 54.3653
+ + + + \ No newline at end of file diff --git a/closed/Lenovo/results/Lenovo_2xL40S_TRT/summary/README.md b/closed/Lenovo/results/Lenovo_2xL40S_TRT/summary/README.md new file mode 100644 index 00000000..48dcc035 --- /dev/null +++ b/closed/Lenovo/results/Lenovo_2xL40S_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Lenovo

+

ThinkEdge SE455 V3 (2x NVIDIA L40S, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:LenovoAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectN/A
accelerator_interconnectN/a
accelerator_interconnect_topology
accelerator_memory_capacity48 GB
accelerator_memory_configurationGDDR6
accelerator_model_nameNVIDIA L40S
accelerator_on-chip_memories
accelerators_per_node2

Processor and Memory Details

host_memory_capacity128 GB
host_memory_configuration4x 32GB M321R4GA3BB6-CQKDS
host_processor_caches
host_processor_core_count16
host_processor_frequency
host_processor_interconnect
host_processor_model_nameAMD EPYC 8124P 16-Core Processor
host_processors_per_node1

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count2
host_networkingGig Ethernet
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 9.3.0, CUDA 12.3
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.3
other_software_stackTensorRT 9.3.0, CUDA 12.2, cuDNN 8.9.6, Driver 545.23.08, DALI 1.28.0
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 1718.33
bert-99F1: 89.9653Samples/s 6502.66
retinanetmAP: 37.1745Samples/s 1629.08
resnetacc: 75.6954Samples/s 86304.6
3d-unet-99DICE: 0.8531Samples/s 7.79867
diff --git a/closed/Lenovo/results/Lenovo_2xL40S_TRT/summary/summary.html b/closed/Lenovo/results/Lenovo_2xL40S_TRT/summary/summary.html new file mode 100644 index 00000000..8fc37b59 --- /dev/null +++ b/closed/Lenovo/results/Lenovo_2xL40S_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Lenovo

+

ThinkEdge SE455 V3 (2x NVIDIA L40S, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:LenovoAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectN/A
accelerator_interconnectN/a
accelerator_interconnect_topology
accelerator_memory_capacity48 GB
accelerator_memory_configurationGDDR6
accelerator_model_nameNVIDIA L40S
accelerator_on-chip_memories
accelerators_per_node2

Processor and Memory Details

host_memory_capacity128 GB
host_memory_configuration4x 32GB M321R4GA3BB6-CQKDS
host_processor_caches
host_processor_core_count16
host_processor_frequency
host_processor_interconnect
host_processor_model_nameAMD EPYC 8124P 16-Core Processor
host_processors_per_node1

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count2
host_networkingGig Ethernet
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 9.3.0, CUDA 12.3
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.3
other_software_stackTensorRT 9.3.0, CUDA 12.2, cuDNN 8.9.6, Driver 545.23.08, DALI 1.28.0
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 1718.33
bert-99F1: 89.9653Samples/s 6502.66
retinanetmAP: 37.1745Samples/s 1629.08
resnetacc: 75.6954Samples/s 86304.6
3d-unet-99DICE: 0.8531Samples/s 7.79867
+ + + + \ No newline at end of file diff --git a/closed/Lenovo/results/Lenovo_2xL4_TRT/summary/README.md b/closed/Lenovo/results/Lenovo_2xL4_TRT/summary/README.md new file mode 100644 index 00000000..0f167a00 --- /dev/null +++ b/closed/Lenovo/results/Lenovo_2xL4_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Lenovo

+

ThinkEdge SE360 V2 (2x NVIDIA L4, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:LenovoAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectEthernet
accelerator_interconnectNone
accelerator_interconnect_topologyNone
accelerator_memory_capacity24 GB
accelerator_memory_configurationGDDR6
accelerator_model_nameNVIDIA L4
accelerator_on-chip_memories
accelerators_per_node2

Processor and Memory Details

host_memory_capacity256 GB
host_memory_configuration4x 64GB M393A8G40CB4-CWE
host_processor_caches
host_processor_core_count16
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) D-2775TE
host_processors_per_node1

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count2
host_networkingGig Ethernet
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 9.3.0, CUDA 12.5
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 9.3.0, CUDA 12.5, Driver 555.42
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
bert-99F1: 89.9653Samples/s 1928.07
retinanetmAP: 37.1745Samples/s 453.667
resnetacc: 75.6954Samples/s 25600.0
3d-unet-99DICE: 0.8531Samples/s 2.21588
diff --git a/closed/Lenovo/results/Lenovo_2xL4_TRT/summary/summary.html b/closed/Lenovo/results/Lenovo_2xL4_TRT/summary/summary.html new file mode 100644 index 00000000..03918030 --- /dev/null +++ b/closed/Lenovo/results/Lenovo_2xL4_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Lenovo

+

ThinkEdge SE360 V2 (2x NVIDIA L4, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:LenovoAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectEthernet
accelerator_interconnectNone
accelerator_interconnect_topologyNone
accelerator_memory_capacity24 GB
accelerator_memory_configurationGDDR6
accelerator_model_nameNVIDIA L4
accelerator_on-chip_memories
accelerators_per_node2

Processor and Memory Details

host_memory_capacity256 GB
host_memory_configuration4x 64GB M393A8G40CB4-CWE
host_processor_caches
host_processor_core_count16
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) D-2775TE
host_processors_per_node1

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count2
host_networkingGig Ethernet
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 9.3.0, CUDA 12.5
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 9.3.0, CUDA 12.5, Driver 555.42
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
bert-99F1: 89.9653Samples/s 1928.07
retinanetmAP: 37.1745Samples/s 453.667
resnetacc: 75.6954Samples/s 25600.0
3d-unet-99DICE: 0.8531Samples/s 2.21588
+ + + + \ No newline at end of file diff --git a/closed/Lenovo/results/Lenovo_8xH200_TRT/summary/README.md b/closed/Lenovo/results/Lenovo_8xH200_TRT/summary/README.md new file mode 100644 index 00000000..eae33c4d --- /dev/null +++ b/closed/Lenovo/results/Lenovo_8xH200_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Lenovo

+

ThinkSystem SR680a V3 (8x H200-SXM-141GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:LenovoAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity141 GB
accelerator_memory_configurationHBM3e
accelerator_model_nameNVIDIA H200-SXM-141GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB MTC40F2046S1RC48BA1
host_processor_caches
host_processor_core_count48
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8568Y+
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notesH200 TGP 700W
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count10x 400Gbe Infiniband
host_networkingInfiniband; Data bandwidth for GPU-PCIe: 504GB/s; PCIe-NIC: 500GB/s
host_networking_topologyEthernet/Infiniband on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 30438.7Tokens/s 31973.7
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 30438.7Tokens/s 31973.7
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 19785.0Tokens/s 20552.1
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 19785.0Tokens/s 20552.1
bert-99F1: 89.9653Queries/s 56012.5Samples/s 70369.8
bert-99.9F1: 90.7831Queries/s 52814.0Samples/s 64983.6
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 16.9831Samples/s 17.5975
3d-unet-99DICE: 0.8531Samples/s 54.5565
3d-unet-99.9DICE: 0.8608Samples/s 54.5565
diff --git a/closed/Lenovo/results/Lenovo_8xH200_TRT/summary/summary.html b/closed/Lenovo/results/Lenovo_8xH200_TRT/summary/summary.html new file mode 100644 index 00000000..9b9fa508 --- /dev/null +++ b/closed/Lenovo/results/Lenovo_8xH200_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Lenovo

+

ThinkSystem SR680a V3 (8x H200-SXM-141GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:LenovoAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity141 GB
accelerator_memory_configurationHBM3e
accelerator_model_nameNVIDIA H200-SXM-141GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB MTC40F2046S1RC48BA1
host_processor_caches
host_processor_core_count48
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8568Y+
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notesH200 TGP 700W
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count10x 400Gbe Infiniband
host_networkingInfiniband; Data bandwidth for GPU-PCIe: 504GB/s; PCIe-NIC: 500GB/s
host_networking_topologyEthernet/Infiniband on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 30438.7Tokens/s 31973.7
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 30438.7Tokens/s 31973.7
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 19785.0Tokens/s 20552.1
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 19785.0Tokens/s 20552.1
bert-99F1: 89.9653Queries/s 56012.5Samples/s 70369.8
bert-99.9F1: 90.7831Queries/s 52814.0Samples/s 64983.6
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 16.9831Samples/s 17.5975
3d-unet-99DICE: 0.8531Samples/s 54.5565
3d-unet-99.9DICE: 0.8608Samples/s 54.5565
+ + + + \ No newline at end of file diff --git a/closed/Lenovo/results/SR650_V3_3xL40S_TRT/summary/README.md b/closed/Lenovo/results/SR650_V3_3xL40S_TRT/summary/README.md new file mode 100644 index 00000000..9484bab6 --- /dev/null +++ b/closed/Lenovo/results/SR650_V3_3xL40S_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Lenovo

+

ThinkSystem SR650 V3 (3x NVIDIA L40S, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:LenovoAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectN/A
accelerator_interconnectN/a
accelerator_interconnect_topology
accelerator_memory_capacity48 GB
accelerator_memory_configurationGDDR6
accelerator_model_nameNVIDIA L40S
accelerator_on-chip_memories
accelerators_per_node3

Processor and Memory Details

host_memory_capacity512 GB
host_memory_configuration16x 32GB M321R4GA3BB6-CQKDS
host_processor_caches
host_processor_core_count32
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Gold 6438N
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count1
host_networkingGig Ethernet
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.5
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 10.2.0, CUDA 12.5, cuDNN 8.9.7, Driver 555.42
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 2475.22Tokens/s 2593.65
bert-99F1: 89.9653Queries/s 9302.0Samples/s 9480.82
retinanetmAP: 37.1745Queries/s 2201.83Samples/s 2290.08
resnetacc: 75.6954Queries/s 132008.0Samples/s 132436.0
3d-unet-99DICE: 0.8531Samples/s 11.6149
diff --git a/closed/Lenovo/results/SR650_V3_3xL40S_TRT/summary/summary.html b/closed/Lenovo/results/SR650_V3_3xL40S_TRT/summary/summary.html new file mode 100644 index 00000000..041dff63 --- /dev/null +++ b/closed/Lenovo/results/SR650_V3_3xL40S_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Lenovo

+

ThinkSystem SR650 V3 (3x NVIDIA L40S, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:LenovoAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectN/A
accelerator_interconnectN/a
accelerator_interconnect_topology
accelerator_memory_capacity48 GB
accelerator_memory_configurationGDDR6
accelerator_model_nameNVIDIA L40S
accelerator_on-chip_memories
accelerators_per_node3

Processor and Memory Details

host_memory_capacity512 GB
host_memory_configuration16x 32GB M321R4GA3BB6-CQKDS
host_processor_caches
host_processor_core_count32
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Gold 6438N
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count1
host_networkingGig Ethernet
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.5
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 10.2.0, CUDA 12.5, cuDNN 8.9.7, Driver 555.42
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 2475.22Tokens/s 2593.65
bert-99F1: 89.9653Queries/s 9302.0Samples/s 9480.82
retinanetmAP: 37.1745Queries/s 2201.83Samples/s 2290.08
resnetacc: 75.6954Queries/s 132008.0Samples/s 132436.0
3d-unet-99DICE: 0.8531Samples/s 11.6149
+ + + + \ No newline at end of file diff --git a/closed/Lenovo/results/SR675_V3_8xH100_NVL_TRT/summary/README.md b/closed/Lenovo/results/SR675_V3_8xH100_NVL_TRT/summary/README.md new file mode 100644 index 00000000..6912e97b --- /dev/null +++ b/closed/Lenovo/results/SR675_V3_8xH100_NVL_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Lenovo

+

ThinkSystem SR675 V3 (8x H100-NVL-94GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:LenovoAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen4 x16
accelerator_interconnect18x 4th Gen NVLink, 600GB/s
accelerator_interconnect_topology
accelerator_memory_capacity94 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-NVL-94GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity3 TB
host_memory_configuration24x 128GB HMCT04MEERA131N
host_processor_caches
host_processor_core_count84
host_processor_frequency
host_processor_interconnect
host_processor_model_nameAMD EPYC 9634 84-Core Processor
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count4
host_networkingGig Ethernet
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.5
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 10.2.0, CUDA 12.5, cuDNN 8.9.7, Driver 555.42
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 13074.7Tokens/s 15875.8
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 13074.7Tokens/s 15875.8
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 13083.5Tokens/s 14073.6
bert-99F1: 89.9653Queries/s 35006.8Samples/s 47655.3
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 11.6801Samples/s 11.827
retinanetmAP: 37.1745Queries/s 9001.86Samples/s 10867.0
resnetacc: 75.6954Queries/s 500148.0Samples/s 541887.0
3d-unet-99DICE: 0.8531Samples/s 43.6883
3d-unet-99.9DICE: 0.8608Samples/s 43.6883
diff --git a/closed/Lenovo/results/SR675_V3_8xH100_NVL_TRT/summary/summary.html b/closed/Lenovo/results/SR675_V3_8xH100_NVL_TRT/summary/summary.html new file mode 100644 index 00000000..d2eb5009 --- /dev/null +++ b/closed/Lenovo/results/SR675_V3_8xH100_NVL_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Lenovo

+

ThinkSystem SR675 V3 (8x H100-NVL-94GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:LenovoAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen4 x16
accelerator_interconnect18x 4th Gen NVLink, 600GB/s
accelerator_interconnect_topology
accelerator_memory_capacity94 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-NVL-94GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity3 TB
host_memory_configuration24x 128GB HMCT04MEERA131N
host_processor_caches
host_processor_core_count84
host_processor_frequency
host_processor_interconnect
host_processor_model_nameAMD EPYC 9634 84-Core Processor
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count4
host_networkingGig Ethernet
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.5
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 10.2.0, CUDA 12.5, cuDNN 8.9.7, Driver 555.42
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 13074.7Tokens/s 15875.8
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 13074.7Tokens/s 15875.8
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 13083.5Tokens/s 14073.6
bert-99F1: 89.9653Queries/s 35006.8Samples/s 47655.3
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 11.6801Samples/s 11.827
retinanetmAP: 37.1745Queries/s 9001.86Samples/s 10867.0
resnetacc: 75.6954Queries/s 500148.0Samples/s 541887.0
3d-unet-99DICE: 0.8531Samples/s 43.6883
3d-unet-99.9DICE: 0.8608Samples/s 43.6883
+ + + + \ No newline at end of file diff --git a/closed/NVIDIA/results/B200-SXM-180GBx1_TRT/summary/README.md b/closed/NVIDIA/results/B200-SXM-180GBx1_TRT/summary/README.md new file mode 100644 index 00000000..cb26ebd4 --- /dev/null +++ b/closed/NVIDIA/results/B200-SXM-180GBx1_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NVIDIA

+

NVIDIA B200 (1x B200-SXM-180GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NVIDIAAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacity180 GB
accelerator_memory_configurationHBM3e
accelerator_model_nameNVIDIA B200-SXM-180GB
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity408GB
host_memory_configuration6x 64GB MTC40F2046S1RC48BA1
host_processor_caches
host_processor_core_count12
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Silver 4410Y
host_processors_per_node1

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notesB200 TGP 1000W
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count2x 10Gbe
host_networkingGig Ethernet
host_networking_topologyEthernet on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.1.0, CUDA 12.7
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 10.1.0, CUDA 12.7, cuDNN 8.9.7, Driver 565
sw_notesPrivate git hash for code and TRTLLM were used to generate the preview results. The git hash are eba031f9e2e6cf3d1cdce0549511d27adf01a3f4 and 4e5b175cc80320789ba6846ced80a87f25e70fb2
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 10755.6Tokens/s 11264.4
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 10755.6Tokens/s 11264.4
diff --git a/closed/NVIDIA/results/B200-SXM-180GBx1_TRT/summary/summary.html b/closed/NVIDIA/results/B200-SXM-180GBx1_TRT/summary/summary.html new file mode 100644 index 00000000..57994d0b --- /dev/null +++ b/closed/NVIDIA/results/B200-SXM-180GBx1_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NVIDIA

+

NVIDIA B200 (1x B200-SXM-180GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NVIDIAAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacity180 GB
accelerator_memory_configurationHBM3e
accelerator_model_nameNVIDIA B200-SXM-180GB
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity408GB
host_memory_configuration6x 64GB MTC40F2046S1RC48BA1
host_processor_caches
host_processor_core_count12
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Silver 4410Y
host_processors_per_node1

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notesB200 TGP 1000W
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count2x 10Gbe
host_networkingGig Ethernet
host_networking_topologyEthernet on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.1.0, CUDA 12.7
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 10.1.0, CUDA 12.7, cuDNN 8.9.7, Driver 565
sw_notesPrivate git hash for code and TRTLLM were used to generate the preview results. The git hash are eba031f9e2e6cf3d1cdce0549511d27adf01a3f4 and 4e5b175cc80320789ba6846ced80a87f25e70fb2
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 10755.6Tokens/s 11264.4
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 10755.6Tokens/s 11264.4
+ + + + \ No newline at end of file diff --git a/closed/NVIDIA/results/DGX-H100_H100-SXM-80GBx8_TRT/summary/README.md b/closed/NVIDIA/results/DGX-H100_H100-SXM-80GBx8_TRT/summary/README.md new file mode 100644 index 00000000..d15ad4c3 --- /dev/null +++ b/closed/NVIDIA/results/DGX-H100_H100-SXM-80GBx8_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NVIDIA

+

NVIDIA DGX H100 (8x H100-SXM-80GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NVIDIAAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-SXM-80GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB MTC40F2046S1RC48BA1
host_processor_caches
host_processor_core_count56
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8480C
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count10x 400Gbe Infiniband
host_networkingInfiniband; Data bandwidth for GPU-PCIe: 504GB/s; PCIe-NIC: 500GB/s
host_networking_topologyEthernet/Infiniband on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 21605.8Tokens/s 24524.9
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 21605.8Tokens/s 24524.9
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 19233.8Tokens/s 19739.8
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 19233.8Tokens/s 19739.8
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 15.7189Samples/s 16.3484
dlrm-v2-99AUC: 79.5069Queries/s 510155.0Samples/s 595658.0
dlrm-v2-99.9AUC: 80.2297Queries/s 340067.0Samples/s 361613.0
diff --git a/closed/NVIDIA/results/DGX-H100_H100-SXM-80GBx8_TRT/summary/summary.html b/closed/NVIDIA/results/DGX-H100_H100-SXM-80GBx8_TRT/summary/summary.html new file mode 100644 index 00000000..d5f401b0 --- /dev/null +++ b/closed/NVIDIA/results/DGX-H100_H100-SXM-80GBx8_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NVIDIA

+

NVIDIA DGX H100 (8x H100-SXM-80GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NVIDIAAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-SXM-80GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB MTC40F2046S1RC48BA1
host_processor_caches
host_processor_core_count56
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8480C
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count10x 400Gbe Infiniband
host_networkingInfiniband; Data bandwidth for GPU-PCIe: 504GB/s; PCIe-NIC: 500GB/s
host_networking_topologyEthernet/Infiniband on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 21605.8Tokens/s 24524.9
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 21605.8Tokens/s 24524.9
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 19233.8Tokens/s 19739.8
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 19233.8Tokens/s 19739.8
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 15.7189Samples/s 16.3484
dlrm-v2-99AUC: 79.5069Queries/s 510155.0Samples/s 595658.0
dlrm-v2-99.9AUC: 80.2297Queries/s 340067.0Samples/s 361613.0
+ + + + \ No newline at end of file diff --git a/closed/NVIDIA/results/GH200-GraceHopper-Superchip_GH200-144GB_aarch64x1_TRT/summary/README.md b/closed/NVIDIA/results/GH200-GraceHopper-Superchip_GH200-144GB_aarch64x1_TRT/summary/README.md new file mode 100644 index 00000000..7aa18765 --- /dev/null +++ b/closed/NVIDIA/results/GH200-GraceHopper-Superchip_GH200-144GB_aarch64x1_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NVIDIA

+

NVIDIA GH200 NVL2 Platform (1x GH200-144GB_aarch64, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NVIDIAAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectNVLink-C2C
accelerator_interconnect1x 400Gbe Infiniband
accelerator_interconnect_topology
accelerator_memory_capacity144 GB
accelerator_memory_configurationHBM3e
accelerator_model_nameNVIDIA GH200 Grace Hopper Superchip 144GB
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity480 GB
host_memory_configuration15x 16DP (32GB) LPDDR5x
host_processor_caches
host_processor_core_count72
host_processor_frequency
host_processor_interconnect
host_processor_model_nameNVIDIA Grace CPU
host_processors_per_node1

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notesNVIDIA GH200 144GB HBM3e
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count1x Mellanox MT2894 [ConnectX-6 Lx]
host_networkingEthernet(IPoIB); Data bandwidth for GPU-NIC is 252.06 GB/s
host_networking_topologyEthernet(IPoIB)/Infiniband on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.2
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 560.30
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 3883.67Tokens/s 4067.52
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 3883.67Tokens/s 4067.52
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 2513.38Tokens/s 2627.69
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 2513.38Tokens/s 2627.69
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 2.01829Samples/s 2.30033
dlrm-v2-99AUC: 79.5069Queries/s 81009.7Samples/s 86731.2
dlrm-v2-99.9AUC: 80.2297Queries/s 51014.2Samples/s 53420.6
diff --git a/closed/NVIDIA/results/GH200-GraceHopper-Superchip_GH200-144GB_aarch64x1_TRT/summary/summary.html b/closed/NVIDIA/results/GH200-GraceHopper-Superchip_GH200-144GB_aarch64x1_TRT/summary/summary.html new file mode 100644 index 00000000..611d70d6 --- /dev/null +++ b/closed/NVIDIA/results/GH200-GraceHopper-Superchip_GH200-144GB_aarch64x1_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NVIDIA

+

NVIDIA GH200 NVL2 Platform (1x GH200-144GB_aarch64, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NVIDIAAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectNVLink-C2C
accelerator_interconnect1x 400Gbe Infiniband
accelerator_interconnect_topology
accelerator_memory_capacity144 GB
accelerator_memory_configurationHBM3e
accelerator_model_nameNVIDIA GH200 Grace Hopper Superchip 144GB
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity480 GB
host_memory_configuration15x 16DP (32GB) LPDDR5x
host_processor_caches
host_processor_core_count72
host_processor_frequency
host_processor_interconnect
host_processor_model_nameNVIDIA Grace CPU
host_processors_per_node1

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notesNVIDIA GH200 144GB HBM3e
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count1x Mellanox MT2894 [ConnectX-6 Lx]
host_networkingEthernet(IPoIB); Data bandwidth for GPU-NIC is 252.06 GB/s
host_networking_topologyEthernet(IPoIB)/Infiniband on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.2
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 560.30
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 3883.67Tokens/s 4067.52
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 3883.67Tokens/s 4067.52
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 2513.38Tokens/s 2627.69
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 2513.38Tokens/s 2627.69
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 2.01829Samples/s 2.30033
dlrm-v2-99AUC: 79.5069Queries/s 81009.7Samples/s 86731.2
dlrm-v2-99.9AUC: 80.2297Queries/s 51014.2Samples/s 53420.6
+ + + + \ No newline at end of file diff --git a/closed/NVIDIA/results/H200-SXM-141GB-CTSx1_TRT/summary/README.md b/closed/NVIDIA/results/H200-SXM-141GB-CTSx1_TRT/summary/README.md new file mode 100644 index 00000000..c28d681b --- /dev/null +++ b/closed/NVIDIA/results/H200-SXM-141GB-CTSx1_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NVIDIA

+

NVIDIA H200 (1x H200-SXM-141GB-CTS, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NVIDIAAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity141 GB
accelerator_memory_configurationHBM3e
accelerator_model_nameNVIDIA H200-SXM-141GB-CTS
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB MTC40F2046S1RC48BA12
host_processor_caches
host_processor_core_count56
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8480C
host_processors_per_node2

Other Hardware Details

coolingAir/liquid cooling
disk_controllersNVMe
disk_drivesSSD
hw_notesH200 TGP 1000W
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count10x 400Gbe Infiniband
host_networkingInfiniband; Data bandwidth for GPU-PCIe: 504GB/s; PCIe-NIC: 500GB/s
host_networking_topologyEthernet/Infiniband on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 4202.3Tokens/s 4487.88
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 4202.3Tokens/s 4487.88
diff --git a/closed/NVIDIA/results/H200-SXM-141GB-CTSx1_TRT/summary/summary.html b/closed/NVIDIA/results/H200-SXM-141GB-CTSx1_TRT/summary/summary.html new file mode 100644 index 00000000..eaecb3f9 --- /dev/null +++ b/closed/NVIDIA/results/H200-SXM-141GB-CTSx1_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NVIDIA

+

NVIDIA H200 (1x H200-SXM-141GB-CTS, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NVIDIAAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity141 GB
accelerator_memory_configurationHBM3e
accelerator_model_nameNVIDIA H200-SXM-141GB-CTS
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB MTC40F2046S1RC48BA12
host_processor_caches
host_processor_core_count56
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8480C
host_processors_per_node2

Other Hardware Details

coolingAir/liquid cooling
disk_controllersNVMe
disk_drivesSSD
hw_notesH200 TGP 1000W
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count10x 400Gbe Infiniband
host_networkingInfiniband; Data bandwidth for GPU-PCIe: 504GB/s; PCIe-NIC: 500GB/s
host_networking_topologyEthernet/Infiniband on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 4202.3Tokens/s 4487.88
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 4202.3Tokens/s 4487.88
+ + + + \ No newline at end of file diff --git a/closed/NVIDIA/results/H200-SXM-141GB-CTSx8_TRT/summary/README.md b/closed/NVIDIA/results/H200-SXM-141GB-CTSx8_TRT/summary/README.md new file mode 100644 index 00000000..f6a36057 --- /dev/null +++ b/closed/NVIDIA/results/H200-SXM-141GB-CTSx8_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NVIDIA

+

NVIDIA H200 (8x H200-SXM-141GB-CTS, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NVIDIAAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity141 GB
accelerator_memory_configurationHBM3e
accelerator_model_nameNVIDIA H200-SXM-141GB-CTS
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB MTC40F2046S1RC48BA12
host_processor_caches
host_processor_core_count56
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8480C
host_processors_per_node2

Other Hardware Details

coolingAir/liquid cooling
disk_controllersNVMe
disk_drivesSSD
hw_notesH200 TGP 1000W
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count10x 400Gbe Infiniband
host_networkingInfiniband; Data bandwidth for GPU-PCIe: 504GB/s; PCIe-NIC: 500GB/s
host_networking_topologyEthernet/Infiniband on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 32789.7Tokens/s 34864.2
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 32789.7Tokens/s 34864.2
diff --git a/closed/NVIDIA/results/H200-SXM-141GB-CTSx8_TRT/summary/summary.html b/closed/NVIDIA/results/H200-SXM-141GB-CTSx8_TRT/summary/summary.html new file mode 100644 index 00000000..8d383fa8 --- /dev/null +++ b/closed/NVIDIA/results/H200-SXM-141GB-CTSx8_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NVIDIA

+

NVIDIA H200 (8x H200-SXM-141GB-CTS, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NVIDIAAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity141 GB
accelerator_memory_configurationHBM3e
accelerator_model_nameNVIDIA H200-SXM-141GB-CTS
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB MTC40F2046S1RC48BA12
host_processor_caches
host_processor_core_count56
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8480C
host_processors_per_node2

Other Hardware Details

coolingAir/liquid cooling
disk_controllersNVMe
disk_drivesSSD
hw_notesH200 TGP 1000W
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count10x 400Gbe Infiniband
host_networkingInfiniband; Data bandwidth for GPU-PCIe: 504GB/s; PCIe-NIC: 500GB/s
host_networking_topologyEthernet/Infiniband on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 32789.7Tokens/s 34864.2
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 32789.7Tokens/s 34864.2
+ + + + \ No newline at end of file diff --git a/closed/NVIDIA/results/H200-SXM-141GBx1_TRT/summary/README.md b/closed/NVIDIA/results/H200-SXM-141GBx1_TRT/summary/README.md new file mode 100644 index 00000000..e8bef60d --- /dev/null +++ b/closed/NVIDIA/results/H200-SXM-141GBx1_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NVIDIA

+

NVIDIA H200 (1x H200-SXM-141GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NVIDIAAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity141 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H200-SXM-141GB
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB MTC40F2046S1RC48BA1
host_processor_caches
host_processor_core_count56
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8480C
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notesH200 TGP 700W
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count10x 400Gbe Infiniband
host_networkingInfiniband; Data bandwidth for GPU-PCIe: 504GB/s; PCIe-NIC: 500GB/s
host_networking_topologyEthernet/Infiniband on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 2405.86Tokens/s 2579.51
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 2405.86Tokens/s 2579.51
diff --git a/closed/NVIDIA/results/H200-SXM-141GBx1_TRT/summary/summary.html b/closed/NVIDIA/results/H200-SXM-141GBx1_TRT/summary/summary.html new file mode 100644 index 00000000..a0a8d382 --- /dev/null +++ b/closed/NVIDIA/results/H200-SXM-141GBx1_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NVIDIA

+

NVIDIA H200 (1x H200-SXM-141GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NVIDIAAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity141 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H200-SXM-141GB
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB MTC40F2046S1RC48BA1
host_processor_caches
host_processor_core_count56
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8480C
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notesH200 TGP 700W
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count10x 400Gbe Infiniband
host_networkingInfiniband; Data bandwidth for GPU-PCIe: 504GB/s; PCIe-NIC: 500GB/s
host_networking_topologyEthernet/Infiniband on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 2405.86Tokens/s 2579.51
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 2405.86Tokens/s 2579.51
+ + + + \ No newline at end of file diff --git a/closed/NVIDIA/results/H200-SXM-141GBx8_TRT/summary/README.md b/closed/NVIDIA/results/H200-SXM-141GBx8_TRT/summary/README.md new file mode 100644 index 00000000..0f5247b3 --- /dev/null +++ b/closed/NVIDIA/results/H200-SXM-141GBx8_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NVIDIA

+

NVIDIA H200 (8x H200-SXM-141GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NVIDIAAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity141 GB
accelerator_memory_configurationHBM3e
accelerator_model_nameNVIDIA H200-SXM-141GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB MTC40F2046S1RC48BA1
host_processor_caches
host_processor_core_count56
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8480C
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notesH200 TGP 700W
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count10x 400Gbe Infiniband
host_networkingInfiniband; Data bandwidth for GPU-PCIe: 504GB/s; PCIe-NIC: 500GB/s
host_networking_topologyEthernet/Infiniband on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 29228.2Tokens/s 31302.7
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 29228.2Tokens/s 31302.7
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 19243.3Tokens/s 20086.1
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 19243.3Tokens/s 20086.1
bert-99F1: 89.9653Queries/s 57609.3Samples/s 73309.5
bert-99.9F1: 90.7831Queries/s 51212.0Samples/s 63950.4
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 16.782Samples/s 17.4186
dlrm-v2-99AUC: 79.5069Queries/s 585208.0Samples/s 637342.0
dlrm-v2-99.9AUC: 80.2297Queries/s 370083.0Samples/s 390953.0
retinanetmAP: 37.1745Queries/s 13604.0Samples/s 14439.0
resnetacc: 75.6954Queries/s 632229.0Samples/s 756960.0
3d-unet-99DICE: 0.8531Samples/s 54.7136
3d-unet-99.9DICE: 0.8608Samples/s 54.7136
diff --git a/closed/NVIDIA/results/H200-SXM-141GBx8_TRT/summary/summary.html b/closed/NVIDIA/results/H200-SXM-141GBx8_TRT/summary/summary.html new file mode 100644 index 00000000..090f9dfb --- /dev/null +++ b/closed/NVIDIA/results/H200-SXM-141GBx8_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NVIDIA

+

NVIDIA H200 (8x H200-SXM-141GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NVIDIAAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity141 GB
accelerator_memory_configurationHBM3e
accelerator_model_nameNVIDIA H200-SXM-141GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB MTC40F2046S1RC48BA1
host_processor_caches
host_processor_core_count56
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8480C
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notesH200 TGP 700W
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count10x 400Gbe Infiniband
host_networkingInfiniband; Data bandwidth for GPU-PCIe: 504GB/s; PCIe-NIC: 500GB/s
host_networking_topologyEthernet/Infiniband on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 29228.2Tokens/s 31302.7
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 29228.2Tokens/s 31302.7
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 19243.3Tokens/s 20086.1
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 19243.3Tokens/s 20086.1
bert-99F1: 89.9653Queries/s 57609.3Samples/s 73309.5
bert-99.9F1: 90.7831Queries/s 51212.0Samples/s 63950.4
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 16.782Samples/s 17.4186
dlrm-v2-99AUC: 79.5069Queries/s 585208.0Samples/s 637342.0
dlrm-v2-99.9AUC: 80.2297Queries/s 370083.0Samples/s 390953.0
retinanetmAP: 37.1745Queries/s 13604.0Samples/s 14439.0
resnetacc: 75.6954Queries/s 632229.0Samples/s 756960.0
3d-unet-99DICE: 0.8531Samples/s 54.7136
3d-unet-99.9DICE: 0.8608Samples/s 54.7136
+ + + + \ No newline at end of file diff --git a/closed/NVIDIA/results/H200-SXM-141GBx8_TRT_MaxQ/summary/README.md b/closed/NVIDIA/results/H200-SXM-141GBx8_TRT_MaxQ/summary/README.md new file mode 100644 index 00000000..c81320e3 --- /dev/null +++ b/closed/NVIDIA/results/H200-SXM-141GBx8_TRT_MaxQ/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NVIDIA

+

NVIDIA H200 (8x H200-SXM-141GB, MaxQ, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NVIDIAAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity141 GB
accelerator_memory_configurationHBM3e
accelerator_model_nameNVIDIA H200-SXM-141GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB MTC40F2046S1RC48BA1
host_processor_caches
host_processor_core_count56
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8480C
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notesH200 TGP 700W
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count10x 400Gbe Infiniband
host_networkingInfiniband; Data bandwidth for GPU-PCIe: 504GB/s; PCIe-NIC: 500GB/s
host_networking_topologyEthernet/Infiniband on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 23113.1Tokens/s 25262.1
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 23113.1Tokens/s 25262.1
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 11700.8Tokens/s 13096.6
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 11700.8Tokens/s 13096.6
bert-99F1: 89.9653Queries/s 41599.4Samples/s 54063.2
bert-99.9F1: 90.7831Queries/s 39804.3Samples/s 46534.6
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 12.708Samples/s 13.202
dlrm-v2-99AUC: 79.5069Queries/s 420107.0Samples/s 503719.0
dlrm-v2-99.9AUC: 80.2297Queries/s 280045.0Samples/s 305223.0
retinanetmAP: 37.1745Queries/s 9602.95Samples/s 10802.5
resnetacc: 75.6954Queries/s 480131.0Samples/s 556234.0
3d-unet-99DICE: 0.8531Samples/s 41.664
3d-unet-99.9DICE: 0.8608Samples/s 41.664
diff --git a/closed/NVIDIA/results/H200-SXM-141GBx8_TRT_MaxQ/summary/summary.html b/closed/NVIDIA/results/H200-SXM-141GBx8_TRT_MaxQ/summary/summary.html new file mode 100644 index 00000000..f640839d --- /dev/null +++ b/closed/NVIDIA/results/H200-SXM-141GBx8_TRT_MaxQ/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NVIDIA

+

NVIDIA H200 (8x H200-SXM-141GB, MaxQ, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NVIDIAAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity141 GB
accelerator_memory_configurationHBM3e
accelerator_model_nameNVIDIA H200-SXM-141GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB MTC40F2046S1RC48BA1
host_processor_caches
host_processor_core_count56
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8480C
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notesH200 TGP 700W
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count10x 400Gbe Infiniband
host_networkingInfiniband; Data bandwidth for GPU-PCIe: 504GB/s; PCIe-NIC: 500GB/s
host_networking_topologyEthernet/Infiniband on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 23113.1Tokens/s 25262.1
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 23113.1Tokens/s 25262.1
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 11700.8Tokens/s 13096.6
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 11700.8Tokens/s 13096.6
bert-99F1: 89.9653Queries/s 41599.4Samples/s 54063.2
bert-99.9F1: 90.7831Queries/s 39804.3Samples/s 46534.6
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 12.708Samples/s 13.202
dlrm-v2-99AUC: 79.5069Queries/s 420107.0Samples/s 503719.0
dlrm-v2-99.9AUC: 80.2297Queries/s 280045.0Samples/s 305223.0
retinanetmAP: 37.1745Queries/s 9602.95Samples/s 10802.5
resnetacc: 75.6954Queries/s 480131.0Samples/s 556234.0
3d-unet-99DICE: 0.8531Samples/s 41.664
3d-unet-99.9DICE: 0.8608Samples/s 41.664
+ + + + \ No newline at end of file diff --git a/closed/NVIDIA/results/H200-SXM-141GBx8_TRT_Triton/summary/README.md b/closed/NVIDIA/results/H200-SXM-141GBx8_TRT_Triton/summary/README.md new file mode 100644 index 00000000..41b40ad4 --- /dev/null +++ b/closed/NVIDIA/results/H200-SXM-141GBx8_TRT_Triton/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NVIDIA

+

NVIDIA H200 (8x H200-SXM-141GB, TensorRT, Triton)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NVIDIAAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity141 GB
accelerator_memory_configurationHBM3e
accelerator_model_nameNVIDIA H200-SXM-141GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB MTC40F2046S1RC48BA1
host_processor_caches
host_processor_core_count56
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8480C
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notesH200 TGP 700W
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count10x 400Gbe Infiniband
host_networkingInfiniband; Data bandwidth for GPU-PCIe: 504GB/s; PCIe-NIC: 500GB/s
host_networking_topologyEthernet/Infiniband on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54, Triton 24.06
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 30128.4Tokens/s 31059.3
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 30128.4Tokens/s 31059.3
diff --git a/closed/NVIDIA/results/H200-SXM-141GBx8_TRT_Triton/summary/summary.html b/closed/NVIDIA/results/H200-SXM-141GBx8_TRT_Triton/summary/summary.html new file mode 100644 index 00000000..33d9ae93 --- /dev/null +++ b/closed/NVIDIA/results/H200-SXM-141GBx8_TRT_Triton/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NVIDIA

+

NVIDIA H200 (8x H200-SXM-141GB, TensorRT, Triton)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NVIDIAAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity141 GB
accelerator_memory_configurationHBM3e
accelerator_model_nameNVIDIA H200-SXM-141GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB MTC40F2046S1RC48BA1
host_processor_caches
host_processor_core_count56
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8480C
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notesH200 TGP 700W
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count10x 400Gbe Infiniband
host_networkingInfiniband; Data bandwidth for GPU-PCIe: 504GB/s; PCIe-NIC: 500GB/s
host_networking_topologyEthernet/Infiniband on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54, Triton 24.06
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 30128.4Tokens/s 31059.3
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 30128.4Tokens/s 31059.3
+ + + + \ No newline at end of file diff --git a/closed/NVIDIA/results/Orin_TRT/summary/README.md b/closed/NVIDIA/results/Orin_TRT/summary/README.md new file mode 100644 index 00000000..31063082 --- /dev/null +++ b/closed/NVIDIA/results/Orin_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NVIDIA

+

NVIDIA Jetson AGX Orin Developer Kit 64G (TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NVIDIAAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacityShared with host
accelerator_memory_configurationLPDDR5
accelerator_model_nameNVIDIA Jetson AGX Orin 64G
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity64 GB
host_memory_configuration64GB 256-bit LPDDR5
host_processor_caches
host_processor_core_count12
host_processor_frequency
host_processor_interconnect
host_processor_model_name12-core ARM Cortex-A78AE CPU
host_processors_per_node1

Other Hardware Details

coolingAir-cooled
disk_controllerseMMC 5.1
disk_driveseMMC 5.1
hw_notesGPU and both DLAs are used in resnet50 and Retinanet, in Offline scenario
other_hardware
power_management
power_supply_detailsDell USB-C 130.0W Adapter (HA130PM170)
power_supply_quantity_and_rating_watts130W

Network and Interconnect Details

host_network_card_count1 Integrated
host_networkingGig Ethernet
host_networking_topologyUSB forwarded
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkJetpack 6.0, TensorRT 10.1, CUDA 12.2
management_firmware_version
nics_enabled_firmware
operating_systemJetson r36.3.1 L4T
other_software_stackJetpack 6.0, TensorRT 10.1, CUDA 12.2, cuDNN 8.9.4
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 64.4734
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Samples/s 0.101697
diff --git a/closed/NVIDIA/results/Orin_TRT/summary/summary.html b/closed/NVIDIA/results/Orin_TRT/summary/summary.html new file mode 100644 index 00000000..8a53048b --- /dev/null +++ b/closed/NVIDIA/results/Orin_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NVIDIA

+

NVIDIA Jetson AGX Orin Developer Kit 64G (TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NVIDIAAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacityShared with host
accelerator_memory_configurationLPDDR5
accelerator_model_nameNVIDIA Jetson AGX Orin 64G
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity64 GB
host_memory_configuration64GB 256-bit LPDDR5
host_processor_caches
host_processor_core_count12
host_processor_frequency
host_processor_interconnect
host_processor_model_name12-core ARM Cortex-A78AE CPU
host_processors_per_node1

Other Hardware Details

coolingAir-cooled
disk_controllerseMMC 5.1
disk_driveseMMC 5.1
hw_notesGPU and both DLAs are used in resnet50 and Retinanet, in Offline scenario
other_hardware
power_management
power_supply_detailsDell USB-C 130.0W Adapter (HA130PM170)
power_supply_quantity_and_rating_watts130W

Network and Interconnect Details

host_network_card_count1 Integrated
host_networkingGig Ethernet
host_networking_topologyUSB forwarded
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkJetpack 6.0, TensorRT 10.1, CUDA 12.2
management_firmware_version
nics_enabled_firmware
operating_systemJetson r36.3.1 L4T
other_software_stackJetpack 6.0, TensorRT 10.1, CUDA 12.2, cuDNN 8.9.4
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 64.4734
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Samples/s 0.101697
+ + + + \ No newline at end of file diff --git a/closed/NeuralMagic/results/vLLM_8xL40S-reference-cpu-pytorch-v2.3.1-default_config/summary/README.md b/closed/NeuralMagic/results/vLLM_8xL40S-reference-cpu-pytorch-v2.3.1-default_config/summary/README.md new file mode 100644 index 00000000..7d2ed487 --- /dev/null +++ b/closed/NeuralMagic/results/vLLM_8xL40S-reference-cpu-pytorch-v2.3.1-default_config/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NeuralMagic

+

Crusoe Cloud L40S (8x L40S PCIe, vLLM, FP8)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NeuralMagicAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen4 x16
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacity48 GB
accelerator_memory_configurationGDDR6
accelerator_model_nameNVIDIA L40S
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity1.5T
host_memory_configurationundefined
host_processor_cachesL1d cache: 1.3 MiB (40 instances), L1i cache: 1.3 MiB (40 instances), L2 cache: 40 MiB (40 instances), L3 cache: 160 MiB (5 instances)
host_processor_core_count4
host_processor_frequencyundefined
host_processor_interconnect
host_processor_model_nameAMD EPYC 9254 24-Core Processor
host_processors_per_node2

Other Hardware Details

coolingair
hw_notes

Network and Interconnect Details

host_network_card_count1
host_networkingGig Ethernet
host_networking_topologyN/A

Software Details

frameworkvLLM 0.5.2
operating_systemUbuntu 22.04 (linux-5.15.0-94-generic-glibc2.35)
other_software_stackPython: 3.10.12, LLVM-15.0.6
sw_notesAutomated by MLCommons CM v2.3.3.
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 592.265Tokens/s 948.198
diff --git a/closed/NeuralMagic/results/vLLM_8xL40S-reference-cpu-pytorch-v2.3.1-default_config/summary/summary.html b/closed/NeuralMagic/results/vLLM_8xL40S-reference-cpu-pytorch-v2.3.1-default_config/summary/summary.html new file mode 100644 index 00000000..20b350c8 --- /dev/null +++ b/closed/NeuralMagic/results/vLLM_8xL40S-reference-cpu-pytorch-v2.3.1-default_config/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NeuralMagic

+

Crusoe Cloud L40S (8x L40S PCIe, vLLM, FP8)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NeuralMagicAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen4 x16
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacity48 GB
accelerator_memory_configurationGDDR6
accelerator_model_nameNVIDIA L40S
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity1.5T
host_memory_configurationundefined
host_processor_cachesL1d cache: 1.3 MiB (40 instances), L1i cache: 1.3 MiB (40 instances), L2 cache: 40 MiB (40 instances), L3 cache: 160 MiB (5 instances)
host_processor_core_count4
host_processor_frequencyundefined
host_processor_interconnect
host_processor_model_nameAMD EPYC 9254 24-Core Processor
host_processors_per_node2

Other Hardware Details

coolingair
hw_notes

Network and Interconnect Details

host_network_card_count1
host_networkingGig Ethernet
host_networking_topologyN/A

Software Details

frameworkvLLM 0.5.2
operating_systemUbuntu 22.04 (linux-5.15.0-94-generic-glibc2.35)
other_software_stackPython: 3.10.12, LLVM-15.0.6
sw_notesAutomated by MLCommons CM v2.3.3.
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 592.265Tokens/s 948.198
+ + + + \ No newline at end of file diff --git a/closed/Oracle/results/GH200-GraceHopper-Superchip_GH200-96GB_aarch64x1_TRT/summary/README.md b/closed/Oracle/results/GH200-GraceHopper-Superchip_GH200-96GB_aarch64x1_TRT/summary/README.md new file mode 100644 index 00000000..d8cd0432 --- /dev/null +++ b/closed/Oracle/results/GH200-GraceHopper-Superchip_GH200-96GB_aarch64x1_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Oracle

+

NVIDIA GH200-GraceHopper-Superchip (1x GH200-96GB_aarch64, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:OracleAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectNVLink-C2C
accelerator_interconnect1x 400Gbe Infiniband
accelerator_interconnect_topology
accelerator_memory_capacity96 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA GH200 Grace Hopper Superchip 96GB
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity512 GB
host_memory_configuration16x 16DP (32GB) LPDDR5x
host_processor_caches
host_processor_core_count72
host_processor_frequency
host_processor_interconnect
host_processor_model_nameNVIDIA Grace CPU
host_processors_per_node1

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD, Block Storage
hw_notesNVIDIA MGX Reference Platform;
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count1x 10Gbe Intel Ethernet X550T
host_networkingEthernet; Data bandwidth for GPU-NIC is 252.06 GB/s
host_networking_topologyEthernet/Infiniband on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.2
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.6, Driver 550.90.07, DALI 1.28.0
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 2159.58Tokens/s 2695.15
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 2159.58Tokens/s 2695.15
bert-99F1: 89.9653Queries/s 6501.91Samples/s 9864.25
bert-99.9F1: 90.7831Queries/s 4502.76Samples/s 8779.27
retinanetmAP: 37.1745Queries/s 1731.13Samples/s 1923.46
resnetacc: 75.6954Queries/s 77012.2Samples/s 95104.5
3d-unet-99DICE: 0.8531Samples/s 6.73664
3d-unet-99.9DICE: 0.8608Samples/s 6.73664
diff --git a/closed/Oracle/results/GH200-GraceHopper-Superchip_GH200-96GB_aarch64x1_TRT/summary/summary.html b/closed/Oracle/results/GH200-GraceHopper-Superchip_GH200-96GB_aarch64x1_TRT/summary/summary.html new file mode 100644 index 00000000..8a53261d --- /dev/null +++ b/closed/Oracle/results/GH200-GraceHopper-Superchip_GH200-96GB_aarch64x1_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Oracle

+

NVIDIA GH200-GraceHopper-Superchip (1x GH200-96GB_aarch64, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:OracleAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectNVLink-C2C
accelerator_interconnect1x 400Gbe Infiniband
accelerator_interconnect_topology
accelerator_memory_capacity96 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA GH200 Grace Hopper Superchip 96GB
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity512 GB
host_memory_configuration16x 16DP (32GB) LPDDR5x
host_processor_caches
host_processor_core_count72
host_processor_frequency
host_processor_interconnect
host_processor_model_nameNVIDIA Grace CPU
host_processors_per_node1

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD, Block Storage
hw_notesNVIDIA MGX Reference Platform;
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count1x 10Gbe Intel Ethernet X550T
host_networkingEthernet; Data bandwidth for GPU-NIC is 252.06 GB/s
host_networking_topologyEthernet/Infiniband on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.2
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.6, Driver 550.90.07, DALI 1.28.0
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 2159.58Tokens/s 2695.15
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 2159.58Tokens/s 2695.15
bert-99F1: 89.9653Queries/s 6501.91Samples/s 9864.25
bert-99.9F1: 90.7831Queries/s 4502.76Samples/s 8779.27
retinanetmAP: 37.1745Queries/s 1731.13Samples/s 1923.46
resnetacc: 75.6954Queries/s 77012.2Samples/s 95104.5
3d-unet-99DICE: 0.8531Samples/s 6.73664
3d-unet-99.9DICE: 0.8608Samples/s 6.73664
+ + + + \ No newline at end of file diff --git a/closed/Quanta_Cloud_Technology/results/1-node-2S-EMR-PyTorch/summary/README.md b/closed/Quanta_Cloud_Technology/results/1-node-2S-EMR-PyTorch/summary/README.md new file mode 100644 index 00000000..12b46cf7 --- /dev/null +++ b/closed/Quanta_Cloud_Technology/results/1-node-2S-EMR-PyTorch/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Quanta_Cloud_Technology

+

1-node-2S-EMR-PyTorch

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:Quanta_Cloud_TechnologyAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacityN/A
accelerator_memory_configurationN/A
accelerator_model_nameN/A
accelerator_on-chip_memories
accelerators_per_nodeN/A

Processor and Memory Details

host_memory_capacity1024 GB
host_memory_configurationDDR5-5600 64GB x16
host_processor_caches
host_processor_core_count64
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8592+
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notesQuantaGrid D54X-1U
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count1x 400Gb InfiniBand
host_networkingInfiniBand; Data bandwidth for PCIe-NIC: 50GB/s
host_networking_topologyEthernet/InfiniBand on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkPyTorch
management_firmware_version
nics_enabled_firmware
operating_systemRocky Linux 9.2
other_software_stack5.14.0-284.11.1.el9_2.x86_64
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 113.732Tokens/s 239.034
bert-99F1: 89.9653Queries/s 1241.42Samples/s 1612.03
dlrm-v2-99.9AUC: 80.2297Queries/s 9102.37Samples/s 9962.84
retinanetmAP: 37.1745Queries/s 280.447Samples/s 372.132
resnetacc: 75.6954Queries/s 22501.7Samples/s 24491.1
3d-unet-99.9DICE: 0.8608Samples/s 1.86148
diff --git a/closed/Quanta_Cloud_Technology/results/1-node-2S-EMR-PyTorch/summary/summary.html b/closed/Quanta_Cloud_Technology/results/1-node-2S-EMR-PyTorch/summary/summary.html new file mode 100644 index 00000000..9a18f441 --- /dev/null +++ b/closed/Quanta_Cloud_Technology/results/1-node-2S-EMR-PyTorch/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Quanta_Cloud_Technology

+

1-node-2S-EMR-PyTorch

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:Quanta_Cloud_TechnologyAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacityN/A
accelerator_memory_configurationN/A
accelerator_model_nameN/A
accelerator_on-chip_memories
accelerators_per_nodeN/A

Processor and Memory Details

host_memory_capacity1024 GB
host_memory_configurationDDR5-5600 64GB x16
host_processor_caches
host_processor_core_count64
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8592+
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notesQuantaGrid D54X-1U
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count1x 400Gb InfiniBand
host_networkingInfiniBand; Data bandwidth for PCIe-NIC: 50GB/s
host_networking_topologyEthernet/InfiniBand on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkPyTorch
management_firmware_version
nics_enabled_firmware
operating_systemRocky Linux 9.2
other_software_stack5.14.0-284.11.1.el9_2.x86_64
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 113.732Tokens/s 239.034
bert-99F1: 89.9653Queries/s 1241.42Samples/s 1612.03
dlrm-v2-99.9AUC: 80.2297Queries/s 9102.37Samples/s 9962.84
retinanetmAP: 37.1745Queries/s 280.447Samples/s 372.132
resnetacc: 75.6954Queries/s 22501.7Samples/s 24491.1
3d-unet-99.9DICE: 0.8608Samples/s 1.86148
+ + + + \ No newline at end of file diff --git a/closed/Quanta_Cloud_Technology/results/D54U_3U_H100_PCIe_80GBx4_TRT/summary/README.md b/closed/Quanta_Cloud_Technology/results/D54U_3U_H100_PCIe_80GBx4_TRT/summary/README.md new file mode 100644 index 00000000..c0bf986c --- /dev/null +++ b/closed/Quanta_Cloud_Technology/results/D54U_3U_H100_PCIe_80GBx4_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Quanta_Cloud_Technology

+

D54U_3U_H100_PCIe_80GBx4_TRT

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:Quanta_Cloud_TechnologyAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnectPCIe Gen5 x16, NVLink 600GB/s
accelerator_interconnect_topology
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM2e
accelerator_model_nameNVIDIA H100-PCIe-80GB
accelerator_on-chip_memories
accelerators_per_node4

Processor and Memory Details

host_memory_capacity1 TB
host_memory_configuration DDR4-4800 64GB x 16
host_processor_caches
host_processor_core_count52
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8470
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notesQuantaGrid D54U-3U
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count2x 400Gb InfiniBand
host_networkingInfiniband; Data bandwidth for GPU-PCIe: 252GB/s; PCIe-NIC: 100GB/s
host_networking_topologyEthernet/InfiniBand on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0.19, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemRocky Linux 9.2
other_software_stackCUDA 12.4, cuDNN 8.9.7.29, Driver 550.90.07
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
bert-99F1: 89.9653Queries/s 17759.3Samples/s 23131.4
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 4.0094Samples/s 4.91259
dlrm-v2-99AUC: 79.5069Queries/s 175023.0Samples/s 184239.0
dlrm-v2-99.9AUC: 80.2297Queries/s 100010.0Samples/s 106363.0
retinanetmAP: 37.1745Queries/s 4003.24Samples/s 4633.9
resnetacc: 75.6954Queries/s 188028.0Samples/s 224868.0
3d-unet-99DICE: 0.8531Samples/s 18.447
3d-unet-99.9DICE: 0.8608Samples/s 18.447
diff --git a/closed/Quanta_Cloud_Technology/results/D54U_3U_H100_PCIe_80GBx4_TRT/summary/summary.html b/closed/Quanta_Cloud_Technology/results/D54U_3U_H100_PCIe_80GBx4_TRT/summary/summary.html new file mode 100644 index 00000000..d94661c8 --- /dev/null +++ b/closed/Quanta_Cloud_Technology/results/D54U_3U_H100_PCIe_80GBx4_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Quanta_Cloud_Technology

+

D54U_3U_H100_PCIe_80GBx4_TRT

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:Quanta_Cloud_TechnologyAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnectPCIe Gen5 x16, NVLink 600GB/s
accelerator_interconnect_topology
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM2e
accelerator_model_nameNVIDIA H100-PCIe-80GB
accelerator_on-chip_memories
accelerators_per_node4

Processor and Memory Details

host_memory_capacity1 TB
host_memory_configuration DDR4-4800 64GB x 16
host_processor_caches
host_processor_core_count52
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8470
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notesQuantaGrid D54U-3U
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count2x 400Gb InfiniBand
host_networkingInfiniband; Data bandwidth for GPU-PCIe: 252GB/s; PCIe-NIC: 100GB/s
host_networking_topologyEthernet/InfiniBand on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0.19, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemRocky Linux 9.2
other_software_stackCUDA 12.4, cuDNN 8.9.7.29, Driver 550.90.07
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
bert-99F1: 89.9653Queries/s 17759.3Samples/s 23131.4
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 4.0094Samples/s 4.91259
dlrm-v2-99AUC: 79.5069Queries/s 175023.0Samples/s 184239.0
dlrm-v2-99.9AUC: 80.2297Queries/s 100010.0Samples/s 106363.0
retinanetmAP: 37.1745Queries/s 4003.24Samples/s 4633.9
resnetacc: 75.6954Queries/s 188028.0Samples/s 224868.0
3d-unet-99DICE: 0.8531Samples/s 18.447
3d-unet-99.9DICE: 0.8608Samples/s 18.447
+ + + + \ No newline at end of file diff --git a/closed/Quanta_Cloud_Technology/results/D54U_3U_L40S_PCIe_48GBx4_TRT/summary/README.md b/closed/Quanta_Cloud_Technology/results/D54U_3U_L40S_PCIe_48GBx4_TRT/summary/README.md new file mode 100644 index 00000000..fddfbc99 --- /dev/null +++ b/closed/Quanta_Cloud_Technology/results/D54U_3U_L40S_PCIe_48GBx4_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Quanta_Cloud_Technology

+

D54U_3U_L40S_PCIe_48GBx4_TRT

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:Quanta_Cloud_TechnologyAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen4 x16
accelerator_interconnectPCIe Gen4 x16
accelerator_interconnect_topology
accelerator_memory_capacity48 GB
accelerator_memory_configurationGDDR6
accelerator_model_nameNVIDIA L40S
accelerator_on-chip_memories
accelerators_per_node4

Processor and Memory Details

host_memory_capacity1 TB
host_memory_configuration DDR4-4800 64GB x 16
host_processor_caches
host_processor_core_count44
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8458P
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notesQuantaGrid D54U-3U
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count2x 400Gb InfiniBand
host_networkingInfiniband; Data bandwidth for GPU-PCIe: 252GB/s; PCIe-NIC: 100GB/s
host_networking_topologyEthernet/InfiniBand on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0.19, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemRocky Linux 9.2
other_software_stackCUDA 12.4, Driver 550.54.15
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 3096.45Tokens/s 3463.19
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 3096.45Tokens/s 3463.19
bert-99F1: 89.9653Queries/s 12002.4Samples/s 13248.3
dlrm-v2-99AUC: 79.5069Queries/s 84409.0Samples/s 115424.0
dlrm-v2-99.9AUC: 80.2297Queries/s 51015.1Samples/s 51911.1
retinanetmAP: 37.1745Queries/s 3001.4Samples/s 3191.45
resnetacc: 75.6954Queries/s 150015.0Samples/s 174603.0
3d-unet-99DICE: 0.8531Samples/s 15.551
diff --git a/closed/Quanta_Cloud_Technology/results/D54U_3U_L40S_PCIe_48GBx4_TRT/summary/summary.html b/closed/Quanta_Cloud_Technology/results/D54U_3U_L40S_PCIe_48GBx4_TRT/summary/summary.html new file mode 100644 index 00000000..629589f0 --- /dev/null +++ b/closed/Quanta_Cloud_Technology/results/D54U_3U_L40S_PCIe_48GBx4_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Quanta_Cloud_Technology

+

D54U_3U_L40S_PCIe_48GBx4_TRT

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:Quanta_Cloud_TechnologyAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen4 x16
accelerator_interconnectPCIe Gen4 x16
accelerator_interconnect_topology
accelerator_memory_capacity48 GB
accelerator_memory_configurationGDDR6
accelerator_model_nameNVIDIA L40S
accelerator_on-chip_memories
accelerators_per_node4

Processor and Memory Details

host_memory_capacity1 TB
host_memory_configuration DDR4-4800 64GB x 16
host_processor_caches
host_processor_core_count44
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8458P
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notesQuantaGrid D54U-3U
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count2x 400Gb InfiniBand
host_networkingInfiniband; Data bandwidth for GPU-PCIe: 252GB/s; PCIe-NIC: 100GB/s
host_networking_topologyEthernet/InfiniBand on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0.19, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemRocky Linux 9.2
other_software_stackCUDA 12.4, Driver 550.54.15
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 3096.45Tokens/s 3463.19
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 3096.45Tokens/s 3463.19
bert-99F1: 89.9653Queries/s 12002.4Samples/s 13248.3
dlrm-v2-99AUC: 79.5069Queries/s 84409.0Samples/s 115424.0
dlrm-v2-99.9AUC: 80.2297Queries/s 51015.1Samples/s 51911.1
retinanetmAP: 37.1745Queries/s 3001.4Samples/s 3191.45
resnetacc: 75.6954Queries/s 150015.0Samples/s 174603.0
3d-unet-99DICE: 0.8531Samples/s 15.551
+ + + + \ No newline at end of file diff --git a/closed/Quanta_Cloud_Technology/results/GH200-GraceHopper-Superchip_GH200-96GB_aarch64x1_TRT/summary/README.md b/closed/Quanta_Cloud_Technology/results/GH200-GraceHopper-Superchip_GH200-96GB_aarch64x1_TRT/summary/README.md new file mode 100644 index 00000000..5ffe7af8 --- /dev/null +++ b/closed/Quanta_Cloud_Technology/results/GH200-GraceHopper-Superchip_GH200-96GB_aarch64x1_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Quanta_Cloud_Technology

+

GH200-GraceHopper-Superchip_GH200-96GB_aarch64x1_TRT

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:Quanta_Cloud_TechnologyAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectNVLink-C2C
accelerator_interconnect1x 400Gbe Infiniband
accelerator_interconnect_topology
accelerator_memory_capacity96 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA GH200 Grace Hopper Superchip 96GB
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity512 GB
host_memory_configuration16x 16DP (32GB) LPDDR5x
host_processor_caches
host_processor_core_count72
host_processor_frequency
host_processor_interconnect
host_processor_model_nameNVIDIA Grace CPU
host_processors_per_node1

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notesQuantaGrid S74G-2U
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count1x 10Gbe Intel Ethernet X550T
host_networkingEthernet; Data bandwidth for GPU-NIC is 252.06 GB/s
host_networking_topologyEthernet/Infiniband on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0.19, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemRocky Linux 9.3
other_software_stackCUDA 12.4, cuDNN 8.9.7.29, Driver 550.54.14
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 2619.09Tokens/s 3114.03
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 2160.12Tokens/s 2803.72
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 2160.12Tokens/s 2803.72
bert-99F1: 89.9653Queries/s 7103.12Samples/s 9196.01
bert-99.9F1: 90.7831Queries/s 6600.99Samples/s 8092.46
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 1.84443Samples/s 2.08805
dlrm-v2-99AUC: 79.5069Queries/s 77511.7Samples/s 80878.4
dlrm-v2-99.9AUC: 80.2297Queries/s 46207.5Samples/s 48197.0
retinanetmAP: 37.1745Queries/s 1731.14Samples/s 1923.2
resnetacc: 75.6954Queries/s 73014.9Samples/s 94990.9
3d-unet-99DICE: 0.8531Samples/s 6.77957
3d-unet-99.9DICE: 0.8608Samples/s 6.77957
diff --git a/closed/Quanta_Cloud_Technology/results/GH200-GraceHopper-Superchip_GH200-96GB_aarch64x1_TRT/summary/summary.html b/closed/Quanta_Cloud_Technology/results/GH200-GraceHopper-Superchip_GH200-96GB_aarch64x1_TRT/summary/summary.html new file mode 100644 index 00000000..947f9c11 --- /dev/null +++ b/closed/Quanta_Cloud_Technology/results/GH200-GraceHopper-Superchip_GH200-96GB_aarch64x1_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Quanta_Cloud_Technology

+

GH200-GraceHopper-Superchip_GH200-96GB_aarch64x1_TRT

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:Quanta_Cloud_TechnologyAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectNVLink-C2C
accelerator_interconnect1x 400Gbe Infiniband
accelerator_interconnect_topology
accelerator_memory_capacity96 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA GH200 Grace Hopper Superchip 96GB
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity512 GB
host_memory_configuration16x 16DP (32GB) LPDDR5x
host_processor_caches
host_processor_core_count72
host_processor_frequency
host_processor_interconnect
host_processor_model_nameNVIDIA Grace CPU
host_processors_per_node1

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notesQuantaGrid S74G-2U
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count1x 10Gbe Intel Ethernet X550T
host_networkingEthernet; Data bandwidth for GPU-NIC is 252.06 GB/s
host_networking_topologyEthernet/Infiniband on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0.19, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemRocky Linux 9.3
other_software_stackCUDA 12.4, cuDNN 8.9.7.29, Driver 550.54.14
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 2619.09Tokens/s 3114.03
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 2160.12Tokens/s 2803.72
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 2160.12Tokens/s 2803.72
bert-99F1: 89.9653Queries/s 7103.12Samples/s 9196.01
bert-99.9F1: 90.7831Queries/s 6600.99Samples/s 8092.46
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 1.84443Samples/s 2.08805
dlrm-v2-99AUC: 79.5069Queries/s 77511.7Samples/s 80878.4
dlrm-v2-99.9AUC: 80.2297Queries/s 46207.5Samples/s 48197.0
retinanetmAP: 37.1745Queries/s 1731.14Samples/s 1923.2
resnetacc: 75.6954Queries/s 73014.9Samples/s 94990.9
3d-unet-99DICE: 0.8531Samples/s 6.77957
3d-unet-99.9DICE: 0.8608Samples/s 6.77957
+ + + + \ No newline at end of file diff --git a/closed/RedHat/results/L40S-RedHat-OpenShift/summary/README.md b/closed/RedHat/results/L40S-RedHat-OpenShift/summary/README.md new file mode 100644 index 00000000..d5c6cb1e --- /dev/null +++ b/closed/RedHat/results/L40S-RedHat-OpenShift/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

RedHat

+

L40S-RedHat-OpenShift

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:RedHatAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerators_per_node4
accelerator_model_nameNVIDIA L40S
accelerator_host_interconnectPCIe Gen5
accelerator_frequency
accelerator_on-chip_memories
accelerator_memory_configurationHBM2
accelerator_memory_capacity48 GB
accelerator_interconnectPCIe
accelerator_interconnect_topology

Processor and Memory Details

host_processors_per_node2
host_processor_model_nameIntel(R) Xeon(R) Platinum 8480CL
host_processor_core_count112
host_processor_vcpu_count-
host_processor_frequency
host_processor_caches
host_processor_interconnectPCIe
host_memory_capacity2 TB
host_memory_configuration32x 64GB Micron DDR5

Other Hardware Details

coolingNA
hw_notesNVIDIA L40S-48GB

Network and Interconnect Details

host_networkingManagement: 1x Ethernet 10GB/Sec
host_networking_topology-
host_network_card_count-

Software Details

frameworkCUDA 12.2
other_software_stack{'cuda_driver_version': '535.129.03', 'cuda_version': '12.2', 'vllm': '0.5.1'}
operating_systemRed Hat Enterprise Linux CoreOS release 4.14
sw_notesRed Hat OpenShift Container Platform 4.14 + OpenShift AI
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 1469.19Tokens/s 1717.77
diff --git a/closed/RedHat/results/L40S-RedHat-OpenShift/summary/summary.html b/closed/RedHat/results/L40S-RedHat-OpenShift/summary/summary.html new file mode 100644 index 00000000..d306ba10 --- /dev/null +++ b/closed/RedHat/results/L40S-RedHat-OpenShift/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

RedHat

+

L40S-RedHat-OpenShift

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:RedHatAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerators_per_node4
accelerator_model_nameNVIDIA L40S
accelerator_host_interconnectPCIe Gen5
accelerator_frequency
accelerator_on-chip_memories
accelerator_memory_configurationHBM2
accelerator_memory_capacity48 GB
accelerator_interconnectPCIe
accelerator_interconnect_topology

Processor and Memory Details

host_processors_per_node2
host_processor_model_nameIntel(R) Xeon(R) Platinum 8480CL
host_processor_core_count112
host_processor_vcpu_count-
host_processor_frequency
host_processor_caches
host_processor_interconnectPCIe
host_memory_capacity2 TB
host_memory_configuration32x 64GB Micron DDR5

Other Hardware Details

coolingNA
hw_notesNVIDIA L40S-48GB

Network and Interconnect Details

host_networkingManagement: 1x Ethernet 10GB/Sec
host_networking_topology-
host_network_card_count-

Software Details

frameworkCUDA 12.2
other_software_stack{'cuda_driver_version': '535.129.03', 'cuda_version': '12.2', 'vllm': '0.5.1'}
operating_systemRed Hat Enterprise Linux CoreOS release 4.14
sw_notesRed Hat OpenShift Container Platform 4.14 + OpenShift AI
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 1469.19Tokens/s 1717.77
+ + + + \ No newline at end of file diff --git a/closed/Supermicro/results/1-node-2S-EMR-PyTorch/summary/README.md b/closed/Supermicro/results/1-node-2S-EMR-PyTorch/summary/README.md new file mode 100644 index 00000000..6d570116 --- /dev/null +++ b/closed/Supermicro/results/1-node-2S-EMR-PyTorch/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

supermicro

+

1-node-2S-EMR-PyTorch

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:supermicroAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerators_per_node0
accelerator_model_nameN/A
accelerator_frequencyN/A
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topologyN/A
accelerator_memory_capacityN/A
accelerator_memory_configurationN/A
accelerator_on-chip_memoriesN/A

Processor and Memory Details

host_processor_model_nameINTEL(R) XEON(R) PLATINUM 8592+
host_processors_per_node2
host_processor_core_count64
host_processor_frequencyN/A
host_processor_cachesN/A
host_memory_configuration8 slots / 64GB each / per socket
host_memory_capacity1024GB
host_processor_interconnectN/A

Other Hardware Details

coolingAir
hw_notesQuantaGrid D54Q-2U

Network and Interconnect Details

host_networkingEthernet Controller / 10GBASE-T
host_networking_topologyN/A
host_network_card_count2

Software Details

frameworkPyTorch
operating_systemRocky Linux 8.10
other_software_stack5.14.0-427.24.1.el9_4.x86_64
sw_notesN/A
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
bert-99F1: 89.9653Queries/s 1256.61Samples/s 1595.13
resnetacc: 75.6954Queries/s 21001.6Samples/s 23674.2
diff --git a/closed/Supermicro/results/1-node-2S-EMR-PyTorch/summary/summary.html b/closed/Supermicro/results/1-node-2S-EMR-PyTorch/summary/summary.html new file mode 100644 index 00000000..6eeeb40b --- /dev/null +++ b/closed/Supermicro/results/1-node-2S-EMR-PyTorch/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

supermicro

+

1-node-2S-EMR-PyTorch

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:supermicroAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerators_per_node0
accelerator_model_nameN/A
accelerator_frequencyN/A
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topologyN/A
accelerator_memory_capacityN/A
accelerator_memory_configurationN/A
accelerator_on-chip_memoriesN/A

Processor and Memory Details

host_processor_model_nameINTEL(R) XEON(R) PLATINUM 8592+
host_processors_per_node2
host_processor_core_count64
host_processor_frequencyN/A
host_processor_cachesN/A
host_memory_configuration8 slots / 64GB each / per socket
host_memory_capacity1024GB
host_processor_interconnectN/A

Other Hardware Details

coolingAir
hw_notesQuantaGrid D54Q-2U

Network and Interconnect Details

host_networkingEthernet Controller / 10GBASE-T
host_networking_topologyN/A
host_network_card_count2

Software Details

frameworkPyTorch
operating_systemRocky Linux 8.10
other_software_stack5.14.0-427.24.1.el9_4.x86_64
sw_notesN/A
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
bert-99F1: 89.9653Queries/s 1256.61Samples/s 1595.13
resnetacc: 75.6954Queries/s 21001.6Samples/s 23674.2
+ + + + \ No newline at end of file diff --git a/closed/Supermicro/results/AS_4125GS_TNHR2_LCC_H100_SXM_80GBx8_TRT/summary/README.md b/closed/Supermicro/results/AS_4125GS_TNHR2_LCC_H100_SXM_80GBx8_TRT/summary/README.md new file mode 100644 index 00000000..bc61b8d9 --- /dev/null +++ b/closed/Supermicro/results/AS_4125GS_TNHR2_LCC_H100_SXM_80GBx8_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Supermicro

+

AS-4125GS-TNHR2-LCC (8x H100-SXM-80GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:SupermicroAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-SXM-80GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2.3 TB
host_memory_configuration24x 96GB DDR5 4800MHz
host_processor_caches
host_processor_core_count96
host_processor_frequency
host_processor_interconnect
host_processor_model_nameAMD EPYC 9654
host_processors_per_node2

Other Hardware Details

coolingLiquid-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count1x Intel X550T 10GbE, 1x NVIDIA B3220 200GbE/NDR200, 8x NVIDIA ConnectX-7 400GbE/NDR
host_networkingEthernet, Infiniband
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4 LTS
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54.15
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 23699.7Tokens/s 24216.8
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 23699.7Tokens/s 24216.8
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 19810.7Tokens/s 19539.2
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 19810.7Tokens/s 19539.2
bert-99F1: 89.9653Queries/s 57846.1Samples/s 72222.5
bert-99.9F1: 90.7831Queries/s 51049.3Samples/s 61490.8
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 15.7157Samples/s 16.1466
dlrm-v2-99.9AUC: 80.2297Queries/s 354036.0Samples/s 370389.0
retinanetmAP: 37.1745Queries/s 13803.0Samples/s 14460.8
resnetacc: 75.6954Queries/s 632629.0Samples/s 708730.0
3d-unet-99DICE: 0.8531Samples/s 52.298
3d-unet-99.9DICE: 0.8608Samples/s 52.298
diff --git a/closed/Supermicro/results/AS_4125GS_TNHR2_LCC_H100_SXM_80GBx8_TRT/summary/summary.html b/closed/Supermicro/results/AS_4125GS_TNHR2_LCC_H100_SXM_80GBx8_TRT/summary/summary.html new file mode 100644 index 00000000..5413de6f --- /dev/null +++ b/closed/Supermicro/results/AS_4125GS_TNHR2_LCC_H100_SXM_80GBx8_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Supermicro

+

AS-4125GS-TNHR2-LCC (8x H100-SXM-80GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:SupermicroAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-SXM-80GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2.3 TB
host_memory_configuration24x 96GB DDR5 4800MHz
host_processor_caches
host_processor_core_count96
host_processor_frequency
host_processor_interconnect
host_processor_model_nameAMD EPYC 9654
host_processors_per_node2

Other Hardware Details

coolingLiquid-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count1x Intel X550T 10GbE, 1x NVIDIA B3220 200GbE/NDR200, 8x NVIDIA ConnectX-7 400GbE/NDR
host_networkingEthernet, Infiniband
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4 LTS
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54.15
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 23699.7Tokens/s 24216.8
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 23699.7Tokens/s 24216.8
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 19810.7Tokens/s 19539.2
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 19810.7Tokens/s 19539.2
bert-99F1: 89.9653Queries/s 57846.1Samples/s 72222.5
bert-99.9F1: 90.7831Queries/s 51049.3Samples/s 61490.8
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 15.7157Samples/s 16.1466
dlrm-v2-99.9AUC: 80.2297Queries/s 354036.0Samples/s 370389.0
retinanetmAP: 37.1745Queries/s 13803.0Samples/s 14460.8
resnetacc: 75.6954Queries/s 632629.0Samples/s 708730.0
3d-unet-99DICE: 0.8531Samples/s 52.298
3d-unet-99.9DICE: 0.8608Samples/s 52.298
+ + + + \ No newline at end of file diff --git a/closed/Supermicro/results/AS_8125GS_TNHR_H100_SXM_80GBx8_TRT/summary/README.md b/closed/Supermicro/results/AS_8125GS_TNHR_H100_SXM_80GBx8_TRT/summary/README.md new file mode 100644 index 00000000..2d29f42f --- /dev/null +++ b/closed/Supermicro/results/AS_8125GS_TNHR_H100_SXM_80GBx8_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Supermicro

+

AS-8125GS-TNHR (8x H100-SXM-80GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:SupermicroAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-SXM-80GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity1.5 TB
host_memory_configuration24x 64GB DDR5 4800MHz
host_processor_caches
host_processor_core_count48
host_processor_frequency
host_processor_interconnect
host_processor_model_nameAMD EPYC 9474F
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count1x Intel X550T 10GbE, 9x NVIDIA ConnectX-7 400GbE/NDR
host_networkingEthernet, Infiniband
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4 LTS
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54.15
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 21775.5Tokens/s 24011.7
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 21775.5Tokens/s 24011.7
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 19304.4Tokens/s 19418.9
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 19304.4Tokens/s 19418.9
bert-99F1: 89.9653Queries/s 57488.7Samples/s 71861.8
bert-99.9F1: 90.7831Queries/s 50729.5Samples/s 61128.0
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 15.6683Samples/s 16.0092
dlrm-v2-99.9AUC: 80.2297Queries/s 354035.0Samples/s 359682.0
retinanetmAP: 37.1745Queries/s 13731.3Samples/s 14244.1
resnetacc: 75.6954Queries/s 633551.0Samples/s 703377.0
3d-unet-99DICE: 0.8531Samples/s 52.1198
3d-unet-99.9DICE: 0.8608Samples/s 52.1198
diff --git a/closed/Supermicro/results/AS_8125GS_TNHR_H100_SXM_80GBx8_TRT/summary/summary.html b/closed/Supermicro/results/AS_8125GS_TNHR_H100_SXM_80GBx8_TRT/summary/summary.html new file mode 100644 index 00000000..6b0ea4d1 --- /dev/null +++ b/closed/Supermicro/results/AS_8125GS_TNHR_H100_SXM_80GBx8_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Supermicro

+

AS-8125GS-TNHR (8x H100-SXM-80GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:SupermicroAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-SXM-80GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity1.5 TB
host_memory_configuration24x 64GB DDR5 4800MHz
host_processor_caches
host_processor_core_count48
host_processor_frequency
host_processor_interconnect
host_processor_model_nameAMD EPYC 9474F
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count1x Intel X550T 10GbE, 9x NVIDIA ConnectX-7 400GbE/NDR
host_networkingEthernet, Infiniband
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4 LTS
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54.15
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 21775.5Tokens/s 24011.7
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 21775.5Tokens/s 24011.7
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 19304.4Tokens/s 19418.9
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 19304.4Tokens/s 19418.9
bert-99F1: 89.9653Queries/s 57488.7Samples/s 71861.8
bert-99.9F1: 90.7831Queries/s 50729.5Samples/s 61128.0
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 15.6683Samples/s 16.0092
dlrm-v2-99.9AUC: 80.2297Queries/s 354035.0Samples/s 359682.0
retinanetmAP: 37.1745Queries/s 13731.3Samples/s 14244.1
resnetacc: 75.6954Queries/s 633551.0Samples/s 703377.0
3d-unet-99DICE: 0.8531Samples/s 52.1198
3d-unet-99.9DICE: 0.8608Samples/s 52.1198
+ + + + \ No newline at end of file diff --git a/closed/Supermicro/results/GH200-GraceHopper-Superchip_GH200-96GB_aarch64x1_TRT/summary/README.md b/closed/Supermicro/results/GH200-GraceHopper-Superchip_GH200-96GB_aarch64x1_TRT/summary/README.md new file mode 100644 index 00000000..515a2bd0 --- /dev/null +++ b/closed/Supermicro/results/GH200-GraceHopper-Superchip_GH200-96GB_aarch64x1_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Supermicro

+

NVIDIA GH200-GraceHopper-Superchip (1x GH200-96GB_aarch64, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:SupermicroAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectNVLink-C2C
accelerator_interconnect1x 400Gbe Infiniband
accelerator_interconnect_topology
accelerator_memory_capacity96 GB
accelerator_memory_configurationHBM3e
accelerator_model_nameNVIDIA GH200 Grace Hopper Superchip 96GB
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity480 GB
host_memory_configuration15x 16DP (32GB) LPDDR5x
host_processor_caches
host_processor_core_count72
host_processor_frequency
host_processor_interconnect
host_processor_model_nameNVIDIA Grace CPU
host_processors_per_node1

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notesNVIDIA MGX Reference Platform;
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count1x 10Gbe Intel Ethernet X550T
host_networkingEthernet; Data bandwidth for GPU-NIC is 252.06 GB/s
host_networking_topologyEthernet/Infiniband on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.2
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 9.1.0, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 2159.89Tokens/s 2659.47
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 2159.89Tokens/s 2659.47
diff --git a/closed/Supermicro/results/GH200-GraceHopper-Superchip_GH200-96GB_aarch64x1_TRT/summary/summary.html b/closed/Supermicro/results/GH200-GraceHopper-Superchip_GH200-96GB_aarch64x1_TRT/summary/summary.html new file mode 100644 index 00000000..e403d162 --- /dev/null +++ b/closed/Supermicro/results/GH200-GraceHopper-Superchip_GH200-96GB_aarch64x1_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Supermicro

+

NVIDIA GH200-GraceHopper-Superchip (1x GH200-96GB_aarch64, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:SupermicroAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectNVLink-C2C
accelerator_interconnect1x 400Gbe Infiniband
accelerator_interconnect_topology
accelerator_memory_capacity96 GB
accelerator_memory_configurationHBM3e
accelerator_model_nameNVIDIA GH200 Grace Hopper Superchip 96GB
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity480 GB
host_memory_configuration15x 16DP (32GB) LPDDR5x
host_processor_caches
host_processor_core_count72
host_processor_frequency
host_processor_interconnect
host_processor_model_nameNVIDIA Grace CPU
host_processors_per_node1

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notesNVIDIA MGX Reference Platform;
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count1x 10Gbe Intel Ethernet X550T
host_networkingEthernet; Data bandwidth for GPU-NIC is 252.06 GB/s
host_networking_topologyEthernet/Infiniband on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.2
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 9.1.0, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 2159.89Tokens/s 2659.47
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 2159.89Tokens/s 2659.47
+ + + + \ No newline at end of file diff --git a/closed/Supermicro/results/SYS_421GE_TNHR2_LCC_H100_SXM_80GBx8_TRT/summary/README.md b/closed/Supermicro/results/SYS_421GE_TNHR2_LCC_H100_SXM_80GBx8_TRT/summary/README.md new file mode 100644 index 00000000..9240958f --- /dev/null +++ b/closed/Supermicro/results/SYS_421GE_TNHR2_LCC_H100_SXM_80GBx8_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Supermicro

+

SYS-421GE-TNHR2-LCC (8x H100-SXM-80GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:SupermicroAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-SXM-80GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB DDR5 4800MHz
host_processor_caches
host_processor_core_count56
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8570
host_processors_per_node2

Other Hardware Details

coolingLiquid-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count8x NVIDIA B3140H 400GbE/NDR, 1x NVIDIA ConnectX-7 400GbE/NDR
host_networkingEthernet, Infiniband
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4 LTS
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54.15
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 21888.6Tokens/s 24180.6
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 21888.6Tokens/s 24180.6
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 19725.6Tokens/s 19808.2
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 19725.6Tokens/s 19808.2
bert-99F1: 89.9653Queries/s 58928.5Samples/s 72876.0
bert-99.9F1: 90.7831Queries/s 52049.4Samples/s 62036.9
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 16.0608Samples/s 16.4933
dlrm-v2-99AUC: 79.5069Queries/s 556101.0Samples/s 602108.0
dlrm-v2-99.9AUC: 80.2297Queries/s 358000.0Samples/s 372277.0
retinanetmAP: 37.1745Queries/s 13979.3Samples/s 14538.1
resnetacc: 75.6954Queries/s 633672.0Samples/s 710521.0
3d-unet-99DICE: 0.8531Samples/s 52.2025
3d-unet-99.9DICE: 0.8608Samples/s 52.2025
diff --git a/closed/Supermicro/results/SYS_421GE_TNHR2_LCC_H100_SXM_80GBx8_TRT/summary/summary.html b/closed/Supermicro/results/SYS_421GE_TNHR2_LCC_H100_SXM_80GBx8_TRT/summary/summary.html new file mode 100644 index 00000000..a80b2912 --- /dev/null +++ b/closed/Supermicro/results/SYS_421GE_TNHR2_LCC_H100_SXM_80GBx8_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Supermicro

+

SYS-421GE-TNHR2-LCC (8x H100-SXM-80GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:SupermicroAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-SXM-80GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB DDR5 4800MHz
host_processor_caches
host_processor_core_count56
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8570
host_processors_per_node2

Other Hardware Details

coolingLiquid-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count8x NVIDIA B3140H 400GbE/NDR, 1x NVIDIA ConnectX-7 400GbE/NDR
host_networkingEthernet, Infiniband
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4 LTS
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54.15
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 21888.6Tokens/s 24180.6
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 21888.6Tokens/s 24180.6
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 19725.6Tokens/s 19808.2
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 19725.6Tokens/s 19808.2
bert-99F1: 89.9653Queries/s 58928.5Samples/s 72876.0
bert-99.9F1: 90.7831Queries/s 52049.4Samples/s 62036.9
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 16.0608Samples/s 16.4933
dlrm-v2-99AUC: 79.5069Queries/s 556101.0Samples/s 602108.0
dlrm-v2-99.9AUC: 80.2297Queries/s 358000.0Samples/s 372277.0
retinanetmAP: 37.1745Queries/s 13979.3Samples/s 14538.1
resnetacc: 75.6954Queries/s 633672.0Samples/s 710521.0
3d-unet-99DICE: 0.8531Samples/s 52.2025
3d-unet-99.9DICE: 0.8608Samples/s 52.2025
+ + + + \ No newline at end of file diff --git a/closed/Supermicro/results/SYS_821GE_TNHR_H100_SXM_80GBx8_TRT/summary/README.md b/closed/Supermicro/results/SYS_821GE_TNHR_H100_SXM_80GBx8_TRT/summary/README.md new file mode 100644 index 00000000..d4b6aa5a --- /dev/null +++ b/closed/Supermicro/results/SYS_821GE_TNHR_H100_SXM_80GBx8_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Supermicro

+

SYS-821GE-TNHR (8x H100-SXM-80GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:SupermicroAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-SXM-80GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB DDR5 4800MHz
host_processor_caches
host_processor_core_count48
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8468
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count1x Intel X550T 10GbE, 8x NVIDIA ConnectX-7 400GbE/NDR
host_networkingEthernet, Infiniband
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4 LTS
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54.15
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 21986.0Tokens/s 24140.1
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 21986.0Tokens/s 24140.1
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 19635.2Tokens/s 19803.6
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 19635.2Tokens/s 19803.6
bert-99F1: 89.9653Queries/s 57928.3Samples/s 71806.2
bert-99.9F1: 90.7831Queries/s 51570.7Samples/s 62153.9
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 15.994Samples/s 16.3778
dlrm-v2-99AUC: 79.5069Queries/s 548900.0Samples/s 592829.0
dlrm-v2-99.9AUC: 80.2297Queries/s 356561.0Samples/s 363656.0
retinanetmAP: 37.1745Queries/s 13803.0Samples/s 14405.1
resnetacc: 75.6954Queries/s 634193.0Samples/s 707052.0
3d-unet-99DICE: 0.8531Samples/s 52.2038
3d-unet-99.9DICE: 0.8608Samples/s 52.2038
diff --git a/closed/Supermicro/results/SYS_821GE_TNHR_H100_SXM_80GBx8_TRT/summary/summary.html b/closed/Supermicro/results/SYS_821GE_TNHR_H100_SXM_80GBx8_TRT/summary/summary.html new file mode 100644 index 00000000..d64bf088 --- /dev/null +++ b/closed/Supermicro/results/SYS_821GE_TNHR_H100_SXM_80GBx8_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Supermicro

+

SYS-821GE-TNHR (8x H100-SXM-80GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:SupermicroAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-SXM-80GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB DDR5 4800MHz
host_processor_caches
host_processor_core_count48
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8468
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count1x Intel X550T 10GbE, 8x NVIDIA ConnectX-7 400GbE/NDR
host_networkingEthernet, Infiniband
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4 LTS
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54.15
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 21986.0Tokens/s 24140.1
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 21986.0Tokens/s 24140.1
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 19635.2Tokens/s 19803.6
gptj-99.9ROUGE1: 42.9435, ROUGE2: 20.1034, ROUGEL: 29.9581, GEN_LEN: 3615190.2Tokens/s 19635.2Tokens/s 19803.6
bert-99F1: 89.9653Queries/s 57928.3Samples/s 71806.2
bert-99.9F1: 90.7831Queries/s 51570.7Samples/s 62153.9
stable-diffusion-xlCLIP_SCORE: 31.6863, FID_SCORE: 23.0109Queries/s 15.994Samples/s 16.3778
dlrm-v2-99AUC: 79.5069Queries/s 548900.0Samples/s 592829.0
dlrm-v2-99.9AUC: 80.2297Queries/s 356561.0Samples/s 363656.0
retinanetmAP: 37.1745Queries/s 13803.0Samples/s 14405.1
resnetacc: 75.6954Queries/s 634193.0Samples/s 707052.0
3d-unet-99DICE: 0.8531Samples/s 52.2038
3d-unet-99.9DICE: 0.8608Samples/s 52.2038
+ + + + \ No newline at end of file diff --git a/closed/Sustainable_Metal_Cloud/results/SMC_H100_SXM_80GBX8_TRT/summary/README.md b/closed/Sustainable_Metal_Cloud/results/SMC_H100_SXM_80GBX8_TRT/summary/README.md new file mode 100644 index 00000000..d3ef0428 --- /dev/null +++ b/closed/Sustainable_Metal_Cloud/results/SMC_H100_SXM_80GBX8_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Sustainable_Metal_Cloud

+

SMC H100 (8x H100-SXM-80GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:Sustainable_Metal_CloudAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectInfiniband NDR
accelerator_interconnectNVLINK Gen4 900 GB/s + NVSWITCH Gen3
accelerator_interconnect_topologyNVLINK + NVSWITCH
accelerator_memory_capacity81559 MiB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-SXM-80GB
accelerator_on-chip_memories80 GB
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB DDR5
host_processor_caches
host_processor_core_count32
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8462Y+
host_processors_per_node2

Other Hardware Details

coolingSMC IMMERSION COOLING TECHNOLOGY
disk_controllersNVMe
disk_drivesSSD
hw_notes
other_hardware
power_management
power_supply_detailsSMC TECHNOLOGY
power_supply_quantity_and_rating_wattsSMC TECHNOLOGY

Network and Interconnect Details

host_network_card_count10x 400Gbe Infiniband
host_networkingInfiniband; Data bandwidth for GPU-PCIe: 504GB/s; PCIe-NIC: 500GB/s
host_networking_topologyEthernet/Infiniband on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 21327.0Tokens/s 24459.6
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 21327.0Tokens/s 24459.6
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 19233.8Tokens/s 19711.1
bert-99F1: 89.9653Queries/s 56008.8Samples/s 69043.2
bert-99.9F1: 90.7831Queries/s 49613.0Samples/s 61778.8
dlrm-v2-99AUC: 79.5069Queries/s 510155.0Samples/s 597885.0
dlrm-v2-99.9AUC: 80.2297Queries/s 340067.0Samples/s 369334.0
retinanetmAP: 37.1745Queries/s 12884.5Samples/s 14405.3
resnetacc: 75.6954Queries/s 584207.0Samples/s 706789.0
3d-unet-99DICE: 0.8531Samples/s 51.8258
3d-unet-99.9DICE: 0.8608Samples/s 51.8258
diff --git a/closed/Sustainable_Metal_Cloud/results/SMC_H100_SXM_80GBX8_TRT/summary/summary.html b/closed/Sustainable_Metal_Cloud/results/SMC_H100_SXM_80GBX8_TRT/summary/summary.html new file mode 100644 index 00000000..a0c0802a --- /dev/null +++ b/closed/Sustainable_Metal_Cloud/results/SMC_H100_SXM_80GBX8_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Sustainable_Metal_Cloud

+

SMC H100 (8x H100-SXM-80GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:Sustainable_Metal_CloudAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectInfiniband NDR
accelerator_interconnectNVLINK Gen4 900 GB/s + NVSWITCH Gen3
accelerator_interconnect_topologyNVLINK + NVSWITCH
accelerator_memory_capacity81559 MiB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-SXM-80GB
accelerator_on-chip_memories80 GB
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB DDR5
host_processor_caches
host_processor_core_count32
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8462Y+
host_processors_per_node2

Other Hardware Details

coolingSMC IMMERSION COOLING TECHNOLOGY
disk_controllersNVMe
disk_drivesSSD
hw_notes
other_hardware
power_management
power_supply_detailsSMC TECHNOLOGY
power_supply_quantity_and_rating_wattsSMC TECHNOLOGY

Network and Interconnect Details

host_network_card_count10x 400Gbe Infiniband
host_networkingInfiniband; Data bandwidth for GPU-PCIe: 504GB/s; PCIe-NIC: 500GB/s
host_networking_topologyEthernet/Infiniband on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99ROUGE1: 43.9869, ROUGE2: 21.8148, ROUGEL: 28.33, TOKENS_PER_SAMPLE: 265.005Tokens/s 21327.0Tokens/s 24459.6
llama2-70b-99.9ROUGE1: 44.3868, ROUGE2: 22.0132, ROUGEL: 28.5876, TOKENS_PER_SAMPLE: 265.005Tokens/s 21327.0Tokens/s 24459.6
gptj-99ROUGE1: 42.5566, ROUGE2: 19.9223, ROUGEL: 29.6882, GEN_LEN: 3615190.2Tokens/s 19233.8Tokens/s 19711.1
bert-99F1: 89.9653Queries/s 56008.8Samples/s 69043.2
bert-99.9F1: 90.7831Queries/s 49613.0Samples/s 61778.8
dlrm-v2-99AUC: 79.5069Queries/s 510155.0Samples/s 597885.0
dlrm-v2-99.9AUC: 80.2297Queries/s 340067.0Samples/s 369334.0
retinanetmAP: 37.1745Queries/s 12884.5Samples/s 14405.3
resnetacc: 75.6954Queries/s 584207.0Samples/s 706789.0
3d-unet-99DICE: 0.8531Samples/s 51.8258
3d-unet-99.9DICE: 0.8608Samples/s 51.8258
+ + + + \ No newline at end of file diff --git a/closed/UntetherAI/results/h13_u1_preview/summary/README.md b/closed/UntetherAI/results/h13_u1_preview/summary/README.md new file mode 100644 index 00000000..69f5c766 --- /dev/null +++ b/closed/UntetherAI/results/h13_u1_preview/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

UntetherAI

+

Supermicro SuperServer H13 (1x speedAI240 Preview)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:UntetherAIAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 16x (32 GT/s)
accelerator_interconnectN/A
accelerator_interconnect_topologyN/A
accelerator_memory_capacitydisabled
accelerator_memory_configurationLPDDR5 64x
accelerator_model_nameUntetherAI speedAI240 Preview
accelerator_on-chip_memories238 MB SRAM
accelerators_per_node1

Processor and Memory Details

host_memory_capacity64 GB
host_memory_configuration4x 16 GB DDR5 (Samsung M321R2GA3PB0-CWMXJ 4800 MT/s)
host_processor_cachesL1d cache: 512 KiB (16 instances); L1i cache: 512 KiB (16 instances); L2 cache: 16 MiB (16 instances); L3 cache: 64 MiB (4 instances)
host_processor_core_count16
host_processor_frequency1500 MHz (min); 3000 MHz (base); 3700 MHz (boost)
host_processor_interconnectN/A
host_processor_model_nameAMD EPYC 9124 16-Core Processor
host_processor_urlhttps://www.amd.com/en/products/processors/server/epyc/4th-generation-9004-and-8004-series/amd-epyc-9124.html
host_processors_per_node1

Other Hardware Details

coolingair
hw_notes

Network and Interconnect Details

host_network_card_count1
host_networkingintegrated
host_networking_topology1GbE

Software Details

frameworkUntetherAI imAIgine SDK v24.07.19
operating_systemUbuntu 22.04.4 LTS (Linux kernel 6.5.0-44-generic #44~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Jun 18 14:36:16 UTC 2 x86_64 x86_64 x86_64 GNU/Linux)
other_software_stack{'KILT': 'mlperf_4.1', 'Docker': '27.1.0, build 6312585', 'Python': '3.10.12'}
sw_notesPowered by the KRAI X and KILT technologies
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
resnetacc: 75.6954Samples/s 70348.2
diff --git a/closed/UntetherAI/results/h13_u1_preview/summary/summary.html b/closed/UntetherAI/results/h13_u1_preview/summary/summary.html new file mode 100644 index 00000000..3301fe4b --- /dev/null +++ b/closed/UntetherAI/results/h13_u1_preview/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

UntetherAI

+

Supermicro SuperServer H13 (1x speedAI240 Preview)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:UntetherAIAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 16x (32 GT/s)
accelerator_interconnectN/A
accelerator_interconnect_topologyN/A
accelerator_memory_capacitydisabled
accelerator_memory_configurationLPDDR5 64x
accelerator_model_nameUntetherAI speedAI240 Preview
accelerator_on-chip_memories238 MB SRAM
accelerators_per_node1

Processor and Memory Details

host_memory_capacity64 GB
host_memory_configuration4x 16 GB DDR5 (Samsung M321R2GA3PB0-CWMXJ 4800 MT/s)
host_processor_cachesL1d cache: 512 KiB (16 instances); L1i cache: 512 KiB (16 instances); L2 cache: 16 MiB (16 instances); L3 cache: 64 MiB (4 instances)
host_processor_core_count16
host_processor_frequency1500 MHz (min); 3000 MHz (base); 3700 MHz (boost)
host_processor_interconnectN/A
host_processor_model_nameAMD EPYC 9124 16-Core Processor
host_processor_urlhttps://www.amd.com/en/products/processors/server/epyc/4th-generation-9004-and-8004-series/amd-epyc-9124.html
host_processors_per_node1

Other Hardware Details

coolingair
hw_notes

Network and Interconnect Details

host_network_card_count1
host_networkingintegrated
host_networking_topology1GbE

Software Details

frameworkUntetherAI imAIgine SDK v24.07.19
operating_systemUbuntu 22.04.4 LTS (Linux kernel 6.5.0-44-generic #44~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Jun 18 14:36:16 UTC 2 x86_64 x86_64 x86_64 GNU/Linux)
other_software_stack{'KILT': 'mlperf_4.1', 'Docker': '27.1.0, build 6312585', 'Python': '3.10.12'}
sw_notesPowered by the KRAI X and KILT technologies
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
resnetacc: 75.6954Samples/s 70348.2
+ + + + \ No newline at end of file diff --git a/closed/UntetherAI/results/h13_u1_preview_dc/summary/README.md b/closed/UntetherAI/results/h13_u1_preview_dc/summary/README.md new file mode 100644 index 00000000..0153638b --- /dev/null +++ b/closed/UntetherAI/results/h13_u1_preview_dc/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

UntetherAI

+

Supermicro SuperServer H13 (1x speedAI240 Preview)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:UntetherAIAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 16x (32 GT/s)
accelerator_interconnectN/A
accelerator_interconnect_topologyN/A
accelerator_memory_capacitydisabled
accelerator_memory_configurationLPDDR5 64x
accelerator_model_nameUntetherAI speedAI240 Preview
accelerator_on-chip_memories238 MB SRAM
accelerators_per_node1

Processor and Memory Details

host_memory_capacity64 GB
host_memory_configuration4x 16 GB DDR5 (Samsung M321R2GA3PB0-CWMXJ 4800 MT/s)
host_processor_cachesL1d cache: 512 KiB (16 instances); L1i cache: 512 KiB (16 instances); L2 cache: 16 MiB (16 instances); L3 cache: 64 MiB (4 instances)
host_processor_core_count16
host_processor_frequency1500 MHz (min); 3000 MHz (base); 3700 MHz (boost)
host_processor_interconnectN/A
host_processor_model_nameAMD EPYC 9124 16-Core Processor
host_processor_urlhttps://www.amd.com/en/products/processors/server/epyc/4th-generation-9004-and-8004-series/amd-epyc-9124.html
host_processors_per_node1

Other Hardware Details

coolingair
hw_notes

Network and Interconnect Details

host_network_card_count1
host_networkingintegrated
host_networking_topology1GbE

Software Details

frameworkUntetherAI imAIgine SDK v24.07.19
operating_systemUbuntu 22.04.4 LTS (Linux kernel 6.5.0-44-generic #44~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Jun 18 14:36:16 UTC 2 x86_64 x86_64 x86_64 GNU/Linux)
other_software_stack{'KILT': 'mlperf_4.1', 'Docker': '27.1.0, build 6312585', 'Python': '3.10.12'}
sw_notesPowered by the KRAI X and KILT technologies
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
resnetacc: 75.6954Queries/s 70096.9Samples/s 70348.6
diff --git a/closed/UntetherAI/results/h13_u1_preview_dc/summary/summary.html b/closed/UntetherAI/results/h13_u1_preview_dc/summary/summary.html new file mode 100644 index 00000000..c406e3d2 --- /dev/null +++ b/closed/UntetherAI/results/h13_u1_preview_dc/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

UntetherAI

+

Supermicro SuperServer H13 (1x speedAI240 Preview)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:UntetherAIAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 16x (32 GT/s)
accelerator_interconnectN/A
accelerator_interconnect_topologyN/A
accelerator_memory_capacitydisabled
accelerator_memory_configurationLPDDR5 64x
accelerator_model_nameUntetherAI speedAI240 Preview
accelerator_on-chip_memories238 MB SRAM
accelerators_per_node1

Processor and Memory Details

host_memory_capacity64 GB
host_memory_configuration4x 16 GB DDR5 (Samsung M321R2GA3PB0-CWMXJ 4800 MT/s)
host_processor_cachesL1d cache: 512 KiB (16 instances); L1i cache: 512 KiB (16 instances); L2 cache: 16 MiB (16 instances); L3 cache: 64 MiB (4 instances)
host_processor_core_count16
host_processor_frequency1500 MHz (min); 3000 MHz (base); 3700 MHz (boost)
host_processor_interconnectN/A
host_processor_model_nameAMD EPYC 9124 16-Core Processor
host_processor_urlhttps://www.amd.com/en/products/processors/server/epyc/4th-generation-9004-and-8004-series/amd-epyc-9124.html
host_processors_per_node1

Other Hardware Details

coolingair
hw_notes

Network and Interconnect Details

host_network_card_count1
host_networkingintegrated
host_networking_topology1GbE

Software Details

frameworkUntetherAI imAIgine SDK v24.07.19
operating_systemUbuntu 22.04.4 LTS (Linux kernel 6.5.0-44-generic #44~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Jun 18 14:36:16 UTC 2 x86_64 x86_64 x86_64 GNU/Linux)
other_software_stack{'KILT': 'mlperf_4.1', 'Docker': '27.1.0, build 6312585', 'Python': '3.10.12'}
sw_notesPowered by the KRAI X and KILT technologies
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
resnetacc: 75.6954Queries/s 70096.9Samples/s 70348.6
+ + + + \ No newline at end of file diff --git a/closed/UntetherAI/results/h13_u1_slim/summary/README.md b/closed/UntetherAI/results/h13_u1_slim/summary/README.md new file mode 100644 index 00000000..59580419 --- /dev/null +++ b/closed/UntetherAI/results/h13_u1_slim/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

UntetherAI

+

Supermicro SuperServer H13 (1x speedAI240 Slim)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:UntetherAIAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 16x (32 GT/s)
accelerator_interconnectN/A
accelerator_interconnect_topologyN/A
accelerator_memory_capacitydisabled
accelerator_memory_configurationLPDDR5 64x
accelerator_model_nameUntetherAI speedAI240 Slim
accelerator_on-chip_memories238 MB SRAM
accelerators_per_node1

Processor and Memory Details

host_memory_capacity64 GB
host_memory_configuration4x 16 GB DDR5 (Samsung M321R2GA3PB0-CWMXJ 4800 MT/s)
host_processor_cachesL1d cache: 512 KiB (16 instances); L1i cache: 512 KiB (16 instances); L2 cache: 16 MiB (16 instances); L3 cache: 64 MiB (4 instances)
host_processor_core_count16
host_processor_frequency1500 MHz (min); 3000 MHz (base); 3700 MHz (boost)
host_processor_interconnectN/A
host_processor_model_nameAMD EPYC 9124 16-Core Processor
host_processor_urlhttps://www.amd.com/en/products/processors/server/epyc/4th-generation-9004-and-8004-series/amd-epyc-9124.html
host_processors_per_node1

Other Hardware Details

coolingair
hw_notesSKU: sai240L-F-A-ES

Network and Interconnect Details

host_network_card_count1
host_networkingintegrated
host_networking_topology1GbE

Software Details

frameworkUntetherAI imAIgine SDK v24.07.19
operating_systemUbuntu 22.04.4 LTS (Linux kernel 6.5.0-44-generic #44~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Jun 18 14:36:16 UTC 2 x86_64 x86_64 x86_64 GNU/Linux)
other_software_stack{'KILT': 'mlperf_4.1', 'Docker': '27.1.0, build 6312585', 'Python': '3.10.12'}
sw_notesPowered by the KRAI X and KILT technologies
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
resnetacc: 75.6954Samples/s 56277.1
diff --git a/closed/UntetherAI/results/h13_u1_slim/summary/summary.html b/closed/UntetherAI/results/h13_u1_slim/summary/summary.html new file mode 100644 index 00000000..a42801a9 --- /dev/null +++ b/closed/UntetherAI/results/h13_u1_slim/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

UntetherAI

+

Supermicro SuperServer H13 (1x speedAI240 Slim)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:UntetherAIAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 16x (32 GT/s)
accelerator_interconnectN/A
accelerator_interconnect_topologyN/A
accelerator_memory_capacitydisabled
accelerator_memory_configurationLPDDR5 64x
accelerator_model_nameUntetherAI speedAI240 Slim
accelerator_on-chip_memories238 MB SRAM
accelerators_per_node1

Processor and Memory Details

host_memory_capacity64 GB
host_memory_configuration4x 16 GB DDR5 (Samsung M321R2GA3PB0-CWMXJ 4800 MT/s)
host_processor_cachesL1d cache: 512 KiB (16 instances); L1i cache: 512 KiB (16 instances); L2 cache: 16 MiB (16 instances); L3 cache: 64 MiB (4 instances)
host_processor_core_count16
host_processor_frequency1500 MHz (min); 3000 MHz (base); 3700 MHz (boost)
host_processor_interconnectN/A
host_processor_model_nameAMD EPYC 9124 16-Core Processor
host_processor_urlhttps://www.amd.com/en/products/processors/server/epyc/4th-generation-9004-and-8004-series/amd-epyc-9124.html
host_processors_per_node1

Other Hardware Details

coolingair
hw_notesSKU: sai240L-F-A-ES

Network and Interconnect Details

host_network_card_count1
host_networkingintegrated
host_networking_topology1GbE

Software Details

frameworkUntetherAI imAIgine SDK v24.07.19
operating_systemUbuntu 22.04.4 LTS (Linux kernel 6.5.0-44-generic #44~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Jun 18 14:36:16 UTC 2 x86_64 x86_64 x86_64 GNU/Linux)
other_software_stack{'KILT': 'mlperf_4.1', 'Docker': '27.1.0, build 6312585', 'Python': '3.10.12'}
sw_notesPowered by the KRAI X and KILT technologies
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
resnetacc: 75.6954Samples/s 56277.1
+ + + + \ No newline at end of file diff --git a/closed/UntetherAI/results/h13_u2_preview/summary/README.md b/closed/UntetherAI/results/h13_u2_preview/summary/README.md new file mode 100644 index 00000000..e00a041e --- /dev/null +++ b/closed/UntetherAI/results/h13_u2_preview/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

UntetherAI

+

Supermicro SuperServer H13 (2x speedAI240 Preview)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:UntetherAIAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 16x (32 GT/s)
accelerator_interconnectN/A
accelerator_interconnect_topologyN/A
accelerator_memory_capacitydisabled
accelerator_memory_configurationLPDDR5 64x
accelerator_model_nameUntetherAI speedAI240 Preview
accelerator_on-chip_memories238 MB SRAM
accelerators_per_node2

Processor and Memory Details

host_memory_capacity64 GB
host_memory_configuration4x 16 GB DDR5 (Samsung M321R2GA3PB0-CWMXJ 4800 MT/s)
host_processor_cachesL1d cache: 512 KiB (16 instances); L1i cache: 512 KiB (16 instances); L2 cache: 16 MiB (16 instances); L3 cache: 64 MiB (4 instances)
host_processor_core_count16
host_processor_frequency1500 MHz (min); 3000 MHz (base); 3700 MHz (boost)
host_processor_interconnectN/A
host_processor_model_nameAMD EPYC 9124 16-Core Processor
host_processor_urlhttps://www.amd.com/en/products/processors/server/epyc/4th-generation-9004-and-8004-series/amd-epyc-9124.html
host_processors_per_node1

Other Hardware Details

coolingair
hw_notes

Network and Interconnect Details

host_network_card_count1
host_networkingintegrated
host_networking_topology1GbE

Software Details

frameworkUntetherAI imAIgine SDK v24.07.19
operating_systemUbuntu 22.04.4 LTS (Linux kernel 6.5.0-44-generic #44~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Jun 18 14:36:16 UTC 2 x86_64 x86_64 x86_64 GNU/Linux)
other_software_stack{'KILT': 'mlperf_4.1', 'Docker': '27.1.0, build 6312585', 'Python': '3.10.12'}
sw_notesPowered by the KRAI X and KILT technologies
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
resnetacc: 75.6954Samples/s 140625.0
diff --git a/closed/UntetherAI/results/h13_u2_preview/summary/summary.html b/closed/UntetherAI/results/h13_u2_preview/summary/summary.html new file mode 100644 index 00000000..85f792ae --- /dev/null +++ b/closed/UntetherAI/results/h13_u2_preview/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

UntetherAI

+

Supermicro SuperServer H13 (2x speedAI240 Preview)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:UntetherAIAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 16x (32 GT/s)
accelerator_interconnectN/A
accelerator_interconnect_topologyN/A
accelerator_memory_capacitydisabled
accelerator_memory_configurationLPDDR5 64x
accelerator_model_nameUntetherAI speedAI240 Preview
accelerator_on-chip_memories238 MB SRAM
accelerators_per_node2

Processor and Memory Details

host_memory_capacity64 GB
host_memory_configuration4x 16 GB DDR5 (Samsung M321R2GA3PB0-CWMXJ 4800 MT/s)
host_processor_cachesL1d cache: 512 KiB (16 instances); L1i cache: 512 KiB (16 instances); L2 cache: 16 MiB (16 instances); L3 cache: 64 MiB (4 instances)
host_processor_core_count16
host_processor_frequency1500 MHz (min); 3000 MHz (base); 3700 MHz (boost)
host_processor_interconnectN/A
host_processor_model_nameAMD EPYC 9124 16-Core Processor
host_processor_urlhttps://www.amd.com/en/products/processors/server/epyc/4th-generation-9004-and-8004-series/amd-epyc-9124.html
host_processors_per_node1

Other Hardware Details

coolingair
hw_notes

Network and Interconnect Details

host_network_card_count1
host_networkingintegrated
host_networking_topology1GbE

Software Details

frameworkUntetherAI imAIgine SDK v24.07.19
operating_systemUbuntu 22.04.4 LTS (Linux kernel 6.5.0-44-generic #44~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Jun 18 14:36:16 UTC 2 x86_64 x86_64 x86_64 GNU/Linux)
other_software_stack{'KILT': 'mlperf_4.1', 'Docker': '27.1.0, build 6312585', 'Python': '3.10.12'}
sw_notesPowered by the KRAI X and KILT technologies
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
resnetacc: 75.6954Samples/s 140625.0
+ + + + \ No newline at end of file diff --git a/closed/UntetherAI/results/h13_u2_preview_dc/summary/README.md b/closed/UntetherAI/results/h13_u2_preview_dc/summary/README.md new file mode 100644 index 00000000..e1af81d8 --- /dev/null +++ b/closed/UntetherAI/results/h13_u2_preview_dc/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

UntetherAI

+

Supermicro SuperServer H13 (2x speedAI240 Preview)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:UntetherAIAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 16x (32 GT/s)
accelerator_interconnectN/A
accelerator_interconnect_topologyN/A
accelerator_memory_capacitydisabled
accelerator_memory_configurationLPDDR5 64x
accelerator_model_nameUntetherAI speedAI240 Preview
accelerator_on-chip_memories238 MB SRAM
accelerators_per_node2

Processor and Memory Details

host_memory_capacity64 GB
host_memory_configuration4x 16 GB DDR5 (Samsung M321R2GA3PB0-CWMXJ 4800 MT/s)
host_processor_cachesL1d cache: 512 KiB (16 instances); L1i cache: 512 KiB (16 instances); L2 cache: 16 MiB (16 instances); L3 cache: 64 MiB (4 instances)
host_processor_core_count16
host_processor_frequency1500 MHz (min); 3000 MHz (base); 3700 MHz (boost)
host_processor_interconnectN/A
host_processor_model_nameAMD EPYC 9124 16-Core Processor
host_processor_urlhttps://www.amd.com/en/products/processors/server/epyc/4th-generation-9004-and-8004-series/amd-epyc-9124.html
host_processors_per_node1

Other Hardware Details

coolingair
hw_notes

Network and Interconnect Details

host_network_card_count1
host_networkingintegrated
host_networking_topology1GbE

Software Details

frameworkUntetherAI imAIgine SDK v24.07.19
operating_systemUbuntu 22.04.4 LTS (Linux kernel 6.5.0-44-generic #44~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Jun 18 14:36:16 UTC 2 x86_64 x86_64 x86_64 GNU/Linux)
other_software_stack{'KILT': 'mlperf_4.1', 'Docker': '27.1.0, build 6312585', 'Python': '3.10.12'}
sw_notesPowered by the KRAI X and KILT technologies
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
resnetacc: 75.6954Queries/s 140239.0Samples/s 140631.0
diff --git a/closed/UntetherAI/results/h13_u2_preview_dc/summary/summary.html b/closed/UntetherAI/results/h13_u2_preview_dc/summary/summary.html new file mode 100644 index 00000000..62dd02ca --- /dev/null +++ b/closed/UntetherAI/results/h13_u2_preview_dc/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

UntetherAI

+

Supermicro SuperServer H13 (2x speedAI240 Preview)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:UntetherAIAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 16x (32 GT/s)
accelerator_interconnectN/A
accelerator_interconnect_topologyN/A
accelerator_memory_capacitydisabled
accelerator_memory_configurationLPDDR5 64x
accelerator_model_nameUntetherAI speedAI240 Preview
accelerator_on-chip_memories238 MB SRAM
accelerators_per_node2

Processor and Memory Details

host_memory_capacity64 GB
host_memory_configuration4x 16 GB DDR5 (Samsung M321R2GA3PB0-CWMXJ 4800 MT/s)
host_processor_cachesL1d cache: 512 KiB (16 instances); L1i cache: 512 KiB (16 instances); L2 cache: 16 MiB (16 instances); L3 cache: 64 MiB (4 instances)
host_processor_core_count16
host_processor_frequency1500 MHz (min); 3000 MHz (base); 3700 MHz (boost)
host_processor_interconnectN/A
host_processor_model_nameAMD EPYC 9124 16-Core Processor
host_processor_urlhttps://www.amd.com/en/products/processors/server/epyc/4th-generation-9004-and-8004-series/amd-epyc-9124.html
host_processors_per_node1

Other Hardware Details

coolingair
hw_notes

Network and Interconnect Details

host_network_card_count1
host_networkingintegrated
host_networking_topology1GbE

Software Details

frameworkUntetherAI imAIgine SDK v24.07.19
operating_systemUbuntu 22.04.4 LTS (Linux kernel 6.5.0-44-generic #44~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Jun 18 14:36:16 UTC 2 x86_64 x86_64 x86_64 GNU/Linux)
other_software_stack{'KILT': 'mlperf_4.1', 'Docker': '27.1.0, build 6312585', 'Python': '3.10.12'}
sw_notesPowered by the KRAI X and KILT technologies
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
resnetacc: 75.6954Queries/s 140239.0Samples/s 140631.0
+ + + + \ No newline at end of file diff --git a/closed/UntetherAI/results/h13_u3_slim/summary/README.md b/closed/UntetherAI/results/h13_u3_slim/summary/README.md new file mode 100644 index 00000000..28b56a3f --- /dev/null +++ b/closed/UntetherAI/results/h13_u3_slim/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

UntetherAI

+

Supermicro SuperServer H13 (3x speedAI240 Slim)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:UntetherAIAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 16x (32 GT/s)
accelerator_interconnectN/A
accelerator_interconnect_topologyN/A
accelerator_memory_capacitydisabled
accelerator_memory_configurationLPDDR5 64x
accelerator_model_nameUntetherAI speedAI240 Slim
accelerator_on-chip_memories238 MB SRAM
accelerators_per_node3

Processor and Memory Details

host_memory_capacity64 GB
host_memory_configuration4x 16 GB DDR5 (Samsung M321R2GA3PB0-CWMXJ 4800 MT/s)
host_processor_cachesL1d cache: 512 KiB (16 instances); L1i cache: 512 KiB (16 instances); L2 cache: 16 MiB (16 instances); L3 cache: 64 MiB (4 instances)
host_processor_core_count16
host_processor_frequency1500 MHz (min); 3000 MHz (base); 3700 MHz (boost)
host_processor_interconnectN/A
host_processor_model_nameAMD EPYC 9124 16-Core Processor
host_processor_urlhttps://www.amd.com/en/products/processors/server/epyc/4th-generation-9004-and-8004-series/amd-epyc-9124.html
host_processors_per_node1

Other Hardware Details

coolingair
hw_notesSKU: sai240L-F-A-ES

Network and Interconnect Details

host_network_card_count1
host_networkingintegrated
host_networking_topology1GbE

Software Details

frameworkUntetherAI imAIgine SDK v24.07.19
operating_systemUbuntu 22.04.4 LTS (Linux kernel 6.5.0-44-generic #44~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Jun 18 14:36:16 UTC 2 x86_64 x86_64 x86_64 GNU/Linux)
other_software_stack{'KILT': 'mlperf_4.1', 'Docker': '27.1.0, build 6312585', 'Python': '3.10.12'}
sw_notesPowered by the KRAI X and KILT technologies
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
resnetacc: 75.6954Samples/s 168720.0
diff --git a/closed/UntetherAI/results/h13_u3_slim/summary/summary.html b/closed/UntetherAI/results/h13_u3_slim/summary/summary.html new file mode 100644 index 00000000..a273ad61 --- /dev/null +++ b/closed/UntetherAI/results/h13_u3_slim/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

UntetherAI

+

Supermicro SuperServer H13 (3x speedAI240 Slim)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:UntetherAIAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 16x (32 GT/s)
accelerator_interconnectN/A
accelerator_interconnect_topologyN/A
accelerator_memory_capacitydisabled
accelerator_memory_configurationLPDDR5 64x
accelerator_model_nameUntetherAI speedAI240 Slim
accelerator_on-chip_memories238 MB SRAM
accelerators_per_node3

Processor and Memory Details

host_memory_capacity64 GB
host_memory_configuration4x 16 GB DDR5 (Samsung M321R2GA3PB0-CWMXJ 4800 MT/s)
host_processor_cachesL1d cache: 512 KiB (16 instances); L1i cache: 512 KiB (16 instances); L2 cache: 16 MiB (16 instances); L3 cache: 64 MiB (4 instances)
host_processor_core_count16
host_processor_frequency1500 MHz (min); 3000 MHz (base); 3700 MHz (boost)
host_processor_interconnectN/A
host_processor_model_nameAMD EPYC 9124 16-Core Processor
host_processor_urlhttps://www.amd.com/en/products/processors/server/epyc/4th-generation-9004-and-8004-series/amd-epyc-9124.html
host_processors_per_node1

Other Hardware Details

coolingair
hw_notesSKU: sai240L-F-A-ES

Network and Interconnect Details

host_network_card_count1
host_networkingintegrated
host_networking_topology1GbE

Software Details

frameworkUntetherAI imAIgine SDK v24.07.19
operating_systemUbuntu 22.04.4 LTS (Linux kernel 6.5.0-44-generic #44~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Jun 18 14:36:16 UTC 2 x86_64 x86_64 x86_64 GNU/Linux)
other_software_stack{'KILT': 'mlperf_4.1', 'Docker': '27.1.0, build 6312585', 'Python': '3.10.12'}
sw_notesPowered by the KRAI X and KILT technologies
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
resnetacc: 75.6954Samples/s 168720.0
+ + + + \ No newline at end of file diff --git a/closed/UntetherAI/results/r760_u4_slim/summary/README.md b/closed/UntetherAI/results/r760_u4_slim/summary/README.md new file mode 100644 index 00000000..7f84960b --- /dev/null +++ b/closed/UntetherAI/results/r760_u4_slim/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

UntetherAI

+

Dell PowerEdge R760xa (4x speedAI240 Slim)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:UntetherAIAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen 5 16x (32 GT/s)
accelerator_interconnectN/A
accelerator_interconnect_topologyN/A
accelerator_memory_capacitydisabled
accelerator_memory_configurationLPDDR5 64x
accelerator_model_nameUntetherAI speedAI240 Slim
accelerator_on-chip_memories238 MB SRAM
accelerators_per_node4

Processor and Memory Details

host_memory_capacity256 GB
host_memory_configuration16x 16 GB DDR5 (Samsung M321R2GA3BB6-CQKDS 4800 MT/s)
host_processor_cachesL1d cache: 3 MiB (64 instances); L1i cache: 2 MiB (64 instances); L2 cache: 128 MiB (64 instances); L3 cache: 120 MiB (2 instances)
host_processor_core_count32
host_processor_frequency800 MHz (min); 2100 MHz (base); 4100 MHz (boost)
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Gold 6448Y
host_processor_urlhttps://www.intel.com/content/www/us/en/products/sku/232384/intel-xeon-gold-6448y-processor-60m-cache-2-10-ghz/specifications.html
host_processors_per_node2

Other Hardware Details

coolingair
hw_notesSKU: sai240L-F-A-ES

Network and Interconnect Details

host_network_card_count2
host_networkingembedded; integrated
host_networking_topologyBroadcom NetXtreme 1GbE (BCM5720); Broadcom Adv. Dual 25GbE

Software Details

frameworkUntetherAI imAIgine SDK v24.07.19
operating_systemUbuntu 22.04.4 LTS (Linux kernel 6.5.0-44-generic #44~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Jun 18 14:36:16 UTC 2 x86_64 x86_64 x86_64 GNU/Linux)
other_software_stack{'KILT': 'mlperf_4.1', 'Docker': '27.1.0, build 6312585', 'Python': '3.10.12'}
sw_notesPowered by the KRAI X and KILT technologies
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
resnetacc: 75.6954Samples/s 221845.0
diff --git a/closed/UntetherAI/results/r760_u4_slim/summary/summary.html b/closed/UntetherAI/results/r760_u4_slim/summary/summary.html new file mode 100644 index 00000000..532c45c4 --- /dev/null +++ b/closed/UntetherAI/results/r760_u4_slim/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

UntetherAI

+

Dell PowerEdge R760xa (4x speedAI240 Slim)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:UntetherAIAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen 5 16x (32 GT/s)
accelerator_interconnectN/A
accelerator_interconnect_topologyN/A
accelerator_memory_capacitydisabled
accelerator_memory_configurationLPDDR5 64x
accelerator_model_nameUntetherAI speedAI240 Slim
accelerator_on-chip_memories238 MB SRAM
accelerators_per_node4

Processor and Memory Details

host_memory_capacity256 GB
host_memory_configuration16x 16 GB DDR5 (Samsung M321R2GA3BB6-CQKDS 4800 MT/s)
host_processor_cachesL1d cache: 3 MiB (64 instances); L1i cache: 2 MiB (64 instances); L2 cache: 128 MiB (64 instances); L3 cache: 120 MiB (2 instances)
host_processor_core_count32
host_processor_frequency800 MHz (min); 2100 MHz (base); 4100 MHz (boost)
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Gold 6448Y
host_processor_urlhttps://www.intel.com/content/www/us/en/products/sku/232384/intel-xeon-gold-6448y-processor-60m-cache-2-10-ghz/specifications.html
host_processors_per_node2

Other Hardware Details

coolingair
hw_notesSKU: sai240L-F-A-ES

Network and Interconnect Details

host_network_card_count2
host_networkingembedded; integrated
host_networking_topologyBroadcom NetXtreme 1GbE (BCM5720); Broadcom Adv. Dual 25GbE

Software Details

frameworkUntetherAI imAIgine SDK v24.07.19
operating_systemUbuntu 22.04.4 LTS (Linux kernel 6.5.0-44-generic #44~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Jun 18 14:36:16 UTC 2 x86_64 x86_64 x86_64 GNU/Linux)
other_software_stack{'KILT': 'mlperf_4.1', 'Docker': '27.1.0, build 6312585', 'Python': '3.10.12'}
sw_notesPowered by the KRAI X and KILT technologies
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
resnetacc: 75.6954Samples/s 221845.0
+ + + + \ No newline at end of file diff --git a/closed/UntetherAI/results/r760_u6_slim/summary/README.md b/closed/UntetherAI/results/r760_u6_slim/summary/README.md new file mode 100644 index 00000000..a316ffca --- /dev/null +++ b/closed/UntetherAI/results/r760_u6_slim/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

UntetherAI

+

Dell PowerEdge R760xa (6x speedAI240 Slim)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:UntetherAIAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen 5 16x (32 GT/s)
accelerator_interconnectN/A
accelerator_interconnect_topologyN/A
accelerator_memory_capacitydisabled
accelerator_memory_configurationLPDDR5 64x
accelerator_model_nameUntetherAI speedAI240 Slim
accelerator_on-chip_memories238 MB SRAM
accelerators_per_node6

Processor and Memory Details

host_memory_capacity256 GB
host_memory_configuration16x 16 GB DDR5 (Samsung M321R2GA3BB6-CQKDS 4800 MT/s)
host_processor_cachesL1d cache: 3 MiB (64 instances); L1i cache: 2 MiB (64 instances); L2 cache: 128 MiB (64 instances); L3 cache: 120 MiB (2 instances)
host_processor_core_count32
host_processor_frequency800 MHz (min); 2100 MHz (base); 4100 MHz (boost)
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Gold 6448Y
host_processor_urlhttps://www.intel.com/content/www/us/en/products/sku/232384/intel-xeon-gold-6448y-processor-60m-cache-2-10-ghz/specifications.html
host_processors_per_node2

Other Hardware Details

coolingair
hw_notes

Network and Interconnect Details

host_network_card_count2
host_networkingembedded; integrated
host_networking_topologyBroadcom NetXtreme 1GbE (BCM5720); Broadcom Adv. Dual 25GbE

Software Details

frameworkUntetherAI imAIgine SDK v24.07.19
operating_systemUbuntu 22.04.4 LTS (Linux kernel 6.5.0-44-generic #44~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Jun 18 14:36:16 UTC 2 x86_64 x86_64 x86_64 GNU/Linux)
other_software_stack{'KILT': 'mlperf_4.1', 'Docker': '27.1.0, build 6312585', 'Python': '3.10.12'}
sw_notesPowered by the KRAI X and KILT technologies
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
resnetacc: 75.6954Queries/s 309752.0Samples/s 334462.0
diff --git a/closed/UntetherAI/results/r760_u6_slim/summary/summary.html b/closed/UntetherAI/results/r760_u6_slim/summary/summary.html new file mode 100644 index 00000000..1d1ed745 --- /dev/null +++ b/closed/UntetherAI/results/r760_u6_slim/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

UntetherAI

+

Dell PowerEdge R760xa (6x speedAI240 Slim)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:UntetherAIAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen 5 16x (32 GT/s)
accelerator_interconnectN/A
accelerator_interconnect_topologyN/A
accelerator_memory_capacitydisabled
accelerator_memory_configurationLPDDR5 64x
accelerator_model_nameUntetherAI speedAI240 Slim
accelerator_on-chip_memories238 MB SRAM
accelerators_per_node6

Processor and Memory Details

host_memory_capacity256 GB
host_memory_configuration16x 16 GB DDR5 (Samsung M321R2GA3BB6-CQKDS 4800 MT/s)
host_processor_cachesL1d cache: 3 MiB (64 instances); L1i cache: 2 MiB (64 instances); L2 cache: 128 MiB (64 instances); L3 cache: 120 MiB (2 instances)
host_processor_core_count32
host_processor_frequency800 MHz (min); 2100 MHz (base); 4100 MHz (boost)
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Gold 6448Y
host_processor_urlhttps://www.intel.com/content/www/us/en/products/sku/232384/intel-xeon-gold-6448y-processor-60m-cache-2-10-ghz/specifications.html
host_processors_per_node2

Other Hardware Details

coolingair
hw_notes

Network and Interconnect Details

host_network_card_count2
host_networkingembedded; integrated
host_networking_topologyBroadcom NetXtreme 1GbE (BCM5720); Broadcom Adv. Dual 25GbE

Software Details

frameworkUntetherAI imAIgine SDK v24.07.19
operating_systemUbuntu 22.04.4 LTS (Linux kernel 6.5.0-44-generic #44~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Jun 18 14:36:16 UTC 2 x86_64 x86_64 x86_64 GNU/Linux)
other_software_stack{'KILT': 'mlperf_4.1', 'Docker': '27.1.0, build 6312585', 'Python': '3.10.12'}
sw_notesPowered by the KRAI X and KILT technologies
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
resnetacc: 75.6954Queries/s 309752.0Samples/s 334462.0
+ + + + \ No newline at end of file diff --git a/open/CTuning/results/cm-demo-gfursin-aws-g4dn.4xlarge-nvidia_original-gpu-tensorrt-vdefault-default_config/summary/README.md b/open/CTuning/results/cm-demo-gfursin-aws-g4dn.4xlarge-nvidia_original-gpu-tensorrt-vdefault-default_config/summary/README.md new file mode 100644 index 00000000..b32bb583 --- /dev/null +++ b/open/CTuning/results/cm-demo-gfursin-aws-g4dn.4xlarge-nvidia_original-gpu-tensorrt-vdefault-default_config/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

CTuning

+

aws-g4dn-4xlarge

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:CTuningAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency1590.000000 MHz
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacity14.56805419921875 GB
accelerator_memory_configurationN/A
accelerator_model_nameTesla T4
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity63G
host_memory_configurationundefined
host_processor_cachesL1d cache: 256 KiB, L1i cache: 256 KiB, L2 cache: 8 MiB, L3 cache: 35.8 MiB
host_processor_core_count8
host_processor_frequencyundefined
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
host_processors_per_node1

Other Hardware Details

coolingair
hw_notes

Network and Interconnect Details

host_network_card_count1
host_networkingGig Ethernet
host_networking_topologyN/A

Software Details

frameworkTensorRT
operating_systemUbuntu 20.04 (linux-6.5.0-1023-aws-glibc2.31)
other_software_stackPython: 3.8.10, GCC-9.4.0
sw_notescTuning.org/ae: Collective Mind demo for our reproducibility initiatives and artifact evaluation at ACM, IEEE and MLCommons ; automated by MLCommons CM v2.3.4 ; taken by Grigori Fursin
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
bert-99Samples/s 381.124
diff --git a/open/CTuning/results/cm-demo-gfursin-aws-g4dn.4xlarge-nvidia_original-gpu-tensorrt-vdefault-default_config/summary/summary.html b/open/CTuning/results/cm-demo-gfursin-aws-g4dn.4xlarge-nvidia_original-gpu-tensorrt-vdefault-default_config/summary/summary.html new file mode 100644 index 00000000..e7591ff9 --- /dev/null +++ b/open/CTuning/results/cm-demo-gfursin-aws-g4dn.4xlarge-nvidia_original-gpu-tensorrt-vdefault-default_config/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

CTuning

+

aws-g4dn-4xlarge

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:CTuningAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency1590.000000 MHz
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacity14.56805419921875 GB
accelerator_memory_configurationN/A
accelerator_model_nameTesla T4
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity63G
host_memory_configurationundefined
host_processor_cachesL1d cache: 256 KiB, L1i cache: 256 KiB, L2 cache: 8 MiB, L3 cache: 35.8 MiB
host_processor_core_count8
host_processor_frequencyundefined
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
host_processors_per_node1

Other Hardware Details

coolingair
hw_notes

Network and Interconnect Details

host_network_card_count1
host_networkingGig Ethernet
host_networking_topologyN/A

Software Details

frameworkTensorRT
operating_systemUbuntu 20.04 (linux-6.5.0-1023-aws-glibc2.31)
other_software_stackPython: 3.8.10, GCC-9.4.0
sw_notescTuning.org/ae: Collective Mind demo for our reproducibility initiatives and artifact evaluation at ACM, IEEE and MLCommons ; automated by MLCommons CM v2.3.4 ; taken by Grigori Fursin
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
bert-99Samples/s 381.124
+ + + + \ No newline at end of file diff --git a/open/CTuning/results/cm-demo-gfursin-scaleway-L4-1-24G-reference-gpu-pytorch-v2.3.1-default_config/summary/README.md b/open/CTuning/results/cm-demo-gfursin-scaleway-L4-1-24G-reference-gpu-pytorch-v2.3.1-default_config/summary/README.md new file mode 100644 index 00000000..4c1f6267 --- /dev/null +++ b/open/CTuning/results/cm-demo-gfursin-scaleway-L4-1-24G-reference-gpu-pytorch-v2.3.1-default_config/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

CTuning

+

scaleway-L4-1-24G

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:CTuningAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency2040.000000 MHz
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacity21.95147705078125 GB
accelerator_memory_configurationN/A
accelerator_model_nameNVIDIA L4
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity48G
host_memory_configurationundefined
host_processor_cachesL1d cache: 512 KiB (8 instances), L1i cache: 512 KiB (8 instances), L2 cache: 4 MiB (8 instances), L3 cache: 128 MiB (8 instances)
host_processor_core_count8
host_processor_frequencyundefined
host_processor_interconnect
host_processor_model_nameAMD EPYC 7413 24-Core Processor
host_processors_per_node1

Other Hardware Details

coolingair
hw_notes

Network and Interconnect Details

host_network_card_count1
host_networkingGig Ethernet
host_networking_topologyN/A

Software Details

frameworkpytorch v2.3.1
operating_systemUbuntu 22.04 (linux-5.15.0-116-generic-glibc2.35)
other_software_stackPython: 3.10.12, LLVM-15.0.6
sw_notescTuning.org/ae: Collective Mind demo for our reproducibility initiatives and artifact evaluation at ACM, IEEE and MLCommons ; automated by MLCommons CM v2.3.4 ; taken by Grigori Fursin
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
stable-diffusion-xlSamples/s 0.125716
diff --git a/open/CTuning/results/cm-demo-gfursin-scaleway-L4-1-24G-reference-gpu-pytorch-v2.3.1-default_config/summary/summary.html b/open/CTuning/results/cm-demo-gfursin-scaleway-L4-1-24G-reference-gpu-pytorch-v2.3.1-default_config/summary/summary.html new file mode 100644 index 00000000..2e105e50 --- /dev/null +++ b/open/CTuning/results/cm-demo-gfursin-scaleway-L4-1-24G-reference-gpu-pytorch-v2.3.1-default_config/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

CTuning

+

scaleway-L4-1-24G

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:CTuningAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency2040.000000 MHz
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacity21.95147705078125 GB
accelerator_memory_configurationN/A
accelerator_model_nameNVIDIA L4
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity48G
host_memory_configurationundefined
host_processor_cachesL1d cache: 512 KiB (8 instances), L1i cache: 512 KiB (8 instances), L2 cache: 4 MiB (8 instances), L3 cache: 128 MiB (8 instances)
host_processor_core_count8
host_processor_frequencyundefined
host_processor_interconnect
host_processor_model_nameAMD EPYC 7413 24-Core Processor
host_processors_per_node1

Other Hardware Details

coolingair
hw_notes

Network and Interconnect Details

host_network_card_count1
host_networkingGig Ethernet
host_networking_topologyN/A

Software Details

frameworkpytorch v2.3.1
operating_systemUbuntu 22.04 (linux-5.15.0-116-generic-glibc2.35)
other_software_stackPython: 3.10.12, LLVM-15.0.6
sw_notescTuning.org/ae: Collective Mind demo for our reproducibility initiatives and artifact evaluation at ACM, IEEE and MLCommons ; automated by MLCommons CM v2.3.4 ; taken by Grigori Fursin
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
stable-diffusion-xlSamples/s 0.125716
+ + + + \ No newline at end of file diff --git a/open/HPE/results/HPE_Cray_XD670_H100_SXM_80GBx8_TRT/summary/README.md b/open/HPE/results/HPE_Cray_XD670_H100_SXM_80GBx8_TRT/summary/README.md new file mode 100644 index 00000000..1fb05bef --- /dev/null +++ b/open/HPE/results/HPE_Cray_XD670_H100_SXM_80GBx8_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

HPE

+

HPE Cray XD670 (8x H100-SXM-80GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:HPEAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectN/A
accelerator_interconnect8x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-SXM-80GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2048GB
host_memory_configuration32x 64GB MTC40F2046S1RC48BA1
host_processor_cachesL1d: 4.5 MiB (96 instances), L1i: 3 MiB (96 instances), L2: 192 MiB (96 instances), L3: 210 MiB (2 instances)
host_processor_core_count48
host_processor_frequency3.8GHz
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8468
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count10x 400Gbe Infiniband
host_networkingInfiniband
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_system22.04.4 LTS
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 555.42.06
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99.9Tokens/s 19627.6Tokens/s 19889.7
diff --git a/open/HPE/results/HPE_Cray_XD670_H100_SXM_80GBx8_TRT/summary/summary.html b/open/HPE/results/HPE_Cray_XD670_H100_SXM_80GBx8_TRT/summary/summary.html new file mode 100644 index 00000000..097fb3e4 --- /dev/null +++ b/open/HPE/results/HPE_Cray_XD670_H100_SXM_80GBx8_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

HPE

+

HPE Cray XD670 (8x H100-SXM-80GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:HPEAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectN/A
accelerator_interconnect8x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-SXM-80GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2048GB
host_memory_configuration32x 64GB MTC40F2046S1RC48BA1
host_processor_cachesL1d: 4.5 MiB (96 instances), L1i: 3 MiB (96 instances), L2: 192 MiB (96 instances), L3: 210 MiB (2 instances)
host_processor_core_count48
host_processor_frequency3.8GHz
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8468
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count10x 400Gbe Infiniband
host_networkingInfiniband
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_system22.04.4 LTS
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 555.42.06
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
gptj-99.9Tokens/s 19627.6Tokens/s 19889.7
+ + + + \ No newline at end of file diff --git a/open/HPE/results/HPE_ProLiant_DL380a_H100_NVL_94GBx4_TRT/summary/README.md b/open/HPE/results/HPE_ProLiant_DL380a_H100_NVL_94GBx4_TRT/summary/README.md new file mode 100644 index 00000000..bf5d9d3d --- /dev/null +++ b/open/HPE/results/HPE_ProLiant_DL380a_H100_NVL_94GBx4_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

HPE

+

HPE Cray XD670 (8x H100-SXM-80GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:HPEAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectN/A
accelerator_interconnect8x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-SXM-80GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2048GB
host_memory_configuration32x 64GB MTC40F2046S1RC48BA1
host_processor_cachesL1d: 4.5 MiB (96 instances), L1i: 3 MiB (96 instances), L2: 192 MiB (96 instances), L3: 210 MiB (2 instances)
host_processor_core_count48
host_processor_frequency3.8GHz
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8468
host_processors_per_node2

Network and Interconnect Details

host_network_card_count10x 400Gbe Infiniband
host_networkingInfiniband
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_system22.04.4 LTS
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 555.42.06
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
diff --git a/open/HPE/results/HPE_ProLiant_DL380a_H100_NVL_94GBx4_TRT/summary/summary.html b/open/HPE/results/HPE_ProLiant_DL380a_H100_NVL_94GBx4_TRT/summary/summary.html new file mode 100644 index 00000000..40cc2650 --- /dev/null +++ b/open/HPE/results/HPE_ProLiant_DL380a_H100_NVL_94GBx4_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

HPE

+

HPE Cray XD670 (8x H100-SXM-80GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:HPEAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectN/A
accelerator_interconnect8x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-SXM-80GB
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity2048GB
host_memory_configuration32x 64GB MTC40F2046S1RC48BA1
host_processor_cachesL1d: 4.5 MiB (96 instances), L1i: 3 MiB (96 instances), L2: 192 MiB (96 instances), L3: 210 MiB (2 instances)
host_processor_core_count48
host_processor_frequency3.8GHz
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8468
host_processors_per_node2

Network and Interconnect Details

host_network_card_count10x 400Gbe Infiniband
host_networkingInfiniband
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 10.2.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_system22.04.4 LTS
other_software_stackTensorRT 10.2.0, CUDA 12.4, cuDNN 8.9.7, Driver 555.42.06
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
+ + + + \ No newline at end of file diff --git a/open/HPE/results/HPE_ProLiant_DL380a_L40S_PCIe_48GBx4_TRT/summary/README.md b/open/HPE/results/HPE_ProLiant_DL380a_L40S_PCIe_48GBx4_TRT/summary/README.md new file mode 100644 index 00000000..df0851b0 --- /dev/null +++ b/open/HPE/results/HPE_ProLiant_DL380a_L40S_PCIe_48GBx4_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

HPE

+

HPE ProLiant DL380a Gen11 (4x L40S-PCIe-48GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:HPEAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 Switch
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacity48 GB
accelerator_memory_configurationHBM2e
accelerator_model_nameNVIDIA L40S
accelerator_on-chip_memories
accelerators_per_node4

Processor and Memory Details

host_memory_capacity1024GB
host_memory_configuration16x 64GB 36ASF8G72PZ-3G2E1
host_processor_caches
host_processor_core_count60
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8580
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count7-slots of 2-port 200G Infiniband (Max 2800GB HDR) or 2-port 100G Ethernet (Max 1400GbE)
host_networking1Gbe
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 9.0.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.3
other_software_stackTensorRT 9.3.0, CUDA 12.4, Driver 535.183.01
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
stable-diffusion-xlSamples/s 1.79467
3d-unet-99.9Samples/s 15.4277
diff --git a/open/HPE/results/HPE_ProLiant_DL380a_L40S_PCIe_48GBx4_TRT/summary/summary.html b/open/HPE/results/HPE_ProLiant_DL380a_L40S_PCIe_48GBx4_TRT/summary/summary.html new file mode 100644 index 00000000..a0af6136 --- /dev/null +++ b/open/HPE/results/HPE_ProLiant_DL380a_L40S_PCIe_48GBx4_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

HPE

+

HPE ProLiant DL380a Gen11 (4x L40S-PCIe-48GB, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:HPEAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 Switch
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacity48 GB
accelerator_memory_configurationHBM2e
accelerator_model_nameNVIDIA L40S
accelerator_on-chip_memories
accelerators_per_node4

Processor and Memory Details

host_memory_capacity1024GB
host_memory_configuration16x 64GB 36ASF8G72PZ-3G2E1
host_processor_caches
host_processor_core_count60
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8580
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllers
disk_drives
hw_notes
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count7-slots of 2-port 200G Infiniband (Max 2800GB HDR) or 2-port 100G Ethernet (Max 1400GbE)
host_networking1Gbe
host_networking_topologyN/A
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 9.0.0, CUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.3
other_software_stackTensorRT 9.3.0, CUDA 12.4, Driver 535.183.01
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
stable-diffusion-xlSamples/s 1.79467
3d-unet-99.9Samples/s 15.4277
+ + + + \ No newline at end of file diff --git a/open/Krai/results/7920t-kilt-onnxruntime_cpu/summary/README.md b/open/Krai/results/7920t-kilt-onnxruntime_cpu/summary/README.md new file mode 100644 index 00000000..c0ecaa81 --- /dev/null +++ b/open/Krai/results/7920t-kilt-onnxruntime_cpu/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Krai

+

Dell Precision 7920 Tower

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:KraiAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequencyN/A
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topologyN/A
accelerator_memory_capacityN/A
accelerator_memory_configurationN/A
accelerator_model_nameN/A
accelerator_on-chip_memoriesN/A
accelerators_per_node0

Processor and Memory Details

host_memory_capacity96 GB
host_memory_configuration6x 16 GB
host_processor_cachesL1d cache: 768 KiB (24 instances); L1i cache: 768 KiB (24 instances); L2 cache: 24 MiB (24 instances); L3 cache: 35.8 MiB (1 instance)
host_processor_core_count24
host_processor_frequency1000 MHz (min); 2400 MHz (base); 4000 MHz (boost)
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Gold 6240R CPU @ 2.40GHz
host_processors_per_node1
host_processor_urlhttps://www.intel.com/content/www/us/en/products/sku/199343/intel-xeon-gold-6240r-processor-35-75m-cache-2-40-ghz/specifications.html

Other Hardware Details

coolingair
hw_notes

Network and Interconnect Details

host_networkingEthernet
host_networking_topologyIntegrated
host_network_card_count1

Software Details

frameworkKRAI Inference Library Technology (KILT) with ONNX Runtime support
operating_systemUbuntu 22.04.4 LTS (Linux kernel 5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux)
other_software_stackONNX Runtime v1.18.1; Python v3.9.19; GCC v11.4.0
sw_notesPowered by the KRAI X and KILT technologies
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
bert-99Samples/s 2.68543
resnetSamples/s 149.84
diff --git a/open/Krai/results/7920t-kilt-onnxruntime_cpu/summary/summary.html b/open/Krai/results/7920t-kilt-onnxruntime_cpu/summary/summary.html new file mode 100644 index 00000000..fa393392 --- /dev/null +++ b/open/Krai/results/7920t-kilt-onnxruntime_cpu/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Krai

+

Dell Precision 7920 Tower

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:KraiAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequencyN/A
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topologyN/A
accelerator_memory_capacityN/A
accelerator_memory_configurationN/A
accelerator_model_nameN/A
accelerator_on-chip_memoriesN/A
accelerators_per_node0

Processor and Memory Details

host_memory_capacity96 GB
host_memory_configuration6x 16 GB
host_processor_cachesL1d cache: 768 KiB (24 instances); L1i cache: 768 KiB (24 instances); L2 cache: 24 MiB (24 instances); L3 cache: 35.8 MiB (1 instance)
host_processor_core_count24
host_processor_frequency1000 MHz (min); 2400 MHz (base); 4000 MHz (boost)
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Gold 6240R CPU @ 2.40GHz
host_processors_per_node1
host_processor_urlhttps://www.intel.com/content/www/us/en/products/sku/199343/intel-xeon-gold-6240r-processor-35-75m-cache-2-40-ghz/specifications.html

Other Hardware Details

coolingair
hw_notes

Network and Interconnect Details

host_networkingEthernet
host_networking_topologyIntegrated
host_network_card_count1

Software Details

frameworkKRAI Inference Library Technology (KILT) with ONNX Runtime support
operating_systemUbuntu 22.04.4 LTS (Linux kernel 5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux)
other_software_stackONNX Runtime v1.18.1; Python v3.9.19; GCC v11.4.0
sw_notesPowered by the KRAI X and KILT technologies
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
bert-99Samples/s 2.68543
resnetSamples/s 149.84
+ + + + \ No newline at end of file diff --git a/open/Krai/results/7920t-kilt-onnxruntime_gpu/summary/README.md b/open/Krai/results/7920t-kilt-onnxruntime_gpu/summary/README.md new file mode 100644 index 00000000..3aa490b3 --- /dev/null +++ b/open/Krai/results/7920t-kilt-onnxruntime_gpu/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Krai

+

Dell Precision 7920 Tower (1x NVIDIA RTX A5000 GPU)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:KraiAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency1170 MHz (base); 1695 MHz (turbo)
accelerator_host_interconnectPCIe Gen 4
accelerator_interconnectNVIDIA NVLink
accelerator_interconnect_topology
accelerator_memory_capacity24 GB
accelerator_memory_configuration1x 24 GB
accelerator_model_nameNVIDIA RTX A5000 GPU
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity96 GB
host_memory_configuration6x 16 GB
host_processor_cachesL1d cache: 768 KiB (24 instances); L1i cache: 768 KiB (24 instances); L2 cache: 24 MiB (24 instances); L3 cache: 35.8 MiB (1 instance)
host_processor_core_count24
host_processor_frequency1000 MHz (min); 2400 MHz (base); 4000 MHz (boost)
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Gold 6240R CPU @ 2.40GHz
host_processors_per_node1
host_processor_urlhttps://www.intel.com/content/www/us/en/products/sku/199343/intel-xeon-gold-6240r-processor-35-75m-cache-2-40-ghz/specifications.html

Other Hardware Details

coolingair
hw_notes

Network and Interconnect Details

host_networkingEthernet
host_networking_topologyIntegrated
host_network_card_count1

Software Details

frameworkKRAI Inference Library Technology (KILT) with ONNX Runtime support
operating_systemUbuntu 22.04.4 LTS (Linux kernel 5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux)
other_software_stackCUDA v12.5; ONNX Runtime v1.18.1; Python v3.9.19; GCC v11.4.0
sw_notesPowered by the KRAI X and KILT technologies
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
bert-99Samples/s 65.5708
resnetSamples/s 1090.09
diff --git a/open/Krai/results/7920t-kilt-onnxruntime_gpu/summary/summary.html b/open/Krai/results/7920t-kilt-onnxruntime_gpu/summary/summary.html new file mode 100644 index 00000000..aff4a788 --- /dev/null +++ b/open/Krai/results/7920t-kilt-onnxruntime_gpu/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Krai

+

Dell Precision 7920 Tower (1x NVIDIA RTX A5000 GPU)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:KraiAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency1170 MHz (base); 1695 MHz (turbo)
accelerator_host_interconnectPCIe Gen 4
accelerator_interconnectNVIDIA NVLink
accelerator_interconnect_topology
accelerator_memory_capacity24 GB
accelerator_memory_configuration1x 24 GB
accelerator_model_nameNVIDIA RTX A5000 GPU
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity96 GB
host_memory_configuration6x 16 GB
host_processor_cachesL1d cache: 768 KiB (24 instances); L1i cache: 768 KiB (24 instances); L2 cache: 24 MiB (24 instances); L3 cache: 35.8 MiB (1 instance)
host_processor_core_count24
host_processor_frequency1000 MHz (min); 2400 MHz (base); 4000 MHz (boost)
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Gold 6240R CPU @ 2.40GHz
host_processors_per_node1
host_processor_urlhttps://www.intel.com/content/www/us/en/products/sku/199343/intel-xeon-gold-6240r-processor-35-75m-cache-2-40-ghz/specifications.html

Other Hardware Details

coolingair
hw_notes

Network and Interconnect Details

host_networkingEthernet
host_networking_topologyIntegrated
host_network_card_count1

Software Details

frameworkKRAI Inference Library Technology (KILT) with ONNX Runtime support
operating_systemUbuntu 22.04.4 LTS (Linux kernel 5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux)
other_software_stackCUDA v12.5; ONNX Runtime v1.18.1; Python v3.9.19; GCC v11.4.0
sw_notesPowered by the KRAI X and KILT technologies
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
bert-99Samples/s 65.5708
resnetSamples/s 1090.09
+ + + + \ No newline at end of file diff --git a/open/NVIDIA/results/H200-SXM-141GBx1_TRT_DepthPruned/summary/README.md b/open/NVIDIA/results/H200-SXM-141GBx1_TRT_DepthPruned/summary/README.md new file mode 100644 index 00000000..adebc30b --- /dev/null +++ b/open/NVIDIA/results/H200-SXM-141GBx1_TRT_DepthPruned/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NVIDIA

+

NVIDIA H200 (1x H200-SXM-141GB)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NVIDIAAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity141 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H200-SXM-141GB
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB MTC40F2046S1RC48BA1
host_processor_caches
host_processor_core_count56
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8480C
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notesH200 TGP 700W
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count10x 400Gbe Infiniband
host_networkingInfiniband; Data bandwidth for GPU-PCIe: 504GB/s; PCIe-NIC: 500GB/s
host_networking_topologyEthernet/Infiniband on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkCUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackCUDA 12.4, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99Tokens/s 11189.1
diff --git a/open/NVIDIA/results/H200-SXM-141GBx1_TRT_DepthPruned/summary/summary.html b/open/NVIDIA/results/H200-SXM-141GBx1_TRT_DepthPruned/summary/summary.html new file mode 100644 index 00000000..faa99eb7 --- /dev/null +++ b/open/NVIDIA/results/H200-SXM-141GBx1_TRT_DepthPruned/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NVIDIA

+

NVIDIA H200 (1x H200-SXM-141GB)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NVIDIAAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity141 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H200-SXM-141GB
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB MTC40F2046S1RC48BA1
host_processor_caches
host_processor_core_count56
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8480C
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notesH200 TGP 700W
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count10x 400Gbe Infiniband
host_networkingInfiniband; Data bandwidth for GPU-PCIe: 504GB/s; PCIe-NIC: 500GB/s
host_networking_topologyEthernet/Infiniband on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkCUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackCUDA 12.4, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99Tokens/s 11189.1
+ + + + \ No newline at end of file diff --git a/open/NVIDIA/results/H200-SXM-141GBx1_TRT_LCM/summary/README.md b/open/NVIDIA/results/H200-SXM-141GBx1_TRT_LCM/summary/README.md new file mode 100644 index 00000000..6c6aabeb --- /dev/null +++ b/open/NVIDIA/results/H200-SXM-141GBx1_TRT_LCM/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NVIDIA

+

NVIDIA H200 (1x H200-SXM-141GB)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NVIDIAAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity141 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H200-SXM-141GB
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB MTC40F2046S1RC48BA1
host_processor_caches
host_processor_core_count56
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8480C
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notesH200 TGP 700W
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count10x 400Gbe Infiniband
host_networkingInfiniband; Data bandwidth for GPU-PCIe: 504GB/s; PCIe-NIC: 500GB/s
host_networking_topologyEthernet/Infiniband on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkCUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackCUDA 12.4, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
stable-diffusion-xlSamples/s 10.7891
diff --git a/open/NVIDIA/results/H200-SXM-141GBx1_TRT_LCM/summary/summary.html b/open/NVIDIA/results/H200-SXM-141GBx1_TRT_LCM/summary/summary.html new file mode 100644 index 00000000..8f7028ea --- /dev/null +++ b/open/NVIDIA/results/H200-SXM-141GBx1_TRT_LCM/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NVIDIA

+

NVIDIA H200 (1x H200-SXM-141GB)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NVIDIAAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity141 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H200-SXM-141GB
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB MTC40F2046S1RC48BA1
host_processor_caches
host_processor_core_count56
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8480C
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notesH200 TGP 700W
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count10x 400Gbe Infiniband
host_networkingInfiniband; Data bandwidth for GPU-PCIe: 504GB/s; PCIe-NIC: 500GB/s
host_networking_topologyEthernet/Infiniband on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkCUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackCUDA 12.4, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
stable-diffusion-xlSamples/s 10.7891
+ + + + \ No newline at end of file diff --git a/open/NVIDIA/results/H200-SXM-141GBx1_TRT_Sparse/summary/README.md b/open/NVIDIA/results/H200-SXM-141GBx1_TRT_Sparse/summary/README.md new file mode 100644 index 00000000..6a0d9df2 --- /dev/null +++ b/open/NVIDIA/results/H200-SXM-141GBx1_TRT_Sparse/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NVIDIA

+

NVIDIA H200 (1x H200-SXM-141GB)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NVIDIAAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity141 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H200-SXM-141GB
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB MTC40F2046S1RC48BA1
host_processor_caches
host_processor_core_count56
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8480C
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notesH200 TGP 700W
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count10x 400Gbe Infiniband
host_networkingInfiniband; Data bandwidth for GPU-PCIe: 504GB/s; PCIe-NIC: 500GB/s
host_networking_topologyEthernet/Infiniband on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkCUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackCUDA 12.4, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99Tokens/s 4575.06
diff --git a/open/NVIDIA/results/H200-SXM-141GBx1_TRT_Sparse/summary/summary.html b/open/NVIDIA/results/H200-SXM-141GBx1_TRT_Sparse/summary/summary.html new file mode 100644 index 00000000..6808369f --- /dev/null +++ b/open/NVIDIA/results/H200-SXM-141GBx1_TRT_Sparse/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NVIDIA

+

NVIDIA H200 (1x H200-SXM-141GB)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NVIDIAAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity141 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H200-SXM-141GB
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity2 TB
host_memory_configuration32x 64GB MTC40F2046S1RC48BA1
host_processor_caches
host_processor_core_count56
host_processor_frequency
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8480C
host_processors_per_node2

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notesH200 TGP 700W
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count10x 400Gbe Infiniband
host_networkingInfiniband; Data bandwidth for GPU-PCIe: 504GB/s; PCIe-NIC: 500GB/s
host_networking_topologyEthernet/Infiniband on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkCUDA 12.4
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.4
other_software_stackCUDA 12.4, Driver 550.54
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99Tokens/s 4575.06
+ + + + \ No newline at end of file diff --git a/open/NVIDIA/results/Orin_TRT_DepthPruned/summary/README.md b/open/NVIDIA/results/Orin_TRT_DepthPruned/summary/README.md new file mode 100644 index 00000000..2833d885 --- /dev/null +++ b/open/NVIDIA/results/Orin_TRT_DepthPruned/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NVIDIA

+

NVIDIA Jetson AGX Orin Developer Kit 64G (TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NVIDIAAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacityShared with host
accelerator_memory_configurationLPDDR5
accelerator_model_nameNVIDIA Jetson AGX Orin 64G
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity64 GB
host_memory_configuration64GB 256-bit LPDDR5
host_processor_caches
host_processor_core_count12
host_processor_frequency
host_processor_interconnect
host_processor_model_name12-core ARM Cortex-A78AE CPU
host_processors_per_node1

Other Hardware Details

coolingAir-cooled
disk_controllerseMMC 5.1
disk_driveseMMC 5.1
hw_notesGPU and both DLAs are used in resnet50 and Retinanet, in Offline scenario
other_hardware
power_management
power_supply_detailsDell USB-C 130.0W Adapter (HA130PM170)
power_supply_quantity_and_rating_watts130W

Network and Interconnect Details

host_network_card_count1 Integrated
host_networkingGig Ethernet
host_networking_topologyUSB forwarded
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkJetpack 6.0, TensorRT 10.1, CUDA 12.2
management_firmware_version
nics_enabled_firmware
operating_systemJetson r36.3.1 L4T
other_software_stackJetpack 6.0, TensorRT 10.1, CUDA 12.2, cuDNN 8.9.4
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99Tokens/s 184.893
diff --git a/open/NVIDIA/results/Orin_TRT_DepthPruned/summary/summary.html b/open/NVIDIA/results/Orin_TRT_DepthPruned/summary/summary.html new file mode 100644 index 00000000..3d821d24 --- /dev/null +++ b/open/NVIDIA/results/Orin_TRT_DepthPruned/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NVIDIA

+

NVIDIA Jetson AGX Orin Developer Kit 64G (TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NVIDIAAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacityShared with host
accelerator_memory_configurationLPDDR5
accelerator_model_nameNVIDIA Jetson AGX Orin 64G
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity64 GB
host_memory_configuration64GB 256-bit LPDDR5
host_processor_caches
host_processor_core_count12
host_processor_frequency
host_processor_interconnect
host_processor_model_name12-core ARM Cortex-A78AE CPU
host_processors_per_node1

Other Hardware Details

coolingAir-cooled
disk_controllerseMMC 5.1
disk_driveseMMC 5.1
hw_notesGPU and both DLAs are used in resnet50 and Retinanet, in Offline scenario
other_hardware
power_management
power_supply_detailsDell USB-C 130.0W Adapter (HA130PM170)
power_supply_quantity_and_rating_watts130W

Network and Interconnect Details

host_network_card_count1 Integrated
host_networkingGig Ethernet
host_networking_topologyUSB forwarded
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkJetpack 6.0, TensorRT 10.1, CUDA 12.2
management_firmware_version
nics_enabled_firmware
operating_systemJetson r36.3.1 L4T
other_software_stackJetpack 6.0, TensorRT 10.1, CUDA 12.2, cuDNN 8.9.4
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99Tokens/s 184.893
+ + + + \ No newline at end of file diff --git a/open/NeuralMagic/results/4xH100-SXM-80GB_vLLM_FP8-reference-cpu-pytorch-v2.3.1-default_config/summary/README.md b/open/NeuralMagic/results/4xH100-SXM-80GB_vLLM_FP8-reference-cpu-pytorch-v2.3.1-default_config/summary/README.md new file mode 100644 index 00000000..25b68768 --- /dev/null +++ b/open/NeuralMagic/results/4xH100-SXM-80GB_vLLM_FP8-reference-cpu-pytorch-v2.3.1-default_config/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NeuralMagic

+

PCSPECIALIST AMD AM5

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NeuralMagicAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacityN/A
accelerator_memory_configurationN/A
accelerator_model_nameN/A
accelerator_on-chip_memories
accelerators_per_node0

Processor and Memory Details

host_memory_capacity128G
host_memory_configurationundefined
host_processor_cachesL1d cache: 512 KiB (16 instances), L1i cache: 512 KiB (16 instances), L2 cache: 16 MiB (16 instances), L3 cache: 64 MiB (2 instances)
host_processor_core_count16
host_processor_frequency5881.0000
host_processor_interconnect
host_processor_model_nameAMD Ryzen 9 7950X 16-Core Processor
host_processors_per_node1

Network and Interconnect Details

host_network_card_count1
host_networkingGig Ethernet
host_networking_topologyN/A

Software Details

frameworkdeepsparse v1.5.2
operating_systemUbuntu 22.04 (linux-6.5.0-41-generic-glibc2.35)
other_software_stackPython: 3.10.12, GCC-11.4.0
sw_notesAutomated by MLCommons CM v2.3.1.
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99Tokens/s 1468.24
diff --git a/open/NeuralMagic/results/4xH100-SXM-80GB_vLLM_FP8-reference-cpu-pytorch-v2.3.1-default_config/summary/summary.html b/open/NeuralMagic/results/4xH100-SXM-80GB_vLLM_FP8-reference-cpu-pytorch-v2.3.1-default_config/summary/summary.html new file mode 100644 index 00000000..3ae18950 --- /dev/null +++ b/open/NeuralMagic/results/4xH100-SXM-80GB_vLLM_FP8-reference-cpu-pytorch-v2.3.1-default_config/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NeuralMagic

+

PCSPECIALIST AMD AM5

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NeuralMagicAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacityN/A
accelerator_memory_configurationN/A
accelerator_model_nameN/A
accelerator_on-chip_memories
accelerators_per_node0

Processor and Memory Details

host_memory_capacity128G
host_memory_configurationundefined
host_processor_cachesL1d cache: 512 KiB (16 instances), L1i cache: 512 KiB (16 instances), L2 cache: 16 MiB (16 instances), L3 cache: 64 MiB (2 instances)
host_processor_core_count16
host_processor_frequency5881.0000
host_processor_interconnect
host_processor_model_nameAMD Ryzen 9 7950X 16-Core Processor
host_processors_per_node1

Network and Interconnect Details

host_network_card_count1
host_networkingGig Ethernet
host_networking_topologyN/A

Software Details

frameworkdeepsparse v1.5.2
operating_systemUbuntu 22.04 (linux-6.5.0-41-generic-glibc2.35)
other_software_stackPython: 3.10.12, GCC-11.4.0
sw_notesAutomated by MLCommons CM v2.3.1.
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99Tokens/s 1468.24
+ + + + \ No newline at end of file diff --git a/open/NeuralMagic/results/4xH100-SXM-80GB_vLLM_GPTQ-reference-cpu-pytorch-v2.3.1-default_config/summary/README.md b/open/NeuralMagic/results/4xH100-SXM-80GB_vLLM_GPTQ-reference-cpu-pytorch-v2.3.1-default_config/summary/README.md new file mode 100644 index 00000000..31fc8802 --- /dev/null +++ b/open/NeuralMagic/results/4xH100-SXM-80GB_vLLM_GPTQ-reference-cpu-pytorch-v2.3.1-default_config/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NeuralMagic

+

SYS-821GE-TNHR H100 'beaker' (4x H100-SXM-80GB, vLLM, GPTQ)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NeuralMagicAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-SXM-80GB
accelerator_on-chip_memories
accelerators_per_node4

Processor and Memory Details

host_memory_capacity2.1T
host_memory_configurationundefined
host_processor_cachesL1d cache: 3 MiB (64 instances), L1i cache: 2 MiB (64 instances), L2 cache: 128 MiB (64 instances), L3 cache: 120 MiB (2 instances)
host_processor_core_count32
host_processor_frequency4100.0000
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8462Y+
host_processors_per_node2

Other Hardware Details

coolingair
hw_notes

Network and Interconnect Details

host_network_card_count1
host_networkingGig Ethernet
host_networking_topologyN/A

Software Details

frameworkvLLM 0.5.2
operating_systemUbuntu 22.04 (linux-6.5.0-35-generic-glibc2.35)
other_software_stackPython: 3.10.12, LLVM-15.0.6
sw_notesAutomated by MLCommons CM v2.3.4.
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99Tokens/s 1164.04Tokens/s 1577.11
diff --git a/open/NeuralMagic/results/4xH100-SXM-80GB_vLLM_GPTQ-reference-cpu-pytorch-v2.3.1-default_config/summary/summary.html b/open/NeuralMagic/results/4xH100-SXM-80GB_vLLM_GPTQ-reference-cpu-pytorch-v2.3.1-default_config/summary/summary.html new file mode 100644 index 00000000..e10cb329 --- /dev/null +++ b/open/NeuralMagic/results/4xH100-SXM-80GB_vLLM_GPTQ-reference-cpu-pytorch-v2.3.1-default_config/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NeuralMagic

+

SYS-821GE-TNHR H100 'beaker' (4x H100-SXM-80GB, vLLM, GPTQ)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NeuralMagicAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen5 x16
accelerator_interconnect18x 4th Gen NVLink, 900GB/s
accelerator_interconnect_topology
accelerator_memory_capacity80 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA H100-SXM-80GB
accelerator_on-chip_memories
accelerators_per_node4

Processor and Memory Details

host_memory_capacity2.1T
host_memory_configurationundefined
host_processor_cachesL1d cache: 3 MiB (64 instances), L1i cache: 2 MiB (64 instances), L2 cache: 128 MiB (64 instances), L3 cache: 120 MiB (2 instances)
host_processor_core_count32
host_processor_frequency4100.0000
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) Platinum 8462Y+
host_processors_per_node2

Other Hardware Details

coolingair
hw_notes

Network and Interconnect Details

host_network_card_count1
host_networkingGig Ethernet
host_networking_topologyN/A

Software Details

frameworkvLLM 0.5.2
operating_systemUbuntu 22.04 (linux-6.5.0-35-generic-glibc2.35)
other_software_stackPython: 3.10.12, LLVM-15.0.6
sw_notesAutomated by MLCommons CM v2.3.4.
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99Tokens/s 1164.04Tokens/s 1577.11
+ + + + \ No newline at end of file diff --git a/open/NeuralMagic/results/ASUS_Vivobook-reference-cpu-deepsparse_v1.8.0-default_config/summary/README.md b/open/NeuralMagic/results/ASUS_Vivobook-reference-cpu-deepsparse_v1.8.0-default_config/summary/README.md new file mode 100644 index 00000000..2f7139c6 --- /dev/null +++ b/open/NeuralMagic/results/ASUS_Vivobook-reference-cpu-deepsparse_v1.8.0-default_config/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NeuralMagic

+

ASUS_Vivobook

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NeuralMagicAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacityN/A
accelerator_memory_configurationN/A
accelerator_model_nameN/A
accelerator_on-chip_memories
accelerators_per_node0

Processor and Memory Details

host_memory_capacity19G
host_memory_configurationundefined
host_processor_cachesL1d cache: 192 KiB (6 instances), L1i cache: 192 KiB (6 instances), L2 cache: 3 MiB (6 instances), L3 cache: 8 MiB (2 instances)
host_processor_core_count6
host_processor_frequency4056.0000
host_processor_interconnect
host_processor_model_nameAMD Ryzen 5 5500U with Radeon Graphics
host_processors_per_node1

Other Hardware Details

coolingair
hw_notes

Network and Interconnect Details

host_network_card_count1
host_networkingGig Ethernet
host_networking_topologyN/A

Software Details

frameworkdeepsparse v1.8.0
operating_systemUbuntu 22.04 (linux-6.5.0-44-generic-glibc2.35)
other_software_stackPython: 3.10.12, GCC-11.4.0
sw_notesAutomated by MLCommons CM v2.3.3.
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
bert-99Samples/s 16.1606
diff --git a/open/NeuralMagic/results/ASUS_Vivobook-reference-cpu-deepsparse_v1.8.0-default_config/summary/summary.html b/open/NeuralMagic/results/ASUS_Vivobook-reference-cpu-deepsparse_v1.8.0-default_config/summary/summary.html new file mode 100644 index 00000000..03ecb126 --- /dev/null +++ b/open/NeuralMagic/results/ASUS_Vivobook-reference-cpu-deepsparse_v1.8.0-default_config/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NeuralMagic

+

ASUS_Vivobook

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NeuralMagicAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacityN/A
accelerator_memory_configurationN/A
accelerator_model_nameN/A
accelerator_on-chip_memories
accelerators_per_node0

Processor and Memory Details

host_memory_capacity19G
host_memory_configurationundefined
host_processor_cachesL1d cache: 192 KiB (6 instances), L1i cache: 192 KiB (6 instances), L2 cache: 3 MiB (6 instances), L3 cache: 8 MiB (2 instances)
host_processor_core_count6
host_processor_frequency4056.0000
host_processor_interconnect
host_processor_model_nameAMD Ryzen 5 5500U with Radeon Graphics
host_processors_per_node1

Other Hardware Details

coolingair
hw_notes

Network and Interconnect Details

host_network_card_count1
host_networkingGig Ethernet
host_networking_topologyN/A

Software Details

frameworkdeepsparse v1.8.0
operating_systemUbuntu 22.04 (linux-6.5.0-44-generic-glibc2.35)
other_software_stackPython: 3.10.12, GCC-11.4.0
sw_notesAutomated by MLCommons CM v2.3.3.
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
bert-99Samples/s 16.1606
+ + + + \ No newline at end of file diff --git a/open/NeuralMagic/results/GO_2xRTX4090-reference-cpu-pytorch-v2.2.1-default_config/summary/README.md b/open/NeuralMagic/results/GO_2xRTX4090-reference-cpu-pytorch-v2.2.1-default_config/summary/README.md new file mode 100644 index 00000000..9290f970 --- /dev/null +++ b/open/NeuralMagic/results/GO_2xRTX4090-reference-cpu-pytorch-v2.2.1-default_config/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NeuralMagic

+

GATE Overflow Intel Sapphire Rapids RTX 4090 (2x RTX 4090, vLLM, FP8)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NeuralMagicAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency2520.000000 MHz
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacity23.64703369140625 GB
accelerator_memory_configurationN/A
accelerator_model_nameNVIDIA GeForce RTX 4090
accelerator_on-chip_memories
accelerators_per_node2

Processor and Memory Details

host_memory_capacity202G
host_memory_configurationundefined
host_processor_cachesL1d cache: 1.1 MiB (24 instances), L1i cache: 768 KiB (24 instances), L2 cache: 48 MiB (24 instances), L3 cache: 45 MiB (1 instance)
host_processor_core_count24
host_processor_frequency4800.0000
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) w7-2495X
host_processors_per_node1

Other Hardware Details

coolingair
hw_notes

Network and Interconnect Details

host_network_card_count1
host_networkingGig Ethernet
host_networking_topologyN/A

Software Details

frameworkvLLM 0.5.2
operating_systemUbuntu 23.04 (linux-6.2.0-39-generic-glibc2.37)
other_software_stackPython: 3.11.4, LLVM-10.0.1
sw_notesAutomated by MLCommons CM v2.3.1.
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99Tokens/s 424.895
llama2-70b-99.9Queries/s 1954.36Tokens/s 424.895
diff --git a/open/NeuralMagic/results/GO_2xRTX4090-reference-cpu-pytorch-v2.2.1-default_config/summary/summary.html b/open/NeuralMagic/results/GO_2xRTX4090-reference-cpu-pytorch-v2.2.1-default_config/summary/summary.html new file mode 100644 index 00000000..16f8a21e --- /dev/null +++ b/open/NeuralMagic/results/GO_2xRTX4090-reference-cpu-pytorch-v2.2.1-default_config/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NeuralMagic

+

GATE Overflow Intel Sapphire Rapids RTX 4090 (2x RTX 4090, vLLM, FP8)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NeuralMagicAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency2520.000000 MHz
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacity23.64703369140625 GB
accelerator_memory_configurationN/A
accelerator_model_nameNVIDIA GeForce RTX 4090
accelerator_on-chip_memories
accelerators_per_node2

Processor and Memory Details

host_memory_capacity202G
host_memory_configurationundefined
host_processor_cachesL1d cache: 1.1 MiB (24 instances), L1i cache: 768 KiB (24 instances), L2 cache: 48 MiB (24 instances), L3 cache: 45 MiB (1 instance)
host_processor_core_count24
host_processor_frequency4800.0000
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) w7-2495X
host_processors_per_node1

Other Hardware Details

coolingair
hw_notes

Network and Interconnect Details

host_network_card_count1
host_networkingGig Ethernet
host_networking_topologyN/A

Software Details

frameworkvLLM 0.5.2
operating_systemUbuntu 23.04 (linux-6.2.0-39-generic-glibc2.37)
other_software_stackPython: 3.11.4, LLVM-10.0.1
sw_notesAutomated by MLCommons CM v2.3.1.
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99Tokens/s 424.895
llama2-70b-99.9Queries/s 1954.36Tokens/s 424.895
+ + + + \ No newline at end of file diff --git a/open/NeuralMagic/results/GO_Intel_SPR-intel-cpu-pytorch-vdefault-default_config/summary/README.md b/open/NeuralMagic/results/GO_Intel_SPR-intel-cpu-pytorch-vdefault-default_config/summary/README.md new file mode 100644 index 00000000..4760b333 --- /dev/null +++ b/open/NeuralMagic/results/GO_Intel_SPR-intel-cpu-pytorch-vdefault-default_config/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NeuralMagic

+

GO Intel SPR 1S 24C

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NeuralMagicAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacityN/A
accelerator_memory_configurationN/A
accelerator_model_nameN/A
accelerator_on-chip_memories
accelerators_per_node0

Processor and Memory Details

host_memory_capacity202G
host_memory_configurationundefined
host_processor_cachesL1d cache: 1.1 MiB (24 instances), L1i cache: 768 KiB (24 instances), L2 cache: 48 MiB (24 instances), L3 cache: 45 MiB (1 instance)
host_processor_core_count24
host_processor_frequency4800.0000
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) w7-2495X
host_processors_per_node1

Other Hardware Details

coolingair
hw_notes

Network and Interconnect Details

host_network_card_count1
host_networkingGig Ethernet
host_networking_topologyN/A

Software Details

frameworkpytorch v2.2.1
operating_systemUbuntu 23.04 (linux-6.2.0-39-generic-glibc2.37)
other_software_stackPython: 3.11.4, LLVM-10.0.1
sw_notesAutomated by MLCommons CM v2.3.1.
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
dlrm-v2-99Samples/s 822.557
diff --git a/open/NeuralMagic/results/GO_Intel_SPR-intel-cpu-pytorch-vdefault-default_config/summary/summary.html b/open/NeuralMagic/results/GO_Intel_SPR-intel-cpu-pytorch-vdefault-default_config/summary/summary.html new file mode 100644 index 00000000..7e1bbf58 --- /dev/null +++ b/open/NeuralMagic/results/GO_Intel_SPR-intel-cpu-pytorch-vdefault-default_config/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NeuralMagic

+

GO Intel SPR 1S 24C

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NeuralMagicAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacityN/A
accelerator_memory_configurationN/A
accelerator_model_nameN/A
accelerator_on-chip_memories
accelerators_per_node0

Processor and Memory Details

host_memory_capacity202G
host_memory_configurationundefined
host_processor_cachesL1d cache: 1.1 MiB (24 instances), L1i cache: 768 KiB (24 instances), L2 cache: 48 MiB (24 instances), L3 cache: 45 MiB (1 instance)
host_processor_core_count24
host_processor_frequency4800.0000
host_processor_interconnect
host_processor_model_nameIntel(R) Xeon(R) w7-2495X
host_processors_per_node1

Other Hardware Details

coolingair
hw_notes

Network and Interconnect Details

host_network_card_count1
host_networkingGig Ethernet
host_networking_topologyN/A

Software Details

frameworkpytorch v2.2.1
operating_systemUbuntu 23.04 (linux-6.2.0-39-generic-glibc2.37)
other_software_stackPython: 3.11.4, LLVM-10.0.1
sw_notesAutomated by MLCommons CM v2.3.1.
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
dlrm-v2-99Samples/s 822.557
+ + + + \ No newline at end of file diff --git a/open/NeuralMagic/results/pcspecialist_amd_am5-reference-gpu-pytorch-v2.2.1-default_config/summary/README.md b/open/NeuralMagic/results/pcspecialist_amd_am5-reference-gpu-pytorch-v2.2.1-default_config/summary/README.md new file mode 100644 index 00000000..41ae9327 --- /dev/null +++ b/open/NeuralMagic/results/pcspecialist_amd_am5-reference-gpu-pytorch-v2.2.1-default_config/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NeuralMagic

+

PCSPECIALIST AMD AM5 (1x RTX 4090)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NeuralMagicAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency2610.000000 MHz
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacity23.64971923828125 GB
accelerator_memory_configurationN/A
accelerator_model_nameNVIDIA GeForce RTX 4090
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity128G
host_memory_configurationundefined
host_processor_cachesL1d cache: 512 KiB (16 instances), L1i cache: 512 KiB (16 instances), L2 cache: 16 MiB (16 instances), L3 cache: 64 MiB (2 instances)
host_processor_core_count16
host_processor_frequency5881.0000
host_processor_interconnect
host_processor_model_nameAMD Ryzen 9 7950X 16-Core Processor
host_processors_per_node1

Other Hardware Details

coolingair
hw_notes

Network and Interconnect Details

host_network_card_count1
host_networkingGig Ethernet
host_networking_topologyN/A

Software Details

frameworkdeepsparse v1.5.2
operating_systemUbuntu 22.04 (linux-6.5.0-41-generic-glibc2.35)
other_software_stackPython: 3.10.12, GCC-11.4.0
sw_notesAutomated by MLCommons CM v2.3.1.
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99.9Queries/s 1429.85Samples/s 1337.46
diff --git a/open/NeuralMagic/results/pcspecialist_amd_am5-reference-gpu-pytorch-v2.2.1-default_config/summary/summary.html b/open/NeuralMagic/results/pcspecialist_amd_am5-reference-gpu-pytorch-v2.2.1-default_config/summary/summary.html new file mode 100644 index 00000000..a3cf8501 --- /dev/null +++ b/open/NeuralMagic/results/pcspecialist_amd_am5-reference-gpu-pytorch-v2.2.1-default_config/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NeuralMagic

+

PCSPECIALIST AMD AM5 (1x RTX 4090)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NeuralMagicAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency2610.000000 MHz
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacity23.64971923828125 GB
accelerator_memory_configurationN/A
accelerator_model_nameNVIDIA GeForce RTX 4090
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity128G
host_memory_configurationundefined
host_processor_cachesL1d cache: 512 KiB (16 instances), L1i cache: 512 KiB (16 instances), L2 cache: 16 MiB (16 instances), L3 cache: 64 MiB (2 instances)
host_processor_core_count16
host_processor_frequency5881.0000
host_processor_interconnect
host_processor_model_nameAMD Ryzen 9 7950X 16-Core Processor
host_processors_per_node1

Other Hardware Details

coolingair
hw_notes

Network and Interconnect Details

host_network_card_count1
host_networkingGig Ethernet
host_networking_topologyN/A

Software Details

frameworkdeepsparse v1.5.2
operating_systemUbuntu 22.04 (linux-6.5.0-41-generic-glibc2.35)
other_software_stackPython: 3.10.12, GCC-11.4.0
sw_notesAutomated by MLCommons CM v2.3.1.
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99.9Queries/s 1429.85Samples/s 1337.46
+ + + + \ No newline at end of file diff --git a/open/NeuralMagic/results/phoenix_Amd_Am5-reference-cpu-deepsparse-vdefault-default_config/summary/README.md b/open/NeuralMagic/results/phoenix_Amd_Am5-reference-cpu-deepsparse-vdefault-default_config/summary/README.md new file mode 100644 index 00000000..99855dd8 --- /dev/null +++ b/open/NeuralMagic/results/phoenix_Amd_Am5-reference-cpu-deepsparse-vdefault-default_config/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NeuralMagic

+

PCSPECIALIST AMD AM5

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NeuralMagicAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacityN/A
accelerator_memory_configurationN/A
accelerator_model_nameN/A
accelerator_on-chip_memories
accelerators_per_node0

Processor and Memory Details

host_memory_capacity128G
host_memory_configurationundefined
host_processor_cachesL1d cache: 512 KiB (16 instances), L1i cache: 512 KiB (16 instances), L2 cache: 16 MiB (16 instances), L3 cache: 64 MiB (2 instances)
host_processor_core_count16
host_processor_frequency5881.0000
host_processor_interconnect
host_processor_model_nameAMD Ryzen 9 7950X 16-Core Processor
host_processors_per_node1

Other Hardware Details

coolingair
hw_notes

Network and Interconnect Details

host_network_card_count1
host_networkingGig Ethernet
host_networking_topologyN/A

Software Details

frameworkdeepsparse v1.5.2
operating_systemUbuntu 22.04 (linux-6.5.0-41-generic-glibc2.35)
other_software_stackPython: 3.10.12, GCC-11.4.0
sw_notesAutomated by MLCommons CM v2.3.1.
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
bert-99Samples/s 98.963
diff --git a/open/NeuralMagic/results/phoenix_Amd_Am5-reference-cpu-deepsparse-vdefault-default_config/summary/summary.html b/open/NeuralMagic/results/phoenix_Amd_Am5-reference-cpu-deepsparse-vdefault-default_config/summary/summary.html new file mode 100644 index 00000000..64839abf --- /dev/null +++ b/open/NeuralMagic/results/phoenix_Amd_Am5-reference-cpu-deepsparse-vdefault-default_config/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NeuralMagic

+

PCSPECIALIST AMD AM5

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NeuralMagicAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectN/A
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacityN/A
accelerator_memory_configurationN/A
accelerator_model_nameN/A
accelerator_on-chip_memories
accelerators_per_node0

Processor and Memory Details

host_memory_capacity128G
host_memory_configurationundefined
host_processor_cachesL1d cache: 512 KiB (16 instances), L1i cache: 512 KiB (16 instances), L2 cache: 16 MiB (16 instances), L3 cache: 64 MiB (2 instances)
host_processor_core_count16
host_processor_frequency5881.0000
host_processor_interconnect
host_processor_model_nameAMD Ryzen 9 7950X 16-Core Processor
host_processors_per_node1

Other Hardware Details

coolingair
hw_notes

Network and Interconnect Details

host_network_card_count1
host_networkingGig Ethernet
host_networking_topologyN/A

Software Details

frameworkdeepsparse v1.5.2
operating_systemUbuntu 22.04 (linux-6.5.0-41-generic-glibc2.35)
other_software_stackPython: 3.10.12, GCC-11.4.0
sw_notesAutomated by MLCommons CM v2.3.1.
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
bert-99Samples/s 98.963
+ + + + \ No newline at end of file diff --git a/open/NeuralMagic/results/vLLM_8xL40S-reference-cpu-pytorch-v2.3.1-default_config/summary/README.md b/open/NeuralMagic/results/vLLM_8xL40S-reference-cpu-pytorch-v2.3.1-default_config/summary/README.md new file mode 100644 index 00000000..912f0c8f --- /dev/null +++ b/open/NeuralMagic/results/vLLM_8xL40S-reference-cpu-pytorch-v2.3.1-default_config/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NeuralMagic

+

Crusoe Cloud L40S (8x L40S PCIe, vLLM)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NeuralMagicAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen4 x16
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacity48 GB
accelerator_memory_configurationGDDR6
accelerator_model_nameNVIDIA L40S
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity1.5T
host_memory_configurationundefined
host_processor_cachesL1d cache: 1.3 MiB (40 instances), L1i cache: 1.3 MiB (40 instances), L2 cache: 40 MiB (40 instances), L3 cache: 160 MiB (5 instances)
host_processor_core_count4
host_processor_frequencyundefined
host_processor_interconnect
host_processor_model_nameAMD EPYC 9254 24-Core Processor
host_processors_per_node2

Other Hardware Details

coolingair
hw_notes

Network and Interconnect Details

host_network_card_count1
host_networkingGig Ethernet
host_networking_topologyN/A

Software Details

frameworkvLLM 0.5.2
operating_systemUbuntu 22.04 (linux-5.15.0-94-generic-glibc2.35)
other_software_stackPython: 3.10.12, LLVM-15.0.6
sw_notesAutomated by MLCommons CM v2.3.3.
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99Tokens/s 923.333
llama2-70b-99.9Tokens/s 923.333
diff --git a/open/NeuralMagic/results/vLLM_8xL40S-reference-cpu-pytorch-v2.3.1-default_config/summary/summary.html b/open/NeuralMagic/results/vLLM_8xL40S-reference-cpu-pytorch-v2.3.1-default_config/summary/summary.html new file mode 100644 index 00000000..100cb5b4 --- /dev/null +++ b/open/NeuralMagic/results/vLLM_8xL40S-reference-cpu-pytorch-v2.3.1-default_config/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

NeuralMagic

+

Crusoe Cloud L40S (8x L40S PCIe, vLLM)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:NeuralMagicAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectPCIe Gen4 x16
accelerator_interconnectN/A
accelerator_interconnect_topology
accelerator_memory_capacity48 GB
accelerator_memory_configurationGDDR6
accelerator_model_nameNVIDIA L40S
accelerator_on-chip_memories
accelerators_per_node8

Processor and Memory Details

host_memory_capacity1.5T
host_memory_configurationundefined
host_processor_cachesL1d cache: 1.3 MiB (40 instances), L1i cache: 1.3 MiB (40 instances), L2 cache: 40 MiB (40 instances), L3 cache: 160 MiB (5 instances)
host_processor_core_count4
host_processor_frequencyundefined
host_processor_interconnect
host_processor_model_nameAMD EPYC 9254 24-Core Processor
host_processors_per_node2

Other Hardware Details

coolingair
hw_notes

Network and Interconnect Details

host_network_card_count1
host_networkingGig Ethernet
host_networking_topologyN/A

Software Details

frameworkvLLM 0.5.2
operating_systemUbuntu 22.04 (linux-5.15.0-94-generic-glibc2.35)
other_software_stackPython: 3.10.12, LLVM-15.0.6
sw_notesAutomated by MLCommons CM v2.3.3.
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
llama2-70b-99Tokens/s 923.333
llama2-70b-99.9Tokens/s 923.333
+ + + + \ No newline at end of file diff --git a/open/Oracle/results/GH200-GraceHopper-Superchip_GH200-96GB_aarch64x1_TRT/summary/README.md b/open/Oracle/results/GH200-GraceHopper-Superchip_GH200-96GB_aarch64x1_TRT/summary/README.md new file mode 100644 index 00000000..12a6a228 --- /dev/null +++ b/open/Oracle/results/GH200-GraceHopper-Superchip_GH200-96GB_aarch64x1_TRT/summary/README.md @@ -0,0 +1,58 @@ + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Oracle

+

NVIDIA GH200-GraceHopper-Superchip (1x GH200-96GB_aarch64, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:OracleAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectNVLink-C2C
accelerator_interconnect1x 400Gbe Infiniband
accelerator_interconnect_topology
accelerator_memory_capacity96 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA GH200 Grace Hopper Superchip 96GB
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity512 GB
host_memory_configuration16x 16DP (32GB) LPDDR5x
host_processor_caches
host_processor_core_count72
host_processor_frequency
host_processor_interconnect
host_processor_model_nameNVIDIA Grace CPU
host_processors_per_node1

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notesNVIDIA MGX Reference Platform;
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count1x 10Gbe Intel Ethernet X550T
host_networkingEthernet; Data bandwidth for GPU-NIC is 252.06 GB/s
host_networking_topologyEthernet/Infiniband on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 9.3.0, CUDA 12.2
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.2
other_software_stackTensorRT 9.3.0, CUDA 12.2, cuDNN 8.9.6, Driver 535.65, DALI 1.28.0
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
stable-diffusion-xlQueries/s 1.80046Samples/s 2.32815
dlrm-v2-99Samples/s 62061.8
dlrm-v2-99.9Samples/s 40885.9
diff --git a/open/Oracle/results/GH200-GraceHopper-Superchip_GH200-96GB_aarch64x1_TRT/summary/summary.html b/open/Oracle/results/GH200-GraceHopper-Superchip_GH200-96GB_aarch64x1_TRT/summary/summary.html new file mode 100644 index 00000000..f4af1c69 --- /dev/null +++ b/open/Oracle/results/GH200-GraceHopper-Superchip_GH200-96GB_aarch64x1_TRT/summary/summary.html @@ -0,0 +1,126 @@ + + + + + + + + + + +
+
+ +
+

MLPerf Inference v4.1

+

Copyright 2019-2024 MLCommons

+
+
+ + + + +
+

Oracle

+

NVIDIA GH200-GraceHopper-Superchip (1x GH200-96GB_aarch64, TensorRT)

+
+ + + + + + + + + + + + + + + +
MLPerf Inference Category:DatacenterMLPerf Inference Division:Closed
Submitted by:OracleAvailability:Available as of Aug 2024
+ + + + + +

Accelerator Details

accelerator_frequency
accelerator_host_interconnectNVLink-C2C
accelerator_interconnect1x 400Gbe Infiniband
accelerator_interconnect_topology
accelerator_memory_capacity96 GB
accelerator_memory_configurationHBM3
accelerator_model_nameNVIDIA GH200 Grace Hopper Superchip 96GB
accelerator_on-chip_memories
accelerators_per_node1

Processor and Memory Details

host_memory_capacity512 GB
host_memory_configuration16x 16DP (32GB) LPDDR5x
host_processor_caches
host_processor_core_count72
host_processor_frequency
host_processor_interconnect
host_processor_model_nameNVIDIA Grace CPU
host_processors_per_node1

Other Hardware Details

coolingAir-cooled
disk_controllersNVMe
disk_drivesSSD
hw_notesNVIDIA MGX Reference Platform;
other_hardware
power_management
power_supply_details
power_supply_quantity_and_rating_watts

Network and Interconnect Details

host_network_card_count1x 10Gbe Intel Ethernet X550T
host_networkingEthernet; Data bandwidth for GPU-NIC is 252.06 GB/s
host_networking_topologyEthernet/Infiniband on switching network
network_speed_mbit
nics_enabled_connected
nics_enabled_firmware
nics_enabled_os
number_of_type_nics_installed

Software Details

boot_firmware_version
frameworkTensorRT 9.3.0, CUDA 12.2
management_firmware_version
nics_enabled_firmware
operating_systemUbuntu 22.04.2
other_software_stackTensorRT 9.3.0, CUDA 12.2, cuDNN 8.9.6, Driver 535.65, DALI 1.28.0
sw_notes
+ +

Results Table

+ + + + + + + + + + + + + +
ModelAccuracy TargetServerOffline
MetricPerformanceMetricPerformance
stable-diffusion-xlQueries/s 1.80046Samples/s 2.32815
dlrm-v2-99Samples/s 62061.8
dlrm-v2-99.9Samples/s 40885.9
+ + + + \ No newline at end of file