Skip to content

Latest commit

 

History

History
42 lines (23 loc) · 2.2 KB

DGX.md

File metadata and controls

42 lines (23 loc) · 2.2 KB

NVIDIA DGX Servers and Supercomputers

NVIDIA DGX A100 and Selene

GTC Sessions:

  • Introducing NVIDIA DGX A100: the Universal AI System for Enterprise S21702
  • Under the Hood of the new DGX A100 System Architecture S21884
  • NVIDIA Selene: Leadership-Class AI Supercomputing Infrastructure S31844
  • Scheduling, Resource-Managing, and Monitoring Selene, a Supercomputer for Large-Scale DL and HPC. S31700
  • Accelerating AI at-scale with Selene DGXA100 SuperPOD and Parallel Filesystem * Storage S31522
  • Advanced containerized workloads in HPC environment: the Selene example S31704

HotChips

  • Hot Chips Tutorial - Scale Out Training Experiences – Megatron Language Model YouTube

    • Part I: Scale Out Systems

      • DGX A100 SuperPOD, Michael Houston, NVIDIA
      • Google TPU Pod, Sameer Kumar and Dehao Chen, Google
      • Cerebras System, Natalia Vassilieva, Cerebras
    • Part II: Scale Out Training Experiences

      • Megatron Language Model, Mohammad Shoeybi, NVIDIA
      • Distributed Parameter Server for Massive Recommender System; Weijie Zhao, Baidu
      • GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding; Zhifeng Chen, Google
  • Hot Chips Session - NVIDIA’s A100 GPU: Performance and Innovation for GPU Computing

Distributed HPC Applications with Unprivileged Containers