Skip to content
Pedro Louro edited this page May 19, 2023 · 7 revisions

Thanks to FCT and other funding sources, our research group has access to an high-performance GPU/Deep Learning server to run complex computations. The server can be used by researchers, but please note that it is shared between users (MIR/MER and Health Informatics). Follow the defined netiquette rules.

Netiquette Rules

  1. Do not hoard resources for your tasks (GPUs / Memory / CPU cores)
    • General rule is to use 1 GPU per user and a reasonable number of CPU cores
    • If you need more, at least warn your colleagues (Skype group) and try to give an estimate (how much and for how long?)
  2. Design / plan and test your experiments with smaller subsets of data to confirm it works before going all in for weeks
    • Not cool to hoard resources for long without at least getting results
  3. Do not reboot the machine without asking, especially if there are tasks running
    • Similarly, be careful when changing software or killing processes
  4. If editing files remotely (e.g., using remote desktop), make sure you save your work before leaving.
  5. For long experiments use checkpoints (save/load progress) and logs. This saves time in case your process goes down. Random TensorFlow/Keras example here.

Hardware Description

SuperServer 4029GP-TRT2 - 4U Dual Processor (Intel), Single-Root GPU System with Up to 8 PCI-E GPUs. Currently with:

  • CPU: 2x Intel® Xeon® Silver 4214 Processor (lscpu)
    • Frequency: 2.20 GHz (Turbo 3.20 GHz)
    • Total Cores: 24 (12 per CPU)
    • Total Threads: 48 (24 per CPU)
    • Cache: 16.5 MB (per CPU)
  • RAM: 320 GB DDR4 (sudo lshw -short -C memory)
    • 8 x 32GiB DIMM DDR4 Synchronous 2666 MHz (0.4 ns)
    • 2 x 32GiB DIMM DDR4 Synchronous 3200 MHz (0.3 ns)
  • GPUs: 8 installed (lspci | grep -i --color 'vga\|3d\|2d' and nvidia-smi)
    • 5 x NVIDIA RTX A5000 24GB GDDR6
      • 8192 CUDA Cores ; 24GB GDDR6 ; 27.8 TFLOPS
    • 3 x NVIDIA Quadro P5000 16GB GDDR5X
      • 2560 CUDA Cores ; 16 GB GDDR5X ; 8.9 TFLOPS
  • HDD: 20+TB (lsblk -f, sudo hwinfo --disk --short, cat /proc/mdstat)
    • 2x Intel SSD DC S4500 Series 480GB 2.5in SATA 6Gbs
      • RAID 1, mounted at / (OS)
    • 6x Seagate 4TB Barracuda 2.5 5400rpm SATA III 128MB (ST4000LM024)
      • RAID 5, mounter at /home (user files)

Available Software

Several stuff already installed. Should we install multi-user stuff or each user will install as needed?

  • NVIDIA-SMI 520.61.05 / Driver Version: 520.61.05 / CUDA Version: 11.8
  • MATLAB R2021b Update 1 (9.11.0.1809720) 64-bit (glnxa64)
    • Probably will ask you for a personal license, check matlab activation @ helpdesk
    • Can be used also via SSH just to run scripts (explain how here?)
    • MATLAB R2014b is also installed (ll /usr/local/bin/matlab_r2014b)
  • Python 3.8
Clone this wiki locally