Skip to content

marcjoos-cea/dumses-hybrid

Repository files navigation

This is DUMSES - hybrid version

Copyrights 2013-2022, CEA, Authors (see AUTHORS file)

DUMSES is a 3D MPI/OpenMP & MPI/OpenACC Eulerian second-order Godunov (magneto)hydrodynamic simulation code in cartesian, spherical and cylindrical coordinates.

CONFIGURATION/COMPILATION

Configuration is done with Autotools; please launch ./configure --help for more details and available options. Note that MPI and OpenMP are activated by default. Explicit OpenACC activation will deactivate OpenMP. Problem used by default is magnetic_loop.

For example, if you want to compile with GCC and MPI the stratified problem:

$ FC=gfortran ./configure --with-problem=stratified

If you want to compile with NVHPC and OpenACC:

$ FC=nvfortran ./configure --enable-openacc

Note that the default optimization level is -O3. If you define explicitely FCFLAGS, please note that you will have to specify the optimization level you want. With NVHPC, targetting GPU, one would be advised to use FCFLAGS='-O3 -gpu=<cc-arch>,nordc', -gpu=nordc flag allowing to gain a few percents of performance.

Note also that CUDA-aware MPI (direct communication of device variables on GPU) will be activated by default if MPI and OpenACC are activated. No more checks are done for now to check that the given MPI implementation actually is CUDA-aware.

If you want to check if your MPI implementation is CUDA-aware, you could compile'n'run this simple example:

program ring
  use mpi
  use openacc
  implicit none

  integer :: nproc, rank, comm, ierr
  integer, dimension(MPI_STATUS_SIZE) :: status
  integer :: ngpu
  integer :: i
  integer, allocatable, dimension(:) :: buf

  call MPI_Init(ierr)
  comm = MPI_COMM_WORLD

  call MPI_Comm_size(comm, nproc, ierr)
  call MPI_Comm_rank(comm, rank, ierr)
  if (rank  == 0) print '("Running with ", I3, " MPI processes")', nproc

  ngpu = acc_get_num_devices(acc_device_nvidia)
  if (rank == 0) print '("Open ACC - # of devices available: ", I2)', ngpu
  call MPI_Barrier(comm, ierr)

  do i = 0, nproc - 1
     if (i == rank) then
        call acc_set_device_num(i, acc_device_nvidia)
     end if
  end do

  allocate(buf(2))

  buf = rank
  !$acc data copy(buf)
  
  !$acc kernels
  buf = buf*2
  !$acc end kernels
  
  !$acc host_data use_device(buf)
  call MPI_Sendrecv(buf(1), 1, MPI_INTEGER, mod(rank+1,nproc), 1, &
                    buf(2), 1, MPI_INTEGER, mod(rank-1,nproc), 1, &
                    comm, status, ierr)
  !$acc end host_data
  !$acc end data
  print '("rank: ", I2, " - send and recv: ", I3, I3)', rank, buf(1), buf(2)

  deallocate(buf)

  call MPI_Finalize(ierr)

end program ring

If it works, it is probably OK. If not, you might want to turn off CUDA-aware MPI with --with-cuda-aware-mpi=0

Additional timers

If you want to run a version with additional timers, you could use the Python preprocessor:

$ python3
import sys
sys.path.append('utils/pyutils/')
from preproc import FileTree
tree = FileTree('./src')
tree.processAllFiles()

$ cd tmp/
$ ./configure <...>; make

RUNNING THE CODE

Problems can be found in src/problem/. Please select a problem in your Makefile, and then copy the input file in your execution directory to run the code.

ANALYZING THE RESULTS

In utils/dumpy/ you will find a small Python package to read DUMSES data.

TESTING THE CODE

A test suite can be run in utils/test/ by the test.py script. You just have to run

python test.py

It can take several minutes to run. It produces a PDF file in utils/test/fig/ summarizing the results and comparing them to a reference test suite execution.

DOCUMENTATION

Documentation of the code can be generated by running:

$ doxygen doc/Doxyfile

You can then access it with your favorite browser:

$ <browser> doc/html/index.html

User manual can be generated by running:

$ pdflatex doc/manual/manual.tex

or, if you have minted installed:

$ pdflatex -shell-escape doc/manual/manual.tex