Copyrights 2013-2022, CEA, Authors (see AUTHORS file)
DUMSES is a 3D MPI/OpenMP & MPI/OpenACC Eulerian second-order Godunov (magneto)hydrodynamic simulation code in cartesian, spherical and cylindrical coordinates.
Configuration is done with Autotools; please launch ./configure --help
for more details and available options. Note that
MPI and OpenMP are activated by default. Explicit OpenACC activation will deactivate OpenMP. Problem used by default is
magnetic_loop
.
For example, if you want to compile with GCC and MPI the stratified
problem:
$ FC=gfortran ./configure --with-problem=stratified
If you want to compile with NVHPC and OpenACC:
$ FC=nvfortran ./configure --enable-openacc
Note that the default optimization level is -O3
. If you define explicitely FCFLAGS
, please note that you will have to specify the optimization level you want.
With NVHPC, targetting GPU, one would be advised to use FCFLAGS='-O3 -gpu=<cc-arch>,nordc'
, -gpu=nordc
flag allowing to gain a few percents of performance.
Note also that CUDA-aware MPI (direct communication of device variables on GPU) will be activated by default if MPI and OpenACC are activated. No more checks are done for now to check that the given MPI implementation actually is CUDA-aware.
If you want to check if your MPI implementation is CUDA-aware, you could compile'n'run this simple example:
program ring
use mpi
use openacc
implicit none
integer :: nproc, rank, comm, ierr
integer, dimension(MPI_STATUS_SIZE) :: status
integer :: ngpu
integer :: i
integer, allocatable, dimension(:) :: buf
call MPI_Init(ierr)
comm = MPI_COMM_WORLD
call MPI_Comm_size(comm, nproc, ierr)
call MPI_Comm_rank(comm, rank, ierr)
if (rank == 0) print '("Running with ", I3, " MPI processes")', nproc
ngpu = acc_get_num_devices(acc_device_nvidia)
if (rank == 0) print '("Open ACC - # of devices available: ", I2)', ngpu
call MPI_Barrier(comm, ierr)
do i = 0, nproc - 1
if (i == rank) then
call acc_set_device_num(i, acc_device_nvidia)
end if
end do
allocate(buf(2))
buf = rank
!$acc data copy(buf)
!$acc kernels
buf = buf*2
!$acc end kernels
!$acc host_data use_device(buf)
call MPI_Sendrecv(buf(1), 1, MPI_INTEGER, mod(rank+1,nproc), 1, &
buf(2), 1, MPI_INTEGER, mod(rank-1,nproc), 1, &
comm, status, ierr)
!$acc end host_data
!$acc end data
print '("rank: ", I2, " - send and recv: ", I3, I3)', rank, buf(1), buf(2)
deallocate(buf)
call MPI_Finalize(ierr)
end program ring
If it works, it is probably OK. If not, you might want to turn off CUDA-aware MPI with --with-cuda-aware-mpi=0
If you want to run a version with additional timers, you could use the Python preprocessor:
$ python3
import sys
sys.path.append('utils/pyutils/')
from preproc import FileTree
tree = FileTree('./src')
tree.processAllFiles()
$ cd tmp/
$ ./configure <...>; make
Problems can be found in src/problem/. Please select a problem in your Makefile, and then copy the input file in your execution directory to run the code.
In utils/dumpy/ you will find a small Python package to read DUMSES data.
A test suite can be run in utils/test/ by the test.py script. You just have to run
python test.py
It can take several minutes to run. It produces a PDF file in utils/test/fig/ summarizing the results and comparing them to a reference test suite execution.
Documentation of the code can be generated by running:
$ doxygen doc/Doxyfile
You can then access it with your favorite browser:
$ <browser> doc/html/index.html
User manual can be generated by running:
$ pdflatex doc/manual/manual.tex
or, if you have minted installed:
$ pdflatex -shell-escape doc/manual/manual.tex