Skip to content

Commit

Permalink
allow for optional printing of errors from GPUs (#2923)
Browse files Browse the repository at this point in the history
Right now we if-def out a lot of error messages when on GPUs because of register pressure with
printf. This adds a compilation flag, USE_GPU_PRINTF=TRUE, that will enable prints from
GPUs using AMREX_DEVICE_PRINTF()
  • Loading branch information
zingale authored Jul 25, 2024
1 parent b218304 commit 8ffd8d7
Show file tree
Hide file tree
Showing 4 changed files with 43 additions and 0 deletions.
1 change: 1 addition & 0 deletions .github/workflows/good_defines.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
ALLOW_GPU_PRINTF
AMREX_DEBUG
AMREX_PARTICLES
AMREX_SPACEDIM
Expand Down
31 changes: 31 additions & 0 deletions Docs/source/mpi_plus_x.rst
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,37 @@ To enable this, compile with::
USE_HIP = TRUE


Printing Warnings from GPU Kernels
==================================

.. index:: USE_GPU_PRINTF

Castro will output warnings if several assumptions are violated (often
triggering a retry in the process). On GPUs, printing from a kernel
(using ``printf()``) can increase the number of registers a kernel needs,
causing performance problems. As a result, warnings are disabled by
wrapping them in ``#ifndef AMREX_USE_GPU``.

However, for debugging GPU runs, sometimes we want to see these
warnings. The build option ``USE_GPU_PRINTF=TRUE`` will enable these
(by setting the preprocessor flag ``ALLOW_GPU_PRINTF``).

.. note::

Not every warning has been enabled for GPUs.

.. tip::

On AMD architectures, it seems necessary to use unbuffered I/O. This
can be accomplished in the job submission script (for SLURM) by doing

::

srun -u ./Castro...




Working at Supercomputing Centers
=================================

Expand Down
4 changes: 4 additions & 0 deletions Exec/Make.Castro
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,10 @@ ifeq ($(USE_GPU),TRUE)
endif
endif

ifeq ($(USE_GPU_PRINTF),TRUE)
DEFINES += -DALLOW_GPU_PRINTF
endif

CASTRO_AUTO_SOURCE_DIR := $(TmpBuildDir)/castro_sources/$(optionsSuffix).EXE


Expand Down
7 changes: 7 additions & 0 deletions Source/driver/Castro.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3300,6 +3300,9 @@ Castro::check_for_negative_density ()
std::cout << "Invalid X[" << n << "] = " << X << " in zone "
<< i << ", " << j << ", " << k
<< " with density = " << rho << "\n";
#elif defined(ALLOW_GPU_PRINTF)
AMREX_DEVICE_PRINTF("Invalid X[%d] = %g in zone (%d,%d,%d) with density = %g\n",
n, X, i, j, k, rho);
#endif
X_check_failed = 1;
}
Expand All @@ -3310,6 +3313,10 @@ Castro::check_for_negative_density ()
return {rho_check_failed, X_check_failed};
});

#ifdef ALLOW_GPU_PRINTF
std::fflush(nullptr);
#endif

}

ReduceTuple hv = reduce_data.value();
Expand Down

0 comments on commit 8ffd8d7

Please sign in to comment.