Skip to content

Commit

Permalink
Merge pull request #185 from awslabs/sjg/gpu-docs-dev
Browse files Browse the repository at this point in the history
Documentation for GPU support
  • Loading branch information
sebastiangrimberg authored Mar 4, 2024
2 parents 71b3813 + a79d79e commit a16c3bf
Show file tree
Hide file tree
Showing 19 changed files with 80 additions and 2 deletions.
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,12 @@ The format of this changelog is based on
- Added documentation for various timer categories and improved timing breakdown of
various sections of a simulation.
- Fixed bug in implementation of numeric wave ports for driven simulations.
- Added GPU support for *Palace* via its dependencies, and added the
`config["Solver"]["Device"]` and `config["Solver"]["Backend"]` options for runtime
configuration of the MFEM device (`"CPU"` or `"GPU"`) and libCEED backend, with suitable
defaults for users.
- Added a new section to the documentation on
[Parallelism and GPU support](https://awslabs.github.io/palace/dev/guide/parallelism/).

## [0.12.0] - 2023-12-21

Expand Down
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,9 @@ the frequency or time domain, using the
[high-order operator partial assembly](https://mfem.org/performance/), parallel sparse
direct solvers, and algebraic multigrid (AMG) preconditioners, for fast performance on
platforms ranging from laptops to HPC systems.
- Support for hardware acceleration using NVIDIA or AMD GPUs, including multi-GPU
parallelism, using pure CUDA and HIP code as well as [MAGMA](https://icl.utk.edu/magma/)
and other libraries.

## Getting started

Expand All @@ -62,6 +65,7 @@ System requirements:
- C and Fortran (optional) compilers for dependency builds
- MPI distribution
- BLAS, LAPACK libraries
- CUDA Toolkit or ROCm installation (optional, for GPU support only)

## Documentation

Expand Down
3 changes: 2 additions & 1 deletion docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,8 @@ makedocs(
"guide/problem.md",
"guide/model.md",
"guide/boundaries.md",
"guide/postprocessing.md"
"guide/postprocessing.md",
"guide/parallelism.md"
],
"Configuration File" => Any[
"config/config.md",
Expand Down
1 change: 1 addition & 0 deletions docs/src/guide/guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,4 @@ which can be performed with *Palace* and the various features available in the s
- [Simulation Models](model.md)
- [Boundary Conditions](boundaries.md)
- [Postprocessing and Visualization](postprocessing.md)
- [Parallelism and GPU Support](parallelism.md)
42 changes: 42 additions & 0 deletions docs/src/guide/parallelism.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
```@raw html
<!--- Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. --->
<!--- SPDX-License-Identifier: Apache-2.0 --->
```

# Parallelism and GPU Support

*Palace* employs multiple types of parallelism in an attempt to maximize performance across
a wide range of deployment possibilities. The first is MPI-based distributed-memory
parallelism. This is controlled using the `-np` command line flag as outlined in
[Running *Palace*](../run.md).

Shared-memory parallelism using OpenMP is also available. To enable this, the
`-DPALACE_WITH_OPENMP=ON` option should be specified at configure time. At runtime, the
number of threads is configured with the `-nt` argument to the `palace` executable, or by
setting the [`OMP_NUM_THREADS`](https://www.openmp.org/spec-html/5.0/openmpse50.html)
environment variable.

Lastly, *Palace* supports GPU-acceleration using NVIDIA and AMD GPUs, activated with the
build options `-DPALACE_WITH_CUDA=ON` and `-DPALACE_WITH_HIP=ON`, respectively. At runtime,
the [`config["Solver"]["Device"]`](../config/solver.md#config%5B%22Solver%22%5D) parameter
in the configuration file can be set to `"CPU"` (the default) or `"GPU"` in order to
configure *Palace* and MFEM to use the available GPU(s). The
[`config["Solver"]["Backend"]`](../config/solver.md#config%5B%22Solver%22%5D) parameter, on
the other hand, controls the
[libCEED backend](https://libceed.org/en/latest/gettingstarted/#backends). Users typically
do not need to provide a value for this option and can instead rely on *Palace*'s default,
which selects the most appropriate backend for the given value of
[`config["Solver"]["Device"]`](../config/solver.md#config%5B%22Solver%22%5D).

In order to take full advantage of the performance benefits made available by GPU-
acceleration, it is recommended to make use of
[operator partial assembly](https://mfem.org/performance/), activated when the value of
[`config["Solver"]["PartialAssemblyOrder"]`](../config/solver.md#config%5B%22Solver%22%5D)
is less than [`config["Solver"]["Order"]`](../config/solver.md#config%5B%22Solver%22%5D).
This feature avoids assembling a global sparse matrix and instead makes use of data
structures for operators which lend themselves to more efficient asymptotic storage and
application costs. See also
[https://libceed.org/en/latest/intro/](https://libceed.org/en/latest/intro/) for more
details. Partial assembly in *Palace* supports mixed meshes including both tensor product
elements (hexahedra and quadrilaterals) as well as non-tensor product elements
(tetrahedra, prisms, pyramids, and triangles).
2 changes: 1 addition & 1 deletion docs/src/guide/postprocessing.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ These include:
## Boundary postprocessing

Boundary postprocessing capabilities are enabled by including objects under
`config["Boundaries"]["Postprocessing"]`](../config/boundaries.md) in the configuration
[`config["Boundaries"]["Postprocessing"]`](../config/boundaries.md) in the configuration
file. These include:

- [`config["Boundaries"]["Postprocessing"]["Capacitance"]`](../config/boundaries.md#boundaries%5B%22Postprocessing%22%5D%5B%22Capacitance%22%5D) :
Expand Down
4 changes: 4 additions & 0 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,10 @@ the frequency or time domain, using the
[high-order operator partial assembly](https://mfem.org/performance/), parallel sparse
direct solvers, and algebraic multigrid (AMG) preconditioners, for fast performance on
platforms ranging from laptops to HPC systems.
- Support for
[hardware acceleration using NVIDIA or AMD GPUs](https://libceed.org/en/latest/intro/),
including multi-GPU parallelism, using pure CUDA and HIP code as well as
[MAGMA](https://icl.utk.edu/magma/) and other libraries.

## Contents

Expand Down
9 changes: 9 additions & 0 deletions docs/src/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,9 @@ A build from source requires the following prerequisites installed on your syste
- C and Fortran (optional) compilers for dependency builds
- MPI distribution
- BLAS, LAPACK libraries (described below in [Math libraries](#Math-libraries))
- [CUDA Toolkit](https://developer.nvidia.com/cuda-toolkit) or
[ROCm](https://rocm.docs.amd.com/en/latest/) installation (optional, for GPU support
only)

In addition, builds from source require the following system packages which are typically
already installed and are available from most package managers (`apt`, `dnf`, `brew`, etc.):
Expand Down Expand Up @@ -101,6 +104,9 @@ The *Palace* build respects standard CMake variables, including:
desired compilers.
- `CMAKE_CXX_FLAGS`, `CMAKE_C_FLAGS`, and `CMAKE_Fortran_FLAGS` which define the
corresponding compiler flags.
- `CMAKE_CUDA_COMPILER`, `CMAKE_CUDA_FLAGS`, `CMAKE_CUDA_ARCHITECTURES`, and the
corresponding `CMAKE_HIP_COMPILER`, `CMAKE_HIP_FLAGS`, and `CMAKE_HIP_ARCHITECTURES` for
GPU-accelerated builds with CUDA or HIP.
- `CMAKE_INSTALL_PREFIX` which specifies the path for installation (if none is provided,
defaults to `<BUILD_DIR>`).
- `CMAKE_BUILD_TYPE` which defines the build type such as `Release`, `Debug`,
Expand All @@ -116,6 +122,9 @@ Additional build options are (with default values in brackets):

- `PALACE_WITH_64BIT_INT [OFF]` : Build with 64-bit integer support
- `PALACE_WITH_OPENMP [OFF]` : Use OpenMP for shared-memory parallelism
- `PALACE_WITH_CUDA [OFF]` : Use CUDA for NVIDIA GPU support
- `PALACE_WITH_HIP [OFF]` : Use HIP for AMD or NVIDIA GPU support
- `PALACE_WITH_GPU_AWARE_MPI [OFF]` : Option to set if MPI distribution is GPU aware
- `PALACE_WITH_SUPERLU [ON]` : Build with SuperLU_DIST sparse direct solver
- `PALACE_WITH_STRUMPACK [OFF]` : Build with STRUMPACK sparse direct solver
- `PALACE_WITH_MUMPS [OFF]` : Build with MUMPS sparse direct solver
Expand Down
1 change: 1 addition & 0 deletions examples/cavity/cavity_impedance.json
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@
"Solver":
{
"Order": 4,
"Device": "CPU",
"Eigenmode":
{
"N": 15,
Expand Down
1 change: 1 addition & 0 deletions examples/cavity/cavity_pec.json
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@
"Solver":
{
"Order": 4,
"Device": "CPU",
"Eigenmode":
{
"N": 15,
Expand Down
1 change: 1 addition & 0 deletions examples/coaxial/coaxial_matched.json
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@
"Solver":
{
"Order": 3,
"Device": "CPU",
"Transient":
{
"Type": "GeneralizedAlpha",
Expand Down
1 change: 1 addition & 0 deletions examples/coaxial/coaxial_open.json
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@
"Solver":
{
"Order": 3,
"Device": "CPU",
"Transient":
{
"Type": "GeneralizedAlpha",
Expand Down
1 change: 1 addition & 0 deletions examples/coaxial/coaxial_short.json
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
"Solver":
{
"Order": 3,
"Device": "CPU",
"Transient":
{
"Type": "GeneralizedAlpha",
Expand Down
1 change: 1 addition & 0 deletions examples/cpw/cpw_lumped_adaptive.json
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,7 @@
"Solver":
{
"Order": 2,
"Device": "CPU",
"Driven":
{
"MinFreq": 2.0, // GHz
Expand Down
1 change: 1 addition & 0 deletions examples/cpw/cpw_lumped_uniform.json
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,7 @@
"Solver":
{
"Order": 2,
"Device": "CPU",
"Driven":
{
"MinFreq": 2.0, // GHz
Expand Down
1 change: 1 addition & 0 deletions examples/cpw/cpw_wave_adaptive.json
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,7 @@
"Solver":
{
"Order": 2,
"Device": "CPU",
"Driven":
{
"MinFreq": 2.0, // GHz
Expand Down
1 change: 1 addition & 0 deletions examples/cpw/cpw_wave_uniform.json
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,7 @@
"Solver":
{
"Order": 2,
"Device": "CPU",
"Driven":
{
"MinFreq": 2.0, // GHz
Expand Down
1 change: 1 addition & 0 deletions examples/rings/rings.json
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@
"Solver":
{
"Order": 2,
"Device": "CPU",
"Magnetostatic":
{
"Save": 2
Expand Down
1 change: 1 addition & 0 deletions examples/spheres/spheres.json
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@
"Solver":
{
"Order": 3,
"Device": "CPU",
"Electrostatic":
{
"Save": 2
Expand Down

0 comments on commit a16c3bf

Please sign in to comment.