MPI segfaulting #68

abid1214 · 2024-08-02T15:22:44Z

When I try using the 'gpumpi' simulator for running qaoa, I segfault when using more than 1 node.

Specifically, I run

mpiexec -n 2 python benchmark.py

where benchmark.py has the lines

f = get_qaoa_maxcut_objective(N, p, G, simulator='gpumpi')
f(theta)

when calling f(theta), it eventually calls furx_all in qokit/fur/mpi_nbcuda/fur.py. I get a segfault when running this, and specifically in the following line:

from ..lazy_import import MPI
from ..nbcuda.fur import furx_all as furx_local


def furx_all(x, theta: float, n_local_qubits: int, n_all_qubits: int, comm):
    assert n_local_qubits <= n_all_qubits
    assert n_all_qubits <= 2 * n_local_qubits, "n_all_qubits > 2*n_local_qubits is not yet implemented"

    for i in range(n_local_qubits):
        furx_local(x, theta, i)

    if n_all_qubits > n_local_qubits:
        comm.Alltoall(MPI.IN_PLACE, x)  #  <- SEGFAULTS AT THIS LINE ------ #
        for i in range(2 * n_local_qubits - n_all_qubits, n_local_qubits):
            furx_local(x, theta, i)
        comm.Alltoall(MPI.IN_PLACE, x)

I've attached benchmark.py to replicate this issue. The branch I'm working on is fix-mpi

benchmark.py.txt

The text was updated successfully, but these errors were encountered:

abid1214 · 2024-11-08T22:52:17Z

Running on Perlmutter, I've found no segfault issues when using the following procedure:

using GPU-enabled MPI on Perlmutter

ensure that nvlink is enabled:

nvidia-smi nvlink -s

with your gpu instances allocated, for example:

salloc --nodes 1 --qos interactive --time 01:00:00 --constraint gpu --gpus 4 --account=####

activate your virtual environment for qokit

source qokit/bin/activate

load in openmpi:

module use /global/common/software/m3169/perlmutter/modulefiles
module load openmpi

run the benchmark

mpiexec -n 4 python benchmark.py

notes

on perlmutter with 4 GPUs, this script runs through N=30

Here is the benchmark.py file that I used
benchmark.py.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPI segfaulting #68

MPI segfaulting #68

abid1214 commented Aug 2, 2024 •

edited

Loading

abid1214 commented Nov 8, 2024

MPI segfaulting #68

MPI segfaulting #68

Comments

abid1214 commented Aug 2, 2024 • edited Loading

abid1214 commented Nov 8, 2024

using GPU-enabled MPI on Perlmutter

notes

abid1214 commented Aug 2, 2024 •

edited

Loading