Parallel GMRES solver fails when n isn't a perfect square #667

quantumsteve · 2024-02-23T20:19:02Z

Describe the bug
A clear and concise description of what the bug is.

solver_test fails for 1 and 4 processes, but fails for 2 and 3 processes

To Reproduce
Steps to reproduce the behavior:

git commit hash being built

Found during Add parallel gmres implementation #662.

cmake command

cmake -DASGARD_USE_MPI=TRUE ..

full program/test invocation command
mpirun -np 2 -i -s gmres
mpirun -np 2 ./solver-tests
additional steps

Expected behavior
A clear and concise description of what you expected to happen.

asgard runs, tests pass

System:

system name [e.g. fusiont5, summit]
modules loaded [e.g. output of module list]
other systems where this is reproducible [e.g. "my laptop", "none"]

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

quantumsteve · 2024-02-23T20:24:15Z

diff --git a/src/time_advance.cpp b/src/time_advance.cpp
index 36f504ae..e9a38437 100644
--- a/src/time_advance.cpp
+++ b/src/time_advance.cpp
@@ -34,7 +34,7 @@ get_sources(PDE<P> const &pde, adapt::distributed_grid<P> const &grid,
   {
     auto const source_vect = transform_and_combine_dimensions(
         pde, source.source_funcs, grid.get_table(), transformer,
-        my_subgrid.row_start, my_subgrid.row_stop, degree, time,
+        my_subgrid.col_start, my_subgrid.col_stop, degree, time,
         source.time_func(time));
     fm::axpy(source_vect, sources);
   }

Appears to fix the failed assertion, but not sure if it creates other issues.

mkstoyanov · 2024-02-23T20:56:47Z

The size of the local source vector should match the number of rows in the grid.

Check function redistribute_vector() in distribution.cpp, the method uses the col_map to do the redistribution, instead of the row_map. (lines 1068 and 1069). This method is called during the coarsen stage of the refinement process (coarsen solution on line 79).

This is odd, how did it work before, but could be that the sources were not properly tested. (most likely)

Can you lookup the uses of redistribute_vector()

maybe we just need to use the row maps and it will be fine
maybe we need to make a second variant of the method to switch between row or col distribution

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel GMRES solver fails when n isn't a perfect square #667

Parallel GMRES solver fails when n isn't a perfect square #667

quantumsteve commented Feb 23, 2024

quantumsteve commented Feb 23, 2024

mkstoyanov commented Feb 23, 2024

Parallel GMRES solver fails when n isn't a perfect square #667

Parallel GMRES solver fails when n isn't a perfect square #667

Comments

quantumsteve commented Feb 23, 2024

quantumsteve commented Feb 23, 2024

mkstoyanov commented Feb 23, 2024