Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel GMRES solver fails when n isn't a perfect square #667

Open
quantumsteve opened this issue Feb 23, 2024 · 2 comments
Open

Parallel GMRES solver fails when n isn't a perfect square #667

quantumsteve opened this issue Feb 23, 2024 · 2 comments

Comments

@quantumsteve
Copy link
Collaborator

Describe the bug
A clear and concise description of what the bug is.

solver_test fails for 1 and 4 processes, but fails for 2 and 3 processes

To Reproduce
Steps to reproduce the behavior:

  1. git commit hash being built
  1. cmake command
  • cmake -DASGARD_USE_MPI=TRUE ..
  1. full program/test invocation command
    mpirun -np 2 -i -s gmres
    mpirun -np 2 ./solver-tests
  2. additional steps

Expected behavior
A clear and concise description of what you expected to happen.

asgard runs, tests pass

System:

  • system name [e.g. fusiont5, summit]
  • modules loaded [e.g. output of module list]
  • other systems where this is reproducible [e.g. "my laptop", "none"]

Additional context
Add any other context about the problem here.

image

@quantumsteve
Copy link
Collaborator Author

diff --git a/src/time_advance.cpp b/src/time_advance.cpp
index 36f504ae..e9a38437 100644
--- a/src/time_advance.cpp
+++ b/src/time_advance.cpp
@@ -34,7 +34,7 @@ get_sources(PDE<P> const &pde, adapt::distributed_grid<P> const &grid,
   {
     auto const source_vect = transform_and_combine_dimensions(
         pde, source.source_funcs, grid.get_table(), transformer,
-        my_subgrid.row_start, my_subgrid.row_stop, degree, time,
+        my_subgrid.col_start, my_subgrid.col_stop, degree, time,
         source.time_func(time));
     fm::axpy(source_vect, sources);
   }

Appears to fix the failed assertion, but not sure if it creates other issues.

@mkstoyanov
Copy link
Collaborator

The size of the local source vector should match the number of rows in the grid.

Check function redistribute_vector() in distribution.cpp, the method uses the col_map to do the redistribution, instead of the row_map. (lines 1068 and 1069). This method is called during the coarsen stage of the refinement process (coarsen solution on line 79).

This is odd, how did it work before, but could be that the sources were not properly tested. (most likely)

Can you lookup the uses of redistribute_vector()

  • maybe we just need to use the row maps and it will be fine
  • maybe we need to make a second variant of the method to switch between row or col distribution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants