Improve `partition_load_balance` #2206

thorstenhater · 2023-08-11T12:52:55Z

Spin-off from #2005. Make the primary load balancing cleaner and faster and more maintainable.
Thus:

remove all MPI calls, this is now purely local
remove temporary data structures and/or coral them into their own little scopes
simplify super_cells vs regular_cells
sort less
sparse connection tables

Partially inspired by external feedback

- Remove redundant MPI call for global gid list - Make GJ table local - Extract connection table builing.

- add group parameters struct to bundle info - coral temporary structures into their own scopes to avoid RSS growth.

Also, add weird test to see that we can _almost_ construct the partition.

arbor/partition_load_balance.cpp

example/busyring/init-only-2048-complex.json

example/busyring/ring.cpp

arbor/partition_load_balance.cpp

arbor/include/arbor/common_types.hpp

arbor/partition_load_balance.cpp

test/unit/test_domain_decomposition.cpp

boeschf · 2023-09-05T08:38:07Z

test/unit/test_domain_decomposition.cpp

+            ctx->distributed = std::make_shared<distributed_context>(dummy_context{rank, nranks});
+            for (const auto& R: {gj_symmetric(nranks, true), gj_symmetric(nranks, false)}) {
+                // NOTE: This is a bit silly, but allows us to test _most_ of
+                // the invariants without proper MPI support. If we could get `gather_gids` to


just curious: can you think of a way to make gather_gids work?

Not really without major surgery. We could split this function into three, tentatively:

register groups, called in parallel and shoving the data into the decomposition

gather_gids, collective call

actually construct the decomposition

but that'll require putting the data threaded between those calls into the decomposition
since they must be independent and pass around some state.

…lance

thorstenhater · 2023-09-14T06:51:47Z

@boeschf the failure seems to be a timeout on CSCS CI/CD?!

…lance

thorstenhater · 2024-01-08T10:49:30Z

@boeschf any news here?

boeschf

sorry it took so long! looks good!

thorstenhater added 8 commits August 11, 2023 11:54

Re-factor, simplify, and optimise load balancing. 1/n

dfda303

- Remove redundant MPI call for global gid list - Make GJ table local - Extract connection table builing.

Remove obsolete MPI methods.

204c33f

Remove super/regular cell split. Much simplification ensues.

c97e9c5

Simplify range computation.

9fa29bd

More clean-up, save on RSS.

b4e9202

- add group parameters struct to bundle info - coral temporary structures into their own scopes to avoid RSS growth.

Add a duplicate connection to test.

b431cd7

Style and polish.

27a0aa7

More comments less work.

8195b52

thorstenhater requested review from AdhocMan and boeschf August 11, 2023 12:53

thorstenhater added 3 commits August 11, 2023 21:11

Moar simple, moar correct connection table.

cd13d2d

Also, add weird test to see that we can _almost_ construct the partition.

Reenable tests.

29f2f98

Another slight clean-up.

c86036f

boeschf reviewed Aug 17, 2023

View reviewed changes

arbor/partition_load_balance.cpp Outdated Show resolved Hide resolved

Simplify test.

f4ea31e

boeschf reviewed Sep 5, 2023

View reviewed changes

thorstenhater added 2 commits September 5, 2023 12:42

Remove remains of testing.

89aa985

Merge remote-tracking branch 'origin/master' into qa/clean-up-load-ba…

0aab605

…lance

Merge remote-tracking branch 'origin/master' into qa/clean-up-load-ba…

21ec625

…lance

boeschf approved these changes Feb 27, 2024

View reviewed changes

thorstenhater merged commit ed67763 into arbor-sim:master Feb 27, 2024
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve `partition_load_balance` #2206

Improve `partition_load_balance` #2206

thorstenhater commented Aug 11, 2023

boeschf Sep 5, 2023

thorstenhater Sep 5, 2023

thorstenhater commented Sep 14, 2023

thorstenhater commented Jan 8, 2024

boeschf left a comment

Improve partition_load_balance #2206

Improve partition_load_balance #2206

Conversation

thorstenhater commented Aug 11, 2023

boeschf Sep 5, 2023

Choose a reason for hiding this comment

thorstenhater Sep 5, 2023

Choose a reason for hiding this comment

thorstenhater commented Sep 14, 2023

thorstenhater commented Jan 8, 2024

boeschf left a comment

Choose a reason for hiding this comment

Improve `partition_load_balance` #2206

Improve `partition_load_balance` #2206