Solve mathematical programs in parallel #21957

AlexandreAmice · 2024-09-26T14:54:43Z

Closes #19119.

This change is

AlexandreAmice

Would close #19119 . I have not yet written the tests nor the bindings yet, but am looking for feedback on the interface design before finalizing the tests.

cc @hongkai-dai, @jwnimmer-tri for design
cc @iamsavva, @cohnt since I know you are wanting this feature and I want to be sure these would feel natural to you both.

Reviewable status: 2 unresolved discussions, needs platform reviewer assigned, needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits), missing label for release notes (waiting on @AlexandreAmice)

solvers/BUILD.bazel line 733 at r1 (raw file):

    implementation_deps = [
        ":choose_best_solver",
        ":get_program_type",

nit: This is not useful.

solvers/solve.cc line 68 at r1 (raw file):

    DRAKE_THROW_UNLESS(
        solvers.at(solver_id)->AreProgramAttributesSatisfied(*(progs.at(i))));
  }

As far as I can tell, all of our solvers are reentrant thread-safe and so it suffices to have only one copy of each solver. I may be wrong about this, in which case I would want to have each thread have it's own list of solvers.

For the case of safety, it may be a good idea to do this anyway. Thoughts?

Code quote:

  std::unordered_map<SolverId, std::unique_ptr<SolverInterface>> solvers;
  for (int i = 0; i < ssize(progs); ++i) {
    // Identity the best solver for the program and store it in the map of
    // available solvers.
    const SolverId solver_id =
        solver_ids->at(i).value_or(ChooseBestSolver(*(progs.at(i))));
    if (!solvers.contains(solver_id)) {
      solvers.insert({solver_id, MakeSolver(solver_id)});
    }
    DRAKE_THROW_UNLESS(
        solvers.at(solver_id)->AreProgramAttributesSatisfied(*(progs.at(i))));
  }

cohnt · 2024-09-27T19:36:07Z

Very happy with how the interface looks -- this should do be able to do everything I'm looking for.

I'm not sure what dynamic scheduling is, but I'm guessing I shouldn't really worry about it?

jwnimmer-tri · 2024-09-27T19:56:03Z

(We should be sure the docs explain this.)

Static scheduling means if you have N cores, each core solves 1/N'th of the programs based on a fixed schedule (e.g., every N'th program on the list). This is most efficient (less overhead) in case the programs all take a comparable amount of time to solve.

Dynamic scheduling is a pool where each core takes the next unsolved program on the list. This works better in for programs that might take widely variable amount of time to solve, but has a higher overhead for each program (needs some kind of semaphore or memory barrier to figure out what's next on the list, and claim it).

See "Scheduling_clauses" here: https://en.wikipedia.org/wiki/OpenMP

cohnt · 2024-09-27T20:08:17Z

Ah, in that case, it is relevant for me! I would appreciate this being included in the documentation in some form.

cohnt

Reviewable status: 4 unresolved discussions, LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits), missing label for release notes (waiting on @AlexandreAmice)

solvers/solve.cc line 112 at r1 (raw file):

    // Convert the initial guess into the requisite optional.
    std::optional<Eigen::VectorXd> initial_guess{std::nullopt};
    if (initial_guesses->at(i) != nullptr) {

Need to check if initial_guesses is a nullptr before trying to dereference it. How about

if (initial_guesses != nullptr && initial_guesses->at(i) != nullptr)

Code quote:

if (initial_guesses->at(i) != nullptr)

solvers/solve.cc line 146 at r1 (raw file):

      // Convert the initial guess into the requisite optional.
      std::optional<Eigen::VectorXd> initial_guess{std::nullopt};
      if (initial_guesses->at(i) != nullptr) {

Need to check if initial_guesses is a nullptr before trying to dereference it. How about

if (initial_guesses != nullptr && initial_guesses->at(i) != nullptr)

Code quote:

if (initial_guesses->at(i) != nullptr)

cohnt

Reviewable status: 5 unresolved discussions, LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits), missing label for release notes (waiting on @AlexandreAmice)

solvers/solve.cc line 119 at r1 (raw file):

    // Solve the program.
    MathematicalProgramResult* result_local{nullptr};

This should be a MathematicalProgramResult, and then passed into solvers.at(solver_id)->Solve(...). That function is expecting a pointer that it can store its data into -- it shouldn't be nullptr.

jwnimmer-tri

Two questions / suggestions on the API:

(1) Now that I see that SolverInterface::Solve is phrased in terms of optional instead of nullable, we should probably do the same here to avoid extra copying.

(2) For the parameters where we want to allow for a either single value (broadcast) or an array value (per program), instead of overloading we could use a variant. This would allow for the user to mix and match how they want to broadcast versus not.

SolveInParallel(
    ...,
    const std::variant<
        std::optional<SolverOptions>,
        std::vector<std::optional<SolverOptions>>
            >& solver_options = {},
    const std::variant<
        std::optional<SolverId>,
        std::vector<std::optional<SolverId>>,
            >& solver_ids = {},

Reviewable status: 10 unresolved discussions, LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits), missing label for release notes (waiting on @AlexandreAmice)

solvers/solve.h line 54 at r1 (raw file):

 * the solver and the problem formulation. If solver_ids == nullptr then this is
 * done for every progs[i]. Uses at most parallelism cores, with dynamic
 * scheduling by default.

BTW As per the discussion above, we need to explain dynamic vs static scheduling more clearly. That might be best as a @param instead of inline in the overview here.

solvers/solve.h line 60 at r1 (raw file):

 * manner.
 *
 * @throws if initial guess and solver options are provided and not the same

typos

Suggestion:

initial_guesses or solver_options or solver_ids

solvers/solve.h line 75 at r1 (raw file):

    bool dynamic_schedule = true);

/**

nit As with code, we should not copy-paste documentation, because it's brittle and annoying to keep large sections of text identical over time as people change things.

If we keep this as two separate functions, we should change this one to explain its contract by citing the other overload and explaining what's different.

solvers/solve.cc line 68 at r1 (raw file):

Previously, AlexandreAmice (Alexandre Amice) wrote…

As far as I can tell, all of our solvers are reentrant thread-safe and so it suffices to have only one copy of each solver. I may be wrong about this, in which case I would want to have each thread have it's own list of solvers.

For the case of safety, it may be a good idea to do this anyway. Thoughts?

The right question isn't whether they currently happen to be thread safe, it's whether we want to document them as such, and thus write regression tests for all of them to guard that documented promise. Currently, only SNOPT has such a regression test.

It is probably easiest to not try to coalesce them in this first PR. We can always circle back and do that in a follow-up if we find that it would be meaningfully faster.

Note that for the non-coalesced design, we should not pre-allocate all 1000 solvers. Each inner work loop should create (and destroy) the solver it needs moment by moment.

solvers/solve.cc line 160 at r1 (raw file):

                                   &(results.at(i)));
    } else {
      results[i] = *results_parallel[i];

It's somewhat wasteful to copy around the results again. How about in the workers we solve directly into the final results, and have a separate bookkeeping vector for those which were unsafe? (Don't use vector<bool>.)

solvers/solve.cc line 178 at r1 (raw file):

  }));
  if (initial_guesses != nullptr) {
    DRAKE_THROW_UNLESS(progs.size() == initial_guesses->size());

nit The next three checks all have the LHS and RHS switched. The suspicious quantity being examined should appear on the LHS.

jwnimmer-tri · 2024-10-01T18:40:06Z

Two questions / suggestions on the API: ...

Actually, let me retract that suggestion for now. Doing variant probably increases the copying, and doing optional probably helps C++ callers but hurts Python callers. I need to pencil out the details more carefully.

jwnimmer-tri

(1) Okay, I think actually we should stick with nullptr instead of optional and (eventually) change SolverBase to accept nullptr to save the extra copying from pydrake.

(2) Using variant is not efficient for C++ callers, so we should avoid that.

In short, I think the proposed two function signatures are good as-is.

Reviewable status: 11 unresolved discussions, LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits), missing label for release notes (waiting on @AlexandreAmice)

solvers/solve.cc line 198 at r1 (raw file):

    const std::optional<SolverId>& solver_id, const Parallelism parallelism,
    bool dynamic_schedule) {
  DRAKE_THROW_UNLESS(std::all_of(progs.begin(), progs.end(), [](auto prog) {

There is no reason for using the Impl stuff and repeating these argument checks in both overloads. This overload can broadcast the solver_options into a std::vector and then call the other overload, like we already do the for the IDs.

AlexandreAmice

Thanks for taking a look at that! I hope to get to this tomorrow.

Reviewable status: 11 unresolved discussions, LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits), missing label for release notes

solvers/solve.cc line 68 at r1 (raw file):

Previously, jwnimmer-tri (Jeremy Nimmer) wrote…

The right question isn't whether they currently happen to be thread safe, it's whether we want to document them as such, and thus write regression tests for all of them to guard that documented promise. Currently, only SNOPT has such a regression test.

It is probably easiest to not try to coalesce them in this first PR. We can always circle back and do that in a follow-up if we find that it would be meaningfully faster.

Note that for the non-coalesced design, we should not pre-allocate all 1000 solvers. Each inner work loop should create (and destroy) the solver it needs moment by moment.

To avoid having to create and destroy each solver I was thinking about having the solvers stored as:

std::vector<std::unordered<map<SolverId, std::unique_ptr<SolverInterface>> solvers(num_threads);

The ith entry of this vector would be all the solvers that thread i has needed up until this moment. If it encounters a program which needs a solver that the ith thread has not yet used, then it constructs that solver, otherwise it just accesses the already made solver.

My reasoning is if I want to solve thousands of small programs, I don't want to be allocating the solver memory thousands of times.

jwnimmer-tri

Reviewable status: 11 unresolved discussions, LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits), missing label for release notes (waiting on @AlexandreAmice)

solvers/solve.cc line 68 at r1 (raw file):

Previously, AlexandreAmice (Alexandre Amice) wrote…

To avoid having to create and destroy each solver I was thinking about having the solvers stored as:
std::vector<std::unordered<map<SolverId, std::unique_ptr<SolverInterface>> solvers(num_threads);
The ith entry of this vector would be all the solvers that thread i has needed up until this moment. If it encounters a program which needs a solver that the ith thread has not yet used, then it constructs that solver, otherwise it just accesses the already made solver.

My reasoning is if I want to solve thousands of small programs, I don't want to be allocating the solver memory thousands of times.

That approach seems safe, but I'm not sure it's necessary. I would suggest running a performance experiment and see whether it matters or not.

My mental model is that creating and destroyed solver instances is an order (or two) of magnitude cheaper than the other things that happen during solving (e.g., populating the results), so would not actually make much a different compared to how much extra complexity that would introduce in the implementation.

There is also the problem that certain solver instances (e.g., mosek) hold a license as a member field, so keeping them around longer than strictly necessary might actually make things worse (in case license seats are a limited resource).

AlexandreAmice

Reviewable status: 11 unresolved discussions, LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits), missing label for release notes

solvers/solve.cc line 68 at r1 (raw file):

Previously, jwnimmer-tri (Jeremy Nimmer) wrote…

That approach seems safe, but I'm not sure it's necessary. I would suggest running a performance experiment and see whether it matters or not.

My mental model is that creating and destroyed solver instances is an order (or two) of magnitude cheaper than the other things that happen during solving (e.g., populating the results), so would not actually make much a different compared to how much extra complexity that would introduce in the implementation.

There is also the problem that certain solver instances (e.g., mosek) hold a license as a member field, so keeping them around longer than strictly necessary might actually make things worse (in case license seats are a limited resource).

The license seats are actually an excellent reason why I shouldn't keep the solvers around and in fact reveals a subtlety I had not considered.

Say that an organization has two licenses to (e.g. Mosek) and they attempt to solve in parallel using 16 threads. If I create 16 solvers, this will crash their code as they don't have enough licenses to solve their problems. However, if I created only one solver and re-used it across all the threads, then they can safely use 16 threads.

Should I:
1.) Only create as many solvers of a particular kind as their are licenses? How would I even do this without a try-catch?
2.) Allow the error and throw a better error message?
3.) Stick with the current design and test our solvers for the promise that they are re-entrant thread safe?

jwnimmer-tri

Reviewable status: 11 unresolved discussions, LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits), missing label for release notes (waiting on @AlexandreAmice)

solvers/solve.cc line 68 at r1 (raw file):

However, if I created only one solver and re-used it across all the threads, then they can safely use 16 threads.

I suppose they could use 16 threads in some sense, but I think only one at a time? As I understand it, each (e.g.) mosek seat buys you one concurrent call to (e.g.) MSK_optimizetrm. They would be probably better served by using a serial solve and giving MOSEK itself all 16 cores.

My initial thought is that the user should pass in a parallelism = N which does not exceed the number of seats they have purchased. Then, so long as each thread has at most 1 solver at a time, we know that the seats will never be exceeded.

AlexandreAmice

I believe this is a complete PR now. Are there any volunteers for a thorough feature review?

Reviewable status: 4 unresolved discussions, LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits), missing label for release notes (waiting on @AlexandreAmice)

solvers/solve.cc line 68 at r1 (raw file):

Previously, jwnimmer-tri (Jeremy Nimmer) wrote…

However, if I created only one solver and re-used it across all the threads, then they can safely use 16 threads.

I suppose they could use 16 threads in some sense, but I think only one at a time? As I understand it, each (e.g.) mosek seat buys you one concurrent call to (e.g.) MSK_optimizetrm. They would be probably better served by using a serial solve and giving MOSEK itself all 16 cores.

My initial thought is that the user should pass in a parallelism = N which does not exceed the number of seats they have purchased. Then, so long as each thread has at most 1 solver at a time, we know that the seats will never be exceeded.

I added a note to this effect in the documentation. I kept the cached map scheme in the code however.

solvers/solve.cc line 112 at r1 (raw file):

Previously, cohnt (Thomas Cohn) wrote…

Need to check if initial_guesses is a nullptr before trying to dereference it. How about
if (initial_guesses != nullptr && initial_guesses->at(i) != nullptr)

Done.

solvers/solve.cc line 119 at r1 (raw file):

Previously, cohnt (Thomas Cohn) wrote…

This should be a MathematicalProgramResult, and then passed into solvers.at(solver_id)->Solve(...). That function is expecting a pointer that it can store its data into -- it shouldn't be nullptr.

Done.

solvers/solve.cc line 146 at r1 (raw file):

Previously, cohnt (Thomas Cohn) wrote…

Need to check if initial_guesses is a nullptr before trying to dereference it. How about
if (initial_guesses != nullptr && initial_guesses->at(i) != nullptr)

Done.

jwnimmer-tri

After we come to an agreement on the solver instance caching (see below), my plan is to push a bunch of cleanups to the implementation for simplicity, conciseness, and reduced copying.

Reviewable status: 5 unresolved discussions, LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits), missing label for release notes (waiting on @AlexandreAmice)

solvers/solve.cc line 68 at r1 (raw file):

Previously, AlexandreAmice (Alexandre Amice) wrote…

I added a note to this effect in the documentation. I kept the cached map scheme in the code however.

The solver caching is a plausible design, but a handful of messages upthread I talked about if we keep the caching then means we need to lay a lot more groundwork first -- write a regression test for each one of our solvers, checking that it is threadsafe (and having that pass under all of our CI flavors). Currently only SNOPT has such a test. I still think it would be easier for this first spin not to cache. The only possible slow thing about create-destroy is the license check for two of the proprietary solvers, which users can (for now) work around by calling AcquireLicense themselves before doing anything in their app, in case it's a real problem. They would need to do that anyway, in case they are calling a higher-layer algorithm which itself calls SolveInParallel more than once.

solvers/solve.cc line 113 at r3 (raw file):

    }
    const SolverInterface* solver = solvers.at(thread_num).at(solver_id).get();
    DRAKE_THROW_UNLESS(solver->AreProgramAttributesSatisfied(*(progs.at(i))));

nit We don't need this check. The SolverBase already checks for this and throws, with a much nicer message.

jwnimmer-tri

Okay, with a closer look, this is ready for me to push/propose some changes. I'll try to get that by tomorrow.

Reviewable status: 4 unresolved discussions, LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits), missing label for release notes (waiting on @AlexandreAmice)

solvers/solve.cc line 68 at r1 (raw file):

Previously, jwnimmer-tri (Jeremy Nimmer) wrote…

The solver caching is a plausible design, but a handful of messages upthread I talked about if we keep the caching then means we need to lay a lot more groundwork first -- write a regression test for each one of our solvers, checking that it is threadsafe (and having that pass under all of our CI flavors). Currently only SNOPT has such a test. I still think it would be easier for this first spin not to cache. The only possible slow thing about create-destroy is the license check for two of the proprietary solvers, which users can (for now) work around by calling AcquireLicense themselves before doing anything in their app, in case it's a real problem. They would need to do that anyway, in case they are calling a higher-layer algorithm which itself calls SolveInParallel more than once.

Oops, I take it back. I misread the code -- it has a separate cache per thread. That's fine as-is.

…am_in_parallel

* remove the pointless impl indirection * remove non-GSG const in header * don't pay the cost of broadcasting null options

fix bug where `parallelism` was not obeyed in the serial section

jwnimmer-tri

I pushed some changes. Please have a look.

Dismissed @cohnt from 3 discussions.
Reviewable status: 1 unresolved discussion, LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits), missing label for release notes (waiting on @AlexandreAmice)

bindings/pydrake/solvers/test/mathematicalprogram_test.py line 1530 at r4 (raw file):

                         mp.SolutionResult.kSolutionResultNotSet)

    def test_solve_in_parallel(self):

This is missing test coverage where an argument is non-None but its list contains a mixture of i'th-specific values and None.

I think it would be fine just to have the hard-coded initial_guesses and solver_ids and solver_options lists have 1 or 2 null elements each. We don't really need to vary our test cases to have different amounts of nulls, we just need coverage of at least one mixture.

jwnimmer-tri

Reviewable status: 2 unresolved discussions, LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @AlexandreAmice)

tools/workspace/coinutils_internal/defs.bzl line 126 at r17 (raw file):

            # On Clang 12, "-w" doesn't suppress this for some reason.
            "-Wno-register",
            # The Coin family of software uses NDEBUG for gross things, not

Working

Needs #22063 to merge first. (This change is due to that PR, so should not be reviewed here.)

jwnimmer-tri

Reviewed 1 of 1 files at r18, all commit messages.
Reviewable status: 1 unresolved discussion, LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @AlexandreAmice)

…tical_program_in_parallel

jwnimmer-tri · 2024-10-21T18:52:47Z

+@calderpg-tri for feature review, please.

calderpg-tri

Reviewed 1 of 16 files at r8, 1 of 4 files at r13, 10 of 13 files at r14, 1 of 2 files at r15, 1 of 5 files at r17, 1 of 1 files at r18, 1 of 2 files at r19, 1 of 2 files at r20, all commit messages.
Reviewable status: 5 unresolved discussions, LGTM missing from assignees jwnimmer-tri(platform),calderpg-tri, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @AlexandreAmice)

solvers/solve.h line 83 at r20 (raw file):

std::vector<MathematicalProgramResult> SolveInParallel(
    const std::vector<const MathematicalProgram*>& progs,
    const std::vector<const Eigen::VectorXd*>* initial_guesses,

If these are optional, wouldn't it make sense to default them to nullptr?

solvers/solve.cc line 80 at r20 (raw file):

      solvers(parallelism.num_threads());

  // The worker lambda behaves slightly differently depending on whether we are

Something you may want to do here to improve clarity would be use a couple of lambdas here so the call sites are clearer

const auto solve_ith = [&](const bool in_parallel, const int thread_num, const int64_t i) {
  ...
};

const auto solve_ith_parallel = [&](const int thread_num, const int64_t i) {
  solve_ith(/* in parallel */ true, thread_num, i);
};

const auto solve_ith_serial = [&](const int64_t i) {
  solve_ith(/* in parallel */ false, /* thread_num */ 0, i);
};

solvers/solve.cc line 107 at r20 (raw file):

    }

    // Find (or create) the specified solver.

btw In addition to the {Dynamic,Static}ParallelForIndexLoop, CRU also provides {Dynamic,Static}ParallelForRangeLoop which (particularly in the static case) would allow you to reduce the frequency of some lookups to once per range rather than once per index. If you went that route, you would probably end up with a structure like

const auto solve_range = [&](const ThreadWorkRange& work_range) {
  // Get solvers for thread.

  // Do any common SolverOptions config.

  for (int64_t i = work_range.GetRangeStart(); i < work_range.GetRangeEnd(); ++i) {
    if (not_safe_in_parallel(i)) {
      continue;
    }

    // Do per-index SolverOptions config.

    // Solve.
  }
};

solvers/solve.cc line 149 at r20 (raw file):

  // Call solve_ith in parallel for all of the progs.
  if (parallelism.num_threads() > 1) {
    (dynamic_schedule ? &(DynamicParallelForIndexLoop<decltype(solve_ith)>)

I think it would be clearer to write this more simply like

if (dynamic_schedule) {
  DynamicParallelForIndexLoop(...);
} else {
  StaticParallelForIndexLoop(...);
}

for only two cases the duplication isn't a problem.

solvers/solve.cc line 162 at r20 (raw file):

  // Finish solving the programs that couldn't be solved in parallel.
  operating_in_parallel = false;
  for (int i = 0; i < ssize(progs); ++i) {

It's a little odd to run the index through three different types (int, size_t, int64_t). This could instead be

for (size_t i = 0; i < progs.size(); ++i) {
  if (!result_is_populated[i]) {
    solve_ith(/* thread_num */ 0, ssize(i));
  }
}

jwnimmer-tri

Reviewed 2 of 2 files at r19, 1 of 2 files at r20, all commit messages.
Reviewable status: 5 unresolved discussions, LGTM missing from assignees jwnimmer-tri(platform),calderpg-tri, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @AlexandreAmice)

solvers/solve.h line 83 at r20 (raw file):

Previously, calderpg-tri wrote…

If these are optional, wouldn't it make sense to default them to nullptr?

That would lead to overload ambiguity with the next function, in case the user only passed progs.

calderpg-tri

Reviewed 1 of 5 files at r17.
Reviewable status: 6 unresolved discussions, LGTM missing from assignees jwnimmer-tri(platform),calderpg-tri, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @AlexandreAmice)

bindings/pydrake/solvers/solvers_py_mathematicalprogram.cc line 1687 at r20 (raw file):

      .def("GetProgramType", &solvers::GetProgramType, doc.GetProgramType.doc)
      .def(
          "SolveInParallel",

I believe both of these bindings should get py::call_guard<py::gil_scoped_release>() to avoid potential deadlocks

calderpg-tri

Reviewable status: 5 unresolved discussions, LGTM missing from assignees jwnimmer-tri(platform),calderpg-tri, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @AlexandreAmice)

solvers/solve.h line 83 at r20 (raw file):

Previously, jwnimmer-tri (Jeremy Nimmer) wrote…

That would lead to overload ambiguity with the next function, in case the user only passed progs.

Ah, true.

AlexandreAmice

Reviewable status: 5 unresolved discussions, LGTM missing from assignees jwnimmer-tri(platform),calderpg-tri, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits)

solvers/solve.cc line 80 at r20 (raw file):

Previously, calderpg-tri wrote…

Something you may want to do here to improve clarity would be use a couple of lambdas here so the call sites are clearer

const auto solve_ith = [&](const bool in_parallel, const int thread_num, const int64_t i) {
  ...
};

const auto solve_ith_parallel = [&](const int thread_num, const int64_t i) {
  solve_ith(/* in parallel */ true, thread_num, i);
};

const auto solve_ith_serial = [&](const int64_t i) {
  solve_ith(/* in parallel */ false, /* thread_num */ 0, i);
};

Done.

solvers/solve.cc line 107 at r20 (raw file):

Previously, calderpg-tri wrote…

btw In addition to the {Dynamic,Static}ParallelForIndexLoop, CRU also provides {Dynamic,Static}ParallelForRangeLoop which (particularly in the static case) would allow you to reduce the frequency of some lookups to once per range rather than once per index. If you went that route, you would probably end up with a structure like
const auto solve_range = [&](const ThreadWorkRange& work_range) {
  // Get solvers for thread.

  // Do any common SolverOptions config.

  for (int64_t i = work_range.GetRangeStart(); i < work_range.GetRangeEnd(); ++i) {
    if (not_safe_in_parallel(i)) {
      continue;
    }

    // Do per-index SolverOptions config.

    // Solve.
  }
};

I think I am okay with the for index loop. Unless you think that there is a substantial impact on performance the current code reads fairly well.

solvers/solve.cc line 149 at r20 (raw file):

Previously, calderpg-tri wrote…

I think it would be clearer to write this more simply like
if (dynamic_schedule) {
  DynamicParallelForIndexLoop(...);
} else {
  StaticParallelForIndexLoop(...);
}
for only two cases the duplication isn't a problem.

Done.

solvers/solve.cc line 162 at r20 (raw file):

Previously, calderpg-tri wrote…

It's a little odd to run the index through three different types (int, size_t, int64_t). This could instead be
for (size_t i = 0; i < progs.size(); ++i) {
  if (!result_is_populated[i]) {
    solve_ith(/* thread_num */ 0, ssize(i));
  }
}

Done.

bindings/pydrake/solvers/solvers_py_mathematicalprogram.cc line 1687 at r20 (raw file):

Previously, calderpg-tri wrote…

I believe both of these bindings should get py::call_guard<py::gil_scoped_release>() to avoid potential deadlocks

Done.

jwnimmer-tri

+@SeanCurtis-TRI for platform review per schedule, please.

from me as co-author.

Reviewed 2 of 2 files at r21, all commit messages.
Reviewable status: 5 unresolved discussions, LGTM missing from assignees calderpg-tri,SeanCurtis-TRI(platform), commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @AlexandreAmice)

bindings/pydrake/solvers/solvers_py_mathematicalprogram.cc line 1687 at r20 (raw file):

Previously, AlexandreAmice (Alexandre Amice) wrote…

Done.

Yes, very important! Good catch.

calderpg-tri

Reviewed 1 of 2 files at r6, 1 of 2 files at r19, 1 of 2 files at r20, 2 of 2 files at r21, all commit messages.
Reviewable status: 1 unresolved discussion, LGTM missing from assignees SeanCurtis-TRI(platform),calderpg-tri, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @AlexandreAmice)

solvers/test/solve_in_parallel_test.cc line 239 at r21 (raw file):

with Drake CI's sanitizers and memory checkers, this would flag memory or thread
errors seen during the parallel solves. */
class SolveInParallelIntegrationTest : public ::testing::Test {

As far as I can tell, these tests don't actually cover the case where Parallelism::None() is used, or cover a case where some programs can be solved in parallel and some in serial. You probably don't need a whole test fixture to test all solvers in serial, but a couple cases are necessary.

AlexandreAmice

Reviewable status: 1 unresolved discussion, LGTM missing from assignees SeanCurtis-TRI(platform),calderpg-tri, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits)

solvers/test/solve_in_parallel_test.cc line 239 at r21 (raw file):

Previously, calderpg-tri wrote…

As far as I can tell, these tests don't actually cover the case where Parallelism::None() is used, or cover a case where some programs can be solved in parallel and some in serial. You probably don't need a whole test fixture to test all solvers in serial, but a couple cases are necessary.

Done. Good catch.

In the testing of diverse program, I cover the case where some programs can be solved in parallel and other must be solved serially.

I changed the test fixture to test both for Parallelism::Max() and Parallelism::None()

calderpg-tri

Reviewed 1 of 1 files at r22, all commit messages.
Reviewable status: LGTM missing from assignee SeanCurtis-TRI(platform), commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @AlexandreAmice)

SeanCurtis-TRI

With some small notes

Reviewed 1 of 2 files at r6, 1 of 16 files at r8, 1 of 4 files at r13, 10 of 13 files at r14, 1 of 2 files at r15, 1 of 5 files at r17, 1 of 1 files at r18, 1 of 2 files at r19, 1 of 2 files at r20, 2 of 2 files at r21, 1 of 1 files at r22, all commit messages.
Reviewable status: 9 unresolved discussions, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @AlexandreAmice)

solvers/solve.h line 51 at r22 (raw file):

 * solver_options[i] if given, by invoking the solver at solver_ids[i] if
 * provided.  If solver_ids[i] is nullopt then the best available solver is
 * selected for each progs[i] individually depending on the availability of

nit editorial

The spelling solver_ids[i] implies we're talking about a single choice which corresponds to a single progs[i].

Suggestion:

selected for progs[i]

solvers/solve.h line 78 at r22 (raw file):
BTW Subtle rhetorical thought.

Would you consider solver_ids[i] = nullptr (or solver_ids = nullptr) as having "specified" solver_ids[i]? Alternative language:

if any program cannot be solved.

Also, the language of using the subscript i seems to suggest we're missing the phrase, "for any valid i" (or some such thing). In other words, if one program has a problem, an exception messes everything up.

Code quote:

specified

solvers/test/solve_in_parallel_test.cc line 42 at r22 (raw file):

  std::vector<std::unique_ptr<MathematicalProgram>> progs;
  std::vector<const MathematicalProgram*> prog_ptrs;
  std::vector<const Eigen::VectorXd*> initial_guesses;

BTW Given the API that allows you to specify one solver id and one solver options, why not use that API in this test?

solvers/test/solve_in_parallel_test.cc line 59 at r22 (raw file):

  symbolic::Variable x("x", symbolic::Variable::Type::BINARY);
  symbolic::Variable y("y");
  symbolic::Variable z("y");

nit: Two variables with the same symbol, "y"? Surely a typo, yes?

Code quote:

("y"

solvers/test/solve_in_parallel_test.cc line 63 at r22 (raw file):

  vars << x, y, z;

  int num_trials = 10;

nit: const (here and below).

solvers/test/solve_in_parallel_test.cc line 65 at r22 (raw file):

  int num_trials = 10;
  for (int i = 0; i < num_trials; i++) {
    progs.push_back(std::make_unique<MathematicalProgram>());

BTW As i read this, every program is identical. You're just creating the same program over and over. In a test below, you simply copy a single program pointer N times. Why not do that here? Why create unique instances?

solvers/test/solve_in_parallel_test.cc line 101 at r22 (raw file):

  std::vector<const SolverOptions*> solver_options;
  std::vector<std::optional<SolverId>> solver_ids;
  int num_trials = 10;

nit: const

solvers/test/solve_in_parallel_test.cc line 202 at r22 (raw file):

  // if certain costs/constraints are flagged for different thread safety in the
  // future, this test will fail and need to be updated.
  EXPECT_TRUE(convex_prog.IsThreadSafe());

nit: This should probably ASSERT_TRUE? Does it make sense to proceed if any of these flags fail?

solvers/test/solve_in_parallel_test.cc line 209 at r22 (raw file):

  std::vector<const MathematicalProgram*> progs;
  std::vector<std::optional<SolverId>> solver_ids;
  int num_trials = 10;

nit: const

AlexandreAmice

Reviewable status: 1 unresolved discussion, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits)

solvers/test/solve_in_parallel_test.cc line 42 at r22 (raw file):

Previously, SeanCurtis-TRI (Sean Curtis) wrote…

BTW Given the API that allows you to specify one solver id and one solver options, why not use that API in this test?

Good call.

AlexandreAmice

+(status: squashing now)

Waiting for CI if everyone is satisfied. Thank you everyone for your help!

Reviewable status: complete! all discussions resolved, LGTM from assignees jwnimmer-tri(platform),SeanCurtis-TRI(platform),calderpg-tri

SeanCurtis-TRI

Reviewed 2 of 2 files at r23, all commit messages.
Reviewable status: complete! all discussions resolved, LGTM from assignees jwnimmer-tri(platform),SeanCurtis-TRI(platform),calderpg-tri

solvers/solve.h line 51 at r22 (raw file):

Previously, SeanCurtis-TRI (Sean Curtis) wrote…

nit editorial

The spelling solver_ids[i] implies we're talking about a single choice which corresponds to a single progs[i].

You have to delete the word "each" as well.

AlexandreAmice added 2 commits September 25, 2024 11:55

wip

888c8d7

finalize interface

eb2434f

AlexandreAmice mentioned this pull request Sep 26, 2024

Solve Mathematical Program in Parallel #19119

Closed

remove useless bazel bloat

6a4ae34

AlexandreAmice commented Sep 26, 2024

View reviewed changes

jwnimmer-tri self-assigned this Sep 26, 2024

cohnt reviewed Sep 28, 2024

View reviewed changes

cohnt mentioned this pull request Sep 28, 2024

Parallelize the GCS Preprocessing Stage #21965

Merged

jwnimmer-tri reviewed Oct 1, 2024

View reviewed changes

AlexandreAmice commented Oct 2, 2024

View reviewed changes

jwnimmer-tri reviewed Oct 2, 2024

View reviewed changes

AlexandreAmice commented Oct 2, 2024

View reviewed changes

jwnimmer-tri reviewed Oct 2, 2024

View reviewed changes

bind python

3419d94

AlexandreAmice commented Oct 3, 2024

View reviewed changes

jwnimmer-tri reviewed Oct 3, 2024

View reviewed changes

AlexandreAmice and others added 6 commits October 4, 2024 11:41

bazel lint fix

3a472b8

Merge remote-tracking branch 'upstream' into solve_mathematical_progr…

79c7263

…am_in_parallel

switch test to use ipopt instead of (absent) snopt

37477a8

small style cleanups in entry points

3d8bfc3

* remove the pointless impl indirection * remove non-GSG const in header * don't pay the cost of broadcasting null options

clean up par-for dispatch some related comments

69da36e

rewrite and clarify the implementation

6096c53

fix bug where `parallelism` was not obeyed in the serial section

jwnimmer-tri reviewed Oct 4, 2024

View reviewed changes

jwnimmer-tri reviewed Oct 20, 2024

View reviewed changes

merge upstream

6eac03c

jwnimmer-tri reviewed Oct 20, 2024

View reviewed changes

AlexandreAmice added 3 commits October 21, 2024 09:12

Merge remote-tracking branch 'drake_master/master' into solve_mathema…

f18eba5

…tical_program_in_parallel

fix small grammar nit

5cdc362

a bit more cleanup

6e432e6

jwnimmer-tri assigned calderpg-tri Oct 21, 2024

calderpg-tri reviewed Oct 23, 2024

View reviewed changes

jwnimmer-tri reviewed Oct 23, 2024

View reviewed changes

calderpg-tri reviewed Oct 24, 2024

View reviewed changes

feature review

b1b85c7

AlexandreAmice commented Oct 24, 2024

View reviewed changes

jwnimmer-tri assigned SeanCurtis-TRI Oct 24, 2024

jwnimmer-tri reviewed Oct 24, 2024

View reviewed changes

calderpg-tri reviewed Oct 24, 2024

View reviewed changes

update tests a bit more

b01e902

AlexandreAmice commented Oct 24, 2024

View reviewed changes

calderpg-tri reviewed Oct 24, 2024

View reviewed changes

SeanCurtis-TRI reviewed Oct 24, 2024

View reviewed changes

platform review

75db004

AlexandreAmice commented Oct 24, 2024

View reviewed changes

AlexandreAmice added the status: squashing now https://drake.mit.edu/reviewable.html#curated-commits label Oct 24, 2024

AlexandreAmice commented Oct 24, 2024

View reviewed changes

SeanCurtis-TRI reviewed Oct 24, 2024

View reviewed changes

small nit on comments

0e7f7f5

AlexandreAmice merged commit ec1896d into RobotLocomotion:master Oct 24, 2024
9 checks passed

sammy-tri mentioned this pull request Nov 11, 2024

Parallelize the Pairwise Set Intersection Checking for GCS (with or without Continuous Revolute Joints) #22142

Merged

AlexandreAmice mentioned this pull request Nov 21, 2024

Provide ability to call SolveInParallel with program generators. #22226

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Solve mathematical programs in parallel #21957

Solve mathematical programs in parallel #21957

AlexandreAmice commented Sep 26, 2024 •

edited by jwnimmer-tri

Loading

AlexandreAmice left a comment

cohnt commented Sep 27, 2024

jwnimmer-tri commented Sep 27, 2024

cohnt commented Sep 27, 2024

cohnt left a comment

cohnt left a comment

jwnimmer-tri left a comment

jwnimmer-tri commented Oct 1, 2024

jwnimmer-tri left a comment

AlexandreAmice left a comment

jwnimmer-tri left a comment

AlexandreAmice left a comment

jwnimmer-tri left a comment

AlexandreAmice left a comment

jwnimmer-tri left a comment

jwnimmer-tri left a comment

jwnimmer-tri left a comment

jwnimmer-tri left a comment

jwnimmer-tri left a comment

jwnimmer-tri commented Oct 21, 2024

calderpg-tri left a comment

jwnimmer-tri left a comment

calderpg-tri left a comment

calderpg-tri left a comment

AlexandreAmice left a comment

jwnimmer-tri left a comment

calderpg-tri left a comment

AlexandreAmice left a comment

calderpg-tri left a comment

SeanCurtis-TRI left a comment

AlexandreAmice left a comment

AlexandreAmice left a comment

SeanCurtis-TRI left a comment

Solve mathematical programs in parallel #21957

Solve mathematical programs in parallel #21957

Conversation

AlexandreAmice commented Sep 26, 2024 • edited by jwnimmer-tri Loading

AlexandreAmice left a comment

Choose a reason for hiding this comment

cohnt commented Sep 27, 2024

jwnimmer-tri commented Sep 27, 2024

cohnt commented Sep 27, 2024

cohnt left a comment

Choose a reason for hiding this comment

cohnt left a comment

Choose a reason for hiding this comment

jwnimmer-tri left a comment

Choose a reason for hiding this comment

jwnimmer-tri commented Oct 1, 2024

jwnimmer-tri left a comment

Choose a reason for hiding this comment

AlexandreAmice left a comment

Choose a reason for hiding this comment

jwnimmer-tri left a comment

Choose a reason for hiding this comment

AlexandreAmice left a comment

Choose a reason for hiding this comment

jwnimmer-tri left a comment

Choose a reason for hiding this comment

AlexandreAmice left a comment

Choose a reason for hiding this comment

jwnimmer-tri left a comment

Choose a reason for hiding this comment

jwnimmer-tri left a comment

Choose a reason for hiding this comment

jwnimmer-tri left a comment

Choose a reason for hiding this comment

jwnimmer-tri left a comment

Choose a reason for hiding this comment

jwnimmer-tri left a comment

Choose a reason for hiding this comment

jwnimmer-tri commented Oct 21, 2024

calderpg-tri left a comment

Choose a reason for hiding this comment

jwnimmer-tri left a comment

Choose a reason for hiding this comment

calderpg-tri left a comment

Choose a reason for hiding this comment

calderpg-tri left a comment

Choose a reason for hiding this comment

AlexandreAmice left a comment

Choose a reason for hiding this comment

jwnimmer-tri left a comment

Choose a reason for hiding this comment

calderpg-tri left a comment

Choose a reason for hiding this comment

AlexandreAmice left a comment

Choose a reason for hiding this comment

calderpg-tri left a comment

Choose a reason for hiding this comment

SeanCurtis-TRI left a comment

Choose a reason for hiding this comment

AlexandreAmice left a comment

Choose a reason for hiding this comment

AlexandreAmice left a comment

Choose a reason for hiding this comment

SeanCurtis-TRI left a comment

Choose a reason for hiding this comment

AlexandreAmice commented Sep 26, 2024 •

edited by jwnimmer-tri

Loading