[Roadmap area] Solver improvements #3910

brosaplanella · 2024-03-20T13:29:35Z

I'd like to continue making pybamm easier to work with for the parameter inference libraries like PyBOP (https://github.com/pybop-team/PyBOP). At the moment I think this includes:

GPU as well as multithreaded support for running many simulations in parallel (Add support for MLIR-based expression evaluation #3826)
Integrate sensitivities more into Simulation (generalise simulation, include sensititivies #3834)
Run multiple simulations from different starting points (IDAKLU solver: add option for multiple initial conditions #3713). Should be possible to reset the starting points during the middle of a solve
This is generally "making it faster", so profiling would fit in here (see delay xarray.DataArray initialization #3862)
others?....

Originally posted by @martinjrobins in #3839 (comment)

rtimms · 2024-06-07T15:18:40Z

Not solver, but this is related to speed-up: #4058

martinjrobins · 2024-07-15T18:17:01Z

MarcBerliner · 2024-07-31T15:02:10Z

Here are my thoughts on improving the solver. Please let me know if you have comments or questions @martinjrobins and others.

Solver options

Add more idaklu solver options (number 2) #4282

Initialization

Improve consistent initialization speed and robustness #4301

Time stepping

In PyBaMM, the solver currently stops at all values in the t_eval vector. It seems that we use t_eval and set some dt for a few distinct purposes:

To enforce time-dependent discontinuities within a single step, like with a drive cycle
To set the data collection frequency (as in the period experiment kwarg)
Setting a small dt to force better solver convergence (take smaller time steps)

These three reasons for setting a dt are all valid, but stopping the simulation at all time steps can have a drastic impact on the adaptive time stepping and performance. For example, consider a full C/10 discharge with

a. t_eval = [0, 36000] (i.e., no stops)
b. t_eval = np.arange(0, 36060, 60) (pybamm default)

If we compare the solver stats,

Number of steps: a. 165 vs b. 715
Number of residual calls: a. 288 vs b. 823
Number of linear solver setups: a. 28 vs b. 91
Number of nonlinear iterations: a. 286 vs b. 821
DAE integration time: a. 25.5 ms vs b. 97.1 ms

Even though we solve the same system, the dense t_eval b. is nearly 4x slower! To address these issues, I propose the following changes that align the time-stepping options with Julia's differential equation solvers (see Output Control):

(Breaking) By default, save every t and y determined by IDA's adaptive time stepping algorithm. This eliminates issues like the one above with the C/10 discharge. We can accurately post-interpolate the solution onto specific times with IDA's Hermite interpolator. This is a huge benefit for parameter estimation because we will always obtain the full solution accuracy regardless of the data collection frequency.
(Non-breaking) Change the description of t_eval to match Julia's tstops: "Denotes extra times that the time-stepping algorithm must step to. This should be used to help the solver deal with discontinuities and singularities, since stepping exactly at the time of the discontinuity will improve accuracy." With this option, drive cycles in 1. still work
(Non-breaking) Add a solver option that matches Julia's saveat: "Denotes specific times to save the solution at, during the solving phase. The solver will save at each of the timepoints in this array in the most efficient manner available to the solver." This addresses the data collection frequency in 2. without negatively affecting performance since it interpolates the solution with IDAGetDky().
(Non-breaking) Discourage modifying t_eval for performance issues and encourage modifying appropriate solver options (rtol, atol, dt_max, etc.), which addresses 3.

martinjrobins · 2024-07-31T18:55:39Z

yea, agree that it would be great to get rid of IDASetStopTime! or at least reduce the number of times we use it. Would this play nicely with the casadi solver?

MarcBerliner · 2024-07-31T21:00:29Z

Would this play nicely with the casadi solver?

If we can access IDA's internal time steps via Casadi, I think we can do most of this stuff

agriyakhetarpal · 2024-08-06T14:35:41Z

Corollary from the PyBaMM developer meeting on 05/08/2024 as a part of PyBaMM running in WASM:

CasADi can be compiled with the Emscripten toolchain starting with the upcoming v3.6.6 and will be available in Pyodide with the next patch or minor version (either v0.26.3 or v0.27.0).
- uses NumPy for computations, so no floating point imprecisions were noticed (so far)
Compiling IDAS doesn't take too much effort; linking with KLU could be difficult. I'm unsure if IREE is available/possible. Reference BLAS/LAPACK implementations should be available and configurable.
- the IDAKLU solver should be possible to compile in theory, but threading/OpenMP will need to be disabled via I4087-multiprocessing #4260 or similar
- the current sdist is broken, can be fixed via further changes in scikit-build-core CI Builds #4242
Best way to proceed with this is to compile PyBaMM without the IDAKLU solver for now (i.e., as a pure Python package) and provide pybamm.CasadiSolver() through a Pyodide instance, which can then be used in the docs for the example notebooks or other in-browser uses (more or less, JupyterLite)
- ship a pure Python wheel (with BUILD_IDAKLU set to OFF) in addition to platform-specific wheels, since pip always chooses the most specific wheel available

martinjrobins · 2024-08-06T14:55:25Z

@agriyakhetarpal: FYI, I've compiled IDA with KLU using enscripten in the past, see this repo: https://github.com/martinjrobins/diffeq-runtime. However I agree that focusing on the casadi solver for now is the best approach since you have it already compiled and in Pyodide

agriyakhetarpal · 2024-08-06T15:16:22Z

Thanks for the resource, @martinjrobins! I see you've built a static lib for KLU and also used NO_LAPACK – should work well. I'll tinker with the settings and maybe I can reduce the binary size a bit :)

MarcBerliner · 2024-08-27T14:58:53Z

To improve the speed of our ODE models, I'd like to add CVODE from SUNDIALS to our suite of C solvers in addition to IDA. I have a few questions about the implementation, and I'd like to hear your thoughts @jsbrittain, @martinjrobins, @pipliggins, and others.

In C++, we can make a base solver class and derived classes for IDA and CVODE. The solver code structure for both will be very similar, so this will help with code reuse. However, this will cause a headache for some of the existing PRs.
The CasadiSolver automatically determines if the system is an ODE or a DAE and passes it to the appropriate solver. I think this approach also makes sense for our C solvers. If we do take this approach, then...
One minor issue is that the IDAKLUSolver name will no longer be accurate. Since we plan on streamlining the IDAKLU installation process and changing the default away from CasadiSolver, maybe we can also rename IDAKLUSolver to SundialsSolver or something. This might make the "new and improved default solver" announcement splashier (cc @valentinsulzer @rtimms).

And just FYI, I don't plan on starting this work for at least a couple of weeks.

agriyakhetarpal · 2024-11-20T21:34:45Z

Hi @martinjrobins, an update: I've made a substantial amount of progress recently in compiling PyBaMM + the IDAKLU solver for WASM and I'm at one of the last stages for it – which is to finish up and get it to compile fully so that we have a shareable wasm32 wheel (and hopefully correctly in terms of computations, which I can't verify until we do have a wheel at hand).

I've linked a PR with my progress above, and the blocker seems to be the fact that post #4260 (which can be closed now since it was superseded, I think?) and its latter variant #4449 that was merged recently, is that I need to figure out how to turn off OpenMP/threading support completely using a build-time flag/CMake argument that I can pass on/an environment variable of sorts/a hardcoded define I can patch, that will set the number of available threads to one so that the solver's compilation can proceed with the NVector_Serial pathway and Emscripten would never attempt to compile the code for the OpenMP pathway – thus, making OpenMP optional for the compilation procedure (please see the error logs in CI for context). Could you give me any pointers towards this, in case I'm missing anything? As far as I understand, we aren't exposing setup_opts.num_threads anywhere to do this, and changes in our build scripts and compilation will be required to turn off threads entirely. I can use OMP_NUM_THREADS to control the parallelisation as you mentioned in #4087 (comment), but this seems to be something I can access only at runtime when we already have wheels and not during build time.

That aside, PyBaMM without the IDAKLU solver, i.e., with just the CasadiSolver, and as a pure Python wheel, as you suggested in #3910 (comment), should work now 🎉 We have almost all the dependencies available for PyBaMM in Pyodide, including all of the compiled ones. It should just a matter of deploying it with JupyterLite and my custom Pyodide distribution.

martinjrobins · 2024-11-21T10:25:48Z

Hi @agriyakhetarpal. I would define a new variable in the CMakeLists.txt (e.g. PYBAMM_OPENMP), that if is equal to ON (this should be the default as well), adds a compile definition using https://cmake.org/cmake/help/latest/command/add_compile_definitions.html (e.g. call the definition PYBAMM_OPENMP). This will be available at compile time and you can do stuff like:

#ifdef PYBAMM_OPENMP
  #include <nvector/nvector_openmp.h>
#endif

Then you need to go through the code and switch off any openmp related code using this compile definition

remember also to switch off the searching and linking against openmp in CMakeLists.txt (line 147-169)

martinjrobins · 2024-11-21T10:26:58Z

looking forward to trying out pybamm in the browser, let me know when you have a deployment :)

agriyakhetarpal · 2024-11-21T11:35:55Z

Thanks for the ideas, @martinjrobins; I did have something like that in mind! I'll put together a PR for this and add another patch. I'll also need changes from #4449 for it, so it might be better for me to wait until we have a new release with #4598. I'll consider moving my repository to our GitHub organisation once the deployments are up and working.

[Note to self] some more changes that be needed to be done:

make pooch optional: while pooch is a pure Python package, we also need to interact with the in-browser file system and PyBaMM won't work outside JupyterLite (offers its own file system implementation) without doing this – it might work with a mounted folder with NodeFS (but we need pooch to save to that folder, then).
make platformdirs optional and disable posthog-based telemetry
patch away the pybamm.BatchStudy class under WASM because it uses multiprocessing – the import multiprocessing statement in itself does not break Pyodide because we've patched it to do nothing, but acquiring a Pool will break and we should fail with a helpful error message.
Fail early if someone sets num_threads at runtime as more than 1 when initialising an IDAKLUSolver() object, so that we avoid internal Emscripten traceback that comes with a crash
Run test suite under WASM with out-of-tree builds and see if there are any threading/multiprocessing issues to catch
Possibly upstream PyBaMM's recipe to the Pyodide repo once the patches are minimal (it does help that I help maintain it :D).

Most of these changes can very well be upstreamed; it would only help us in the future and I wouldn't need to maintain these patches.

agriyakhetarpal · 2024-11-21T11:51:44Z

Additional notes for further reference: we don't have a recipe for SUNDIALS in Pyodide yet – there will soon be one, though, but it will be for the latest version (7.1.1) and not the version we use (6.5.0), so I've been compelled to create a custom one myself and patch out or exclude the Fortran bits however we can because we don't have a relatively stable Fortran to WASM compiler (flang is starting to offer support starting with LLVM 19).

The same goes for SuiteSparse; we have v5.11.0 in Pyodide (for SparseQR, and does not compile KLU so far), but we use 6.0.3 here in PyBaMM (side note: we're getting quite behind, though, it might be worth testing the latest 7.8.3 for our conventional targets), which lacks some of the "build system convenience" changes such as being able to pass -DSUITESPARSE_ENABLE_PROJECTS or -DUSE_FORTRAN=NO, so I've compiled AMD, COLAMD, BTF, and KLU by going into their directories one-by-one in a pattern similar to what we do in install_KLU_Sundials.py. It would be nice to not maintain this too much, either.

For CasADi, we can't simply use the PyPI/non-PyPI wheels because we need the CasADi shared objects as side modules for the IDAKLU shared object to link to. Again, it might be nice to compile it some time as a native setuptools Extension (this gives us the added benefit of speeding up some of the Windows builds for which we use vcpkg).

brosaplanella assigned martinjrobins Mar 20, 2024

martinjrobins mentioned this issue May 14, 2024

refactor multiprocessing and multiple inputs #4087

Open

martinjrobins mentioned this issue Jul 30, 2024

Improve consistent initialization speed and robustness #4301

Merged

6 tasks

martinjrobins mentioned this issue Aug 2, 2024

Time stepping in IDA - minimise solver restarts #4312

Closed

4 tasks

MarcBerliner mentioned this issue Sep 3, 2024

Add CVODE solver #4407

Open

This was referenced Oct 28, 2024

remove explicit sensitivity calculation (casadi solver) #4545

Open

do sensitivity calculations for processed variables in C++ #4547

Open

agriyakhetarpal mentioned this issue Nov 20, 2024

Get a working build for PyBaMM – with IDAKLU enabled agriyakhetarpal/pybamm-lite#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Roadmap area] Solver improvements #3910

[Roadmap area] Solver improvements #3910

brosaplanella commented Mar 20, 2024

rtimms commented Jun 7, 2024

martinjrobins commented Jul 15, 2024 •

edited by agriyakhetarpal

Loading

MarcBerliner commented Jul 31, 2024 •

edited

Loading

martinjrobins commented Jul 31, 2024

MarcBerliner commented Jul 31, 2024

agriyakhetarpal commented Aug 6, 2024

martinjrobins commented Aug 6, 2024 •

edited

Loading

agriyakhetarpal commented Aug 6, 2024

MarcBerliner commented Aug 27, 2024

agriyakhetarpal commented Nov 20, 2024 •

edited

Loading

martinjrobins commented Nov 21, 2024

martinjrobins commented Nov 21, 2024

agriyakhetarpal commented Nov 21, 2024

agriyakhetarpal commented Nov 21, 2024

[Roadmap area] Solver improvements #3910

[Roadmap area] Solver improvements #3910

Comments

brosaplanella commented Mar 20, 2024

rtimms commented Jun 7, 2024

martinjrobins commented Jul 15, 2024 • edited by agriyakhetarpal Loading

Sensitivities

Multithreaded/GPU support:

Post Processing optimisation and refactoring

Solver refactoring

Solver documentation

WASM build

MarcBerliner commented Jul 31, 2024 • edited Loading

Solver options

Initialization

Time stepping

martinjrobins commented Jul 31, 2024

MarcBerliner commented Jul 31, 2024

agriyakhetarpal commented Aug 6, 2024

martinjrobins commented Aug 6, 2024 • edited Loading

agriyakhetarpal commented Aug 6, 2024

MarcBerliner commented Aug 27, 2024

agriyakhetarpal commented Nov 20, 2024 • edited Loading

martinjrobins commented Nov 21, 2024

martinjrobins commented Nov 21, 2024

agriyakhetarpal commented Nov 21, 2024

agriyakhetarpal commented Nov 21, 2024

martinjrobins commented Jul 15, 2024 •

edited by agriyakhetarpal

Loading

MarcBerliner commented Jul 31, 2024 •

edited

Loading

martinjrobins commented Aug 6, 2024 •

edited

Loading

agriyakhetarpal commented Nov 20, 2024 •

edited

Loading